Register
Login
YOUR CART IS EMPTY

asynchronous queries to simpledb

Boy does time fly!

This thread looks to be a little on the old side and therefore may no longer be relevant. Please see if there is a newer thread on the subject and ensure you're using the most recent build of any software if your question regards a particular product.

This thread has been locked

This thread has been locked and is no longer accepting new posts, if you have a question regarding this topic please email us at support@mindscape.co.nz

stevel
3 posts

Your product looks great. So far it's the only thing I've found that does ORM for SimpleDB which could save us some time. I'm wondering if your code can fetch from SDB concurrently using asynchronous web requests or if it chains round trips to simpledb synchronously. For example if I i want 10,000 records it might be best to run 4 async range queries of 2500 (the current SDB limit). Amazon recommends this kind of pattern (performing sort in the app).

I realize that there are scenarios where you want a query to be synchronous too. But of course there are also times when you want it to be async.

This isn't necessarily a "make or break" thing for us, but it would be nice if you could choose to go async if you wanted to.

Looking forward to your thoughts on this.

Steve

Posted on Sep 18 2009

ivan
5,431 posts

At the moment these chained queries are synchronous because we depend on the NextToken from one query to initialise the next one. I believe that using SELECT COUNT you can get the NextTokens more quickly (and can then use these to perform the actual downloads concurrently) but we have not yet implemented support for this, and we would still need to do four SELECT COUNT queries synchronously in order to obtain the required NextTokens, so we're not sure whether this would really be a worthwhile optimisation. As far as I have been able to determine, there isn't currently a way to jump directly to an offset (e.g. we could not concurrently issue SELECT TOP 2500 OFFSET 0, SELECT TOP 2500 OFFSET 2500, SELECT TOP 2500 OFFSET 5000...).

If we're missing something that would allow us to do this more efficiently then we'd be keen to hear about it because I know we have other customers also performing multi-thousand-record queries -- let us know!

Posted on Sep 18 2009

stevel
3 posts

I'm not sure about your idea to use the NextToken from select count queries (I didn't know count returned a next token, but apparently it does). It might be worth testing.

I think i'm suggesting that it's possible to plan your application so that data can be broken up into meaningful chunks (like ranges of 2500 for example). I'm not sure how many people are doing this. I know we are. In our case we have a row that gets generated every minute for every customer, so if I want a month of data I can easily break my queries up into a set of time ranges. I will run these concurrently and then sort the results when all my queries are done. I don't need to know anything about the NextToken in this case, but I do need to know how many records will be returned in my chunks for it to work.

Let me know what you think about this.

Posted on Sep 19 2009

ivan
5,431 posts

Ah, I see what you mean. You are not expecting LightSpeed to automatically issue multiple queries: you just want a way to issue multiple queries concurrently. You will use your knowledge of the application data to compose those queries.

However, I'm not sure I understand why you need to know how many records will be returned in your chunks. Is it that you are trying to avoid having any one query overrun the 2500 record limit? If so I think you don't actually need to know how many records will be returned from each time range, but whether a given time range will exceed 2500 and if so what a "safe" time range would be. I don't think this will be possible -- you could issue COUNT queries and use some sort of binary chop for anything that returns over 2500, but this could result in a sequence of up-front queries (suppose one of the time ranges has 10000+ records in a skewed distribution and therefore requires multiple binary chops to get each chunk down below 2500) which kinda defeats the purpose of kicking off all the queries in parallel in the first place. Maybe I've misunderstood why you need the record count here -- let me know!

Ignoring the record count issue, I don't think LightSpeed will really be able to provide built-in support for "run multiple Finds simultaneously and return multiple result sets." However you could implement this using a BackgroundWorker or the thread pool. Because SimpleDB doesn't have a stateful database connection, there should be no problem running multiple requests in parallel through the same unit of work. I know this is not as efficient as a true async I/O request but given the other overheads of talking to SimpleDB I don't think this will be a huge issue. We could potentially create a lower-level FindAsync method, but this adds complexity to the programming model -- it adds yet another API to IUnitOfWork, and still leaves the developer adrift in the thread pool when it comes to handling the callbacks -- so we'd prefer not to unless the thread pool approach really does turn out to be unacceptably inefficient.

Needless to say we remain open to suggestions!

Posted on Sep 21 2009

asynchronous queries to simpledb

Data Products

DevOp Tools

Visual Controls

Popular Products

Quick Links