This thread looks to be a little on the old side and therefore may no longer be relevant. Please see if there is a newer thread on the subject and ensure you're using the most recent build of any software if your question regards a particular product.
This thread has been locked and is no longer accepting new posts, if you have a question regarding this topic please email us at support@mindscape.co.nz
|
i was wondering your thoughts on the Indexability of "is null" and simpleDb basically, we are worried that by turning on softDelete, we have effectivly introduced a performance hit. Here is the "Avoiding is null comparison" section from this link: http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1394
|
|
|
We are open to implementing something like this if required. We would want to avoid an API change so what we would probably do is store undeleted items with a magic date or a known invalid date string. This would be a breaking change for existing data so it would be an option, possibly in the SimpleDB connection string, and you would have to run some sort of script to migrate existing items. However, this change is still non-trivial for us (I can't even promise it's possible) and we would need to understand what the impact is on your application. We would suggest that you run some tests to evaluate the performance impact, along the lines of: 1. Create a pair of domains and data sets, representative in size and volume of your application data sets, and each including a "DeleteTime" attribute. (I'm deliberately avoiding using the LightSpeed DeletedOn name here so that we can more precisely compare the queries.) Populate them with identical data except that in one domain, DeleteTime is set to null, and in the other, DeleteTime is set to, say, '0001-01-01T00:00:01' (for a representative subset of the items; in both cases you'll also want some items to have 'deleted' times). 2. On the first domain, run a Find<>(/* representative criteria */ && Entity.Attribute("DeleteTime") != null) and time the query. (By "representative criteria" I mean a query that is fairly typical of your application.) 3. On the second domain, run a Find<>(/* representative criteria */ && Entity.Attribute("DeleteTime") != new DateTime(1,1,1,0,0,1)) and time the query. (Same criteria of course.) This will give you some idea of how much performance difference you would realise if we were to change the implementation of soft delete; we can then take the discussion from there to determine whether the effort is worth while (and to reiterate it may not even be practically possible, I would need to dig through the code in more detail to be sure). |
|
|
Er, in steps 2 and 3, those != comparisons should be ==, so that we get the non-deleted items rather than the deleted ones. Sorry! |
|
|
ok, ran some tests on some test data, it is not our full amount of data, but it did bring some interesting results.... it seems like the response times are inconsistent for queries with 'is null' in them, but consistent for queries that use values. My guess is that SimpleDB uses an index for the queries that do not have an 'is null' in them, so value queries will have consisten execution times, where 'is null' queries will not. also, it seems like the more predicates that are put into the query, the smaller the difference between the two becomes... I used straight SimpleDB scratchpad calls so I could see the response time. I waited 10 minutes between each test so that the internal SimpleDB cache expired between each execution. 737 active items, 737 deleted items -LEGEND- 1 if you need us to, we can populate a larger amount of test records this week... let us know.. Thoughts? Thanks, -Joe Freeman
|
|
|
Thanks for that info Joe. Based on the results you're seeing, my feeling is that it's not worth making the change to the DeletedOn implementation. In the simple case you are seeing an improvement from 0.0010s to 0.0009s; in more complex cases the difference is slightly smaller (and almost vanishes once sorting becomes involved). I appreciate that the 0.0001s improvement does represent 10%, and that under heavy load all those tenths of a millisecond do add up, so if you do feel from your real environment that the impact is sufficient to affect your users then of course we're open to discussion! But if performance under load is the issue then I suspect there may be other optimisations that will be more useful (e.g. stevel's comment about using async requests behind the scenes). |
|
|
we will put together a larger dataset and see what happens.... Thanks! -Joe |
|
|
We have been running some reports, logging and tests... There is currently a differential of about .00001 BoxUsage per request between having the DeletedOn = null and not having it. With our current application activity, this comes down to about $7 a day extra SimpleDB charges ($210 a month) So, not too bad... But will probably increase as the year goes by and usage and date size increase. I will continue to update this thread with more info... Looking forward to 3.0! Thanks, -Joe |
|