This thread looks to be a little on the old side and therefore may no longer be relevant. Please see if there is a newer thread on the subject and ensure you're using the most recent build of any software if your question regards a particular product.
This thread has been locked and is no longer accepting new posts, if you have a question regarding this topic please email us at support@mindscape.co.nz
|
I have been evaluating LightSpeed with VistaDb and I have found some interesting issues regarding performance with respect to use of IdentityMethod KeyTable vs IdentityColumn and response and memory usage. All test were run on the same machine using a simple database with one table. I used the June 3 nightly build on a core 2 duo laptop. 2 side observation regarding the nightly build and Keytable: 1 - at first I used OpenMode = Exclusive Write but I got a Database is in use error and a reference to vistaDb express build 77 (which is most certainly not on my system. I changed to non exclusive write and the error disappeared. 2 - From a previous post I found that I would need to change the IdentityBlockSize as each allocation takes between 150 to 200ms so I used a value of 1000 Using KeyTable: I created a using loop with a UnitOfWork and created 100000 entities in a for loop I printed out the timing and set a readline so that I could observe the system memory usage the timing and memory use was great approx 4 seconds and 50mB of memory use Then I used the UnitOfWork SaveChanges Method and the memory use jumps to 300Mb++ I then tried a loop within a loop where I created batches of inserts at 2000 records per SaveChange and the memory use drops to approx 100 to 120Mb and record write is aprox 1000 records per second Using IdentityColumn: the same for loop (and exclusivewrite????) the timing is 2 seconds and 50Mb The SaveChanges memory use jumps to approx 90 to 100Mb and overall speed is 700 items per second overall not bad. This memory use issue is not new to me I have seen this with my last Orm and the issue is that the application/orm and vistadb are running in the same memory space so the use of 2000 record batch inserts has been the workaround. The interesting thing is that if I use the Identity column I get acceptable results compared to the keytable.
|
|
|
One thing you might want to try in the KeyTable scenario is experimenting with different values for UpdateBatchSize. By default, when you do a SaveChanges, LightSpeed performs 10 inserts (or updates, or deletes) in each database request, so 100000 records will result in 10000 database requests. Increasing the UpdateBatchSize to, say, 25 would cut that down to 4000 database requests. But see below. (Also note there is a downside to increasing it too far: see http://www.mindscape.co.nz/blog/index.php/2008/09/18/saving-large-numbers-of-entities-in-lightspeed/ for guidance.) My suspicion is that this will probably not affect memory consumption. My guess (and it is only a guess) is that this is related to VistaDB needing to store up the pending inserts for a transactional commit. Performing fewer inserts per transaction (i.e. per SaveChanges) results in there being fewer pending inserts at any given time, hence less memory blowout. Changing UpdateBatchSize will not affect the overall size of the transaction, just deliver it in different-sized pieces, therefore will probably not affect memory usage. While
I'm making a fool of myself in public, I'll also speculate about why
you're seeing good results for IdentityColumn compared to KeyTable.
Normally IdentityColumn performance is weak because it defeats batching: LightSpeed has to issue a request for each insert, resulting in ten or more times as many trips to the database. That's a big deal if the database is in another process or across a network. But with an inprocess database like VistaDB, the cost of sending a request is much smaller. So defeating batching should not have as much of an impact as it does in remote scenarios. (Conversely, tweaking UpdateBatchSize may not help as much as it can do in remote scenarios.) Don't have any numbers to back this up though! |
|
|
Experimented with the values and came to a happy median of keytable 200 keys 50 batch update able to create 900 records and uses 25mB average. Now I have the following issue: we use build 84 and if we use IdentityColumns then we can use exclusive write BUT not with keytable
|
|
|
Hmm, I think what is happening here is that VistaDB is objecting to LightSpeed trying to initiate a transaction to get keys (and update the key table) while already having a transaction in progress for the unit of work. Possibly they are interpreting "exclusive write" to mean "only one transaction" rather than "only one connection." This sounds like a VistaDB issue and you might want to check in with them to see if that is indeed the case and whether it is by design (to my eyes, the stack trace implies it may be a bug due to not reusing the file handle when in exclusive-write mode even on the same connection; but it may be deliberate). If so let us know and we will see if we can come up with a workaround. |
|
|
Ivan, I will take it up with VistaDb early next week. Thank you for your help. If something comes of it I will post back. Cheers, |
|
|
Ivan, The following link discusses the use of tranactions: http://www.vistadb.net/blog/post/2009/04/24/Transactions-can-hurt-performance.aspx
|
|
|
I have looked into this a bit further and it looks like we are actually using two connections -- one for the unit of work and one for the key table. Sorry for the confusion. So VistaDB's exclusive write mode seems to imply only one connection at a time (not one transaction at a time as I suggested earlier). But this still conflicts with the ExclusiveReadWrite documentation which says "single process." In our case the two connections are from within a single process so they should both succeed. So I would still suggest that you check this out with the VistaDB folks. As background, as a general rule, we do need to keep the KeyTable query separate from other transactions -- e.g. if an insert failed we would not want the key table update to be rolled back (because the IDs could have been used by other inserts which *have* succeeded) -- and using a separate connection is the only way to guarantee this in general because connections owned by UOWs could be in transactions. So if the VistaDB folks say that ExclusiveReadWrite means a single connection rather than a single process, we would need to either special-case this scenario (and I am not sure how we could make it reliable given that there might be transactions on the sole connection), or just document an incompatibility between KeyTable and Exclusive mode. Hopefully VistaDB will be able to address the "single connection vs. single process" issue though and we will not need to make this choice... |
|
|
Ivan, I will send this to VistaDb. Although I still am a little mystified by the entire need for transactions in databases such as VistaDb where there is no real multi user connections. Cheers, |
|
|
The reason for transactions in a single-user database is atomicity rather than isolation. For example, consider a SaveChanges which saves 100 entities, and suppose we are doing them in batches of 20. Then there are 5 SQL commands being executed. It is possible that the first three of those commands could succeed, and the fourth fail. Without a transaction, that would leave the database in an unhappy intermediate state. This might be only slightly annoying in a bulk insert scenario, as you could just examine the database to see which records were inserted and try again with the others, but it gets much more troubling when you have associations and updates to consider e.g. suppose we were updating 100 Order rows to have a new CustomerId. Then 60 of those rows have been assigned to the new Customer but 40 are still on the old Customer. This could rapidly get quite hard to untangle, especially once you start considering that there may have been modifications to other tables (e.g. Customer) in there as well! Using a transaction around the 5 SQL commands ensures that the database never gets saved in such an "intermediate" state. Hope this makes more sense now! However as noted I think the KeyTable issue is down to connections rather than transactions -- so this may be a red herring anyway! |
|
|
First I agree it is likely the number of connections and not the transactions that caused the problem I saw. But I still have some concerns about the scenario you propose what would cause the failure of the command? I am by no means a database expert BUT from my understanding a tranaction is intended to wrap an operation that involves multiple tables such where foreign keys or cascaded delete's. In other words where tasks involve the database to do more than a simple delete, update or insert. In the strictest sense if it was possible for a simple insert batch as you outlined could fail then I would have to question the sanity of using a sql database. This is not to say that a tranaction is not an essential part of sql operations such as associative updates or cascaded deletes. To conclude we have chosen to use the sql database a a data store any relational operations that involve foreign key updates or cascaded deletions are done in our application. Undoubtedly this flys in the face of the accepted logic that the database be left to do this type of work. And there are likely performance penalties to be paid but working around the various peculiarities of a sql database and the flavour of SQL they support does make for its own set of issues.
|
|