Register
Login
YOUR CART IS EMPTY

Using LS for Bulk Inserts

Boy does time fly!

This thread looks to be a little on the old side and therefore may no longer be relevant. Please see if there is a newer thread on the subject and ensure you're using the most recent build of any software if your question regards a particular product.

This thread has been locked

This thread has been locked and is no longer accepting new posts, if you have a question regarding this topic please email us at support@mindscape.co.nz

damageboy
45 posts

I'm getting som pretty horrible performance numbers with LS 2.2. Nightly on SQLite.

While I've tried SQLite with prepared SQL statements and binding parameters, and am getting reasonable performance from SQLite per-se, I'm getting bad numbers with LS.

I am doing the right thing in the sense that I'm using resonable batch sizes, Identity sizes and a KeyTable.

Stepping through the code (of the Enterprise edition), leads me to the suspicion the LS (at least 2.2 nightlies) causes SQLite to waste a lot of time by parsing again and again the same SQL queries where in theory it COULD prepare the statements once, and simply bind the parameters of the insert to it.

I'll refrain from quoting the source code in here for obvious reasons, but I would like to get some comments if I'm barking up the right tree...

It's my understanding that if LS would be using PreparedSQL statements, instread of re-generating and forcing SQLite to re-parse these statements one by one every time, I could get but better resuts.

Just as a comparison, Loading up 1,000,000 lines into an SQLite table takes 30 seconds with prepared statements (without using LS)

Loading up 250,000 lines with LS takes in excess of 100 seconds...

Posted on Nov 27 2009

ivan
5,431 posts

It's a bit more complicated than that unfortunately: because LightSpeed issues INSERT statements in batches (and there may be UPDATE, DELETE and "INSERT for other entity type" statements within that same batch), it can't just prepare a command once and then reuse that again and again. Although each INSERT statement for a given entity type is individually the same, batching means the overall SQL statement may be unique.

You're right that there would be scope for optimising the use of prepared SQL in the bulk insert scenario, but to be honest that's not really that's the area LightSpeed is focused on: it's an object-relational mapper rather than an ETL-type tool. However, I suspect it would be reasonably easy for us to maintain a small cache of "already seen" SQL statements, without impacting performance in the primary use case. I'll log this as a feature request but I'm afraid it will have to be a post-3.0 enhancement because we are really trying to get 3.0 out the door right now!

Posted on Nov 27 2009

damageboy
45 posts

Thanks for the reply Ivan.

I understand the push to 3.0 overrules all other requests right now, but I really do think that this sort of feature is an important one, especiall for us.

We currently have two entirely different production systems that do need heavy bulk loading.
While we could revert to using plain ole ADO.NET, that kind of defeats the purpose of trying to abstract away the database by using LS.

I can't see how the distinction between ETL-type tools and OR/M tools helps here. Ultimately, we have (idealy) one code base that deals with all the database abstraction, be it bulk loading or querying the data. Ideally that code base would be LS and not a combination of LS + Something else which would increase the amount of maintainance and what not...

We're more than happy with using LS from a quality/coherence of code standpoint, but we do need performance when bulk loading data.

I would think that caching frequent queries, and refraining from regenerating all the time would be a worthwhile (even if not easy) optimization to have in LS.

As a work-around perhaps, a small cache of prepared statements can be held by LS, and when the CommandText you're about to set in the SQL Command matches the "key" of the Dictionary<String,IDbCommand> you could use that, and at least save SQLite from having to reparse the whole thing from scratch.

Ultimately, I would expect LS not to take precious CPU cycles needlessly and somehow (I admint I might be not aware of all the complications) avoid even having to generate the SQL text, and "plan ahead" and know how to optmize the SQL generation, so that less time/cycles is spent in LS OR/M-ing and more time is spent somewhere else.

Posted on Nov 27 2009

ivan 5,431 posts	Okay, I will see if I can take a look at this (I'm not on the critical path for LS3). Any immediate enhancement will have to be fairly simple though -- major optimisations like recognising that we don't need to generate the SQL text are probably beyond what we can do in the short term. I'll keep you posted. Posted on Nov 27 2009

ivan
5,431 posts

A quick update on this. So far caching and reusing command objects appears to be giving us relatively little benefit for significantly increased complexity and risk (the danger of leakage increases because we can't dispose them until we know they can be uncached), though I want to revisit this because I'm not very happy that my initial results were trustworthy.

However, in the course of exploring this, I've found a couple of things that have a significant impact on performance that might be worth trying:

1. Check the IdentityBlockSize. (You mentioned block size in your original message but in my tests this initially dominated the timing so it's worth bringing up just in case.) For 250K inserts, you want to be looking at a block size of, well, ideally 250K. If your block size is down in the tens or low hundreds, you'll incur a lot of KeyTable accesses. If you know the number of items in advance, use that as your IdentityBlockSize. If you know it'll be roughly 250K but you're not sure exactly how many, try an IBS of 10000 or 20000, something like that.

2. If you're using PluralizeTableNames, get rid of it and use TableAttribute instead. We seem to be handling pluralisation rather inefficiently (I'll log a bug for this). When I changed from PluralizeTableNames=true to a TableAttribute with the plural name in it, I got a 40% improvement in save times, with the time to save 250K records dropping from 32 seconds to 19 seconds.

3. Limit the number of entities per unit of work. At 50K entities, LightSpeed takes 80% longer than raw SQLite. At 250K entities, it takes 120% longer than raw SQLite. At 500K entities, it takes 200% longer than raw SQLite. This degradation is because we need to keep all those entities in memory whereas raw bulk insert code doesn't. So creating 10 UOWs with 100K records each is going to be quicker than creating one UOW with a million records, because LightSpeed can discard the entities from the old UOWs. (Since you incur an overhead of connection setup for each new UOW, you don't want to squeeze this too far; in my tests I found a sweet spot around 25K-100K entities per UOW but this may vary with the type of data being inserted.)

4. I've also found that on SQLite, assuming the database file is on the local filesystem, lower UpdateBlockSizes are better. Below a UBS of 10, the improvement is marginal, but you definitely don't want to be using a bigger UpdateBlockSize than the default: it causes a measurable slowdown.

With these changes I was getting insert rates of around 10K records per second -- not as good as your 30K per second reusing the same command object with raw SQLite, and obviously there's plenty of room for improvement (and I realise my test entities were probably smaller than your real entities), but significantly better than the 2.5K per second that you report from your 250K record test. So hopefully this will give your some immediate improvement while we continue to see where we can improve the engine.

Posted on Dec 01 2009

Using LS for Bulk Inserts

Data Products

DevOp Tools

Visual Controls

Popular Products

Quick Links