Register
Login
YOUR CART IS EMPTY

Eager loading on MySQL performs badly

Boy does time fly!

This thread looks to be a little on the old side and therefore may no longer be relevant. Please see if there is a newer thread on the subject and ensure you're using the most recent build of any software if your question regards a particular product.

This thread has been locked

This thread has been locked and is no longer accepting new posts, if you have a question regarding this topic please email us at support@mindscape.co.nz

N4L
17 posts

Hi,

We have been looking closely at performance of various queries as our database is growing. The biggest problem we have noticed is that simple eager loading frequently creates queries which are doing full table scans and performing very badly. Here is the schema used with the examples below:

CREATE TABLE `Identity` (`Id` int(11) NOT NULL, `Name` varchar(45) DEFAULT NULL, PRIMARY KEY (`Id`), KEY `IX_Name` (`Name`)) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `Profile` (`Id` int(11) NOT NULL,`IdentityId` int(11) NOT NULL, `Body` varchar(45) NOT NULL, PRIMARY KEY (`Id`), UNIQUE KEY `IdentityId_UNIQUE` (`IdentityId`), CONSTRAINT `FK_Profile_Identity_IdentityId` FOREIGN KEY (`IdentityId`) REFERENCES `Identity` (`Id`) ON DELETE NO ACTION ON UPDATE NO ACTION ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

In the LightSpeed model we have [EagerLoad] specified on Identity.Profile, and we run this query:

var identities = unitOfWork.Identities
  .Where(x => x.Name == ("Jim"))
  .ToList();

The following SQL is included in the query batch to eager load the profiles for the matching identities:

SELECT
  `Profile`.`Id`,
  `Profile`.`Body`,
  `Profile`.`IdentityId`
FROM
  `Profile`
WHERE EXISTS (
  SELECT
    `Identity`.*
  FROM
    `Identity`
  WHERE
    `Identity`.`Id` = `Profile`.`IdentityId` AND
    `Identity`.`Name` = 'Jim'
);

MySQL isn't able to optimize this (see: http://dev.mysql.com/doc/refman/5.0/en/subquery-optimization-with-exists.html) and it ends up running the subquery for every row in Profile, which is getting worse the more our data grows. Here is some of the output from EXPLAIN:

id  select_type         table     type    rows
1   PRIMARY             Profile   ALL     10
2   DEPENDENT SUBQUERY  Identity  eq_ref  1

I can manually rewrite the query to get rid of the EXISTS and it performs much better:

SELECT
  `Profile`.`Id`,
  `Profile`.`Body`,
  `Profile`.`IdentityId`
FROM
  `Profile`
    INNER JOIN
  `Identity`
    ON `Identity`.`Id` = `Profile`.`IdentityId`
WHERE
  `Identity`.`Name` = 'Jim'

And gives the EXPLAIN output showing that the index on Identity.Name has been used and no unnecessary data is being touched:

id  select_type  table     type    rows
1   SIMPLE       Identity  ref     1
1   SIMPLE       Profile   eq_ref  1

Note that querying by Identity.Id does not have this issue as it does not need to load in the related table at all. It happens when you filter by something else on the root table. These examples were run with:

.NET 4.5.1
MySQL 5.6.16
LightSpeed 5.0.2953.0

I can't see any way for us to customise the eager loading or create our own query batches, how would you suggest we tackle this?

Thanks
Barry

Posted on Oct 13 2014

N4L
17 posts

Related to the above, we have been looking at the way we are logging query timings. On this page of the docs it outlines a basic way to measure query performance and detect slow queries:

http://www.mindscapehq.com/documentation/lightspeed/Testing-and-Debugging/Profiling

This works fine until eager loading is involved. When a query batch for eager loading is executed, the time returned appears to the be time taken to execute the first command in the batch only. There can still be a significant delay when NextResult is called on the data reader. Due to the above issues and the ordering of the commands by LightSpeed the quickest command is usually first in the batch and the query times are greatly under reported.

We are experimenting with using an interceptor to wrap any IDbCommand and IDataReaders generated and measure time until the reader is disposed. I understand this is similar to what the mini profiler does. This definitely includes time for all queries in the batch and appears to more accurately represent our DB performance.

The only caveat is that I'm not sure how much LightSpeed processing (loading objects to the unit of work etc) is included in this time, though I would expect this to be relatively short compared to SQL and network time. What would your recommendation for measuring this be?

Thanks again :)

Posted on Oct 15 2014

jeremy
3,400 posts

Yes using an Interceptor is going to give you a clearer picture of the overall timings measure client side, the times returned via the CommandLog object refer to the time taken at the server.

In term of measuring the overall processing time (object instantiation times could be significant if you are dealing with a large number of objects or a large graph of objects) we dont really have a facility for this. One thought would be you could measure the overall call time surrounding a query and subtract the server call time. Im guessing though that this isn't going to be that viable unless you have some centralised approach to your queries where you could hook in the surrounding timing code.

Jeremy

Posted on Oct 15 2014

N4L
17 posts

To be clear, the point was that the time reported via the CommandLog object is not actually accurate even for the server side time in the case of a single command returning multiple result sets, e.g. an eager load batch.

Client side processing done by LightSpeed doesn't seem to a problem so we're not too worried about that. As a hint to others, if you add a custom IEntityFactory you can count the number of entities being materialised which is also a good indicator of potential issues as it will highlight any queries selecting more rows than are sensible for a single request.

Posted on Oct 15 2014

Eager loading on MySQL performs badly

Data Products

DevOp Tools

Visual Controls

Popular Products

Quick Links