Register
Login
YOUR CART IS EMPTY

Sharding

Boy does time fly!

This thread looks to be a little on the old side and therefore may no longer be relevant. Please see if there is a newer thread on the subject and ensure you're using the most recent build of any software if your question regards a particular product.

This thread has been locked

This thread has been locked and is no longer accepting new posts, if you have a question regarding this topic please email us at support@mindscape.co.nz

hostdude99 4 posts	Are there any plans to support sharding in Lightspeed? I have an upcomming project where there are concerns about quick scalability. I'd love to use Lightspeed but can't see how to extend it to deal with sharding across database servers. Posted on Apr 29 2010

ivan 5,431 posts	It's been on the wish list for a while now, but there isn't any built-in support at the moment. It is a feature we'd like to add, so if your specific requirements are relatively simple then we could certainly look at implementing some level of support (though we can't make any promises until we understand what those requirements might be). Posted on Apr 29 2010

hostdude99
4 posts

The problem one runs into with sharding is how to keep the object graph together or dealing with the issues caused by cross server requests. There are three approaches to sharding that I have had to take in the past. The simplest approach, and the one we need for this particular project, is the ability to segregate a group of entities by an "account" identifier. This way, in a tenancy approach, requests can be sent to the appropriate server based on the account making the request. All the entities for the domain could live on any server but the entities within the same account would always be found on the same server. This alleviates many of the performance concerns in the SQL, but makes it difficult to truly scale out.

The second approach I've used is to have the ability to move entities to different servers but keep all entities of the same class on the same server. So all orders would be on one server and all customers would be on another server, for example. This creates problems with eager loading and other performance techniques, but allows more scalability flexibility than the first option.

The third approach is the most difficult and would be almost impossible to achieve in any type of performant basis but is also the most flexible and scalable. Basically, every record can be stored on any server. The determination of which server an entity is stored on is based upon the primary key and the entity. An algorithm parses out the primary key (usually a GUID) and parses that out to determine which server a record is located on. Aggregate queries basically need to be run against all servers.

Anyway, we are interested in option #1 for this particular project.

Thanks.

Posted on Apr 29 2010

ivan
5,431 posts

In your option #1, would it be the case that any given unit of work will involve only one "account"? E.g. in a multi-tenant CRM system, Alice has her set of customers that would all be stored on one shard, and Bob has his set of customers that might be stored on another shard but again would all be within one shard. There's no scenario where you need to load both Alice and Bob's customers into a single unit of work. I.e. effectively each server is completely independent and the only issue is figuring out which server to connect to for any given operation. Is that correct?

Do you need to break entities down further within an account, e.g. Alice's customers are stored on Server A but her products are stored on Server B? Are there shared entities that live somewhere else e.g. customers are stored in shards but reference data is stored on a single server?

Sorry if these are dumb questions -- just trying to get a picture of how much work we'd need to do to meet your requirement, and how well it will fit into the existing architecture.

Posted on Apr 29 2010

hostdude99
4 posts

In option #1, everything would have to live within the uow - crossing "account" boundaries should never occur (would be a nice security feature too as you could always be sure that you are not hitting the wrong account's data). So, yes, you have the right assumption. This is the easiest and simplest method, I believe, for an ORM to provide sharding in a scalable fashion in a multitenant environment.

It would be wonderful if you could break entities out into their own servers but I think that your complexity level increases by several orders of magnitude. I don't need it for this project, but if you really wanted to implement sharding in your ORM, that would be a killer feature. So basically, Alice's account data could like on one server and here order data could live on another.

By the way, NHibernate currently has a sharding project underway. Not sure how far its gotten, but you might want to see what they are doing for reference purposes.

Posted on May 01 2010

ivan
5,431 posts

This isn't as elegant as having sharding built in, but it looks like you could do this in application code by creating a separate LightSpeedContext for each shard, with just the connection string different between them. (LightSpeedContexts are cheap, and are typically static, so the overhead of keeping multiple LightSpeedContexts around is negligible.) Thus you have a pool of contexts instead of the normal single context. Then when you need to create a unit of work, figure out which context to use based on the account of the logged-in user, and create the UOW from that context. Obviously you could hide all of this in a repository class.

Obviously the downsides of this are (a) it puts the onus on you to write the account-to-shard mapping function, rather than having a built-in load balancer or whatever, and (b) it requires that you can determine the account from some external information, e.g. the user login, and this assumption will fail in e.g. a n-tier environment where the application tier sends the service tier only the ID of the entity (not the account that the entity belongs to). And there may be other issues that I'm missing. The upside is that it's something you can implement immediately, removing or minimising your dependency on us doing work in the core.

Do you think this approach might suffice for your requirements?

Posted on May 04 2010

hostdude99
4 posts

Sorry I am just getting back to you but I got pulled off onto another project and had to drop this issue. This actually should work. We pass in the user id to the service layer, but all requests are made to a specific host (accountname.serviceurl...) and we grab the appropriate account based on the hostname the request was made to. So I think your solution will work. We will give it a shot. Thanks.

Posted on May 31 2010

Sharding

Data Products

DevOp Tools

Visual Controls

Popular Products

Quick Links