This thread looks to be a little on the old side and therefore may no longer be relevant. Please see if there is a newer thread on the subject and ensure you're using the most recent build of any software if your question regards a particular product.
This thread has been locked and is no longer accepting new posts, if you have a question regarding this topic please email us at support@mindscape.co.nz
|
Hello, We are using Lightspeed along with the default search provider and at one customer site we noticed that the search index grows to have over 5K files. Once that happens, Lightspeed slows down significantly and ultimately slows the entire application.
I know the question is a little vague, but I am wondering if anyone experienced similar behavior and can point us in the right direction?
Thanks |
|
|
Hi Gild, What's the total file size of the files combined? If you do a call to the search engines Rebuild() method, does the app perform more quickly (you should even see a reduction in files)? If you have a large application it's likely that you will want to configure the Lucene Search Engine parameters to suit your data volume. I can help you do that. There is a call in Lucene for optimizing the search store. It is likely that you'll want to run this from time to time as a scheduled task (I can help with this too later once I better understand your situation). Let me know how you get on with the Rebuild() call. Kind regards, John-Daniel Trask |
|
|
Thank you for your reply. Yes, calling Rebuild() reduces the number of file to around 4-5 and does improve the overall performance of the application. I did not record the total size of the files before the folder was cleared, but I will try to log that the next time it happens. We are dealing with a large application but the data set at this point is still small ~10K records. I am very interested to learn more about optimizing Lucene parameters.
Thanks, Gil |
|
|
Hello John-Daniel, Just wanted to bump this as I need to get back to the customer. What are your thoughts on this issue? Is this a known limitation of the search engine? Also, Where would I be able find more information about optimizing Lucene?
Thanks for your help, Gil |
|
|
Hello Gil, Thanks for the reminder. The key values revolve around setting the mergefactor and mergedoc numbers. You can read the API docs here: http://lucene.apache.org/java/1_4_3/api/org/apache/lucene/index/IndexWriter.html You'll need the source to our search engine implementation. If you don't have a source code license, please email me: jd@mindscape.co.nz and I'll send you the source to the Lucene search engine implementation we use so that you can tweak the values for your environment. I'm unsure how fast the files are building up for you, but what I would recommend is setting up a sheduled task that runs often enough that calls for the index to be Optimized: http://lucene.apache.org/java/1_4_3/api/org/apache/lucene/index/IndexWriter.html#optimize() This is the function which collapses down the file count and makes things fast again. We should look at shipping a small tool that wraps just that function for people to setup. If your data volumes are high enough, you may want to be careful to ensure that the task runs at a lower priority than your application so as to not impact performance of your system. You'll have a feel for how often it needs to run, but generally I'd advise that if the number of files starts to exceed approximately 15 or so it's a good time to run. Tuning the merge factor information should also help it not explode the file count quite so fast (don't set it crazy high though as you'll still want to be optimizing relatively often). The documentation is for the Java API as Lucene is a Java framework however we use a .NET port of it as it's a great tool. You can reference in the assembly we use and likely infer very quickly from our search engine source how to call the Optimize function on your index with just a few lines of code. I hope that helps, John-Daniel |
|
|
Thank you for your detailed reply. We will use the Rebuild function as a short term solution until we experiment with the different Lucene options.
Thanks, Gil |
|