Indexes load at startup

In this article we discuss how indexes are loaded at startup and the implications on incoming queries and management of the search instance.

In older Sphinx versions, if preopen option was set, indexes would been pre-read and loaded into memory. During this time, the daemon would refuse accepting any incoming connection. If for small indexes this was not a big problem as they would load fast. But for a huge index which would need to load tens or even hundreds of GB, this was a real problem as it could take several minutes or more to finish the loading.

In Sphinx 2.3 this changed, as index files are memory-mapped (with mmap) rather than simply read and loaded into memory. With mmap, a shadow thread of the daemon and not the main one reads first one byte for each page and then continues reading the rest of the data, slowly bringing everything in the RAM but also allowing access to the data. In short, this is a lazy loading of index files.

This way the daemon can accept connections as soon as is up. If a query needs to access some random data that is not loaded yet in memory, it will read it from disk. This also means the query could execute slower than usual (depending on storage performance) but at least the daemon can provide results. Even more, as soon as the files of an  index are loaded, the daemon can offer ‘full speed’ for it even if the other indexes are still loading.

While this is good news for most users, for some the ‘lazy’ loading might not be best. For example a load balancer  using simple strategies (random or round robin) which don’t measure response times  will add the rebooted instance as soon as the daemon is up. As the queries will require reads directly from disk, the instance will provide higher or a lot higher response time than usual, something that is not desired. Instead it would be better if possible for the instance to be added to the cluster only when daemon finished loading the indexes.

To cover this situation an option for searchd exists since 2.3, but was unfortunately it was  undocumented  until recently:  ‘–force-preread‘. Using this option the daemon will still mmap the index files, but the preread is a blocker operation (just like in old versions) and searchd will not respond until everything is loaded in memory.

Leave a Reply

Training

Personal and team training will maximize them performance. 

Custom development

Need cone custom or individual features?

Fill the form and don’t forget to make the description of what you need.

Free config review

There are often optimizations that can be made to a Sphinx / Manticore setup by changing some simple directives in the configuration or making quick changes to an index definition.

Some common mistakes and issues can include:

  • doing main+delta without kill-lists, even if the delta does include updated records found in the main
  • using wildcarding with very short prefix/infix which can hammer performance in some cases
  • disabled (unintentional) seamless rotates and getting stalls on index rotations
  • adding texts as string attributes even if they are not using for any kind of operation (filtering, grouping, sorting) or mandatory to be present in results
  • using deprecated settings 

Having a quick look on the configuration can show issues or potential issues, this is why we want to offer a gift to our growing community!

When uploading your configuration file, we recommend to remove any database credentials first.

We suggest also you give as many possible details about your setup: how big is the data you have, how typical queries look and what issues you experience.

Contact us