Manticore Search kill-list feature

Plain indexes text data is immutable,  this means to refresh the data we need to issue a full reindexing. In many cases, the reindexing can take a long time. For that, a main+delta schema is used.

The concept assume a big index that holds a snapshot of the data at a given time and a smaller index, which holds the changes (delta) from the snapshot time to a more present date.  As the latter is smaller, it can be reindexed more frequent.  The delta  changes can be new records, updated or deleted records. Updated or deleted records introduce an issue: when the engine searches in both indexes, it doesn't know whenever a record in the main index is not actual anymore. This leads to keep showing records that are actually deleted or (in case of updated records) to include old versions of records that instead of newer ones from the delta index.

To overcome this, the kill-list feature has been introduced. The kill-list defines a list of document IDs in the delta index which tells the engine that those records should be ignored on previous indexes.

sql_query_killlist = \
    SELECT id FROM documents WHERE updated_ts>=@last_reindex UNION \
    SELECT id FROM documents_deleted WHERE deleted_ts>=@last_reindex

In this example we include in kill-list documents IDs updated since @last_reindex, the date when last main index occured and also the deleted documents IDs.  The documents_deleted table can be filled manually when a record in documents is deleted or a trigger can be used.

An important thing to remember about kill-list is that the removals are made on preceding indexes in the order they are declared.

If you are doing a sequential search on the indexes, the delta must come after the main index.

   mysql> SELECT * FROM main,delta WHERE MATCH('...');

The same applies if we are using multiple deltas (like delta_daily, delta_houly), the sequence should be main,delta_daily,delta_hourly and not main,delta_hourly,delta_daily.

The kill-list is used also in local distributed indexes and and the indexes defined order matters in this case too, even if we do parallel processing (using dist_threads > 0) of the local indexes:

  index dist 
  {
     local = main
     local = delta
  }

 

The article is based on "SphinxSearch kill list feature" by Yaroslav Vorozhko https://www.ivinco.com/blog/sphinxsearch-kill-list-feature/ and publication is authorized by the owner.

Leave a Reply

Training

Personal and team training will maximize them performance. 

Custom development

Need cone custom or individual features?

Fill the form and don’t forget to make the description of what you need.

Free config review

There are often optimizations that can be made to a Sphinx / Manticore setup by changing some simple directives in the configuration or making quick changes to an index definition.

Some common mistakes and issues can include:

  • doing main+delta without kill-lists, even if the delta does include updated records found in the main
  • using wildcarding with very short prefix/infix which can hammer performance in some cases
  • disabled (unintentional) seamless rotates and getting stalls on index rotations
  • adding texts as string attributes even if they are not using for any kind of operation (filtering, grouping, sorting) or mandatory to be present in results
  • using deprecated settings 

Having a quick look on the configuration can show issues or potential issues, this is why we want to offer a gift to our growing community!

When uploading your configuration file, we recommend to remove any database credentials first.

We suggest also you give as many possible details about your setup: how big is the data you have, how typical queries look and what issues you experience.

Contact us