Plain indexes replication

Manticore Search (as well as Sphinx) doesn’t yet support replication for plain or RT indexes out of the box (although we’re working on this, if you want to be a beta tester let us know at info@manticoresearch.com), so you have to implement it by yourself if you need to have a copy of your Manticore Search / Sphinx data somewhere else. Why you may need it:

  • scalability: you want to balance load on your servers (e.g. you can send half of all Manticore queries to one server and the rest to another one) to increase throughout, decrease latency or load on the servers
  • high availability: you want to have Manticore index replicas that are immediately available once the main Manticore storage becomes unavailable for some reason
  • you want to combine the both of the above with automatic problems detection and switching between servers (e.g. when one server crashes another one automatically starts handling all of the queries)

The easiest way to do plain indexes replication is just copy your Manticore config to another server and index everything there using ‘indexer’ there, but what is bad in this case is that first of all it’s waste of resources needed to rebuild the index and secondly it’s difficult to provide  good data synchronization level: Manticore plain indexes will be rebuilt separately from the same source (e.g. your main database), they should be identical in the end, but it may happen with some latency, because some server is a bit more loaded or you update the source intensively, there may be a number of reasons.

Here’s how we can do it another way which would make sure the replicas are 100% identical and the data appearance latency is minimal:

  1. make indexing only in one place (let’s call it MASTER)
  2. use rsync or smth else to copy to SLAVEs (or from MASTER in case you run it on the slaves)
  3. tell Manticore in all the places it should start using the new rebuilt data

Here’s an example of how it can look:

Indexation on the master

The trick here is to make a new “pseudo” index fully inherited from your normal index with only “path” modified (see idx_new below):

index idx {
    path = sphinx_tmp/idx
    source = src
}

index idx_new:idx {
    path = sphinx_tmp/idx_new
}

The “new” index can be called differently or have a different path, it doesn’t matter, we just need to know where the new index will be placed.

Build the “new” index:

[snikolaev@dev01 ~]$ indexer -c sphinx_replication.conf idx_new
Manticore 2.6.1 9a706b4@180119 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2018, Manticore Software LTD (http://manticoresearch.com)

using config file 'sphinx_replication.conf'...
indexing index 'idx_new'...
collected 3 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 3 docs, 29 bytes
total 0.004 sec, 7045 bytes/sec, 728.86 docs/sec
total 7 reads, 0.000 sec, 13.7 kb/call avg, 0.0 msec/call avg
total 12 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg

After that you should have smth like this in your indexes dir:


[snikolaev@dev01 ~]$ ls -la sphinx_tmp/
total 460
drwxrwxr-x 2 snikolaev snikolaev 4096 Feb 26 01:09 .
drwx------+ 149 snikolaev snikolaev 405504 Feb 26 01:09 ..
-rw-r--r-- 1 snikolaev snikolaev 112 Feb 26 01:09 idx_new.spa
-rw-r--r-- 1 snikolaev snikolaev 24 Feb 26 01:09 idx_new.spd
-rw-r--r-- 1 snikolaev snikolaev 1 Feb 26 01:09 idx_new.spe
-rw-r--r-- 1 snikolaev snikolaev 363 Feb 26 01:09 idx_new.sph
-rw-r--r-- 1 snikolaev snikolaev 57 Feb 26 01:09 idx_new.spi
-rw-r--r-- 1 snikolaev snikolaev 0 Feb 26 01:09 idx_new.spk
-rw-r--r-- 1 snikolaev snikolaev 0 Feb 26 01:09 idx_new.spm
-rw-r--r-- 1 snikolaev snikolaev 8 Feb 26 01:09 idx_new.spp
-rw-r--r-- 1 snikolaev snikolaev 35 Feb 26 01:09 idx_new.sps
-rw-r--r-- 1 snikolaev snikolaev 112 Feb 26 01:07 idx.spa
-rw-r--r-- 1 snikolaev snikolaev 24 Feb 26 01:07 idx.spd
-rw-r--r-- 1 snikolaev snikolaev 1 Feb 26 01:07 idx.spe
-rw-r--r-- 1 snikolaev snikolaev 363 Feb 26 01:07 idx.sph
-rw-r--r-- 1 snikolaev snikolaev 57 Feb 26 01:07 idx.spi
-rw-r--r-- 1 snikolaev snikolaev 0 Feb 26 01:07 idx.spk
-rw------- 1 snikolaev snikolaev 0 Feb 26 01:09 idx.spl
-rw-r--r-- 1 snikolaev snikolaev 0 Feb 26 01:07 idx.spm
-rw-r--r-- 1 snikolaev snikolaev 8 Feb 26 01:07 idx.spp
-rw-r--r-- 1 snikolaev snikolaev 35 Feb 26 01:07 idx.sps

idx.* is your normal index files and idx_new.* is the “new” index files.

Copying the index

Now that you have the new index built you need to deliver it to all the places you want it to be running at. You can use whatever you want for this: rsync, scp, ftp, samba etc.

scp sphinx_tmp/idx_new.sp* SLAVE:sphinx_tmp/

Rotating the indexes

Now that you have your index propagated to all the slaves what’s left is just rotate the index so the new one takes place. Previously Sphinx allowed to do that only via sending signal HUP to the searchd instance or restarting the instance. From Sphinx 2.3.1 and in Manticore Search RELOAD INDEX is available which allows to rotate an index via SphinxQL and what’s more important to pass the path where searchd should find the index files, e.g. in our case:

[snikolaev@dev01 ~]$ mysql -P9314 -h0 -e "RELOAD INDEX idx FROM 'sphinx_tmp/idx_new'"

After that you can see that the “new” index files disappear in the dir:

[snikolaev@dev01 ~]$ ls -la sphinx_tmp/
total 432
drwxrwxr-x 2 snikolaev snikolaev 4096 Feb 26 01:15 .
drwx------+ 149 snikolaev snikolaev 405504 Feb 26 01:09 ..
-rw-r--r-- 1 snikolaev snikolaev 112 Feb 26 01:09 idx.spa
-rw-r--r-- 1 snikolaev snikolaev 24 Feb 26 01:09 idx.spd
-rw-r--r-- 1 snikolaev snikolaev 1 Feb 26 01:09 idx.spe
-rw-r--r-- 1 snikolaev snikolaev 363 Feb 26 01:09 idx.sph
-rw-r--r-- 1 snikolaev snikolaev 57 Feb 26 01:09 idx.spi
-rw-r--r-- 1 snikolaev snikolaev 0 Feb 26 01:09 idx.spk
-rw------- 1 snikolaev snikolaev 0 Feb 26 01:15 idx.spl
-rw-r--r-- 1 snikolaev snikolaev 0 Feb 26 01:09 idx.spm
-rw-r--r-- 1 snikolaev snikolaev 8 Feb 26 01:09 idx.spp
-rw-r--r-- 1 snikolaev snikolaev 35 Feb 26 01:09 idx.sps

and in the searchd log you can see that the rotate was successful (and took only 1ms):

[snikolaev@dev01 ~]$ tail -n 2 sphinx_replication.log
[Mon Feb 26 01:15:39.003 2018] [3346] rotating index 'idx': started
[Mon Feb 26 01:15:39.004 2018] [3346] rotating index 'idx': success

So what you want to do now to rotate your indexes from the copies you’ve made on all the destinations is just call the RELOAD INDEX command everywhere, e.g.:

[snikolaev@dev01 ~]$ for host in SLAVE 0; do echo $host; mysql -P9314 -h$host -e "RELOAD INDEX idx FROM 'sphinx_tmp/idx_new';"; done;
SLAVE
0

Notes

  • when you start Manticore you may see the following warning:

WARNING: index 'idx_new': prealloc: failed to open sphinx_tmp/idx_new.sph: No such file or directory; NOT SERVING

This warning is ok, because we don’t need to have idx_new served by Manticore Search, we need this index only as a temporary storage for new data.

  • time spent for the indexes synchronization may be long for huge indexes, but anyway data syncing on fs/network level is easier for your servers than using ‘indexer’ to rebuild the index from scratch.
  • remember if you do attribute updates you need to do that on all your slaves too since as I said in the beginnig there’s no high grade replication yet. Contact us if you need that and want to be a beta tester.
  • normally what you do is just have an indexation script which first prepares the new index, then copies it to all needed places and then rotates the index everywhere. This is the simplest way of doing this, in real world you may want to also do index validation, indexing on a slave too to provide indexation HA, use locks to make sure you’re not rebuiling an index on N servers at the same time and so on. We’ve done it many times for our clients. Let us know by sending an email to contact@manticoresearch.com if you need support in this area. You can see our support plans here.

As a bottom line I want to say that until there’s high-grade replication for RT indexes and many people still use plain indexes the described howto makes sense and allows to do plain indexes replication easily.

Leave a Reply