Manticore 2.7.5 vs Sphinx 3.1.1

Hi

Here we benchmarked Sphinx 3.0.2 vs Manticore 2.6.2. This was 8 months ago and both Manticore and Sphinx changed since then. As it’s said in Sphinx 3.0.3 announcement Sphinx 3.0.3 is up to 2x faster compared to 3.0.2, so it’s interesting to do another benchmark. This time let’s test on a real dataset – Hacker News comments.

The benchmark was conducted with the following conditions:

  • Hacker News curated comments dataset of 2016 in CSV format
  • OS: 16.04.4 LTS (Xenial Xerus), kernel: 4.15.0-30-generic
  • CPU: Intel(R) Xeon(R) CPU E3-1275 v5 @ 3.60GHz, 8 cores
  • 64G RAM
  • HDD
  • Docker version 17.05.0-ce, build 89658be
  • Base image for indexing and searchd – alpine:3.6
  • Manticore Search was built there, Sphinx binaries were downloaded from the site since there’s no open source to build from
  • stress-tester for benchmarking

The config is identical for Manticore and Sphinx:

source full
{
  type = csvpipe
  csvpipe_command = cat /root/hacker_news_comments.prepared.csv|grep -v line_number
  csvpipe_attr_uint = story_id
  csvpipe_attr_timestamp = story_time
  csvpipe_field = story_text
  csvpipe_field = story_author
  csvpipe_attr_uint = comment_id
  csvpipe_field = comment_text
  csvpipe_field = comment_author
  csvpipe_attr_uint = comment_ranking
  csvpipe_attr_uint = author_comment_count
  csvpipe_attr_uint = story_comment_count
}

index full
{
  path = /root/idx_full
  source = full
  html_strip = 1
  mlock = 1
}

searchd
{
  listen = 9306:mysql41
  query_log = /root/query.log
  log = /root/searchd.log
  pid_file = /root/searchd.pid
  binlog_path =
  qcache_max_bytes = 0
}

 

Indexation

Indexation took 682 seconds for Manticore and 645 seconds for Sphinx:

Manticore:

indexing index 'full'...
collected 11654429 docs, 6198.6 MB
sorted 1115.7 Mhits, 100.0% done
total 11654429 docs, 6198580642 bytes
total 681.732 sec, 9092389 bytes/sec, 17095.30 docs/sec
total 11676 reads, 1.811 sec, 452.6 kb/call avg, 0.1 msec/call avg
total 9431 writes, 4.878 sec, 1065.3 kb/call avg, 0.5 msec/call avg

Sphinx:

using config file '/root/manticore.conf'...
indexing index 'full'...
collected 11654429 docs, 6198.6 MB
sorted 1115.7 Mhits, 100.0% done
total 11654429 docs, 6.199 Gb
total 645.7 sec, 9.600 Mb/sec, 18049 docs/sec

So on this data set and index schema Manticore indexes slower than Sphinx by  5.6%.

Performance tests

The both instances were warmed up before testing. The indexes were of the same size and were fully in OS cache:

Manticore:

snikolaev@dev:~$ sudo ls -lah /var/lib/docker/volumes/4702617c7b0d970ba660514053706125fd88f2a4b7fce7222aab1f53fba2b56d/_data/
total 4.6G
drwx------ 2 root root 4.0K Jan 25 04:50 .
drwxr-xr-x 3 root root 4.0K Jan 25 04:43 ..
-rw-r--r-- 1 root root 362M Jan 25 03:51 idx_full.spa
-rw-r--r-- 1 root root 3.1G Jan 25 03:56 idx_full.spd
-rw-r--r-- 1 root root 27M Jan 25 03:56 idx_full.spe
-rw-r--r-- 1 root root 601 Jan 25 03:56 idx_full.sph
-rw-r--r-- 1 root root 6.3M Jan 25 03:56 idx_full.spi
-rw-r--r-- 1 root root 0 Jan 25 03:51 idx_full.spk
-rw------- 1 root root 0 Jan 25 04:50 idx_full.spl
-rw-r--r-- 1 root root 0 Jan 25 03:51 idx_full.spm
-rw-r--r-- 1 root root 1.1G Jan 25 03:56 idx_full.spp
-rw-r--r-- 1 root root 1 Jan 25 03:56 idx_full.sps
-rw-rw-r-- 1 root root 750 Jan 25 03:27 manticore.conf
lrwxrwxrwx 1 root root 11 Jan 25 03:57 query.log -> /dev/stdout
lrwxrwxrwx 1 root root 11 Jan 25 03:57 searchd.log -> /dev/stdout
-rw------- 1 root root 2 Jan 25 04:50 searchd.pid
snikolaev@dev:~$ sudo vmtouch /var/lib/docker/volumes/4702617c7b0d970ba660514053706125fd88f2a4b7fce7222aab1f53fba2b56d/_data/
vmtouch: WARNING: not following symbolic link /var/lib/docker/volumes/4702617c7b0d970ba660514053706125fd88f2a4b7fce7222aab1f53fba2b56d/_data/searchd.log
vmtouch: WARNING: not following symbolic link /var/lib/docker/volumes/4702617c7b0d970ba660514053706125fd88f2a4b7fce7222aab1f53fba2b56d/_data/query.log
Files: 12
Directories: 1
Resident Pages: 1190485/1190682 4G/4G 100%
Elapsed: 0.070337 seconds

Sphinx:

snikolaev@dev:~$ sudo ls -lah /var/lib/docker/volumes/0aa7c89baeaa4c1bd2c52d2fbffe6f05fde5f43243534d5725482a702a90fd5b/_data
total 4.6G
drwx------ 3 root root 4.0K Jan 25 04:50 .
drwxr-xr-x 3 root root 4.0K Jan 25 04:43 ..
-rw-r--r-- 1 root root 362M Jan 25 03:37 idx_full.spa
-rw-r--r-- 1 root root 3.1G Jan 25 03:41 idx_full.spd
-rw-r--r-- 1 root root 27M Jan 25 03:41 idx_full.spe
-rw-r--r-- 1 root root 648 Jan 25 03:41 idx_full.sph
-rw-r--r-- 1 root root 6.3M Jan 25 03:41 idx_full.spi
-rw-r--r-- 1 root root 8 Jan 25 03:37 idx_full.spj
-rw-r--r-- 1 root root 1.4M Jan 25 03:37 idx_full.spk
-rw------- 1 root root 0 Jan 25 04:50 idx_full.spl
-rw-r--r-- 1 root root 1.1G Jan 25 03:41 idx_full.spp
-rw-rw-r-- 1 root root 750 Jan 25 03:27 manticore.conf
lrwxrwxrwx 1 root root 11 Jan 25 03:42 query.log -> /dev/stdout
lrwxrwxrwx 1 root root 11 Jan 25 03:42 searchd.log -> /dev/stdout
-rw------- 1 root root 2 Jan 25 04:50 searchd.pid
drwxr-sr-x 8 root root 4.0K Jan 23 08:01 sphinx-3.1.1
snikolaev@dev:~$ sudo vmtouch /var/lib/docker/volumes/0aa7c89baeaa4c1bd2c52d2fbffe6f05fde5f43243534d5725482a702a90fd5b/_data
vmtouch: WARNING: not following symbolic link /var/lib/docker/volumes/0aa7c89baeaa4c1bd2c52d2fbffe6f05fde5f43243534d5725482a702a90fd5b/_data/searchd.log
vmtouch: WARNING: not following symbolic link /var/lib/docker/volumes/0aa7c89baeaa4c1bd2c52d2fbffe6f05fde5f43243534d5725482a702a90fd5b/_data/query.log
Files: 156
Directories: 18
Resident Pages: 1195648/1210212 4G/4G 98.8%
Elapsed: 0.075835 seconds

 

Test 1 – time to process top 1000 terms from the collection

First of all let’s just run a simple test – how much it takes to find documents containing top 1000 terms from the collection:

for n in `head -1000 hn_top.txt|awk '{print $1}'`; do
mysql -P9306 -hhn_$engine -e "select * from full where match('@(comment_text,story_text,comment_author,story_author) $n') limit 10 option max_matches=1000" > /dev/null
done

The results are: 32.9 seconds for Sphinx and 28.1 seconds for Manticore.

So in this test Manticore Search is faster than Sphinx Search by 16.7%.

Test 2 – top 1000 frequent terms from the collection broken down by groups (top 1-50, top 50-100 etc.)

Now let’s see how Sphinx and Manticore are different in processing terms from sub-groups of the group of top 1000 frequent terms:

Manticore is faster than Sphinx by 24% 95p latency wise and 21% throughput wise.

Test 3 – top 1000 frequent terms from the collection broken down by groups + 1 term from group 1-100

Let’s test document sets intersection performance by adding one more frequent term to the query:

Again Manticore is faster than Sphinx by 21% throughput wise and 13% in average in terms of 95p latency. Although for groups 900-950 and 950-1000 Sphinx showed a little bit better 95p latency – 1.3% and 4.6% correspondingly (but still worse throughput – 15% and 14% worse correspondingly than Manticore Search)

Test 4 – top 1000 frequent terms from the collection broken down by groups + 1 term from group 1-100, both terms enclosed in quotes to make a phrase

In average Manticore is again faster: by 20% for throughput and 11% for 95p latency. As in the previous test Sphinx shows lower 95p latency for groups 850-900, 900-950 and 950-1000 – by 1.4%, 4.9% and 4.9% correspondingly, the throughput though is still significantly lower – by 14-15%.

Test 5 – 2 terms each from group 600-750 under different concurrencies

This test aims to show the difference in throughput under different query concurrencies. Here’s what we get:

So Manticore is faster under all the concurrencies by average 14% and by 17% with concurrency 8 which gives max throughput with 95p latency lower by 17% too.

Conclusions

Sphinx shows a little bit better indexation performance.

In terms of search performance Manticore 2.7.5 shows much higher throughput in all the tests and much lower latency in vast majority of the tests.

The test is fully dockerized and open sourced in our github. The detailed results can be found here. We’ll appreciate if you run the same tests on your hardware or add different tests to the suite and let us know the results.

Thank you for reading!

1 thought on “Manticore 2.7.5 vs Sphinx 3.1.1

Leave a Reply

Training

Personal and team training will maximize them performance. 

Custom development

Need cone custom or individual features?

Fill the form and don’t forget to make the description of what you need.

Free config review

There are often optimizations that can be made to a Sphinx / Manticore setup by changing some simple directives in the configuration or making quick changes to an index definition.

Some common mistakes and issues can include:

  • doing main+delta without kill-lists, even if the delta does include updated records found in the main
  • using wildcarding with very short prefix/infix which can hammer performance in some cases
  • disabled (unintentional) seamless rotates and getting stalls on index rotations
  • adding texts as string attributes even if they are not using for any kind of operation (filtering, grouping, sorting) or mandatory to be present in results
  • using deprecated settings 

Having a quick look on the configuration can show issues or potential issues, this is why we want to offer a gift to our growing community!

When uploading your configuration file, we recommend to remove any database credentials first.

We suggest also you give as many possible details about your setup: how big is the data you have, how typical queries look and what issues you experience.

Contact us