[UPDATE] Fresher benchmark is here.
Recently long-awaited Sphinx 3 was released and updated in 3.0.2 . It has got documents storage capabilities, A-indexes, snippets pre-indexing and unfortunately is not open source any more (at least now, in March 2018).
Those all are very nice features, but are you interested in how much they affected the performance of Sphinx 3 and how much that differs from Manticore’s performance? We too!
To figure that out we’ve made a benchmark to measure:
- indexation time
- max throughput Sphinx 3 and Manticore Search 2.6.2 can give
- min latency the both can provide
The benchmark is based on the following:
- luceneutil to generate data to index and query sets
- lucene2manticore to convert the data from Lucene to Manticore Search / Sphinx format
- stress-tester for benchmarking
- server: 8xIntel(R) Xeon(R) CPU E3-1275 v5 @ 3.60GHz, 64G RAM, HDD
- OS: Ubuntu 16.04.3 LTS, kernel 4.8.0-45-generic
Here are the results:
As it can be seen in all tested scenarios Sphinx 3 has much higher indexation time and much worse performance: both throughput and latency. We tend to believe this may be caused by some compilation issue (again, Sphinx 3 is not open source so we can’t recompile) or some general performance leak which could be debugged and fixed if the source code was available. It would be sad if the new features worsen the performance this much. But anyway we want to warn all users of Manticore and Sphinx that you may get a performance degradation if you migrate to Sphinx 3.
Please let us know if you have different results of migration to Sphinx 3 or comparison between Manticore and Sphinx 3, it would be great to figure out in what cases the performance does not degrade.
Here’s how you can reproduce the benchmark
Be aware that downloading and preparing the data may take few hours.
- Install the above supplementary tools and prepare the configs and stopwords files:
mkdir data
mkdir q
git clone http://github.com/mikemccand/luceneutil.git
git clone http://github.com/manticoresoftware/lucene2manticore
git clone http://github.com/Ivinco/stress-tester
cp lucene2manticore/*.conf ./
- Install Manticore Search and Sphinx3 binaries.
- Fetch and prepare the source data
cd luceneutil
python src/python/setup.py -download
cd ../data/
xzcat enwiki-20120502-lines-1k.txt.lzma > lucene.tsv
convert the data from Lucene TSV-like format to proper TSV format that can be used with Manticore Search and Sphinx data sources:
cd ..
python lucene2manticore/lucene2tsv.py data/lucene.tsv --maxlen 2097152 > data/lc.tsv
head -n 100000 data/lc.tsv > data/lc100k.tsv
head -n 300000 data/lc.tsv > data/lc300k.tsv
head -n 1000000 data/lc.tsv > data/lc1m.tsv
- Prepare the queries
python lucene2manticore/lucene2query.py --types simple data/wikimedium500.tasks > q/q-wiki500-simple.txt
python lucene2manticore/lucene2query.py --types ext2 data/wikimedium500.tasks > q/q-wiki500-ext2.txt
python lucene2manticore/lucene2query.py --types simple luceneutil/tasks/wikimedium.10M.datefacets.nostopwords.tasks > q/q-wiki10m-simple.txt
python lucene2manticore/lucene2query.py --types ext2 luceneutil/tasks/wikimedium.10M.datefacets.nostopwords.tasks > q/q-wiki10m-ext2.txt
python lucene2manticore/lucene2query.py --types simple luceneutil/tasks/wikimedium.1M.nostopwords.tasks > q/q-wiki1m-simple.txt
python lucene2manticore/lucene2query.py --types ext2 luceneutil/tasks/wikimedium.1M.nostopwords.tasks > q/q-wiki1m-ext2.txt
cat q/q-wiki*-simple.txt > q/q-simple.txt
cat q/q-wiki*-ext2.txt > q/q-ext2.txt
- Prepare stop words
indexer -c lucene2manticore/sphinx3.conf i2_1m_no_stopwords --buildstops stopwords1k.txt 1000
head -100 stopwords1k.txt > stopwords.txt
- Index the data and remember how much it takes:
./indexer -c lucene2manticore/manticore.conf --all
./indexer -c lucene2manticore/sphinx3.conf --all
- start search daemons
/path/to/manticore/searchd -c lucene2manticore/manticore.conf
/path/to/sphinx3/searchd -c lucene2manticore/sphinx3.conf
- warm up the servers
It’s worth to warm up the search daemon before testing, e.g. like this:
cd stress-tester
for q in simple ext2; do for p in 8306 7406; do ./test.php --plugin=plain.php --data=../q/q-$q.txt -b=100 -c=8 --port=$p --index=i2_100k_stopwords_100 --maxmatches=100 --csv; done; done;
Throughput test cases
We now know how much indexation takes (see p. 4 above). Let’s see how much throughput Sphinx 3 and Manticore Search can give.
Simple queries against 100K docs index with top 100 stop words:
for port in 7406 8306; do for c in 1 4 6 8 12; do for batchSize in 1 100; do ./test.php --plugin=plain.php --data=../q/q-simple.txt -b=$batchSize -c=$c --port=$port --index=i2_100k_stopwords_100 --maxmatches=1000 --csv; done; done; done
Simple queries against 100K docs index with top 1000 stop words:
for port in 7406 8306; do for c in 1 4 6 8 12; do for batchSize in 1 100; do ./test.php --plugin=plain.php --data=../q/q-simple.txt -b=$batchSize -c=$c --port=$port --index=i2_100k_stopwords_1k --maxmatches=1000 --csv; done; done; done
Complex queries against 100K docs index with top 100 stop words:
for port in 7406 8306; do for c in 1 4 6 8 12; do for batchSize in 1 100; do ./test.php --plugin=plain.php --data=../q/q-ext2.txt -b=$batchSize -c=$c --port=$port --index=i2_100k_stopwords_100 --maxmatches=1000 --csv; done; done; done
Complex queries against 100K docs index with top 1000 stop words:
for port in 7406 8306; do for c in 1 4 6 8 12; do for batchSize in 1 100; do ./test.php --plugin=plain.php --data=../q/q-ext2.txt -b=$batchSize -c=$c --port=$port --index=i2_100k_stopwords_1k --maxmatches=1000 --csv; done; done; done
Simple queries against 100K docs index with top 100 stop words and morphology enabled:
for port in 7406 8306; do for c in 1 8; do for batchSize in 1 100; do ./test.php --plugin=plain.php --data=../q/q-simple.txt -b=$batchSize -c=$c --port=$port --index=i2_100k_stopwords_100_morphology --maxmatches=1000 --csv; done; done; done
Simple queries against 1M docs index with top 100 stop words:
for port in 7406 8306; do for c in 1 8; do for batchSize in 1 100; do ./test.php --plugin=plain.php --data=../q/q-simple.txt -b=$batchSize -c=$c --port=$port --index=i2_1m_stopwords_100 --maxmatches=1000 --csv; done; done; done
Complex queries against 1M docs index with top 100 stop words:
for port in 7406 8306; do for c in 1 8; do for batchSize in 1 100; do ./test.php --plugin=plain.php --data=../q/q-ext2.txt -b=$batchSize -c=$c --port=$port --index=i2_1m_stopwords_100 --maxmatches=1000 --csv; done; done; done
Simple queries against 1M docs index with top 100 stop words and morphology enabled:
for port in 7406 8306; do for c in 1 8; do for batchSize in 1 100; do ./test.php --plugin=plain.php --data=../q/q-simple.txt -b=$batchSize -c=$c --port=$port --index=i2_1m_stopwords_100_morphology --maxmatches=1000 --csv; done; done; done
Simple queries against 1M docs index with top 100 stop words filtering by attributes to skip half of the documents:
for port in 7406 8306; do for c in 1 8; do for batchSize in 1 100; do ./test.php --plugin=plain.php --data=../q/q-simple.txt -b=$batchSize -c=$c --port=$port --index=i2_1m_stopwords_100 --maxmatches=1000 --filter='ts<1199141654' --csv; done; done; done