# Sphinx 3 vs Manticore: performance benchmark

**[UPDATE] Fresher benchmark is [here](https://manticoresearch.com/blog/manticore-2-7-5-vs-sphinx-3-1-1/).**

Recently long-awaited Sphinx 3 was [released](http://sphinxsearch.com/blog/2017/12/18/sphinx-3-0-1-released/) and [updated in 3.0.2](http://sphinxsearch.com/blog/2018/02/26/sphinx-3-0-2-released/) . It has got documents storage capabilities, A-indexes, snippets pre-indexing and unfortunately is not open source any more (at least now, in March 2018).

Those all are very nice features, but are you interested in how much they affected the performance of Sphinx 3 and how much that differs from Manticore's performance? We too!

To figure that out we've made a benchmark to measure:
- indexation time
- max throughput Sphinx 3 and [Manticore Search 2.6.2](https://manticoresearch.com/downloads/) can give
- min latency the both can provide

The benchmark is based on the following:
- [luceneutil](http://github.com/mikemccand/luceneutil.git) to generate data to index and query sets
- [lucene2manticore](http://github.com/tomatolog/lucene2manticore) to convert the data from Lucene to Manticore Search / Sphinx format
- [stress-tester](http://github.com/Ivinco/stress-tester) for benchmarking
- server: 8xIntel(R) Xeon(R) CPU E3-1275 v5 @ 3.60GHz, 64G RAM, HDD
- OS: Ubuntu 16.04.3 LTS, kernel 4.8.0-45-generic

Here are the results:

### ![ms_vs_s3_indexation](./sphinx-3-vs-manticore-performance-benchmark/ms_vs_s3_indexation.png)

![ms_vs_s3_throughput](./sphinx-3-vs-manticore-performance-benchmark/ms_vs_s3_throughput.png)![ms_vs_s3_latency](./sphinx-3-vs-manticore-performance-benchmark/ms_vs_s3_latency.png)As it can be seen **in all tested scenarios** Sphinx 3 has much higher indexation time and much worse performance: both throughput and latency. We tend to believe this may be caused by some compilation issue (again, Sphinx 3 is not open source so we can't recompile) or some general performance leak which could be debugged and fixed if the source code was available. It would be sad if the new features worsen the performance this much. But anyway we want to warn all users of Manticore and Sphinx that you may get a performance degradation if you migrate to Sphinx 3.

Please let us know if you have different results of migration to Sphinx 3 or comparison between Manticore and Sphinx 3, it would be great to figure out in what cases the performance does not degrade.

[Ping us if you need our help.](https://manticoresearch.com/services/)

---

### Here's how you can reproduce the benchmark

Be aware that downloading and preparing the data may take few hours.
1. Install the above supplementary tools and prepare the configs and stopwords files:


```bash
mkdir data
mkdir q

git clone http://github.com/mikemccand/luceneutil.git
git clone http://github.com/manticoresoftware/lucene2manticore
git clone http://github.com/Ivinco/stress-tester

cp lucene2manticore/*.conf ./
```

2. Install [Manticore Search](https://manticoresearch.com/downloads/) and [Sphinx3](http://sphinxsearch.com/downloads/current/) binaries.
3. Fetch and prepare the source data

```bash
cd luceneutil
python src/python/setup.py -download
cd ../data/
xzcat enwiki-20120502-lines-1k.txt.lzma > lucene.tsv
```

convert the data from Lucene TSV-like format to proper TSV format that can be used with Manticore Search and Sphinx data sources:

```bash
cd ..
python lucene2manticore/lucene2tsv.py data/lucene.tsv --maxlen 2097152 > data/lc.tsv
head -n 100000 data/lc.tsv >  data/lc100k.tsv
head -n 300000 data/lc.tsv > data/lc300k.tsv
head -n 1000000 data/lc.tsv > data/lc1m.tsv
```

4. Prepare the queries

```bash
python lucene2manticore/lucene2query.py --types simple data/wikimedium500.tasks > q/q-wiki500-simple.txt
python lucene2manticore/lucene2query.py --types ext2 data/wikimedium500.tasks > q/q-wiki500-ext2.txt
python lucene2manticore/lucene2query.py --types simple luceneutil/tasks/wikimedium.10M.datefacets.nostopwords.tasks > q/q-wiki10m-simple.txt
python lucene2manticore/lucene2query.py --types ext2 luceneutil/tasks/wikimedium.10M.datefacets.nostopwords.tasks > q/q-wiki10m-ext2.txt
python lucene2manticore/lucene2query.py --types simple luceneutil/tasks/wikimedium.1M.nostopwords.tasks > q/q-wiki1m-simple.txt
python lucene2manticore/lucene2query.py --types ext2 luceneutil/tasks/wikimedium.1M.nostopwords.tasks > q/q-wiki1m-ext2.txt
cat q/q-wiki*-simple.txt > q/q-simple.txt
cat q/q-wiki*-ext2.txt > q/q-ext2.txt
```

4. Prepare stop words

```bash
indexer -c lucene2manticore/sphinx3.conf i2_1m_no_stopwords --buildstops stopwords1k.txt 1000
head -100 stopwords1k.txt > stopwords.txt
```

4. Index the data and remember how much it takes:

```bash
./indexer -c lucene2manticore/manticore.conf --all
./indexer -c lucene2manticore/sphinx3.conf --all
```

5. start search daemons

```bash
/path/to/manticore/searchd -c lucene2manticore/manticore.conf
/path/to/sphinx3/searchd -c lucene2manticore/sphinx3.conf
```

6. warm up the servers

It's worth to warm up the search daemon before testing, e.g. like this:

```bash
cd stress-tester
for q in simple ext2; do for p in 8306 7406; do ./test.php --plugin=plain.php --data=../q/q-$q.txt -b=100 -c=8 --port=$p --index=i2_100k_stopwords_100 --maxmatches=100 --csv; done; done;
```

### Throughput test cases

We now know how much indexation takes (see p. 4 above). Let's see how much throughput Sphinx 3 and Manticore Search can give.

Simple queries against 100K docs index with top 100 stop words:


```bash
for port in 7406 8306; do for c in 1 4 6 8 12; do for batchSize in 1 100; do ./test.php --plugin=plain.php --data=../q/q-simple.txt -b=$batchSize -c=$c --port=$port --index=i2_100k_stopwords_100 --maxmatches=1000 --csv; done; done; done
```

Simple queries against 100K docs index with top 1000 stop words:

```bash
for port in 7406 8306; do for c in 1 4 6 8 12; do for batchSize in 1 100; do ./test.php --plugin=plain.php --data=../q/q-simple.txt -b=$batchSize -c=$c --port=$port --index=i2_100k_stopwords_1k --maxmatches=1000 --csv; done; done; done
```

Complex queries against 100K docs index with top 100 stop words:


```bash
for port in 7406 8306; do for c in 1 4 6 8 12; do for batchSize in 1 100; do ./test.php --plugin=plain.php --data=../q/q-ext2.txt -b=$batchSize -c=$c --port=$port --index=i2_100k_stopwords_100 --maxmatches=1000 --csv; done; done; done
```

Complex queries against 100K docs index with top 1000 stop words:

```bash
for port in 7406 8306; do for c in 1 4 6 8 12; do for batchSize in 1 100; do ./test.php --plugin=plain.php --data=../q/q-ext2.txt -b=$batchSize -c=$c --port=$port --index=i2_100k_stopwords_1k --maxmatches=1000 --csv; done; done; done
```

Simple queries against 100K docs index with top 100 stop words and morphology enabled:


```bash
for port in 7406 8306; do for c in 1 8; do for batchSize in 1 100; do ./test.php --plugin=plain.php --data=../q/q-simple.txt -b=$batchSize -c=$c --port=$port --index=i2_100k_stopwords_100_morphology --maxmatches=1000 --csv; done; done; done
```

Simple queries against 1M docs index with top 100 stop words:


```bash
for port in 7406 8306; do for c in 1 8; do for batchSize in 1 100; do ./test.php --plugin=plain.php --data=../q/q-simple.txt -b=$batchSize -c=$c --port=$port --index=i2_1m_stopwords_100 --maxmatches=1000 --csv; done; done; done
```

Complex queries against 1M docs index with top 100 stop words:

```bash
for port in 7406 8306; do for c in 1 8; do for batchSize in 1 100; do ./test.php --plugin=plain.php --data=../q/q-ext2.txt -b=$batchSize -c=$c --port=$port --index=i2_1m_stopwords_100 --maxmatches=1000 --csv; done; done; done
```

Simple queries against 1M docs index with top 100 stop words and morphology enabled:

```bash
for port in 7406 8306; do for c in 1 8; do for batchSize in 1 100; do ./test.php --plugin=plain.php --data=../q/q-simple.txt -b=$batchSize -c=$c --port=$port --index=i2_1m_stopwords_100_morphology --maxmatches=1000 --csv; done; done; done
```

Simple queries against 1M docs index with top 100 stop words filtering by attributes to skip half of the documents:

```bash
for port in 7406 8306; do for c in 1 8; do for batchSize in 1 100; do ./test.php --plugin=plain.php --data=../q/q-simple.txt -b=$batchSize -c=$c --port=$port --index=i2_1m_stopwords_100 --maxmatches=1000 --filter='ts<1199141654' --csv; done; done; done
```
