Back in 1958 Hans Peter Luhn assumed in his paper “The Automatic Creation of Literature Abstracts” that “the frequency of word occurrence in an article furnishes a useful measurement of word significance” which is until now probably one of the most significant things in the Information Retrieval science and is used in all well known big and small search engines starting from Google and Yahoo to custom search solutions such as ElasticSearch and Manticore Search.
Here we benchmarked Sphinx 3.0.2 vs Manticore 2.6.2. This was 8 months ago and both Manticore and Sphinx changed since then. As it’s said in Sphinx 3.0.3 announcement Sphinx 3.0.3 is up to 2x faster compared to 3.0.2, so it’s interesting to do another benchmark. This time let’s test on a real dataset – Hacker News comments.
The benchmark was conducted with the following conditions:
Many databases and search engines allow you to customize your queries using your own so called “user defined functions” or UDF. Sphinx and Manticore are not exceptions. There’s a long section in documentation about this – https://docs.manticoresearch.com/latest/html/extending.html#udfs-user-defined-functions
Here I want to give just a quick example of how you can make a UDF which enables some fucntionality which can be really useful in some cases, but missing out of the box – sleep() function.
Did you know that Sphinx and Manticore Search configs allow you to do scripting by using shebang syntax? Here is an example of how useful it can be in some cases:
Imagine you have 3 tables with identical structure that you would like to index into 3 indexes one per table. You can just make a php script which will do it and use it as a Manticore Search config instead of describing each source/index separately: