Building 1M docs index having no one real doc


Just want to share an interesting trick on how to easily index something with Sphinx / Manticore Search for test purposes without need of populating database with a lot of data or doing smth like that. The below is a full Sphinx / Mantocore Search config which lets you build a 1M docs index consisting of random 3-char words and geo coordinates, an example of command to build the index and an exampe of a sphinxql query which does search in the index. All you need is just any connection to any db (in this case ‘mysql -u root’ works).

[snikolaev@dev01 ~]$ cat sphinx_1m.conf
source min
type = mysql
sql_host = localhost
sql_user = root
sql_pass =
sql_db = test
sql_query_range = select 1, 1000000
sql_range_step = 1
sql_query = select $start, mid(md5(rand()), 1, 3) body, rand() * 180 lat, rand($end) * 90 lng
sql_attr_float = lat
sql_attr_float = lng
index idx
path = idx_1m
source = min
binlog_path = #
listen = 9314:mysql41
log = sphinx_1m.log
pid_file =

[snikolaev@dev01 ~]$ indexer -c sphinx_1m.conf --all --rotate
Manticore 2.6.1 9a706b4@180119 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (
Copyright (c) 2017-2018, Manticore Software LTD (

using config file 'sphinx_1m.conf'...
indexing index 'idx'...
WARNING: sql_range_step=1: too small; might hurt indexing performance!
collected 1000000 docs, 3.0 MB
sorted 1.0 Mhits, 100.0% done
total 1000000 docs, 3000000 bytes
total 86.580 sec, 34649 bytes/sec, 11549.98 docs/sec
total 5 reads, 0.014 sec, 4512.0 kb/call avg, 2.9 msec/call avg
total 24 writes, 0.031 sec, 1806.1 kb/call avg, 1.3 msec/call avg
rotating indices: successfully sent SIGHUP to searchd (pid=17284).

mysql> select id, geodist(lat,lng,73.9667,40.78, {in=deg,out=km}) dist, lat, lng from idx where dist < 5;
| id | dist | lat | lng |
| 636503 | 4.880664 | 73.952385 | 40.929459 |
1 row in set (0.09 sec)

As you can see the tricky part is to utilize directives sql_query_range and sql_range_step to let Manticore loop until it makes 1M docs collection. The drawback is slower indexing comparing to real fetching the same amount of data from db, but come on, you’re not going to use this in production, right?

I hope you’ll find it helpful when you decide to play with Manticore Search.

Leave a Reply


Personal and team training will maximize them performance. 

Custom development

Need cone custom or individual features?

Fill the form and don’t forget to make the description of what you need.

Free config review

There are often optimizations that can be made to a Sphinx / Manticore setup by changing some simple directives in the configuration or making quick changes to an index definition.

Some common mistakes and issues can include:

  • doing main+delta without kill-lists, even if the delta does include updated records found in the main
  • using wildcarding with very short prefix/infix which can hammer performance in some cases
  • disabled (unintentional) seamless rotates and getting stalls on index rotations
  • adding texts as string attributes even if they are not using for any kind of operation (filtering, grouping, sorting) or mandatory to be present in results
  • using deprecated settings 

Having a quick look on the configuration can show issues or potential issues, this is why we want to offer a gift to our growing community!

When uploading your configuration file, we recommend to remove any database credentials first.

We suggest also you give as many possible details about your setup: how big is the data you have, how typical queries look and what issues you experience.

Contact us