Building 1M docs index having no one real doc

Hi folks

Just want to share an interesting trick on how to easily index something with Sphinx / Manticore Search for test purposes without need of populating database with a lot of data or doing smth like that. The below is a full Sphinx / Manticore Search config which lets you build a 1M docs index consisting of random 3-char words and geo coordinates, an example of command to build the index and an example of a sphinxql query which does search in the index. All you need is just any connection to any db (in this case mysql -u root works).

[snikolaev@dev01 ~]$ cat sphinx_1m.conf
source min
{
  type = mysql
  sql_host = localhost
  sql_user = root
  sql_pass =
  sql_db = test
  sql_query_range = select 1, 1000000
  sql_range_step = 1
  sql_query = select $start, mid(md5(rand()), 1, 3) body, rand() * 180 lat, rand($end) * 90 lng
  sql_attr_float = lat
  sql_attr_float = lng
}

index idx
{
  path = idx_1m
  source = min
}

searchd
{
  binlog_path = #
  listen = 9314:mysql41
  log = sphinx_1m.log
  pid_file = sphinx_1m.pid
}

[snikolaev@dev01 ~]$ indexer -c sphinx_1m.conf --all --rotate
Manticore 2.6.1 9a706b4@180119 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2018, Manticore Software LTD (http://manticoresearch.com)

using config file 'sphinx_1m.conf'...
indexing index 'idx'...
WARNING: sql_range_step=1: too small; might hurt indexing performance!
collected 1000000 docs, 3.0 MB
sorted 1.0 Mhits, 100.0% done
total 1000000 docs, 3000000 bytes
total 86.580 sec, 34649 bytes/sec, 11549.98 docs/sec
total 5 reads, 0.014 sec, 4512.0 kb/call avg, 2.9 msec/call avg
total 24 writes, 0.031 sec, 1806.1 kb/call avg, 1.3 msec/call avg
rotating indices: successfully sent SIGHUP to searchd (pid=17284).

mysql> select id, geodist(lat,lng,73.9667,40.78, {in=deg,out=km}) dist, lat, lng from idx where dist < 5;
+--------+----------+-----------+-----------+
| id | dist | lat | lng |
+--------+----------+-----------+-----------+
| 636503 | 4.880664 | 73.952385 | 40.929459 |
+--------+----------+-----------+-----------+
1 row in set (0.09 sec)

As you can see the tricky part is to utilize directives sql_query_range and sql_range_step to let Manticore loop until it makes 1M docs collection. The drawback is slower indexing comparing to real fetching the same amount of data from db, but come on, you’re not going to use this in production, right?

I hope you’ll find it helpful when you decide to play with Manticore Search.

Install Manticore Search

Install Manticore Search