Building 1M docs index having no one real doc

Hi

Just want to share an interesting trick on how to easily index something with Sphinx / Manticore Search for test purposes without need of populating database with a lot of data or doing smth like that. The below is a full Sphinx / Mantocore Search config which lets you build a 1M docs index consisting of random 3-char words and geo coordinates, an example of command to build the index and an exampe of a sphinxql query which does search in the index. All you need is just any connection to any db (in this case ‘mysql -u root’ works).


[snikolaev@dev01 ~]$ cat sphinx_1m.conf
source min
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass =
sql_db = test
sql_query_range = select 1, 1000000
sql_range_step = 1
sql_query = select $start, mid(md5(rand()), 1, 3) body, rand() * 180 lat, rand($end) * 90 lng
sql_attr_float = lat
sql_attr_float = lng
}
index idx
{
path = idx_1m
source = min
}
searchd
{
binlog_path = #
listen = 9314:mysql41
log = sphinx_1m.log
pid_file = sphinx_1m.pid
}

[snikolaev@dev01 ~]$ indexer -c sphinx_1m.conf --all --rotate
Manticore 2.6.1 9a706b4@180119 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2018, Manticore Software LTD (http://manticoresearch.com)

using config file 'sphinx_1m.conf'...
indexing index 'idx'...
WARNING: sql_range_step=1: too small; might hurt indexing performance!
collected 1000000 docs, 3.0 MB
sorted 1.0 Mhits, 100.0% done
total 1000000 docs, 3000000 bytes
total 86.580 sec, 34649 bytes/sec, 11549.98 docs/sec
total 5 reads, 0.014 sec, 4512.0 kb/call avg, 2.9 msec/call avg
total 24 writes, 0.031 sec, 1806.1 kb/call avg, 1.3 msec/call avg
rotating indices: successfully sent SIGHUP to searchd (pid=17284).

mysql> select id, geodist(lat,lng,73.9667,40.78, {in=deg,out=km}) dist, lat, lng from idx where dist < 5;
+--------+----------+-----------+-----------+
| id | dist | lat | lng |
+--------+----------+-----------+-----------+
| 636503 | 4.880664 | 73.952385 | 40.929459 |
+--------+----------+-----------+-----------+
1 row in set (0.09 sec)

As you can see the tricky part is to utilize directives sql_query_range and sql_range_step to let Manticore loop until it makes 1M docs collection. The drawback is slower indexing comparing to real fetching the same amount of data from db, but come on, you’re not going to use this in production, right?

I hope you’ll find it helpful when you decide to play with Manticore Search.

Leave a Reply

© 2018 Manticore Software Ltd. Registered Address: Office 2, Derby House, 123 Watling Street, Gillingham, Kent, ME7 2YY
Company No. 10772872