Getting started with Manticore Search in Docker

In this post, we’ll discuss how you can quick start using Manticore Search with Docker.

Installing and running

Official Docker images are hosted at https://hub.docker.com/r/manticoresearch/manticore/.
To start Manticore Search you just need to run:

$ docker run --name manticore -p 9306:9306 -d manticoresearch/manticore

The Docker recipe is hosted on github, for those who would want to extend the existing images.

The Manticore Search container doesn’t have a persistent storage and if the container is stopped, any changes are lost. A simple way to have persistance is to mount several folders to the local machines.

The folders for which we want to have persistance are:

  • /etc/sphinxsearch – location of sphinx.conf
  • /var/lib/manticore/data – used for index files
  • /var/lib/manticore/log – used for log files

We consider a manticore/ folder in our home directiory where we’ll create  the etc/ , data/ and logs/ folders, as well as add a valid sphinx.conf in ~/manticore/etc/. We can use the sphinx.conf  included in the recipe repository.
To the run command we add the mounts:


$ docker run --name manticore -v ~/manticore/etc/:/etc/sphinxsearch/ -v ~/manticore/data/:/var/lib/manticore/data -v ~/manticore/logs/:/var/lib/manticore/log -p 9306:9306 -d manticoresearch/manticore

The container can be stopped by simply doing:

$ docker stop manticore

The Docker image also come with the indexer and indextoo utilities, which can run with Docker’s exec command:

$ docker exec -it manticore indexer --all --rotate

Running queries

The simple way to connect and do some tests is to use the SphinxQL protocol. For this, you need a mysql command line client.

While it implements the MySQL protocol, SphinxQL is not 100% compatible with MySQL syntax. There are specific extensions, like MATCH clause [the most powerful thing in Manticore] or WITHIN GROUP BY and many functions available in MySQL are not implemented (or they are dummy just to allow compatibility with MySQL connector e.g.) or JOINs between indexes which are not supported yet.

First, let’s connect to Manticore Search and take a look at the available indexes:

$ mysql -P9306 -h0
mysql> SHOW TABLES;
+-------+-------------+
| Index | Type        |
+-------+-------------+
| dist1 | distributed |
| testrt| rt          |
+-------+-------------+
2 rows in set (0.00 sec)

Now let’s look at our RT index:

mysql> DESCRIBE testrt;
+---------+--------+
| Field | Type     |
+---------+--------+
| id      | bigint |
| title   | field  |
| content | field  |
| gid     | uint   |
+---------+--------+
4 rows in set (0.00 sec)

As the RT indexes start empty, let’s add some data into it first

mysql> INSERT INTO testrt VALUES(1,'List of HP business laptops','Elitebook Probook',10);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO testrt VALUES(2,'List of Dell business laptops','Latitude Precision Vostro',10);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO testrt VALUES(3,'List of Dell gaming laptops','Inspirion Alienware',20);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO testrt VALUES(4,'Lenovo laptops list','Yoga IdeaPad',30);
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO testrt VALUES(5,'List of ASUS ultrabooks and laptops','Zenbook Vivobook',30);
Query OK, 1 row affected (0.01 sec)

Now we have some data, we can do some queries.

Fulltext searches are done with the special clause MATCH, which is the main workhorse.

mysql> SELECT * FROM testrt WHERE MATCH('list of laptops');
+------+------+
| id   | gid  |
+------+------+
| 1    | 10   |
| 2    | 10   |
| 3    | 20   |
| 5    | 30   |
+------+------+
4 rows in set (0.00 sec)

As you see in the result set we can only get back the doc id and the attributes. The full-text fields values are not returned since the text is only indexed, not stored also, and it’s impossible to rebuild the original text.

Now let’s add some filtering and more ordering:

mysql> SELECT *,WEIGHT() FROM testrt WHERE MATCG('list of laptops') AND gid>10 ORDER BY WEIGHT() DESC,gid DESC;
+------+------+----------+
| id   | gid  | weight() |
+------+------+----------+
| 5    | 30   | 2334     |
| 3    | 20   | 2334     |
+------+------+----------+
2 rows in set (0.00 sec)

The WEIGHT() function returns the calculated matching score. If no ordering specified, the result is sorted descending by the score provided by WEIGHT(). In this example we order first by weight and then by an integer attribute.

The search above does a simple matching, where all words need to be present. But we can do more (and this is just a simple example):

mysql> SELECT *,WEIGHT() FROM testrt WHERE MATCH('"list of business laptops"/3');
+------+------+----------+
| id   | gid  | weight() |
+------+------+----------+
| 1    | 10   | 2397     |
| 2    | 10   | 2397     |
| 3    | 20   | 2375     |
| 5    | 30   | 2375     |
+------+------+----------+
4 rows in set (0.00 sec)

mysql> SHOW META;
+---------------+----------+
| Variable_name | Value    |
+---------------+----------+
| total         | 4        |
| total_found   | 4        |
| time          | 0.000    |
| keyword[0]    | list     |
| docs[0]       | 5        |
| hits[0]       | 5        |
| keyword[1]    | of       |
| docs[1]       | 4        |
| hits[1]       | 4        |
| keyword[2]    | business |
| docs[2]       | 2        |
| hits[2]       | 2        |
| keyword[3]    | laptops  |
| docs[3]       | 5        |
| hits[3]       | 5        |
+---------------+----------+
15 rows in set (0.00 sec)

Here we search for 4 words, but we can have a match even if only 3 words (of 4) are found. The search will rank higher first the documents that contain all the words. We also added a SHOW META command. SHOW META returns information about previous executed query, that is number of found records (in total_found), execution time (in time) and statistics about the keywords of the search.

Using plain indexes

Unlike RT, a plain also requires configuring a source for it. In our example we are using a MySQL source.

Add in your sphinx.conf:

source src1
{
type = mysql

sql_host = 172.17.0.1
sql_user = test
sql_pass =
sql_db = test
sql_port = 3306 # optional, default is 3306

sql_query_pre = SET NAMES utf8

sql_query = \
SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
FROM documents

sql_attr_uint = group_id
sql_attr_timestamp = date_added

}
index test1
{

source = src1
path = /var/lib/manticore/data/test1
min_word_len = 1

}

In this example we assume we have MySQL running on the local host, but as Manticore Search runs inside a Docker container, we need to use ‘172.17.0.1’, the static IP address of the Docker host. For more details, please check Docker documentation. You also need to adjust the MySQL credentials accordingly.

Then we look after the sql_query, which is the query that grabs the data

sql_query = \
SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
FROM documents

And we’ll use this SQL snippet to create our test table in MySQL:

DROP TABLE IF EXISTS test.documents;
CREATE TABLE test.documents
(
id INTEGER PRIMARY KEY NOT NULL AUTO_INCREMENT,
group_id INTEGER NOT NULL,
date_added DATETIME NOT NULL,
title VARCHAR(255) NOT NULL,
content TEXT NOT NULL
);

INSERT INTO test.documents ( id, group_id, date_added, title, content ) VALUES
( 1, 1, NOW(), 'test one', 'this is my test document number one. also checking search within phrases.' ),
( 2, 1, NOW(), 'test two', 'this is my test document number two' ),
( 3, 2, NOW(), 'another doc', 'this is another group' ),
( 4, 2, NOW(), 'doc number four', 'this is to test groups' );

If you want to use another table, keep in mind that the first column in the result set must be an unsigned unique integer – for most cases this is your primary key id of a table.

If not specified, the rest of the columns are indexed as fulltext fields. Columns which should be used as attributes need to be declared. In our example group_id and date_added are attributes:

sql_attr_uint = group_id
sql_attr_timestamp = date_added

Once we have this setup, we can run the indexing process:

$ docker exec -it manticore indexer test1 --rotate
using config file '/etc/sphinxsearch/sphinx.conf'...
indexing index 'test1'...
collected 4 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 4 docs, 193 bytes
total 0.015 sec, 12335 bytes/sec, 255.65 docs/sec
total 4 reads, 0.000 sec, 8.1 kb/call avg, 0.0 msec/call avg
total 12 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg

Index is created and is ready to be used:

mysql> SHOW TABLES;
+-------+-------------+
| Index | Type        |
+-------+-------------+
| dist1 | distributed |
| rt    | rt          |
| test1 | local       |
+-------+-------------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM test1;
+------+----------+------------+
| id   | group_id | date_added |
+------+----------+------------+
| 1    | 1        | 1507904567 |
| 2    | 1        | 1507904567 |
| 3    | 2        | 1507904567 |
| 4    | 2        | 1507904567 |
+------+----------+------------+
4 rows in set (0.00 sec)

A quick test of a search which should match 2 terms, but not match another one:

mysql> SELECT * FROM test1 WHERE MATCH('test document -one');
+------+----------+------------+-------+
| id   | group_id | date_added | tag   |
+------+----------+------------+-------+
| 2    | 1        | 1519040667 | 2,4,6 |   
+------+----------+------------+-------+
1 row in set (0.00 sec)

 

Leave a Reply