# 在 Docker 中开始使用 Manticore Search

在本文中，我们将讨论如何快速开始使用 Docker 配合 Manticore Search。

### 安装和运行


官方的 Docker 镜像托管在 <https://hub.docker.com/r/manticoresearch/manticore/>。
要启动 Manticore Search，您只需运行：


```bash
$ docker run --name manticore -p 9306:9306 -d manticoresearch/manticore
```


Docker 配方托管在 [github](https://github.com/manticoresoftware/docker)，供那些希望扩展现有镜像的用户使用。

Manticore Search 容器没有持久化存储，如果容器停止，所有更改都会丢失。一个简单的方法是将几个文件夹挂载到本地机器上。

我们希望持久化的文件夹是：
- /etc/sphinxsearch - sphinx.conf 的位置
- /var/lib/manticore/data - 用于索引文件
- /var/lib/manticore/log - 用于日志文件


我们在主目录中考虑一个 `manticore/` 文件夹，其中我们将创建 `etc/`、`data/` 和 `logs/` 文件夹，并在 `~/manticore/etc/` 中添加一个有效的 `sphinx.conf`。我们可以使用配方存储库中包含的 [sphinx.conf](https://github.com/manticoresoftware/docker/blob/master/sphinx.conf)。
我们在运行命令中添加挂载：


```bash
$ docker run --name manticore -v ~/manticore/etc/:/etc/sphinxsearch/ -v ~/manticore/data/:/var/lib/manticore/data -v ~/manticore/logs/:/var/lib/manticore/log -p 9306:9306 -d manticoresearch/manticore
```


通过以下方式可以停止容器：


```bash
$ docker stop manticore
```


Docker 镜像还包含 `indexer` 和 `indextoo` 工具，这些工具可以使用 Docker 的 `exec` 命令运行：


```bash
$ docker exec -it manticore indexer --all --rotate
```


### 运行查询

连接并进行一些测试的简单方法是使用 SphinxQL 协议。为此，您需要一个 mysql 命令行客户端。

虽然它实现了 MySQL 协议，但 SphinxQL 并不完全兼容 MySQL 语法。有一些特定的扩展，比如 MATCH 子句（Manticore 最强大的功能之一）或 WITHIN GROUP BY，以及许多在 MySQL 中可用的函数未实现（或仅实现为 dummy 以允许与 MySQL 连接器兼容），或者尚未支持的索引之间的 JOIN。

首先，让我们连接到 Manticore Search 并查看可用的索引：


```sql
$ mysql -P9306 -h0
mysql> SHOW TABLES;
+-------+-------------+
| Index | Type        |
+-------+-------------+
| dist1 | distributed |
| testrt| rt          |
+-------+-------------+
2 rows in set (0.00 sec)

```


现在让我们查看我们的 RT 索引：


```sql
mysql> DESCRIBE testrt;
+---------+--------+
| Field | Type     |
+---------+--------+
| id      | bigint |
| title   | field  |
| content | field  |
| gid     | uint   |
+---------+--------+
4 rows in set (0.00 sec)

```


由于 RT 索引最初是空的，让我们先向其中添加一些数据：


```sql
mysql> INSERT INTO testrt VALUES(1,'List of HP business laptops','Elitebook Probook',10);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO testrt VALUES(2,'List of Dell business laptops','Latitude Precision Vostro',10);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO testrt VALUES(3,'List of Dell gaming laptops','Inspirion Alienware',20);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO testrt VALUES(4,'Lenovo laptops list','Yoga IdeaPad',30);
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO testrt VALUES(5,'List of ASUS ultrabooks and laptops','Zenbook Vivobook',30);
Query OK, 1 row affected (0.01 sec)
```


现在我们有一些数据，可以执行一些查询。

全文搜索使用特殊的 MATCH 子句完成，这是主要的工具。


```sql
mysql> SELECT * FROM testrt WHERE MATCH('list of laptops');
+------+------+
| id   | gid  |
+------+------+
| 1    | 10   |
| 2    | 10   |
| 3    | 20   |
| 5    | 30   |
+------+------+
4 rows in set (0.00 sec)
```


正如您在结果集中看到的，我们只能获取回文档 ID 和属性。全文字段的值不会返回，因为文本仅被索引，未被存储，而且无法重建原始文本。

现在让我们添加一些过滤和更多排序：


```sql
mysql> SELECT *,WEIGHT() FROM testrt WHERE MATCG('list of laptops') AND gid>10 ORDER BY WEIGHT() DESC,gid DESC;
+------+------+----------+
| id   | gid  | weight() |
+------+------+----------+
| 5    | 30   | 2334     |
| 3    | 20   | 2334     |
+------+------+----------+
2 rows in set (0.00 sec)
```


WEIGHT() 函数返回计算出的匹配分数。如果没有指定排序，结果将按 WEIGHT() 提供的分数降序排列。在这个例子中，我们首先按权重排序，然后按整数属性排序。

上面的搜索执行简单的匹配，其中所有单词都需要存在。但我们还可以做更多（这只是个简单示例）：


```sql
mysql> SELECT *,WEIGHT() FROM testrt WHERE MATCH('"list of business laptops"/3');
+------+------+----------+
| id   | gid  | weight() |
+------+------+----------+
| 1    | 10   | 2397     |
| 2    | 10   | 2397     |
| 3    | 20   | 2375     |
| 5    | 30   | 2375     |
+------+------+----------+
4 rows in set (0.00 sec)

mysql> SHOW META;
+---------------+----------+
| Variable_name | Value    |
+---------------+----------+
| total         | 4        |
| total_found   | 4        |
| time          | 0.000    |
| keyword[0]    | list     |
| docs[0]       | 5        |
| hits[0]       | 5        |
| keyword[1]    | of       |
| docs[1]       | 4        |
| hits[1]       | 4        |
| keyword[2]    | business |
| docs[2]       | 2        |
| hits[2]       | 2        |
| keyword[3]    | laptops  |
| docs[3]       | 5        |
| hits[3]       | 5        |
+---------------+----------+
15 rows in set (0.00 sec)
```


在这里我们搜索 4 个单词，但即使只找到其中 3 个单词（4 个中的 3 个）也可以匹配。搜索会优先返回包含所有单词的文档。我们还添加了 SHOW META 命令。SHOW META 返回有关之前执行查询的信息，即找到的记录数（在 total_found 中）、执行时间（在 time 中）以及搜索关键字的统计信息。


### 使用普通索引

与 RT 不同，普通索引也需要为其配置一个源。在我们的示例中，我们使用 MySQL 源。

在您的 sphinx.conf 中添加：


```ini
source src1
{
type = mysql

sql_host = 172.17.0.1
sql_user = test
sql_pass =
sql_db = test
sql_port = 3306 # optional, default is 3306

sql_query_pre = SET NAMES utf8

sql_query = \
SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
FROM documents

sql_attr_uint = group_id
sql_attr_timestamp = date_added

}
index test1
{

source = src1
path = /var/lib/manticore/data/test1
min_word_len = 1

}
```


在这个示例中，我们假设 MySQL 在本地主机上运行，但因为 Manticore Search 在 Docker 容器中运行，我们需要使用 '172.17.0.1'，这是 Docker 主机的静态 IP 地址。有关更多详细信息，请参阅 Docker 文档。您还需要相应地调整 MySQL 凭据。

然后我们查看 sql_query，这是获取数据的查询：


```sql
sql_query = \
SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
FROM documents
```


我们将使用这个 SQL 片段在 MySQL 中创建测试表：


```sql
DROP TABLE IF EXISTS test.documents;
CREATE TABLE test.documents
(
id INTEGER PRIMARY KEY NOT NULL AUTO_INCREMENT,
group_id INTEGER NOT NULL,
date_added DATETIME NOT NULL,
title VARCHAR(255) NOT NULL,
content TEXT NOT NULL
);

INSERT INTO test.documents ( id, group_id, date_added, title, content ) VALUES
( 1, 1, NOW(), 'test one', 'this is my test document number one. also checking search within phrases.' ),
( 2, 1, NOW(), 'test two', 'this is my test document number two' ),
( 3, 2, NOW(), 'another doc', 'this is another group' ),
( 4, 2, NOW(), 'doc number four', 'this is to test groups' );
```


如果您想使用另一个表，请记住结果集中的第一列必须是无符号唯一整数 - 对于大多数情况，这是表的主键 id。

如果没有指定，其余列将被索引为全文字段。需要作为属性使用的列需要声明。在我们的示例中 group_id 和 date_added 是属性：

```sql
sql_attr_uint = group_id
sql_attr_timestamp = date_added
```


一旦我们有了这个设置，就可以运行索引过程：


```bash
$ docker exec -it manticore indexer test1 --rotate
using config file '/etc/sphinxsearch/sphinx.conf'...
indexing index 'test1'...
collected 4 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 4 docs, 193 bytes
total 0.015 sec, 12335 bytes/sec, 255.65 docs/sec
total 4 reads, 0.000 sec, 8.1 kb/call avg, 0.0 msec/call avg
total 12 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
```


索引已创建并准备使用：


```sql
mysql> SHOW TABLES;
+-------+-------------+
| Index | Type        |
+-------+-------------+
| dist1 | distributed |
| rt    | rt          |
| test1 | local       |
+-------+-------------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM test1;
+------+----------+------------+
| id   | group_id | date_added |
+------+----------+------------+
| 1    | 1        | 1507904567 |
| 2    | 1        | 1507904567 |
| 3    | 2        | 1507904567 |
| 4    | 2        | 1507904567 |
+------+----------+------------+
4 rows in set (0.00 sec)
```


对一个应匹配两个术语但不匹配另一个术语的搜索进行快速测试：


```sql
mysql> SELECT * FROM test1 WHERE MATCH('test document -one');
+------+----------+------------+-------+
| id   | group_id | date_added | tag   |
+------+----------+------------+-------+
| 2    | 1        | 1519040667 | 2,4,6 |   
+------+----------+------------+-------+
1 row in set (0.00 sec)
```