在本教程中,我们将探索Manticore Search中可用的全文本搜索操作符。
全文本操作符和基本搜索介绍
Manticore Search中的所有搜索操作基于标准布尔操作符(AND,OR,NOT),可以组合使用并自由排列,以结合或排除搜索中的关键字,从而获得更相关的结果。
默认且最简单的全文本操作符是AND,当您仅列举几个单词进行搜索时,默认使用AND。
AND是默认操作符,**fast slow**查询将返回同时包含两个术语的文档:'fast'和'slow'。如果一个术语在文档中而另一个不在,则该文档将不包含在结果列表中。
默认情况下,单词将在所有可用的全文本字段中进行搜索。
SELECT * FROM testrt WHERE MATCH('fast slow');
OR用于匹配任何术语(或两者)。术语之间用竖线分隔,例如**fast | slow**。它将找到包含fast或slow的文档。
SELECT * FROM testrt WHERE MATCH('fast | slow');
OR操作符的优先级高于AND,因此查询**'find me fast|slow'**可以解释为'find me (fast|slow)':
SELECT * FROM testrt WHERE MATCH('find me fast | slow');
NOT确保标记为-或!的术语不在结果中。任何包含此类术语的文档将被排除。例如**fast !slow**将找到包含fast的文档,只要其中没有slow。使用时要小心,以免搜索变得过于具体,从而排除良好的文档。
SELECT * FROM testrt WHERE MATCH('find !slow');
SELECT * FROM testrt WHERE MATCH('find -slow');
MAYBE是一个特殊操作符,类似于OR,但要求左侧术语始终在结果中,而右侧术语是可选的。但是当两者都满足时,文档将获得更高的搜索排名。例如**fast MAYBE slow**将找到包含fast或slow的文档,但同时包含这两个术语的文档将得分更高。
SELECT * FROM testrt WHERE MATCH('find MAYBE slow');
使用示例
让我们使用mysql客户端连接到Manticore:
# mysql -P9306 -h0
对于布尔搜索,可以使用OR操作符|:
MySQL [(none)]> select * from testrt where match('find | me fast');
+------+------+------------------------+----------------+
| id | gid | title | content |
+------+------+------------------------+----------------+
| 1 | 1 | find me | fast and quick|
| 2 | 1 | find me fast | quick |
| 6 | 1 | find me fast now | quick |
| 5 | 1 | find me quick and fast | quick |
+------+------+------------------------+----------------+
4 rows in set (0.00 sec)
OR操作符的优先级高于AND,查询find me fast|slow被解释为find me (fast|slow):
MySQL [(none)]> SELECT * FROM testrt WHERE MATCH('find me fast|slow');
+------+------+------------------------+----------------+
| id | gid | title | content |
+------+------+------------------------+----------------+
| 1 | 1 | find me | fast and quick|
| 2 | 1 | find me fast | quick |
| 6 | 1 | find me fast now | quick |
| 3 | 1 | find me slow | quick |
| 5 | 1 | find me quick and fast | quick |
+------+------+------------------------+----------------+
5 rows in set (0.00 sec)
对于否定,可以将操作符NOT指定为-或!:
MySQL [(none)]> select * from testrt where match('find me -fast');
+------+------+--------------+---------+
| id | gid | title | content |
+------+------+--------------+---------+
| 3 | 1 | find me slow | quick |
+------+------+--------------+---------+
1 row in set (0.00 sec)
必须注意的是,默认情况下Manticore不支持完全否定查询,无法仅运行-fast(自v3.5.2起将可行)。
另一个基本操作符是MAYBE。MAYBE定义的术语可以在文档中存在或不存在。如果存在,它将影响排名,包含该术语的文档将排名更高。
MySQL [(none)]> select * from testrt where match('find me MAYBE slow');
+------+------+------------------------+----------------+
| id | gid | title | content |
+------+------+------------------------+----------------+
| 3 | 1 | find me slow | quick |
| 1 | 1 | find me | fast and quick|
| 2 | 1 | find me fast | quick |
| 5 | 1 | find me quick and fast | quick |
| 6 | 1 | find me fast now | quick |
+------+------+------------------------+----------------+
5 rows in set (0.00 sec)
字段操作符
如果我们想将搜索限制为仅特定字段,可以使用操作符'@':
mysql> select * from testrt where match('@title find me fast');
+------+------+------------------------+---------+
| id | gid | title | content |
+------+------+------------------------+---------+
| 2 | 1 | find me fast | quick |
| 6 | 1 | find me fast now | quick |
| 5 | 1 | find me quick and fast | quick |
+------+------+------------------------+---------+
3 rows in set (0.00 sec)
我们还可以指定多个字段以限制搜索:
mysql> select * from testrt where match('@(title,content) find me fast');
+------+------+------------------------+----------------+
| id | gid | title | content |
+------+------+------------------------+----------------+
| 1 | 1 | find me | fast and quick |
| 2 | 1 | find me fast | quick |
| 6 | 1 | find me fast now | quick |
| 5 | 1 | find me quick and fast | quick |
+------+------+------------------------+----------------+
4 rows in set (0.00 sec)
字段操作符还可以用于限制搜索仅在前x个单词中进行。例如:
mysql> select * from testrt where match('@title lazy dog');
+------+------+----------------------------------------------------------------------------+---------------------------------------+
| id | gid | title | content |
+------+------+----------------------------------------------------------------------------+---------------------------------------+
| 4 | 1 | The quick brown fox jumps over the lazy dog | The five boxing wizards jump quickly |
| 7 | 1 | The quick brown fox take a step back and jumps over the lazy dog | The five boxing wizards jump quickly |
| 8 | 1 | The brown and beautiful fox take a step back and jumps over the lazy dog | The five boxing wizards jump quickly |
+------+------+----------------------------------------------------------------------------+---------------------------------------+
4 rows in set (0.00 sec)
然而,如果我们仅在前5个单词中搜索,我们将得不到任何结果:
mysql> select * from testrt where match('@title[5] lazy dog');
Empty set (0.00 sec)
在某些情况下,搜索可以在多个索引上进行,这些索引可能没有相同的全文本字段。
默认情况下,指定一个在索引中不存在的字段将导致查询错误。为了解决这个问题,可以使用特殊操作符@@relaxed:
mysql> select * from testrt where match('@(title,keywords) lazy dog');<br></br>ERROR 1064 (42000): index testrt: query error: no field 'keywords' found in schema
mysql> select * from testrt where match('@@relaxed @(title,keywords) lazy dog');
+------+------+----------------------------------------------------------------------------+---------------------------------------+
| id | gid | title | content |
+------+------+----------------------------------------------------------------------------+---------------------------------------+
| 4 | 1 | The quick brown fox jumps over the lazy dog | The five boxing wizards jump quickly |
| 7 | 1 | The quick brown fox take a step back and jumps over the lazy dog | The five boxing wizards jump quickly |
| 8 | 1 | The brown and beautiful fox take a step back and jumps over the lazy dog | The five boxing wizards jump quickly |
+------+------+----------------------------------------------------------------------------+---------------------------------------+
3 rows in set, 1 warning (0.01 sec)
模糊搜索
模糊匹配允许仅匹配查询字符串中的某些单词,例如:
mysql> select * from testrt where match('"fox bird lazy dog"/3');
+------+------+----------------------------------------------------------------------------+---------------------------------------+
| id | gid | title | content |
+------+------+----------------------------------------------------------------------------+---------------------------------------+
| 4 | 1 | The quick brown fox jumps over the lazy dog | The five boxing wizards jump quickly |
| 7 | 1 | The quick brown fox take a step back and jumps over the lazy dog | The five boxing wizards jump quickly |
| 8 | 1 | The brown and beautiful fox take a step back and jumps over the lazy dog | The five boxing wizards jump quickly |
+------+------+----------------------------------------------------------------------------+---------------------------------------+
3 rows in set (0.00 sec)
在这种情况下,我们使用QUORUM操作符并指定仅匹配3个单词是可以的。带有/1的搜索相当于OR布尔搜索,而带有/N的搜索,其中N是输入单词的数量,相当于AND搜索。
您还可以指定一个介于0.0和1.0之间的数字(代表0%和100%),Manticore将仅匹配至少具有指定百分比给定单词的文档。上面的同一示例也可以写成"fox bird lazy dog"/0.3,它将匹配至少包含4个单词中30%的文档。
mysql> select * from testrt where match('"fox bird lazy dog"/0.3');
+------+------+----------------------------------------------------------------------------+---------------------------------------+
| id | gid | title | content |
+------+------+----------------------------------------------------------------------------+---------------------------------------+
| 4 | 1 | The quick brown fox jumps over the lazy dog | The five boxing wizards jump quickly |
| 7 | 1 | The quick brown fox take a step back and jumps over the lazy dog | The five boxing wizards jump quickly |
| 8 | 1 | The brown and beautiful fox take a step back and jumps over the lazy dog | The five boxing wizards jump quickly |
+------+------+----------------------------------------------------------------------------+---------------------------------------+
3 rows in set (0.00 sec)
高级操作符
除了简单的操作符外,还有许多高级操作符使用得较少,但在某些情况下可能是绝对必要的。
最常用的高级操作符之一是短语操作符。
短语操作符仅在给定单词按逐字指定的顺序找到时匹配。这也将限制单词在同一字段中找到:
mysql> SELECT * FROM testrt WHERE MATCH('"quick brown fox"');
+------+------+-------------------------------------------------------------------+---------------------------------------+
| id | gid | title | content |
+------+------+-------------------------------------------------------------------+---------------------------------------+
| 4 | 1 | The quick brown fox jumps over the lazy dog | The five boxing wizards jump quickly |
| 7 | 1 | The quick brown fox take a step back and jumps over the lazy dog | The five boxing wizards jump quickly |
+------+------+-------------------------------------------------------------------+---------------------------------------+
2 rows in set (0.00 sec)
短语操作符的更宽松版本是严格顺序操作符。
顺序操作符要求单词按指定的确切顺序找到,但可以接受其他单词在其间:
mysql> SELECT * FROM testrt WHERE MATCH('find << me << fast');
+------+------+------------------------+---------+
| id | gid | title | content |
+------+------+------------------------+---------+
| 2 | 1 | find me fast | quick |
| 6 | 1 | find me fast now | quick |
| 5 | 1 | find me quick and fast | quick |
+------+------+------------------------+---------+
3 rows in set (0.00 sec)
另一对与单词位置相关的操作符是开始/结束字段操作符。
这些将限制一个单词出现在字段的开始或结束。
mysql> SELECT * FROM testrt WHERE MATCH('^find me fast$');
+------+------+------------------------+---------+
| id | gid | title | content |
+------+------+------------------------+---------+
| 2 | 1 | find me fast | quick |
| 5 | 1 | find me quick and fast | quick |
+------+------+------------------------+---------+
2 rows in set (0.00 sec)
接近操作符类似于AND操作符,但增加了单词之间的最大距离,以便仍然可以被视为匹配。让我们以仅使用AND操作符的示例为例:
mysql> SELECT * FROM testrt WHERE MATCH('brown fox jumps');
+------+------+----------------------------------------------------------------------------+---------------------------------------+
| id | gid | title | content |
+------+------+----------------------------------------------------------------------------+---------------------------------------+
| 4 | 1 | The quick brown fox jumps over the lazy dog | The five boxing wizards jump quickly |
| 7 | 1 | The quick brown fox take a step back and jumps over the lazy dog | The five boxing wizards jump quickly |
| 8 | 1 | The brown and beautiful fox take a step back and jumps over the lazy dog | The five boxing wizards jump quickly |
+------+------+----------------------------------------------------------------------------+---------------------------------------+
3 rows in set (0.00 sec)
我们的查询返回2个结果:一个是所有单词彼此接近,另一个是其中一个单词更远。
如果我们希望仅在单词之间的某个距离内匹配,我们可以使用接近操作符进行限制:
mysql> SELECT * FROM testrt WHERE MATCH('"brown fox jumps"~5');
+------+------+---------------------------------------------+---------------------------------------+
| id | gid | title | content |
+------+------+---------------------------------------------+---------------------------------------+
| 4 | 1 | The quick brown fox jumps over the lazy dog | The five boxing wizards jump quickly |
+------+------+---------------------------------------------+---------------------------------------+
1 row in set (0.00 sec)
更通用的接近操作符是 NEAR 操作符。在接近的情况下,指定了一个距离用于一组词,而 NEAR 操作符则使用两个操作数,这两个操作数可以是单个单词或表达式。
在以下示例中,'brown' 和 'fox' 必须在距离 2 之内,而 'fox' 和 'jumps' 必须在距离 6 之内:
mysql> SELECT * FROM testrt WHERE MATCH('brown NEAR/2 fox NEAR/6 jumps');
+------+------+-------------------------------------------------------------------+---------------------------------------+
| id | gid | title | content |
+------+------+-------------------------------------------------------------------+---------------------------------------+
| 4 | 1 | The quick brown fox jumps over the lazy dog | The five boxing wizards jump quickly |
| 7 | 1 | The quick brown fox take a step back and jumps over the lazy dog | The five boxing wizards jump quickly |
+------+------+-------------------------------------------------------------------+---------------------------------------+
2 rows in set (0.00 sec)
查询省略了一个文档,因为它不符合第一个 NEAR 条件(这里的最后一个):
mysql> SELECT * FROM testrt WHERE MATCH('brown NEAR/3 fox NEAR/6 jumps');
+------+------+----------------------------------------------------------------------------+---------------------------------------+
| id | gid | title | content |
+------+------+----------------------------------------------------------------------------+---------------------------------------+
| 4 | 1 | The quick brown fox jumps over the lazy dog | The five boxing wizards jump quickly |
| 7 | 1 | The quick brown fox take a step back and jumps over the lazy dog | The five boxing wizards jump quickly |
| 8 | 1 | The brown and beautiful fox take a step back and jumps over the lazy dog | The five boxing wizards jump quickly |
+------+------+----------------------------------------------------------------------------+---------------------------------------+
3 rows in set (0.09 sec)
NEAR 操作符的一个变体是 NOTNEAR,只有在操作数之间有最小距离时才匹配。
mysql> SELECT * FROM testrt WHERE MATCH('"brown fox" NOTNEAR/5 jumps');
+------+------+-------------------------------------------------------------------+---------------------------------------+
| id | gid | title | content |
+------+------+-------------------------------------------------------------------+---------------------------------------+
| 7 | 1 | The quick brown fox take a step back and jumps over the lazy dog | The five boxing wizards jump quickly |
+------+------+-------------------------------------------------------------------+---------------------------------------+
1 row in set (0.00 sec)
Manticore 还可以检测纯文本中的句子和 HTML 内容中的段落。
要对句子进行索引,需要启用 index_sp 选项,而段落还需要 html_strip
=1。我们来看以下示例:
mysql> select * from testrt where match('"the brown fox" jumps')G
*************************** 1. row ***************************
id: 15
gid: 2
title: The brown fox takes a step back. Then it jumps over the lazydog
content:
1 row in set (0.00 sec)
该文档包含 2 个句子,而短语只在第一个句子中找到,'jumps' 仅在第二个句子中。
使用 SENTENCE 操作符,我们可以限制搜索,仅在操作数在同一句子中时匹配:
mysql> select * from testrt where match('"the brown fox" SENTENCE jumps')G
Empty set (0.00 sec)
我们可以看到该文档不再匹配。如果我们纠正搜索查询,使所有单词都来自同一句子,我们将看到匹配:
mysql> select * from testrt where match('"the brown fox" SENTENCE back')G<br></br>*************************** 1. row ***************************<br></br>id: 15<br></br>gid: 2<br></br>title: The brown fox takes a step back. Then it jumps over the lazydog<br></br>content:<br></br>1 row in set (0.00 sec)
为了演示 PARAGRAPH,让我们使用以下搜索:
mysql> select * from testrt where match('Samsung Galaxy');
+------+------+-------------------------------------------------------------------------------------+---------+
| id | gid | title | content |
+------+------+-------------------------------------------------------------------------------------+---------+
| 9 | 2 | <h1>Samsung Galaxy S10</h1>Is a smartphone introduced by Samsung in 2019 | |
| 10 | 2 | <h1>Samsung</h1>Galaxy,Note,A,J | |
+------+------+-------------------------------------------------------------------------------------+---------+
2 rows in set (0.00 sec)
这 2 个文档有不同的 HTML 标签
如果我们添加 PARAGRAPH,只有在单个标签中找到搜索词的文档将保留。
更通用的操作符是 ZONE 及其变体 ZONESPAN。“zone” 是 HTML 或 XML 标签内的文本。
需要考虑的区域标签需要在 index_zones 设置中声明,例如 index_zones = h*, th, title。
例如:
mysql> select * from testrt where match('hello world');
+------+------+-------------------------------+---------+
| id | gid | title | content |
+------+------+-------------------------------+---------+
| 12 | 2 | Hello world | |
| 14 | 2 | <h1>Hello world</h1> | |
| 13 | 2 | <h1>Hello</h1> <h1>world</h1> | |
+------+------+-------------------------------+---------+
3 rows in set (0.00 sec)
我们有 3 个文档,其中 'hello' 和 'world' 在纯文本中找到,在相同类型的不同区域或在单个区域中。
mysql> select * from testrt where match('ZONE:h1 hello world');
+------+------+-------------------------------+---------+
| id | gid | title | content |
+------+------+-------------------------------+---------+
| 14 | 2 | <h1>Hello world</h1> | |
| 13 | 2 | <h1>Hello</h1> <h1>world</h1> | |
+------+------+-------------------------------+---------+
2 rows in set (0.00 sec)
在这种情况下,单词出现在 H1 区域中,但它们不需要在同一区域中。如果我们想将匹配限制为单个区域,可以使用 ZONESPAN:
mysql> select * from testrt where match('ZONESPAN:h1 hello world');
+------+------+----------------------+---------+
| id | gid | title | content |
+------+------+----------------------+---------+
| 14 | 2 | <h1>Hello world</h1> | |
+------+------+----------------------+---------+
1 row in set (0.00 sec)
希望通过这篇文章,您已经了解了 Manticore 中 全文搜索操作符 的工作原理。如果您想获得更好的实践经验,可以立即在浏览器中 尝试我们的互动课程 。
互动课程
<img src="Manticore-Full-text-operators-Interactive-course-optimized.webp" alt="img">
如果您尝试我们的“全文操作符介绍” 互动课程 ,您可以了解更多关于全文匹配的内容,该课程提供了一个命令行以便于学习。
