blog-post

OR 语句中的短语、法定人数和接近度

如果你曾经需要编写多个查询来捕捉短语的所有变体,你就会知道这有多么重复和混乱。通过在短语中支持新的 OR,你可以在一个干净的查询中匹配“快乐的客户”和“伤心的客户”——以及任何其他变体。

Manticore Search 13.6.7 中的新功能

我们很高兴地宣布, Manticore Search 13.6.7 已发布,增强了对这一有用功能的支持。短语操作符(引号)中的 OR 操作符 (|) 提供了灵活的短语匹配,可以改善你构建搜索功能的方式。

短语中 OR 的魔力

传统搜索引擎让你在精确短语匹配和宽松关键字匹配之间做出选择。但如果你需要介于两者之间的东西呢?这就是短语中的 OR 操作符大显身手的地方。每个选项在短语中的同一位置进行检查,如果任何替代项适合该位置,则短语匹配。

合理的语法

"( a | b ) c"                           -- Either "a c" or "b c"
"( ( a b c ) | d ) e"                   -- Either "a b c e" or "d e"  
"man ( happy | sad ) but all ( ( as good ) | ( as fast ) )"  -- Complex nested possibilities

让我们看看实际效果

让我们使用客户反馈数据创建一个真实的例子。首先,我们将设置我们的测试环境:

-- Clean slate for easy reproduction
DROP TABLE IF EXISTS phrase_or_demo;

CREATE TABLE phrase_or_demo (title TEXT, content TEXT, category TEXT);

INSERT INTO phrase_or_demo (id, title, content, category) VALUES 
(1, 'Happy Customer Review', 'I am a very happy customer with excellent service', 'reviews'),
(2, 'Sad Customer Feedback', 'I am a very sad customer with poor experience', 'reviews'), 
(3, 'Customer Service Report', 'The customer was happy but had some concerns', 'reports'),
(4, 'Angry Customer Complaint', 'I am an angry customer demanding refund', 'complaints'),
(5, 'Neutral Customer Survey', 'The customer seemed neutral about our service', 'surveys'),
(6, 'Fast Delivery Service', 'Our delivery service is really fast and reliable', 'services'),
(7, 'Slow Delivery Issues', 'The delivery was extremely slow this time', 'issues'),
(8, 'Good Service Quality', 'We provide good service to all customers', 'services'),
(9, 'Bad Service Report', 'There were complaints about bad service quality', 'reports'),
(10, 'Customer Happy Experience', 'The happy customer left positive feedback', 'feedback'),
(11, 'Premium Quality Product', 'This is a premium quality item with excellent features', 'products'),
(12, 'Budget Quality Option', 'A budget quality alternative for cost-conscious buyers', 'products'),
(13, 'Standard Quality Service', 'Our standard quality offering meets basic needs', 'services');

示例 1:捕捉所有情感状态

查询: "(happy | sad | angry) customer"

SELECT * FROM phrase_or_demo WHERE MATCH('"(happy | sad | angry) customer"')

结果:

+------+---------------------------+---------------------------------------------------+------------+
| id   | title                     | content                                           | category   |
+------+---------------------------+---------------------------------------------------+------------+
|    2 | Sad Customer Feedback     | I am a very sad customer with poor experience     | reviews    |
|    4 | Angry Customer Complaint  | I am an angry customer demanding refund           | complaints |
|    1 | Happy Customer Review     | I am a very happy customer with excellent service | reviews    |
|   10 | Customer Happy Experience | The happy customer left positive feedback         | feedback   |
+------+---------------------------+---------------------------------------------------+------------+
4 rows in set (0.00 sec)

这很重要的原因: 不必编写三个单独的短语查询并用 OR 组合它们,你可以通过一个优雅的查询获得精确的短语匹配。

示例 2:服务质量变体

查询: "(good | bad | premium | budget | standard) (service | quality)"

SELECT * FROM phrase_or_demo WHERE MATCH('"(good | bad | premium | budget | standard) (service | quality)"');

结果:

+------+--------------------------+--------------------------------------------------------+----------+
| id   | title                    | content                                                | category |
+------+--------------------------+--------------------------------------------------------+----------+
|    8 | Good Service Quality     | We provide good service to all customers               | services |
|    9 | Bad Service Report       | There were complaints about bad service quality        | reports  |
|   11 | Premium Quality Product  | This is a premium quality item with excellent features | products |
|   12 | Budget Quality Option    | A budget quality alternative for cost-conscious buyers | products |
|   13 | Standard Quality Service | Our standard quality offering meets basic needs        | services |
+------+--------------------------+--------------------------------------------------------+----------+
5 rows in set (0.00 sec)

优势: 一个查询捕获所有质量 - 服务组合,具有精确的短语精度。

超越基本短语:法定人数和接近度

OR 操作符不仅限于简单短语。有时你需要更多的灵活性,比如匹配即使不是每个术语都存在的文档,或者找到彼此接近但不一定按确切顺序排列的术语。这就是 法定人数接近度 操作符的作用,它们与 OR 无缝协作。

带 OR 的法定人数:灵活的模糊匹配

带 OR 的法定人数操作符为你提供了复杂的模糊匹配,其中每个 OR 组中的一个单词计入阈值:

-- Find documents with at least 2 out of these word groups
SELECT id, content FROM phrase_or_demo  WHERE MATCH('@content "(excellent | good | premium) (service | quality | experience) customer"/2');

结果:

+------+--------------------------------------------------------+
| id   | content                                                |
+------+--------------------------------------------------------+
|    8 | We provide good service to all customers               |
|    1 | I am a very happy customer with excellent service      |
|   11 | This is a premium quality item with excellent features |
|    2 | I am a very sad customer with poor experience          |
|    5 | The customer seemed neutral about our service          |
+------+--------------------------------------------------------+
5 rows in set (0.00 sec)

解释: 这匹配包含至少 2 个单词组的文档:(excellent|good|premium)、(service|quality|experience) 和“customer”。

高级法定人数示例

-- Match documents with at least 50% of these emotion/service combinations
SELECT id, title FROM phrase_or_demo 
WHERE MATCH('"(happy | satisfied) (customer | experience) (excellent | good) (service | quality)"/0.5');

带 OR 的接近度:邻近替代项

带 OR 的接近度操作符在指定距离内分别检查每个替代项:

-- Find "delivery" within 3 words of either "fast" or "slow"
SELECT id, title, content FROM phrase_or_demo WHERE MATCH('"(fast | slow) delivery"~3');

结果:

+------+-----------------------+--------------------------------------------------+
| id   | title                 | content                                          |
+------+-----------------------+--------------------------------------------------+
|    7 | Slow Delivery Issues  | The delivery was extremely slow this time        |
|    6 | Fast Delivery Service | Our delivery service is really fast and reliable |
+------+-----------------------+--------------------------------------------------+
2 rows in set (0.00 sec)

复杂接近度示例

-- Customer and emotional state within 5 words, plus quality terms
SELECT id, title, content FROM phrase_or_demo  WHERE MATCH('"customer (happy | sad | angry)"~2 (quality | service | experience)');

结果:

+------+---------------------------+---------------------------------------------------+
| id   | title                     | content                                           |
+------+---------------------------+---------------------------------------------------+
|   10 | Customer Happy Experience | The happy customer left positive feedback         |
|    2 | Sad Customer Feedback     | I am a very sad customer with poor experience     |
|    1 | Happy Customer Review     | I am a very happy customer with excellent service |
|    3 | Customer Service Report   | The customer was happy but had some concerns      |
+------+---------------------------+---------------------------------------------------+
4 rows in set (0.00 sec)

比较:传统与现代

传统方法(多个全文声明)

-- The old way: multiple separate queries
SELECT id, title FROM phrase_or_demo WHERE MATCH('"happy customer"|"sad customer"|"angry customer"');

现代方法(单个 OR 短语)

-- The elegant way: one query to rule them all
SELECT id, title FROM phrase_or_demo WHERE MATCH('"(happy | sad | angry) customer"');

现实世界应用

1. 电子商务产品搜索

-- Capture all color and size variations
"(red | blue | green | black) (shirt | t-shirt | tee) (small | medium | large)"

2. 内容管理系统

-- Track document status changes
"(draft | published | archived | deleted) (document | article | post)"

3. 客户支持工单分析

-- Categorize support issues with quorum
"(urgent | critical | high) (priority | importance) (bug | issue | problem)"/2

4. 社交媒体情感监测

-- Capture brand mentions with emotional context  
"@brand (love | hate | like | dislike) (product | service | experience)"~5

5. 医疗记录搜索

-- Find patient symptoms with proximity
"patient (experienced | reported | complained) (pain | discomfort | symptoms)"~4

6. 金融交易分析

-- Track transaction types and statuses
"(credit | debit | transfer) (completed | pending | failed | cancelled)"

高级使用模式

1. 分层精度

将短语 OR 与其他操作符结合以实现精确的搜索:

@title "(urgent | critical) (update | patch)" @body "security"

2. 性能优化

使用法定人数与 OR 进行模糊匹配,可能比通配符搜索更快:

"(run | running | runner | runs) (fast | quick | speed)"/1

3. 上下文灵活性

利用接近度 OR 处理自然语言变体:

"user (wants | needs | requires) (feature | functionality)"~3

主要好处

  1. 精度:保持精确的短语结构,同时适应变体
  2. 可维护性:一个查询进行更新,而不是管理多个变体
  3. 分析:统一的结果集使分析和排名更有意义
  4. 灵活性:有效处理现实世界的语言变体

结论

短语中的 OR 操作符提供了在严格的精确匹配搜索和宽松的关键字匹配之间的有用中间地带。无论你是在构建电子商务搜索、分析客户反馈,还是创建内容发现系统,这一功能都提供了短语的精确性与替代选项的灵活性。

Manticore Search 13.6.7 将此功能作为其全面文本搜索能力的一部分。短语、接近度和法定人数操作符与 OR 功能的结合为处理复杂搜索需求提供了额外的选项。

要了解有关此功能和其他改进的更多信息,请参阅 Manticore Search 13.6.7 发布说明

安装Manticore Search

安装Manticore Search