⚠️ 此页面为自动翻译,翻译可能不完美。
blog-post

Understanding Pagination in Manticore Search

分页是任何搜索引擎的重要功能,它允许用户高效地浏览大量结果集。Manticore Search 提供了多种强大的分页方法,每种方法都有其自身的优缺点。本文将探讨 Manticore Search 中可用的不同分页选项,并通过实际的 SQL 示例帮助您为您的使用场景实现最佳解决方案。

目录

  1. 分页简介
  2. 设置测试环境
  3. 传统的基于偏移的分页
  4. 结果集窗口和 max_matches
  5. 基于滚动的分页
  6. 通过 HTTP JSON 进行分页
  7. 分页的排序最佳实践
  8. 限制和性能考虑
  9. 带分页的面向方面的搜索
  10. 实际使用案例
  11. 结论

分页简介

在处理大型数据集时,一次性返回所有匹配结果是不切实际的。分页通过将结果划分为可管理的块或“页面”来解决这个问题。Manticore Search 提供了多种分页方法,以适应不同的使用场景,从简单的列表导航到无限滚动界面。

在本文中,我们将探讨三种主要的分页方法:

  • 传统的基于偏移的分页 ( 第3节 )
  • 使用 max_matches 的深度分页 ( 第4节 )
  • 用于高效浏览大型结果集的基于滚动的分页 ( 第5节 )

我们还将讨论它们通过 SQL 和 HTTP JSON 接口的实现方式,性能考虑因素以及实际应用。

设置测试环境

在深入分页示例之前,让我们使用 Manticore Load 工具创建一个带有示例数据的测试环境。该工具使生成和加载测试数据以进行基准测试和实验变得容易:

# Create a products table and populate it with 10,000 records containing "smartphone" in the title
manticore-load --quiet \
  --batch-size=1000 \
  --threads=4 \
  --total=10000 \
  --drop \
  --init="CREATE TABLE products(title text, description text, price int, category string, rating float, stock int)" \
  --load="INSERT INTO products(id, title, description, price, category, rating, stock) VALUES(<increment>, '<text/3/5> smartphone', '<text/20/50>', <int/10/1000>, '<string/5/10>', <float/1/5>, <int/0/100>)"

输出:
manticore-load 输出

有了这个测试环境,我们现在可以探索不同的分页方法。

传统的基于偏移的分页

Manticore Search 中最直接的分页方法使用 LIMIT 子句与偏移量和计数参数。这种方法对于任何使用过 SQL 数据库的人都很熟悉。

基本语法

SELECT ... FROM ... [LIMIT [offset,] row_count]
SELECT ... FROM ... [LIMIT row_count][ OFFSET offset]

示例

-- Return the first 5 results (page 1)
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5;

输出:

+------+----------------------------------------------+-------+----------+
| id   | title                                        | price | weight() |
+------+----------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone                 |    10 |     1272 |
| 1388 | 3; her day of. smartphone                    |    10 |     1272 |
| 1636 | Positively bad winter clean. smartphone      |    10 |     1272 |
| 5628 | Foolish from went heard low. smartphone      |    10 |     1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone |    10 |     1272 |
+------+----------------------------------------------+-------+----------+
-- Return results 6-10 (page 2)
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5, 5;

输出:

+------+----------------------------------------------+-------+----------+
| id   | title                                        | price | weight() |
+------+----------------------------------------------+-------+----------+
|  246 | Soft technically took. smartphone            |    11 |     1272 |
| 1105 | Dirty up are! temporarily called. smartphone |    11 |     1272 |
| 3293 | Instantly thick ran that hurt. smartphone    |    11 |     1272 |
| 3736 | Work her wrong locally. smartphone           |    11 |     1272 |
| 6978 | Stale loud passively sweet clean. smartphone |    11 |     1272 |
+------+----------------------------------------------+-------+----------+
-- Alternative syntax with separate LIMIT and OFFSET
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5 OFFSET 10;
-- Results 11-15 (page 3)

输出:

+------+----------------------------------------+-------+----------+
| id   | title                                  | price | weight() |
+------+----------------------------------------+-------+----------+
| 7318 | Her spent at. smartphone               |    11 |     1272 |
| 8436 | His? felt work. Angry fast. smartphone |    11 |     1272 |
| 9699 | Feel spend funny. smartphone           |    11 |     1272 |
| 2326 | Quietly wet drew found. smartphone     |    12 |     1272 |
| 4171 | Set perhaps new send soon. smartphone  |    12 |     1272 |
+------+----------------------------------------+-------+----------+

此方法易于实现,适合浏览结果的前几页。然而,由于 Manticore 内部处理查询的方式,它在深度分页时效率较低。有关这些限制的详细信息,请参见 第8节:限制和性能考虑

结果集窗口和 max_matches

默认情况下,Manticore Search 限制返回的匹配项数量为 1000。如果您尝试分页超过此限制,查询将导致错误。可以使用 max_matches 选项调整此限制。

理解 max_matches

max_matches 选项控制 Manticore 在搜索时在 RAM 中保留的匹配项数量。它旨在限制每个查询的内存使用。默认值为 1000,对于大多数常见搜索场景来说已经足够,但当您需要访问更深层的结果页面时,可以增加此值。

-- Increase max_matches to allow deeper pagination (results 1001-1005)
SELECT id, title, price FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 1000, 5 
OPTION max_matches=1500;

输出:

+------+-----------------------------------------+-------+
| id   | title                                   | price |
+------+-----------------------------------------+-------+
| 3550 | Open sold being late wide. smartphone   |   111 |
| 4896 | Grew. Wrote as hungry. smartphone       |   111 |
|  647 | Went seen had. smartphone               |   112 |
|  883 | Hold lose; have thought. smartphone     |   112 |
| 1774 | Get! began sick. Gone, wild. smartphone |   112 |
+------+-----------------------------------------+-------+
-- For very deep pagination (results 5000-5005)
SELECT id, title, price FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5000, 5 
OPTION max_matches=5500;

输出:

+------+------------------------------------------------+-------+
| id   | title                                          | price |
+------+------------------------------------------------+-------+
| 4894 | Subtly? inside for spend evening. smartphone   |   507 |
| 5203 | Bad; artistically from yearly. smartphone      |   507 |
| 7446 | Empty probably clever! universally. smartphone |   507 |
| 8053 | In! rich daily irregularly. smartphone         |   507 |
| 8055 | Short cold fall; keep physically. smartphone   |   507 |
+------+------------------------------------------------+-------+

请注意,增加 max_matches 会带来内存成本。每个匹配项都会消耗内存,因此将此值设置得过高可能会影响服务器性能,尤其是在高负载情况下。对于一种更节省内存的深度分页替代方法,请参见 基于滚动的分页

基于滚动的分页

基于滚动的分页提供了一种高效浏览大型结果集的方法。与传统的基于偏移的分页不同,它使用一个标记来跟踪结果集中的当前位置,这有助于高效地浏览结果的连续页面。

工作原理

  1. 执行一个带有排序条件的初始查询(必须在 ORDER BY 中包含 id
  2. 获取一个封装当前位置的滚动标记
  3. 在后续查询中使用此标记来获取下一批结果

示例

-- Initial query with sorting criteria (must include id in ORDER BY)
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC 
LIMIT 5; SHOW SCROLL\G

输出:

+------+----------------------------------------------+-------+----------+
| id   | title                                        | price | weight() |
+------+----------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone                 |    10 |     1272 |
| 1388 | 3; her day of. smartphone                    |    10 |     1272 |
| 1636 | Positively bad winter clean. smartphone      |    10 |     1272 |
| 5628 | Foolish from went heard low. smartphone      |    10 |     1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone |    10 |     1272 |
+------+----------------------------------------------+-------+----------+

*************************** 1. row ***************************
scroll_token: eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMCwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo4NTYxLCJ0eXBlIjoiaW50In1dfQ==
-- Subsequent paginated query using the scroll token
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC 
LIMIT 5 
OPTION scroll='eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MywidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo1MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo0NCwidHlwZSI6ImludCJ9XX0=';

输出:

+------+----------------------------------------------+-------+----------+
| id   | title                                        | price | weight() |
+------+----------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone                 |    10 |     1272 |
| 1388 | 3; her day of. smartphone                    |    10 |     1272 |
| 1636 | Positively bad winter clean. smartphone      |    10 |     1272 |
| 5628 | Foolish from went heard low. smartphone      |    10 |     1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone |    10 |     1272 |
+------+----------------------------------------------+-------+----------+

基于滚动的分页特别适合在 Web 应用程序中实现“加载更多”或无限滚动功能。有关完整实现示例,请参见 实际使用案例

通过 HTTP JSON 进行分页

除了 SQL,Manticore Search 还提供通过 HTTP JSON 进行分页。这对于 Web 应用程序或微服务架构特别有用。传统的基于偏移的分页和基于滚动的分页都通过 JSON API 支持。

通过 HTTP JSON 的传统基于偏移的分页

HTTP JSON API 使用 offsetlimit 参数来控制分页。这等同于 SQL 的 LIMIT offset, count 语法。

基本语法

{
  "table": "table_name",
  "query": { "match": { ... } },
  "limit": number_of_results,
  "offset": starting_position
}

示例

# First page (results 1-5)
curl -s 0:9308/search -d '{
  "table": "products",
  "query": { "match": { "*": "smartphone" } },
  "limit": 5,
  "offset": 0,
  "sort": [
    { "_score": "desc" },
    { "price": "asc" },
    { "id": "asc" }
  ]
}'|jq .

输出:

{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 10000,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 1179,
        "_score": 1272,
        "_source": {
          "title": "Weak spoke? fall. smartphone",
          "description": "Made down never long 3 angry are went asked closed! cold historically got take can to commonly emotionally; socially asked get tame frequently think nowhere? carefully? scientifically small.",
          "price": 10,
          "category": "hkadwpwwk",
          "rating": 2.300000,
          "stock": 63
        }
      },
      // ... more results ...
    ]
  }
}
# Second page (results 6-10)
curl -s 0:9308/search -d '{
  "table": "products",
  "query": { "match": { "*": "smartphone" } },
  "limit": 5,
  "offset": 5,
  "sort": [
    { "_score": "desc" },
    { "price": "asc" },
    { "id": "asc" }
  ]
}' | jq .

输出:

{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 10000,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 246,
        "_score": 1272,
        "_source": {
          "title": "Soft technically took. smartphone",
          "description": "This sometimes empty short normally! been send. Locally kind! quickly might infrequently culturally softly became tried quietly good gradually; inside dead loud.",
          "price": 11,
          "category": "wuiymrjdp",
          "rating": 1.600000,
          "stock": 87
        }
      },
      // ... more results ...
    ]
  }
}

替代语法

HTTP JSON API 还支持使用 fromsize 代替 offsetlimit 的替代语法:

{
  "table": "table_name",
  "query": { "match": { ... } },
  "size": number_of_results,
  "from": starting_position
}

这与之前的格式等效,但遵循 Elasticsearch 的约定,对某些用户可能更熟悉。

通过 HTTP JSON 进行滚动分页

HTTP JSON API 还支持滚动分页以实现更高效的深度分页。这需要设置适当的排序条件并使用滚动令牌。

初始查询

curl -s 0:9308/search -d '{
  "table": "products",
  "query": { "match": { "*": "smartphone" } },
  "limit": 5,
  "track_scores": true,
  "sort": [
    { "_score": "desc" },
    { "id": "asc" }
  ],
  "options": {
    "scroll": true
  }
}' | jq .

输出:

{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 10000,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 1,
        "_score": 1272,
        "_source": {
          "title": "Incorrectly heavy soft. smartphone",
          "description": "Carefully late sit; put draw afraid; gave hardly been vaguely 1000 this at understood gave always east fake, dull internationally sent. At cold; hot.",
          "price": 540,
          "category": "ogqejby",
          "rating": 2.800000,
          "stock": 52
        }
      },
      // ... more results ...
    ]
  },
  "scroll": "eyJvcmRlcl9ieV9zdHIiOiJAd2VpZ2h0IGRlc2MsIGlkIGFzYyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo1LCJ0eXBlIjoiaW50In1dfQ=="
}

注意响应中的 scroll 值,这是一个 base64 编码的令牌,表示您在结果集中的当前位置。

后续查询

curl -s 0:9308/search -d '{
  "table": "products",
  "query": { "match": { "*": "smartphone" } },
  "limit": 5,
  "track_scores": true,
  "options": {
    "scroll": "eyJvcmRlcl9ieV9zdHIiOiJAd2VpZ2h0IGRlc2MsIGlkIGFzYyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo1LCJ0eXBlIjoiaW50In1dfQ=="
  }
}' | jq .

输出:

{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 9995,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 6,
        "_score": 1272,
        "_source": {
          "title": "Negatively! it become. smartphone",
          "description": "Be quiet cold nightly poor hourly had thin got dry, informally work, 100 50 needed specifically serious do bring below began! rich loud.",
          "price": 602,
          "category": "hjpwt",
          "rating": 2.500000,
          "stock": 47
        }
      },
      // ... more results ...
    ]
  },
  "scroll": "eyJvcmRlcl9ieV9zdHIiOiJAd2VpZ2h0IGRlc2MsIGlkIGFzYyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMCwidHlwZSI6ImludCJ9XX0="
}

每个响应都包含一个用于下一页的新滚动令牌,允许继续分页。

HTTP JSON 分页的最佳实践

  1. 包含排序条件:为了获得可预测的分页结果,始终包含显式的排序,例如:
    "sort": [
      { "_score": "desc" },
      { "price": "asc" },
      { "id": "asc" }
    ]
    
  2. 正确处理滚动令牌:存储每个响应中的滚动令牌,并在下一个请求中使用它。
  3. 使用适当的错误处理:检查错误响应,尤其是在深度分页时。
  4. 考虑响应大小:通过使用 _source 参数仅选择所需的字段来控制返回的数据量:
    "_source": ["title", "price", "rating"]
    
  5. 监控性能:对于非常大的结果集,请监控服务器性能并考虑使用基于滚动的分页。

HTTP JSON 分页方法遵循与 SQL 对应方法相同的原则(在 第3节第5节 中描述),但使用 JSON 语法。这使它们非常适合现代 Web 应用程序和 REST API。

分页的排序最佳实践

正确的排序对于有效的分页至关重要,尤其是基于滚动的分页。以下是关键考虑因素:

1. 包含唯一标识符

为了获得一致的分页结果,始终在 ORDER BY 子句中包含唯一标识符(通常是 id 字段)。这确保了即使多个文档在其他排序条件上具有相同的值,分页仍会产生一致的结果。

-- Good: Includes id as the tie-breaker
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC;

-- Bad: No unique identifier, may lead to inconsistent pagination
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY price ASC;

2. 包含 weight() 用于全文搜索

当在查询中使用 MATCH() 时,始终在排序条件中包含 weight(),以确保最相关的结果首先出现:

-- Good: Includes weight() for relevance sorting
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC;

-- Less optimal: Missing relevance sorting
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY price ASC, id ASC;

3. 保持排序顺序一致性

当使用传统的基于偏移的分页时,始终在所有分页查询中保持相同的排序顺序。在页面之间更改排序顺序可能导致结果丢失或重复。

-- First page (correct)
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC LIMIT 10;

-- Second page (correct - same sort order)
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC LIMIT 10, 10;

-- Second page (incorrect - different sort order)
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY rating DESC, id ASC LIMIT 10, 10;

4. 使用确定性排序

对于基于滚动的分页,始终使用确定性排序条件。应避免使用非确定性函数(如 RAND()),因为它们会在查询之间产生不一致的结果。

-- Good: Deterministic sorting
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC;

-- Bad: Non-deterministic sorting, will break scroll pagination
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY RAND();

5. 使用 ID 进行排序方向

当使用 id 字段作为排序的平局处理时,您可以使用 ASC 或 DESC 顺序,但必须一致应用:

-- These are both valid for scroll pagination
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC;
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id DESC;

6. 优化排序性能

为了提高排序性能,请考虑使用 sort_method 选项:

-- For pre-sorted data (e.g., by id)
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY id ASC 
LIMIT 5 OPTION sort_method=kbuffer;

-- For other sorting criteria
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC 
LIMIT 5 OPTION sort_method=pq;

您选择的排序方法会显著影响分页的性能和一致性。例如,确定性排序对于基于滚动的分页至关重要,而包含唯一标识符对于所有分页方法的一致结果至关重要。有关排序如何影响性能的更多详细信息,请参阅 限制和性能考虑

限制和性能考虑

了解不同分页方法的限制将帮助您为特定用例选择合适的方法。

基于偏移的分页限制

  1. 大型偏移量性能下降:随着偏移量值的增加,查询性能会下降。这是因为 Manticore 仍需要处理所有记录直到偏移量值,然后才能返回请求的页面。
  2. max_matches 限制:默认情况下,您只能分页浏览前 1000 个结果。要超出此限制,必须增加 max_matches,这会增加内存使用。
  3. 内存使用:较大的 max_matches 值需要每个查询更多的 RAM,这可能在高负载下影响服务器性能。
  4. 结果一致性无法保证:如果在页面请求之间添加或删除文档,可能会看到重复或遗漏的结果。

对于深度分页场景或实现“加载更多”功能,请考虑使用 基于滚动的分页 ,如 第5节 中所述。

基于滚动的分页限制

  1. 需要在排序条件中包含 ID:您必须在 ORDER BY 子句中包含 id 字段,这可能与您期望的排序逻辑不一致。
  2. 令牌管理:您需要在请求之间存储和管理滚动令牌,这会增加应用程序的复杂性。
  3. 无一致性保证:基于滚动的分页不保证时间点一致性。如果在请求之间添加、删除或修改文档,结果仍可能受到影响。
  4. 无随机访问:与基于偏移的分页不同,您无法直接跳转到特定页面;必须顺序导航。

带分页的面向方面的搜索

面向方面的搜索允许用户通过多个维度过滤和导航搜索结果。当与分页结合使用时,了解分页如何影响面向方面的结果非常重要。

-- Faceted search with pagination (first page)
SELECT id, title, price, weight()
FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 0, 5
FACET category ORDER BY COUNT(*) DESC;

输出:

+------+----------------------------------------------+-------+----------+
| id   | title                                        | price | weight() |
+------+----------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone                 |    10 |     1272 |
| 1388 | 3; her day of. smartphone                    |    10 |     1272 |
| 1636 | Positively bad winter clean. smartphone      |    10 |     1272 |
| 5628 | Foolish from went heard low. smartphone      |    10 |     1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone |    10 |     1272 |
+------+----------------------------------------------+-------+----------+
+-----------+----------+
| category  | count(*) |
+-----------+----------+
| ogqejby   |        1 |
| unmfujgqr |        1 |
| ttefm     |        1 |
| hceihdy   |        1 |
| sicjr     |        1 |
| hjpwt     |        1 |
| tvfqyj    |        1 |
| mvdjhbexo |        1 |
| scayuo    |        1 |
| esmlh     |        1 |
| fvbhplj   |        1 |
| lcphmiqmv |        1 |
| lnjfhb    |        1 |
| qexfdulub |        1 |
| tbswa     |        1 |
| eekarf    |        1 |
| airjuod   |        1 |
| ozkbuvgj  |        1 |
| yafbhr    |        1 |
| duccr     |        1 |
+-----------+----------+
-- Faceted search with pagination (second page)
SELECT id, title, price, weight()
FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5, 5
FACET category ORDER BY COUNT(*) DESC;

输出:

+------+-----------------------------------------------------------+-------+----------+
| id   | title                                                     | price | weight() |
+------+-----------------------------------------------------------+-------+----------+
|  246 | Soft technically took. smartphone                         |    11 |     1272 |
| 1105 | Dirty up are! temporarily called. smartphone              |    11 |     1272 |
| 3293 | Instantly thick ran that hurt. smartphone                 |    11 |     1272 |
| 3736 | Work her wrong locally. smartphone                        |    11 |     1272 |
| 6978 | Stale loud passively sweet clean. smartphone              |    11 |     1272 |
+------+-----------------------------------------------------------+-------+----------+
+-----------+----------+
| category  | count(*) |
+-----------+----------+
| ogqejby   |        1 |
| unmfujgqr |        1 |
| ttefm     |        1 |
| hceihdy   |        1 |
| sicjr     |        1 |
| hjpwt     |        1 |
| tvfqyj    |        1 |
| mvdjhbexo |        1 |
| scayuo    |        1 |
| esmlh     |        1 |
| fvbhplj   |        1 |
| lcphmiqmv |        1 |
| lnjfhb    |        1 |
| qexfdulub |        1 |
| tbswa     |        1 |
| eekarf    |        1 |
| airjuod   |        1 |
| ozkbuvgj  |        1 |
| yafbhr    |        1 |
| duccr     |        1 |
+-----------+----------+

重要说明: 分面结果是基于整个匹配查询结果集计算的,而不是仅基于分页部分。这意味着无论您查看的是哪一页,分面计数都代表所有匹配文档的分布情况。LIMIT 子句仅影响主结果集中返回的文档,不影响分面计算。如上例所示,两页的分面结果相同,确认了它们是基于整个结果集计算的。

分面结果中的分页

除了对主搜索结果进行分页外,Manticore Search 还允许对分面结果本身进行分页。这在处理大量分面值时特别有用,因为显示所有分面值可能不切实际。

您可以在 FACET 语句中使用 LIMIT 子句来控制返回的分面值数量。

-- Get only the first 5 categories ordered by count
SELECT id, title FROM products 
WHERE MATCH('smartphone') 
LIMIT 3 
FACET category ORDER BY COUNT(*) DESC, category asc LIMIT 0,5;

输出:

+------+---------------------------------------------+
| id   | title                                       |
+------+---------------------------------------------+
|    1 | Incorrectly heavy soft. smartphone          |
|    2 | Know of make afraid foolish. smartphone     |
|    3 | Poor spring evening drove young. smartphone |
+------+---------------------------------------------+
+------------+----------+
| category   | count(*) |
+------------+----------+
| aaaabzwfzn |        1 |
| aabswb     |        1 |
| aacla      |        1 |
| aaejmtubv  |        1 |
| aaethytj   |        1 |
+------------+----------+

您可以在同一查询中为不同的分面提供不同的分页参数:

-- Multiple facets with different pagination parameters
SELECT id, title FROM products 
WHERE MATCH('smartphone') 
LIMIT 3 
FACET category ORDER BY COUNT(*) DESC, category asc LIMIT 0,5 
FACET category ORDER BY category ASC LIMIT 3,5 
FACET price ORDER BY FACET() ASC LIMIT 5;

输出:

+------+---------------------------------------------+
| id   | title                                       |
+------+---------------------------------------------+
|    1 | Incorrectly heavy soft. smartphone          |
|    2 | Know of make afraid foolish. smartphone     |
|    3 | Poor spring evening drove young. smartphone |
+------+---------------------------------------------+
+------------+----------+
| category   | count(*) |
+------------+----------+
| aaaabzwfzn |        1 |
| aabswb     |        1 |
| aacla      |        1 |
| aaejmtubv  |        1 |
| aaethytj   |        1 |
+------------+----------+
+------------+----------+
| category   | count(*) |
+------------+----------+
| aaejmtubv  |        1 |
| aaethytj   |        1 |
| aaktjgaa   |        1 |
| aalfwcvwil |        1 |
| aaqmumofe  |        1 |
+------------+----------+
+-------+----------+
| price | count(*) |
+-------+----------+
|    10 |        5 |
|    11 |        8 |
|    12 |       10 |
|    13 |        9 |
|    14 |        8 |
+-------+----------+

此示例演示了多种分面分页技术:

  1. 第一个分面返回按计数排序的前 5 个类别(偏移量 0,限制 5)
  2. 第二个分面按字母顺序返回类别,跳过前 3 个(偏移量 3,限制 5)
  3. 第三个分面返回前 5 个价格点,按升序排列

分面分页特别适用于:

  • 处理大量分面值(例如数千个产品类别)
  • 为分面值实现“显示更多”功能
  • 创建动态过滤界面,首先显示最受欢迎的过滤器
  • 通过限制返回的分面值数量进行性能优化

注意: 与主结果分页不同,主结果分页在深度偏移时会影响性能,而分面分页即使在大型偏移时也通常效率较高,因为分面计算发生在文档匹配阶段之后。

通过 HTTP JSON 进行分面分页

当使用 HTTP JSON 接口时,也可以使用分面分页。您可以通过聚合规范中的 size 参数来限制返回的分面值数量:

{
  "table": "products",
  "query": { "match": { "*": "smartphone" } },
  "limit": 3,
  "aggs": {
    "category_counts": {
      "terms": {
        "field": "category",
        "size": 5
      }
    }
  }
}

此示例将类别分面结果限制为前 5 个值。

重要: 与 SQL 分面不同,HTTP JSON 接口目前仅支持使用 size 参数限制分面值数量,不支持分面内的 OFFSET 参数进行分页。您只能指定返回多少结果,而不能指定跳过哪些结果。

有关通过 HTTP JSON 进行分面分页的更多详细信息,请参阅 Manticore Search 文档

实际使用案例

带高亮的分页

-- Pagination with highlighting
SELECT id, title, price, weight(),
       HIGHLIGHT({limit=100}) as highlight
FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5, 5;

输出:

+------+----------------------------------------------+-------+----------+-----------------------------------------------------+
| id   | title                                        | price | weight() | highlight                                           |
+------+----------------------------------------------+-------+----------+-----------------------------------------------------+
|  246 | Soft technically took. smartphone            |    11 |     1272 | Soft technically took. <b>smartphone</b>            |
| 1105 | Dirty up are! temporarily called. smartphone |    11 |     1272 | Dirty up are! temporarily called. <b>smartphone</b> |
| 3293 | Instantly thick ran that hurt. smartphone    |    11 |     1272 | Instantly thick ran that hurt. <b>smartphone</b>    |
| 3736 | Work her wrong locally. smartphone           |    11 |     1272 | Work her wrong locally. <b>smartphone</b>           |
| 6978 | Stale loud passively sweet clean. smartphone |    11 |     1272 | Stale loud passively sweet clean. <b>smartphone</b> |
+------+----------------------------------------------+-------+----------+-----------------------------------------------------+

使用滚动分页实现“加载更多”按钮

基于滚动的分页非常适合实现“加载更多”功能,因为它可以高效地导航大型结果集。以下是实现方法:

-- Initial query with SHOW SCROLL to get the scroll token
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC 
LIMIT 5; SHOW SCROLL\G

输出:

+------+------------------------------------------------+-------+----------+
| id   | title                                          | price | weight() |
+------+------------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone                   |    10 |     1272 |
| 1388 | 3; her day of. smartphone                      |    10 |     1272 |
| 1636 | Positively bad winter clean. smartphone        |    10 |     1272 |
| 5628 | Foolish from went heard low. smartphone        |    10 |     1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone   |    10 |     1272 |
+------+------------------------------------------------+-------+----------+

*************************** 1. row ***************************
scroll_token: eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMCwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo4NTYxLCJ0eXBlIjoiaW50In1dfQ==
1 row in set (0.00 sec)
-- When user clicks "Load More", use the scroll token to get the next batch
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
LIMIT 5 
OPTION scroll='eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMCwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo4NTYxLCJ0eXBlIjoiaW50In1dfQ=='; SHOW SCROLL\G

输出:

+------+---------------------------------------------+-------+----------+
| id   | title                                       | price | weight() |
+------+---------------------------------------------+-------+----------+
|  246 | Soft technically took. smartphone           |    11 |     1272 |
| 1105 | Dirty up are! temporarily called. smartphone|    11 |     1272 |
| 3293 | Instantly thick ran that hurt. smartphone   |    11 |     1272 |
| 3736 | Work her wrong locally. smartphone          |    11 |     1272 |
| 6978 | Stale loud passively sweet clean. smartphone|    11 |     1272 |
+------+---------------------------------------------+-------+----------+

*************************** 1. row ***************************
scroll_token: eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMSwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo2OTc4LCJ0eXBlIjoiaW50In1dfQ==

每次点击“加载更多”时,您将使用前一个查询返回的新滚动令牌。这种方法相比传统的基于偏移的分页有以下优势:

  1. 性能更好:深度分页时不会出现性能下降
  2. 无最大匹配限制:可以高效地分页数百万个结果
  3. 顺序访问:高效地导航连续的结果页

实现提示:

  • 在应用程序的状态中存储滚动令牌(客户端或服务器端)
  • 当用户点击“加载更多”时,在后续请求中包含该令牌
  • 始终使用最新的令牌,因为每个令牌代表结果集中的当前位置
  • 请记住滚动令牌是临时的,最终会过期

此实现方法解决了传统基于偏移分页在 第 8 节 中描述的许多限制,特别是针对深度分页场景。

结论

Manticore Search 提供了灵活的分页选项,以满足不同的使用场景。传统的基于偏移的分页适用于简单界面和有限的页面导航,而基于滚动的分页则在深度分页场景(如无限滚动或数据导出)中提供更好的性能。

在 Manticore Search 中实现分页时:

  1. 对于浅层分页(前几页):使用传统的基于偏移的分页,并保持一致的排序。
  2. 对于深层分页(多页或大型数据集):使用基于滚动的分页,并使用适当的基于 ID 的排序标准。
  3. 对于一致的结果:始终在 ORDER BY 子句中包含 id 字段,并在查询中保持一致的排序标准。
  4. 对于全文搜索:在排序标准中包含 weight()_score,以确保最相关的结果首先出现。

通过了解每种方法的优势、限制和最佳实践,您可以为特定的应用需求实现最有效的分页策略。请记住,最佳的分页方法取决于您的使用场景、数据量和用户体验要求。

安装Manticore Search

安装Manticore Search