Understanding Pagination in Manticore Search

Understanding Pagination in Manticore Search

已发布: Mar 18, 2025
自动翻译: Understanding Pagination in Manticore Search

分页是任何搜索引擎的重要功能，它允许用户高效地浏览大量结果集。Manticore Search 提供了多种强大的分页方法，每种方法都有其自身的优缺点。本文将探讨 Manticore Search 中可用的不同分页选项，并通过实际的 SQL 示例帮助您为您的使用场景实现最佳解决方案。

分页简介

在处理大型数据集时，一次性返回所有匹配结果是不切实际的。分页通过将结果划分为可管理的块或“页面”来解决这个问题。Manticore Search 提供了多种分页方法，以适应不同的使用场景，从简单的列表导航到无限滚动界面。

在本文中，我们将探讨三种主要的分页方法：

传统的基于偏移的分页 ( 第3节 )
使用 max_matches 的深度分页 ( 第4节 )
用于高效浏览大型结果集的基于滚动的分页 ( 第5节 )

我们还将讨论它们通过 SQL 和 HTTP JSON 接口的实现方式，性能考虑因素以及实际应用。

设置测试环境

在深入分页示例之前，让我们使用 Manticore Load 工具创建一个带有示例数据的测试环境。该工具使生成和加载测试数据以进行基准测试和实验变得容易：

# Create a products table and populate it with 10,000 records containing "smartphone" in the title
manticore-load --quiet \
  --batch-size=1000 \
  --threads=4 \
  --total=10000 \
  --drop \
  --init="CREATE TABLE products(title text, description text, price int, category string, rating float, stock int)" \
  --load="INSERT INTO products(id, title, description, price, category, rating, stock) VALUES(<increment>, '<text/3/5> smartphone', '<text/20/50>', <int/10/1000>, '<string/5/10>', <float/1/5>, <int/0/100>)"

输出：
manticore-load 输出

有了这个测试环境，我们现在可以探索不同的分页方法。

传统的基于偏移的分页

Manticore Search 中最直接的分页方法使用 LIMIT 子句与偏移量和计数参数。这种方法对于任何使用过 SQL 数据库的人都很熟悉。

基本语法

SELECT ... FROM ... [LIMIT [offset,] row_count]
SELECT ... FROM ... [LIMIT row_count][ OFFSET offset]

示例

-- Return the first 5 results (page 1)
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5;

输出：

+------+----------------------------------------------+-------+----------+
| id   | title                                        | price | weight() |
+------+----------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone                 |    10 |     1272 |
| 1388 | 3; her day of. smartphone                    |    10 |     1272 |
| 1636 | Positively bad winter clean. smartphone      |    10 |     1272 |
| 5628 | Foolish from went heard low. smartphone      |    10 |     1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone |    10 |     1272 |
+------+----------------------------------------------+-------+----------+

-- Return results 6-10 (page 2)
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5, 5;

输出：

+------+----------------------------------------------+-------+----------+
| id   | title                                        | price | weight() |
+------+----------------------------------------------+-------+----------+
|  246 | Soft technically took. smartphone            |    11 |     1272 |
| 1105 | Dirty up are! temporarily called. smartphone |    11 |     1272 |
| 3293 | Instantly thick ran that hurt. smartphone    |    11 |     1272 |
| 3736 | Work her wrong locally. smartphone           |    11 |     1272 |
| 6978 | Stale loud passively sweet clean. smartphone |    11 |     1272 |
+------+----------------------------------------------+-------+----------+

-- Alternative syntax with separate LIMIT and OFFSET
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5 OFFSET 10;
-- Results 11-15 (page 3)

输出：

+------+----------------------------------------+-------+----------+
| id   | title                                  | price | weight() |
+------+----------------------------------------+-------+----------+
| 7318 | Her spent at. smartphone               |    11 |     1272 |
| 8436 | His? felt work. Angry fast. smartphone |    11 |     1272 |
| 9699 | Feel spend funny. smartphone           |    11 |     1272 |
| 2326 | Quietly wet drew found. smartphone     |    12 |     1272 |
| 4171 | Set perhaps new send soon. smartphone  |    12 |     1272 |
+------+----------------------------------------+-------+----------+

此方法易于实现，适合浏览结果的前几页。然而，由于 Manticore 内部处理查询的方式，它在深度分页时效率较低。有关这些限制的详细信息，请参见第8节：限制和性能考虑。

结果集窗口和 max_matches

默认情况下，Manticore Search 限制返回的匹配项数量为 1000。如果您尝试分页超过此限制，查询将导致错误。可以使用 max_matches 选项调整此限制。

理解 max_matches

max_matches 选项控制 Manticore 在搜索时在 RAM 中保留的匹配项数量。它旨在限制每个查询的内存使用。默认值为 1000，对于大多数常见搜索场景来说已经足够，但当您需要访问更深层的结果页面时，可以增加此值。

-- Increase max_matches to allow deeper pagination (results 1001-1005)
SELECT id, title, price FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 1000, 5 
OPTION max_matches=1500;

输出：

+------+-----------------------------------------+-------+
| id   | title                                   | price |
+------+-----------------------------------------+-------+
| 3550 | Open sold being late wide. smartphone   |   111 |
| 4896 | Grew. Wrote as hungry. smartphone       |   111 |
|  647 | Went seen had. smartphone               |   112 |
|  883 | Hold lose; have thought. smartphone     |   112 |
| 1774 | Get! began sick. Gone, wild. smartphone |   112 |
+------+-----------------------------------------+-------+

-- For very deep pagination (results 5000-5005)
SELECT id, title, price FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5000, 5 
OPTION max_matches=5500;

输出：

+------+------------------------------------------------+-------+
| id   | title                                          | price |
+------+------------------------------------------------+-------+
| 4894 | Subtly? inside for spend evening. smartphone   |   507 |
| 5203 | Bad; artistically from yearly. smartphone      |   507 |
| 7446 | Empty probably clever! universally. smartphone |   507 |
| 8053 | In! rich daily irregularly. smartphone         |   507 |
| 8055 | Short cold fall; keep physically. smartphone   |   507 |
+------+------------------------------------------------+-------+

请注意，增加 max_matches 会带来内存成本。每个匹配项都会消耗内存，因此将此值设置得过高可能会影响服务器性能，尤其是在高负载情况下。对于一种更节省内存的深度分页替代方法，请参见基于滚动的分页。

基于滚动的分页

基于滚动的分页提供了一种高效浏览大型结果集的方法。与传统的基于偏移的分页不同，它使用一个标记来跟踪结果集中的当前位置，这有助于高效地浏览结果的连续页面。

工作原理

执行一个带有排序条件的初始查询（必须在 ORDER BY 中包含 id）
获取一个封装当前位置的滚动标记
在后续查询中使用此标记来获取下一批结果

示例

-- Initial query with sorting criteria (must include id in ORDER BY)
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC 
LIMIT 5; SHOW SCROLL\G

输出：

+------+----------------------------------------------+-------+----------+
| id   | title                                        | price | weight() |
+------+----------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone                 |    10 |     1272 |
| 1388 | 3; her day of. smartphone                    |    10 |     1272 |
| 1636 | Positively bad winter clean. smartphone      |    10 |     1272 |
| 5628 | Foolish from went heard low. smartphone      |    10 |     1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone |    10 |     1272 |
+------+----------------------------------------------+-------+----------+

*************************** 1. row ***************************
scroll_token: eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMCwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo4NTYxLCJ0eXBlIjoiaW50In1dfQ==

-- Subsequent paginated query using the scroll token
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC 
LIMIT 5 
OPTION scroll='eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MywidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo1MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo0NCwidHlwZSI6ImludCJ9XX0=';

输出：

+------+----------------------------------------------+-------+----------+
| id   | title                                        | price | weight() |
+------+----------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone                 |    10 |     1272 |
| 1388 | 3; her day of. smartphone                    |    10 |     1272 |
| 1636 | Positively bad winter clean. smartphone      |    10 |     1272 |
| 5628 | Foolish from went heard low. smartphone      |    10 |     1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone |    10 |     1272 |
+------+----------------------------------------------+-------+----------+

基于滚动的分页特别适合在 Web 应用程序中实现“加载更多”或无限滚动功能。有关完整实现示例，请参见实际使用案例。

通过 HTTP JSON 进行分页

除了 SQL，Manticore Search 还提供通过 HTTP JSON 进行分页。这对于 Web 应用程序或微服务架构特别有用。传统的基于偏移的分页和基于滚动的分页都通过 JSON API 支持。

通过 HTTP JSON 的传统基于偏移的分页

HTTP JSON API 使用 offset 和 limit 参数来控制分页。这等同于 SQL 的 LIMIT offset, count 语法。

基本语法

{
  "table": "table_name",
  "query": { "match": { ... } },
  "limit": number_of_results,
  "offset": starting_position
}

示例

# First page (results 1-5)
curl -s 0:9308/search -d '{
  "table": "products",
  "query": { "match": { "*": "smartphone" } },
  "limit": 5,
  "offset": 0,
  "sort": [
    { "_score": "desc" },
    { "price": "asc" },
    { "id": "asc" }
  ]
}'|jq .

输出：

{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 10000,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 1179,
        "_score": 1272,
        "_source": {
          "title": "Weak spoke? fall. smartphone",
          "description": "Made down never long 3 angry are went asked closed! cold historically got take can to commonly emotionally; socially asked get tame frequently think nowhere? carefully? scientifically small.",
          "price": 10,
          "category": "hkadwpwwk",
          "rating": 2.300000,
          "stock": 63
        }
      },
      // ... more results ...
    ]
  }
}

# Second page (results 6-10)
curl -s 0:9308/search -d '{
  "table": "products",
  "query": { "match": { "*": "smartphone" } },
  "limit": 5,
  "offset": 5,
  "sort": [
    { "_score": "desc" },
    { "price": "asc" },
    { "id": "asc" }
  ]
}' | jq .

输出：

{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 10000,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 246,
        "_score": 1272,
        "_source": {
          "title": "Soft technically took. smartphone",
          "description": "This sometimes empty short normally! been send. Locally kind! quickly might infrequently culturally softly became tried quietly good gradually; inside dead loud.",
          "price": 11,
          "category": "wuiymrjdp",
          "rating": 1.600000,
          "stock": 87
        }
      },
      // ... more results ...
    ]
  }
}

替代语法

HTTP JSON API 还支持使用 from 和 size 代替 offset 和 limit 的替代语法：

{
  "table": "table_name",
  "query": { "match": { ... } },
  "size": number_of_results,
  "from": starting_position
}

这与之前的格式等效，但遵循 Elasticsearch 的约定，对某些用户可能更熟悉。

通过 HTTP JSON 进行滚动分页

HTTP JSON API 还支持滚动分页以实现更高效的深度分页。这需要设置适当的排序条件并使用滚动令牌。

初始查询

curl -s 0:9308/search -d '{
  "table": "products",
  "query": { "match": { "*": "smartphone" } },
  "limit": 5,
  "track_scores": true,
  "sort": [
    { "_score": "desc" },
    { "id": "asc" }
  ],
  "options": {
    "scroll": true
  }
}' | jq .

输出：

{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 10000,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 1,
        "_score": 1272,
        "_source": {
          "title": "Incorrectly heavy soft. smartphone",
          "description": "Carefully late sit; put draw afraid; gave hardly been vaguely 1000 this at understood gave always east fake, dull internationally sent. At cold; hot.",
          "price": 540,
          "category": "ogqejby",
          "rating": 2.800000,
          "stock": 52
        }
      },
      // ... more results ...
    ]
  },
  "scroll": "eyJvcmRlcl9ieV9zdHIiOiJAd2VpZ2h0IGRlc2MsIGlkIGFzYyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo1LCJ0eXBlIjoiaW50In1dfQ=="
}

注意响应中的 scroll 值，这是一个 base64 编码的令牌，表示您在结果集中的当前位置。

后续查询

curl -s 0:9308/search -d '{
  "table": "products",
  "query": { "match": { "*": "smartphone" } },
  "limit": 5,
  "track_scores": true,
  "options": {
    "scroll": "eyJvcmRlcl9ieV9zdHIiOiJAd2VpZ2h0IGRlc2MsIGlkIGFzYyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo1LCJ0eXBlIjoiaW50In1dfQ=="
  }
}' | jq .

输出：

{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 9995,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 6,
        "_score": 1272,
        "_source": {
          "title": "Negatively! it become. smartphone",
          "description": "Be quiet cold nightly poor hourly had thin got dry, informally work, 100 50 needed specifically serious do bring below began! rich loud.",
          "price": 602,
          "category": "hjpwt",
          "rating": 2.500000,
          "stock": 47
        }
      },
      // ... more results ...
    ]
  },
  "scroll": "eyJvcmRlcl9ieV9zdHIiOiJAd2VpZ2h0IGRlc2MsIGlkIGFzYyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMCwidHlwZSI6ImludCJ9XX0="
}

每个响应都包含一个用于下一页的新滚动令牌，允许继续分页。

HTTP JSON 分页的最佳实践

包含排序条件：为了获得可预测的分页结果，始终包含显式的排序，例如：
```
"sort": [
  { "_score": "desc" },
  { "price": "asc" },
  { "id": "asc" }
]
```
正确处理滚动令牌：存储每个响应中的滚动令牌，并在下一个请求中使用它。
使用适当的错误处理：检查错误响应，尤其是在深度分页时。
考虑响应大小：通过使用 _source 参数仅选择所需的字段来控制返回的数据量：
```
"_source": ["title", "price", "rating"]
```
监控性能：对于非常大的结果集，请监控服务器性能并考虑使用基于滚动的分页。

HTTP JSON 分页方法遵循与 SQL 对应方法相同的原则（在第3节和第5节中描述），但使用 JSON 语法。这使它们非常适合现代 Web 应用程序和 REST API。

分页的排序最佳实践

正确的排序对于有效的分页至关重要，尤其是基于滚动的分页。以下是关键考虑因素：

1. 包含唯一标识符

为了获得一致的分页结果，始终在 ORDER BY 子句中包含唯一标识符（通常是 id 字段）。这确保了即使多个文档在其他排序条件上具有相同的值，分页仍会产生一致的结果。

-- Good: Includes id as the tie-breaker
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC;

-- Bad: No unique identifier, may lead to inconsistent pagination
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY price ASC;

2. 包含 weight() 用于全文搜索

当在查询中使用 MATCH() 时，始终在排序条件中包含 weight()，以确保最相关的结果首先出现：

-- Good: Includes weight() for relevance sorting
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC;

-- Less optimal: Missing relevance sorting
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY price ASC, id ASC;

3. 保持排序顺序一致性

当使用传统的基于偏移的分页时，始终在所有分页查询中保持相同的排序顺序。在页面之间更改排序顺序可能导致结果丢失或重复。

-- First page (correct)
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC LIMIT 10;

-- Second page (correct - same sort order)
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC LIMIT 10, 10;

-- Second page (incorrect - different sort order)
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY rating DESC, id ASC LIMIT 10, 10;

4. 使用确定性排序

对于基于滚动的分页，始终使用确定性排序条件。应避免使用非确定性函数（如 RAND()），因为它们会在查询之间产生不一致的结果。

-- Good: Deterministic sorting
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC;

-- Bad: Non-deterministic sorting, will break scroll pagination
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY RAND();

5. 使用 ID 进行排序方向

当使用 id 字段作为排序的平局处理时，您可以使用 ASC 或 DESC 顺序，但必须一致应用：

-- These are both valid for scroll pagination
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC;
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id DESC;

6. 优化排序性能

为了提高排序性能，请考虑使用 sort_method 选项：

-- For pre-sorted data (e.g., by id)
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY id ASC 
LIMIT 5 OPTION sort_method=kbuffer;

-- For other sorting criteria
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC 
LIMIT 5 OPTION sort_method=pq;

您选择的排序方法会显著影响分页的性能和一致性。例如，确定性排序对于基于滚动的分页至关重要，而包含唯一标识符对于所有分页方法的一致结果至关重要。有关排序如何影响性能的更多详细信息，请参阅限制和性能考虑。

限制和性能考虑

了解不同分页方法的限制将帮助您为特定用例选择合适的方法。

基于偏移的分页限制

大型偏移量性能下降：随着偏移量值的增加，查询性能会下降。这是因为 Manticore 仍需要处理所有记录直到偏移量值，然后才能返回请求的页面。
max_matches 限制：默认情况下，您只能分页浏览前 1000 个结果。要超出此限制，必须增加 max_matches，这会增加内存使用。
内存使用：较大的 max_matches 值需要每个查询更多的 RAM，这可能在高负载下影响服务器性能。
结果一致性无法保证：如果在页面请求之间添加或删除文档，可能会看到重复或遗漏的结果。

对于深度分页场景或实现“加载更多”功能，请考虑使用基于滚动的分页，如第5节中所述。

基于滚动的分页限制

需要在排序条件中包含 ID：您必须在 ORDER BY 子句中包含 id 字段，这可能与您期望的排序逻辑不一致。
令牌管理：您需要在请求之间存储和管理滚动令牌，这会增加应用程序的复杂性。
无一致性保证：基于滚动的分页不保证时间点一致性。如果在请求之间添加、删除或修改文档，结果仍可能受到影响。
无随机访问：与基于偏移的分页不同，您无法直接跳转到特定页面；必须顺序导航。

带分页的面向方面的搜索

面向方面的搜索允许用户通过多个维度过滤和导航搜索结果。当与分页结合使用时，了解分页如何影响面向方面的结果非常重要。

-- Faceted search with pagination (first page)
SELECT id, title, price, weight()
FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 0, 5
FACET category ORDER BY COUNT(*) DESC;

输出：

+------+----------------------------------------------+-------+----------+
| id   | title                                        | price | weight() |
+------+----------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone                 |    10 |     1272 |
| 1388 | 3; her day of. smartphone                    |    10 |     1272 |
| 1636 | Positively bad winter clean. smartphone      |    10 |     1272 |
| 5628 | Foolish from went heard low. smartphone      |    10 |     1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone |    10 |     1272 |
+------+----------------------------------------------+-------+----------+
+-----------+----------+
| category  | count(*) |
+-----------+----------+
| ogqejby   |        1 |
| unmfujgqr |        1 |
| ttefm     |        1 |
| hceihdy   |        1 |
| sicjr     |        1 |
| hjpwt     |        1 |
| tvfqyj    |        1 |
| mvdjhbexo |        1 |
| scayuo    |        1 |
| esmlh     |        1 |
| fvbhplj   |        1 |
| lcphmiqmv |        1 |
| lnjfhb    |        1 |
| qexfdulub |        1 |
| tbswa     |        1 |
| eekarf    |        1 |
| airjuod   |        1 |
| ozkbuvgj  |        1 |
| yafbhr    |        1 |
| duccr     |        1 |
+-----------+----------+

-- Faceted search with pagination (second page)
SELECT id, title, price, weight()
FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5, 5
FACET category ORDER BY COUNT(*) DESC;

输出：

+------+-----------------------------------------------------------+-------+----------+
| id   | title                                                     | price | weight() |
+------+-----------------------------------------------------------+-------+----------+
|  246 | Soft technically took. smartphone                         |    11 |     1272 |
| 1105 | Dirty up are! temporarily called. smartphone              |    11 |     1272 |
| 3293 | Instantly thick ran that hurt. smartphone                 |    11 |     1272 |
| 3736 | Work her wrong locally. smartphone                        |    11 |     1272 |
| 6978 | Stale loud passively sweet clean. smartphone              |    11 |     1272 |
+------+-----------------------------------------------------------+-------+----------+
+-----------+----------+
| category  | count(*) |
+-----------+----------+
| ogqejby   |        1 |
| unmfujgqr |        1 |
| ttefm     |        1 |
| hceihdy   |        1 |
| sicjr     |        1 |
| hjpwt     |        1 |
| tvfqyj    |        1 |
| mvdjhbexo |        1 |
| scayuo    |        1 |
| esmlh     |        1 |
| fvbhplj   |        1 |
| lcphmiqmv |        1 |
| lnjfhb    |        1 |
| qexfdulub |        1 |
| tbswa     |        1 |
| eekarf    |        1 |
| airjuod   |        1 |
| ozkbuvgj  |        1 |
| yafbhr    |        1 |
| duccr     |        1 |
+-----------+----------+

重要说明： 分面结果是基于整个匹配查询结果集计算的，而不是仅基于分页部分。这意味着无论您查看的是哪一页，分面计数都代表所有匹配文档的分布情况。LIMIT 子句仅影响主结果集中返回的文档，不影响分面计算。如上例所示，两页的分面结果相同，确认了它们是基于整个结果集计算的。

分面结果中的分页

除了对主搜索结果进行分页外，Manticore Search 还允许对分面结果本身进行分页。这在处理大量分面值时特别有用，因为显示所有分面值可能不切实际。

您可以在 FACET 语句中使用 LIMIT 子句来控制返回的分面值数量。

-- Get only the first 5 categories ordered by count
SELECT id, title FROM products 
WHERE MATCH('smartphone') 
LIMIT 3 
FACET category ORDER BY COUNT(*) DESC, category asc LIMIT 0,5;

输出：

+------+---------------------------------------------+
| id   | title                                       |
+------+---------------------------------------------+
|    1 | Incorrectly heavy soft. smartphone          |
|    2 | Know of make afraid foolish. smartphone     |
|    3 | Poor spring evening drove young. smartphone |
+------+---------------------------------------------+
+------------+----------+
| category   | count(*) |
+------------+----------+
| aaaabzwfzn |        1 |
| aabswb     |        1 |
| aacla      |        1 |
| aaejmtubv  |        1 |
| aaethytj   |        1 |
+------------+----------+

您可以在同一查询中为不同的分面提供不同的分页参数：

-- Multiple facets with different pagination parameters
SELECT id, title FROM products 
WHERE MATCH('smartphone') 
LIMIT 3 
FACET category ORDER BY COUNT(*) DESC, category asc LIMIT 0,5 
FACET category ORDER BY category ASC LIMIT 3,5 
FACET price ORDER BY FACET() ASC LIMIT 5;

输出：

+------+---------------------------------------------+
| id   | title                                       |
+------+---------------------------------------------+
|    1 | Incorrectly heavy soft. smartphone          |
|    2 | Know of make afraid foolish. smartphone     |
|    3 | Poor spring evening drove young. smartphone |
+------+---------------------------------------------+
+------------+----------+
| category   | count(*) |
+------------+----------+
| aaaabzwfzn |        1 |
| aabswb     |        1 |
| aacla      |        1 |
| aaejmtubv  |        1 |
| aaethytj   |        1 |
+------------+----------+
+------------+----------+
| category   | count(*) |
+------------+----------+
| aaejmtubv  |        1 |
| aaethytj   |        1 |
| aaktjgaa   |        1 |
| aalfwcvwil |        1 |
| aaqmumofe  |        1 |
+------------+----------+
+-------+----------+
| price | count(*) |
+-------+----------+
|    10 |        5 |
|    11 |        8 |
|    12 |       10 |
|    13 |        9 |
|    14 |        8 |
+-------+----------+

此示例演示了多种分面分页技术：

第一个分面返回按计数排序的前 5 个类别（偏移量 0，限制 5）
第二个分面按字母顺序返回类别，跳过前 3 个（偏移量 3，限制 5）
第三个分面返回前 5 个价格点，按升序排列

分面分页特别适用于：

处理大量分面值（例如数千个产品类别）
为分面值实现“显示更多”功能
创建动态过滤界面，首先显示最受欢迎的过滤器
通过限制返回的分面值数量进行性能优化

注意： 与主结果分页不同，主结果分页在深度偏移时会影响性能，而分面分页即使在大型偏移时也通常效率较高，因为分面计算发生在文档匹配阶段之后。

通过 HTTP JSON 进行分面分页

当使用 HTTP JSON 接口时，也可以使用分面分页。您可以通过聚合规范中的 size 参数来限制返回的分面值数量：

{
  "table": "products",
  "query": { "match": { "*": "smartphone" } },
  "limit": 3,
  "aggs": {
    "category_counts": {
      "terms": {
        "field": "category",
        "size": 5
      }
    }
  }
}

此示例将类别分面结果限制为前 5 个值。

重要： 与 SQL 分面不同，HTTP JSON 接口目前仅支持使用 size 参数限制分面值数量，不支持分面内的 OFFSET 参数进行分页。您只能指定返回多少结果，而不能指定跳过哪些结果。

有关通过 HTTP JSON 进行分面分页的更多详细信息，请参阅 Manticore Search 文档。

实际使用案例

带高亮的分页

-- Pagination with highlighting
SELECT id, title, price, weight(),
       HIGHLIGHT({limit=100}) as highlight
FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5, 5;

输出：

+------+----------------------------------------------+-------+----------+-----------------------------------------------------+
| id   | title                                        | price | weight() | highlight                                           |
+------+----------------------------------------------+-------+----------+-----------------------------------------------------+
|  246 | Soft technically took. smartphone            |    11 |     1272 | Soft technically took. <b>smartphone</b>            |
| 1105 | Dirty up are! temporarily called. smartphone |    11 |     1272 | Dirty up are! temporarily called. <b>smartphone</b> |
| 3293 | Instantly thick ran that hurt. smartphone    |    11 |     1272 | Instantly thick ran that hurt. <b>smartphone</b>    |
| 3736 | Work her wrong locally. smartphone           |    11 |     1272 | Work her wrong locally. <b>smartphone</b>           |
| 6978 | Stale loud passively sweet clean. smartphone |    11 |     1272 | Stale loud passively sweet clean. <b>smartphone</b> |
+------+----------------------------------------------+-------+----------+-----------------------------------------------------+

使用滚动分页实现“加载更多”按钮

基于滚动的分页非常适合实现“加载更多”功能，因为它可以高效地导航大型结果集。以下是实现方法：

-- Initial query with SHOW SCROLL to get the scroll token
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC 
LIMIT 5; SHOW SCROLL\G

输出：

+------+------------------------------------------------+-------+----------+
| id   | title                                          | price | weight() |
+------+------------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone                   |    10 |     1272 |
| 1388 | 3; her day of. smartphone                      |    10 |     1272 |
| 1636 | Positively bad winter clean. smartphone        |    10 |     1272 |
| 5628 | Foolish from went heard low. smartphone        |    10 |     1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone   |    10 |     1272 |
+------+------------------------------------------------+-------+----------+

*************************** 1. row ***************************
scroll_token: eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMCwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo4NTYxLCJ0eXBlIjoiaW50In1dfQ==
1 row in set (0.00 sec)

-- When user clicks "Load More", use the scroll token to get the next batch
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
LIMIT 5 
OPTION scroll='eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMCwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo4NTYxLCJ0eXBlIjoiaW50In1dfQ=='; SHOW SCROLL\G

输出：

+------+---------------------------------------------+-------+----------+
| id   | title                                       | price | weight() |
+------+---------------------------------------------+-------+----------+
|  246 | Soft technically took. smartphone           |    11 |     1272 |
| 1105 | Dirty up are! temporarily called. smartphone|    11 |     1272 |
| 3293 | Instantly thick ran that hurt. smartphone   |    11 |     1272 |
| 3736 | Work her wrong locally. smartphone          |    11 |     1272 |
| 6978 | Stale loud passively sweet clean. smartphone|    11 |     1272 |
+------+---------------------------------------------+-------+----------+

*************************** 1. row ***************************
scroll_token: eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMSwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo2OTc4LCJ0eXBlIjoiaW50In1dfQ==

每次点击“加载更多”时，您将使用前一个查询返回的新滚动令牌。这种方法相比传统的基于偏移的分页有以下优势：

性能更好：深度分页时不会出现性能下降
无最大匹配限制：可以高效地分页数百万个结果
顺序访问：高效地导航连续的结果页

实现提示：

在应用程序的状态中存储滚动令牌（客户端或服务器端）
当用户点击“加载更多”时，在后续请求中包含该令牌
始终使用最新的令牌，因为每个令牌代表结果集中的当前位置
请记住滚动令牌是临时的，最终会过期

此实现方法解决了传统基于偏移分页在第 8 节中描述的许多限制，特别是针对深度分页场景。

结论

Manticore Search 提供了灵活的分页选项，以满足不同的使用场景。传统的基于偏移的分页适用于简单界面和有限的页面导航，而基于滚动的分页则在深度分页场景（如无限滚动或数据导出）中提供更好的性能。

在 Manticore Search 中实现分页时：

对于浅层分页（前几页）：使用传统的基于偏移的分页，并保持一致的排序。
对于深层分页（多页或大型数据集）：使用基于滚动的分页，并使用适当的基于 ID 的排序标准。
对于一致的结果：始终在 ORDER BY 子句中包含 id 字段，并在查询中保持一致的排序标准。
对于全文搜索：在排序标准中包含 weight() 或 _score，以确保最相关的结果首先出现。

通过了解每种方法的优势、限制和最佳实践，您可以为特定的应用需求实现最有效的分页策略。请记住，最佳的分页方法取决于您的使用场景、数据量和用户体验要求。