blog-post

理解 Manticore Search 中的分页

分页是任何搜索引擎的重要功能,使用户能够高效地浏览大量结果集。Manticore Search 提供了几种强大的分页方法,每种方法都有其优缺点。本文探讨了 Manticore Search 中可用的不同分页选项,并提供了实用的 SQL 示例,以帮助您为您的用例实施最佳解决方案。

目录

  1. 分页简介
  2. 设置测试环境
  3. 传统的基于偏移量的分页
  4. 结果集窗口和 max_matches
  5. 基于滚动的分页
  6. 通过 HTTP JSON 进行分页
  7. 分页的最佳实践
  8. 限制和性能考虑
  9. 带分页的分面搜索
  10. 实际用例
  11. 结论

分页简介

在处理大型数据集时,一次性返回所有匹配结果是不切实际的。分页通过将结果分成可管理的块或“页面”来解决这个问题。Manticore Search 提供了多种分页方法,以适应不同的用例,从简单的列表导航到无限滚动界面。

在本文中,我们将探讨三种主要的分页方法:

  • 传统的基于偏移量的分页 ( 第 3 节 )
  • 使用 max_matches 的深度分页 ( 第 4 节 )
  • 基于滚动的分页,以高效浏览大型结果集 ( 第 5 节 )

我们还将讨论它们通过 SQL 和 HTTP JSON 接口的实现、性能考虑和实际应用。

设置测试环境

在深入分页示例之前,让我们使用 Manticore Load 工具创建一个带有示例数据的测试环境。该工具使生成和加载测试数据以进行基准测试和实验变得简单:

# Create a products table and populate it with 10,000 records containing "smartphone" in the title
manticore-load --quiet \
  --batch-size=1000 \
  --threads=4 \
  --total=10000 \
  --drop \
  --init="CREATE TABLE products(title text, description text, price int, category string, rating float, stock int)" \
  --load="INSERT INTO products(id, title, description, price, category, rating, stock) VALUES(<increment>, '<text/3/5> smartphone', '<text/20/50>', <int/10/1000>, '<string/5/10>', <float/1/5>, <int/0/100>)"

输出:
manticore-load output

有了这个测试环境,我们现在可以探索不同的分页方法。

传统的基于偏移量的分页

Manticore Search 中最简单的分页方法使用 LIMIT 子句与偏移量和计数参数。对于任何曾经使用 SQL 数据库的人来说,这种方法都很熟悉。

基本语法

SELECT ... FROM ... [LIMIT [offset,] row_count]
SELECT ... FROM ... [LIMIT row_count][ OFFSET offset]

示例

-- Return the first 5 results (page 1)
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5;

输出:

+------+----------------------------------------------+-------+----------+
| id   | title                                        | price | weight() |
+------+----------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone                 |    10 |     1272 |
| 1388 | 3; her day of. smartphone                    |    10 |     1272 |
| 1636 | Positively bad winter clean. smartphone      |    10 |     1272 |
| 5628 | Foolish from went heard low. smartphone      |    10 |     1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone |    10 |     1272 |
+------+----------------------------------------------+-------+----------+
-- Return results 6-10 (page 2)
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5, 5;

输出:

+------+----------------------------------------------+-------+----------+
| id   | title                                        | price | weight() |
+------+----------------------------------------------+-------+----------+
|  246 | Soft technically took. smartphone            |    11 |     1272 |
| 1105 | Dirty up are! temporarily called. smartphone |    11 |     1272 |
| 3293 | Instantly thick ran that hurt. smartphone    |    11 |     1272 |
| 3736 | Work her wrong locally. smartphone           |    11 |     1272 |
| 6978 | Stale loud passively sweet clean. smartphone |    11 |     1272 |
+------+----------------------------------------------+-------+----------+
-- Alternative syntax with separate LIMIT and OFFSET
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5 OFFSET 10;
-- Results 11-15 (page 3)

输出:

+------+----------------------------------------+-------+----------+
| id   | title                                  | price | weight() |
+------+----------------------------------------+-------+----------+
| 7318 | Her spent at. smartphone               |    11 |     1272 |
| 8436 | His? felt work. Angry fast. smartphone |    11 |     1272 |
| 9699 | Feel spend funny. smartphone           |    11 |     1272 |
| 2326 | Quietly wet drew found. smartphone     |    12 |     1272 |
| 4171 | Set perhaps new send soon. smartphone  |    12 |     1272 |
+------+----------------------------------------+-------+----------+

这种方法简单易用,适合浏览初始结果页面。然而,由于 Manticore 内部处理查询的方式,对于深度分页来说效率较低。有关这些限制的详细信息,请参见 第 8 节:限制和性能考虑

结果集窗口和 max_matches

默认情况下,Manticore Search 将可以返回的匹配数限制为 1000。如果您尝试超出此限制进行分页,查询将导致错误。可以使用 max_matches 选项调整此限制。

理解 max_matches

max_matches 选项控制 Manticore 在搜索时将保留多少匹配项在 RAM 中。它旨在限制每个查询的内存使用。默认值为 1000,这对于大多数常见搜索场景来说是足够的,但在需要访问更深的结果页面时可以增加。

-- Increase max_matches to allow deeper pagination (results 1001-1005)
SELECT id, title, price FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 1000, 5 
OPTION max_matches=1500;

输出:

+------+-----------------------------------------+-------+
| id   | title                                   | price |
+------+-----------------------------------------+-------+
| 3550 | Open sold being late wide. smartphone   |   111 |
| 4896 | Grew. Wrote as hungry. smartphone       |   111 |
|  647 | Went seen had. smartphone               |   112 |
|  883 | Hold lose; have thought. smartphone     |   112 |
| 1774 | Get! began sick. Gone, wild. smartphone |   112 |
+------+-----------------------------------------+-------+
-- For very deep pagination (results 5000-5005)
SELECT id, title, price FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5000, 5 
OPTION max_matches=5500;

输出:

+------+------------------------------------------------+-------+
| id   | title                                          | price |
+------+------------------------------------------------+-------+
| 4894 | Subtly? inside for spend evening. smartphone   |   507 |
| 5203 | Bad; artistically from yearly. smartphone      |   507 |
| 7446 | Empty probably clever! universally. smartphone |   507 |
| 8053 | In! rich daily irregularly. smartphone         |   507 |
| 8055 | Short cold fall; keep physically. smartphone   |   507 |
+------+------------------------------------------------+-------+

请记住,增加 max_matches 会带来内存成本。每个匹配项都会消耗内存,因此将此值设置得过高可能会影响服务器性能,尤其是在负载较重的情况下。有关更节省内存的深度分页替代方法,请参见 基于滚动的分页

基于滚动的分页

基于滚动的分页提供了一种高效的方式来浏览大型结果集。与传统的基于偏移量的分页不同,它使用一个令牌来跟踪结果集中的当前位置,这有助于高效地浏览结果的顺序页面。

工作原理

  1. 执行带有排序标准的初始查询(必须在 ORDER BY 中包含 id
  2. 检索一个封装当前位置信息的滚动令牌
  3. 在后续查询中使用此令牌获取下一批结果

示例

-- Initial query with sorting criteria (must include id in ORDER BY)
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC 
LIMIT 5; SHOW SCROLL\G

输出:

+------+----------------------------------------------+-------+----------+
| id   | title                                        | price | weight() |
+------+----------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone                 |    10 |     1272 |
| 1388 | 3; her day of. smartphone                    |    10 |     1272 |
| 1636 | Positively bad winter clean. smartphone      |    10 |     1272 |
| 5628 | Foolish from went heard low. smartphone      |    10 |     1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone |    10 |     1272 |
+------+----------------------------------------------+-------+----------+

*************************** 1. row ***************************
scroll_token: eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMCwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo4NTYxLCJ0eXBlIjoiaW50In1dfQ==
-- Subsequent paginated query using the scroll token
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC 
LIMIT 5 
OPTION scroll='eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MywidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo1MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo0NCwidHlwZSI6ImludCJ9XX0=';

输出:

+------+----------------------------------------------+-------+----------+
| id   | title                                        | price | weight() |
+------+----------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone                 |    10 |     1272 |
| 1388 | 3; her day of. smartphone                    |    10 |     1272 |
| 1636 | Positively bad winter clean. smartphone      |    10 |     1272 |
| 5628 | Foolish from went heard low. smartphone      |    10 |     1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone |    10 |     1272 |
+------+----------------------------------------------+-------+----------+

基于滚动的分页特别适合在 Web 应用程序中实现“加载更多”或无限滚动功能。有关完整的实现示例,请参见 实际用例

通过 HTTP JSON 进行分页

除了 SQL,Manticore Search 还提供通过 HTTP JSON 进行分页。这对于 Web 应用程序或微服务架构特别有用。传统的基于偏移量的分页和基于滚动的分页都通过 JSON API 得到支持。

通过 HTTP JSON 的传统基于偏移量的分页

HTTP JSON API 使用 offsetlimit 参数来控制分页。这相当于 SQL 的 LIMIT offset, count 语法。

基本语法

{
  "table": "table_name",
  "query": { "match": { ... } },
  "limit": number_of_results,
  "offset": starting_position
}

示例

# First page (results 1-5)
curl -s 0:9308/search -d '{
  "table": "products",
  "query": { "match": { "*": "smartphone" } },
  "limit": 5,
  "offset": 0,
  "sort": [
    { "_score": "desc" },
    { "price": "asc" },
    { "id": "asc" }
  ]
}'|jq .

输出:

{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 10000,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 1179,
        "_score": 1272,
        "_source": {
          "title": "Weak spoke? fall. smartphone",
          "description": "Made down never long 3 angry are went asked closed! cold historically got take can to commonly emotionally; socially asked get tame frequently think nowhere? carefully? scientifically small.",
          "price": 10,
          "category": "hkadwpwwk",
          "rating": 2.300000,
          "stock": 63
        }
      },
      // ... more results ...
    ]
  }
}
# Second page (results 6-10)
curl -s 0:9308/search -d '{
  "table": "products",
  "query": { "match": { "*": "smartphone" } },
  "limit": 5,
  "offset": 5,
  "sort": [
    { "_score": "desc" },
    { "price": "asc" },
    { "id": "asc" }
  ]
}' | jq .

输出:

{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 10000,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 246,
        "_score": 1272,
        "_source": {
          "title": "Soft technically took. smartphone",
          "description": "This sometimes empty short normally! been send. Locally kind! quickly might infrequently culturally softly became tried quietly good gradually; inside dead loud.",
          "price": 11,
          "category": "wuiymrjdp",
          "rating": 1.600000,
          "stock": 87
        }
      },
      // ... more results ...
    ]
  }
}

替代语法

The HTTP JSON API also supports the alternative syntax using from and size instead of offset and limit:

{
  "table": "table_name",
  "query": { "match": { ... } },
  "size": number_of_results,
  "from": starting_position
}

This is equivalent to the previous format but follows the Elasticsearch convention, which may be more familiar to some users.

Scroll-Based Pagination via HTTP JSON

The HTTP JSON API also supports scroll-based pagination for more efficient deep pagination. This requires setting up proper sorting criteria and using the scroll token.

Initial Query

curl -s 0:9308/search -d '{
  "table": "products",
  "query": { "match": { "*": "smartphone" } },
  "limit": 5,
  "track_scores": true,
  "sort": [
    { "_score": "desc" },
    { "id": "asc" }
  ],
  "options": {
    "scroll": true
  }
}' | jq .

Output:

{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 10000,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 1,
        "_score": 1272,
        "_source": {
          "title": "Incorrectly heavy soft. smartphone",
          "description": "Carefully late sit; put draw afraid; gave hardly been vaguely 1000 this at understood gave always east fake, dull internationally sent. At cold; hot.",
          "price": 540,
          "category": "ogqejby",
          "rating": 2.800000,
          "stock": 52
        }
      },
      // ... more results ...
    ]
  },
  "scroll": "eyJvcmRlcl9ieV9zdHIiOiJAd2VpZ2h0IGRlc2MsIGlkIGFzYyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo1LCJ0eXBlIjoiaW50In1dfQ=="
}

Note the scroll value in the response, which is a base64-encoded token representing your current position in the result set.

Subsequent Queries

curl -s 0:9308/search -d '{
  "table": "products",
  "query": { "match": { "*": "smartphone" } },
  "limit": 5,
  "track_scores": true,
  "options": {
    "scroll": "eyJvcmRlcl9ieV9zdHIiOiJAd2VpZ2h0IGRlc2MsIGlkIGFzYyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo1LCJ0eXBlIjoiaW50In1dfQ=="
  }
}' | jq .

Output:

{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 9995,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 6,
        "_score": 1272,
        "_source": {
          "title": "Negatively! it become. smartphone",
          "description": "Be quiet cold nightly poor hourly had thin got dry, informally work, 100 50 needed specifically serious do bring below began! rich loud.",
          "price": 602,
          "category": "hjpwt",
          "rating": 2.500000,
          "stock": 47
        }
      },
      // ... more results ...
    ]
  },
  "scroll": "eyJvcmRlcl9ieV9zdHIiOiJAd2VpZ2h0IGRlc2MsIGlkIGFzYyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMCwidHlwZSI6ImludCJ9XX0="
}

Each response includes a new scroll token for the next page, allowing for continued pagination.

Best Practices for HTTP JSON Pagination

  1. Include Sort Criteria: For predictable pagination results, always include explicit sorting, e.g.:
    "sort": [
      { "_score": "desc" },
      { "price": "asc" },
      { "id": "asc" }
    ]
    
  2. Handle Scroll Tokens Properly: Store the scroll token from each response and use it for the next request.
  3. Use Appropriate Error Handling: Check for error responses, especially when paginating deeply.
  4. Consider Response Size: Control the amount of data returned by selecting only the needed fields using the _source parameter:
    "_source": ["title", "price", "rating"]
    
  5. Monitor Performance: For very large result sets, monitor server performance and consider using scroll-based pagination.

The HTTP JSON pagination methods follow the same principles as their SQL counterparts (described in Sections 3 and 5 ), but with JSON syntax. This makes them ideal for modern web applications and REST APIs.

Sorting Best Practices for Pagination

Proper sorting is crucial for effective pagination, especially for scroll-based pagination. Here are the key considerations:

1. Include a Unique Identifier

For consistent pagination results, always include a unique identifier (typically the id field) in your ORDER BY clause. This ensures that even if multiple documents have the same value for other sorting criteria, the pagination will still produce consistent results.

-- Good: Includes id as the tie-breaker
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC;

-- Bad: No unique identifier, may lead to inconsistent pagination
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY price ASC;

When using MATCH() in your query, always include weight() in your sorting criteria to ensure the most relevant results appear first:

-- Good: Includes weight() for relevance sorting
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC;

-- Less optimal: Missing relevance sorting
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY price ASC, id ASC;

3. Maintain Sort Order Consistency

When using traditional offset-based pagination, always maintain the same sort order across all paginated queries. Changing the sort order between pages can lead to missed or duplicated results.

-- First page (correct)
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC LIMIT 10;

-- Second page (correct - same sort order)
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC LIMIT 10, 10;

-- Second page (incorrect - different sort order)
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY rating DESC, id ASC LIMIT 10, 10;

4. Use Deterministic Sorting

For scroll-based pagination, always use deterministic sorting criteria. Non-deterministic functions like RAND() should be avoided as they will produce inconsistent results between queries.

-- Good: Deterministic sorting
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC;

-- Bad: Non-deterministic sorting, will break scroll pagination
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY RAND();

5. Sort Direction with ID

When using the id field as a tie-breaker in sorting, you can use either ASC or DESC order, but it must be consistently applied:

-- These are both valid for scroll pagination
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC;
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id DESC;

6. Optimizing Sort Performance

For better sorting performance, consider using the sort_method option:

-- For pre-sorted data (e.g., by id)
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY id ASC 
LIMIT 5 OPTION sort_method=kbuffer;

-- For other sorting criteria
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC 
LIMIT 5 OPTION sort_method=pq;

The sorting approach you choose can significantly impact pagination performance and consistency. For example, deterministic sorting is essential for scroll-based pagination, while including a unique identifier is crucial for consistent results in all pagination methods. For more details on how sorting affects performance, see Limitations and Performance Considerations .

Limitations and Performance Considerations

Understanding the limitations of different pagination methods will help you choose the right approach for your specific use case.

Offset-Based Pagination Limitations

  1. Performance Degrades with Large Offsets: As the offset value increases, query performance decreases. This is because Manticore still needs to process all records up to the offset value before returning the requested page.
  2. max_matches Constraint: By default, you can only paginate through the first 1000 results. To go beyond this, you must increase max_matches, which increases memory usage.
  3. Memory Usage: Larger max_matches values require more RAM per query, which can impact server performance under heavy load.
  4. Consistent Results Not Guaranteed: If documents are added or removed between page requests, you might see duplicates or miss results.

For deep pagination scenarios or implementing "Load More" functionality, consider using Scroll-Based Pagination as described in Section 5 .

Scroll-Based Pagination Limitations

  1. Requires ID in Sort Criteria: You must include the id field in your ORDER BY clause, which may not always align with your desired sorting logic.
  2. Token Management: You need to store and manage scroll tokens between requests, adding complexity to your application.
  3. No Consistency Guarantee: Scroll-based pagination does not guarantee point-in-time consistency. If documents are added, removed, or modified between requests, the results may still be affected.
  4. No Random Access: Unlike offset-based pagination, you cannot jump directly to a specific page; you must navigate sequentially.

Faceted Search with Pagination

Faceted search allows users to filter and navigate through search results using multiple dimensions. When combined with pagination, it's important to understand how pagination affects facet results.

-- Faceted search with pagination (first page)
SELECT id, title, price, weight()
FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 0, 5
FACET category ORDER BY COUNT(*) DESC;

Output:

+------+----------------------------------------------+-------+----------+
| id   | title                                        | price | weight() |
+------+----------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone                 |    10 |     1272 |
| 1388 | 3; her day of. smartphone                    |    10 |     1272 |
| 1636 | Positively bad winter clean. smartphone      |    10 |     1272 |
| 5628 | Foolish from went heard low. smartphone      |    10 |     1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone |    10 |     1272 |
+------+----------------------------------------------+-------+----------+
+-----------+----------+
| category  | count(*) |
+-----------+----------+
| ogqejby   |        1 |
| unmfujgqr |        1 |
| ttefm     |        1 |
| hceihdy   |        1 |
| sicjr     |        1 |
| hjpwt     |        1 |
| tvfqyj    |        1 |
| mvdjhbexo |        1 |
| scayuo    |        1 |
| esmlh     |        1 |
| fvbhplj   |        1 |
| lcphmiqmv |        1 |
| lnjfhb    |        1 |
| qexfdulub |        1 |
| tbswa     |        1 |
| eekarf    |        1 |
| airjuod   |        1 |
| ozkbuvgj  |        1 |
| yafbhr    |        1 |
| duccr     |        1 |
+-----------+----------+
-- Faceted search with pagination (second page)
SELECT id, title, price, weight()
FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5, 5
FACET category ORDER BY COUNT(*) DESC;

Output:

+------+-----------------------------------------------------------+-------+----------+
| id   | title                                                     | price | weight() |
+------+-----------------------------------------------------------+-------+----------+
|  246 | Soft technically took. smartphone                         |    11 |     1272 |
| 1105 | Dirty up are! temporarily called. smartphone              |    11 |     1272 |
| 3293 | Instantly thick ran that hurt. smartphone                 |    11 |     1272 |
| 3736 | Work her wrong locally. smartphone                        |    11 |     1272 |
| 6978 | Stale loud passively sweet clean. smartphone              |    11 |     1272 |
+------+-----------------------------------------------------------+-------+----------+
+-----------+----------+
| category  | count(*) |
+-----------+----------+
| ogqejby   |        1 |
| unmfujgqr |        1 |
| ttefm     |        1 |
| hceihdy   |        1 |
| sicjr     |        1 |
| hjpwt     |        1 |
| tvfqyj    |        1 |
| mvdjhbexo |        1 |
| scayuo    |        1 |
| esmlh     |        1 |
| fvbhplj   |        1 |
| lcphmiqmv |        1 |
| lnjfhb    |        1 |
| qexfdulub |        1 |
| tbswa     |        1 |
| eekarf    |        1 |
| airjuod   |        1 |
| ozkbuvgj  |        1 |
| yafbhr    |        1 |
| duccr     |        1 |
+-----------+----------+

重要说明: Facet 结果是基于与查询匹配的整个结果集计算的,而不仅仅是分页部分。这意味着无论您查看哪个页面,facet 计数都代表所有匹配文档的分布。LIMIT 子句仅影响主结果集中返回的文档,而不影响 facet 计算。如上面的示例所示,两个页面的 facet 结果是相同的,确认它们是从整个结果集计算得出的。

Facet 结果中的分页

除了对主搜索结果进行分页外,Manticore Search 还允许您对 facet 结果本身进行分页。当处理大量 facet 值时,这尤其有用,因为显示所有值可能不切实际。

您可以在 FACET 语句中使用 LIMIT 子句来控制返回的 facet 值的数量。

-- Get only the first 5 categories ordered by count
SELECT id, title FROM products 
WHERE MATCH('smartphone') 
LIMIT 3 
FACET category ORDER BY COUNT(*) DESC, category asc LIMIT 0,5;

输出:

+------+---------------------------------------------+
| id   | title                                       |
+------+---------------------------------------------+
|    1 | Incorrectly heavy soft. smartphone          |
|    2 | Know of make afraid foolish. smartphone     |
|    3 | Poor spring evening drove young. smartphone |
+------+---------------------------------------------+
+------------+----------+
| category   | count(*) |
+------------+----------+
| aaaabzwfzn |        1 |
| aabswb     |        1 |
| aacla      |        1 |
| aaejmtubv  |        1 |
| aaethytj   |        1 |
+------------+----------+

您可以为同一查询中的不同 facet 提供不同的分页参数:

-- Multiple facets with different pagination parameters
SELECT id, title FROM products 
WHERE MATCH('smartphone') 
LIMIT 3 
FACET category ORDER BY COUNT(*) DESC, category asc LIMIT 0,5 
FACET category ORDER BY category ASC LIMIT 3,5 
FACET price ORDER BY FACET() ASC LIMIT 5;

输出:

+------+---------------------------------------------+
| id   | title                                       |
+------+---------------------------------------------+
|    1 | Incorrectly heavy soft. smartphone          |
|    2 | Know of make afraid foolish. smartphone     |
|    3 | Poor spring evening drove young. smartphone |
+------+---------------------------------------------+
+------------+----------+
| category   | count(*) |
+------------+----------+
| aaaabzwfzn |        1 |
| aabswb     |        1 |
| aacla      |        1 |
| aaejmtubv  |        1 |
| aaethytj   |        1 |
+------------+----------+
+------------+----------+
| category   | count(*) |
+------------+----------+
| aaejmtubv  |        1 |
| aaethytj   |        1 |
| aaktjgaa   |        1 |
| aalfwcvwil |        1 |
| aaqmumofe  |        1 |
+------------+----------+
+-------+----------+
| price | count(*) |
+-------+----------+
|    10 |        5 |
|    11 |        8 |
|    12 |       10 |
|    13 |        9 |
|    14 |        8 |
+-------+----------+

此示例演示了几种 facet 分页技术:

  1. 第一个 facet 返回按计数排序的前 5 个类别(偏移量 0,限制 5)
  2. 第二个 facet 返回按字母顺序排列的类别,跳过前 3 个(偏移量 3,限制 5)
  3. 第三个 facet 返回按升序排列的前 5 个价格点

Facet 分页特别适用于:

  • 处理大量的 facet 值(例如,数千个产品类别)
  • 实现 facet 值的“显示更多”功能
  • 创建动态过滤界面,最受欢迎的过滤器优先显示
  • 通过限制返回的 facet 值数量来优化性能

注意: 与主结果分页不同,主结果分页在深偏移量时会影响性能,而 facet 分页通常即使在大偏移量时也很高效,因为 facet 计算发生在文档匹配阶段之后。

通过 HTTP JSON 进行 Facet 分页

在使用 HTTP JSON 接口时,facet 分页也是可用的。您可以使用聚合规范中的 size 参数来限制返回的 facet 值的数量:

{
  "table": "products",
  "query": { "match": { "*": "smartphone" } },
  "limit": 3,
  "aggs": {
    "category_counts": {
      "terms": {
        "field": "category",
        "size": 5
      }
    }
  }
}

此示例将类别 facet 结果限制为前 5 个值。

重要: 与 SQL facet 不同,HTTP JSON 接口目前仅支持使用 size 参数限制 facet 值的数量。它不支持在 facet 内进行分页的 OFFSET 参数。您只能指定返回多少结果,而不能指定跳过哪些结果。

有关通过 HTTP JSON 进行 facet 分页的更多详细信息,请参见 Manticore Search 文档

实际用例

带高亮的分页

-- Pagination with highlighting
SELECT id, title, price, weight(),
       HIGHLIGHT({limit=100}) as highlight
FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5, 5;

输出:

+------+----------------------------------------------+-------+----------+-----------------------------------------------------+
| id   | title                                        | price | weight() | highlight                                           |
+------+----------------------------------------------+-------+----------+-----------------------------------------------------+
|  246 | Soft technically took. smartphone            |    11 |     1272 | Soft technically took. <b>smartphone</b>            |
| 1105 | Dirty up are! temporarily called. smartphone |    11 |     1272 | Dirty up are! temporarily called. <b>smartphone</b> |
| 3293 | Instantly thick ran that hurt. smartphone    |    11 |     1272 | Instantly thick ran that hurt. <b>smartphone</b>    |
| 3736 | Work her wrong locally. smartphone           |    11 |     1272 | Work her wrong locally. <b>smartphone</b>           |
| 6978 | Stale loud passively sweet clean. smartphone |    11 |     1272 | Stale loud passively sweet clean. <b>smartphone</b> |
+------+----------------------------------------------+-------+----------+-----------------------------------------------------+

实现“加载更多”按钮与滚动分页

基于滚动的分页非常适合实现“加载更多”功能,因为它提供了高效的导航,适用于大型结果集。以下是如何实现它:

-- Initial query with SHOW SCROLL to get the scroll token
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
ORDER BY weight() DESC, price ASC, id ASC 
LIMIT 5; SHOW SCROLL\G

输出:

+------+------------------------------------------------+-------+----------+
| id   | title                                          | price | weight() |
+------+------------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone                   |    10 |     1272 |
| 1388 | 3; her day of. smartphone                      |    10 |     1272 |
| 1636 | Positively bad winter clean. smartphone        |    10 |     1272 |
| 5628 | Foolish from went heard low. smartphone        |    10 |     1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone   |    10 |     1272 |
+------+------------------------------------------------+-------+----------+

*************************** 1. row ***************************
scroll_token: eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMCwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo4NTYxLCJ0eXBlIjoiaW50In1dfQ==
1 row in set (0.00 sec)
-- When user clicks "Load More", use the scroll token to get the next batch
SELECT id, title, price, weight() FROM products 
WHERE MATCH('smartphone') 
LIMIT 5 
OPTION scroll='eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMCwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo4NTYxLCJ0eXBlIjoiaW50In1dfQ=='; SHOW SCROLL\G

输出:

+------+---------------------------------------------+-------+----------+
| id   | title                                       | price | weight() |
+------+---------------------------------------------+-------+----------+
|  246 | Soft technically took. smartphone           |    11 |     1272 |
| 1105 | Dirty up are! temporarily called. smartphone|    11 |     1272 |
| 3293 | Instantly thick ran that hurt. smartphone   |    11 |     1272 |
| 3736 | Work her wrong locally. smartphone          |    11 |     1272 |
| 6978 | Stale loud passively sweet clean. smartphone|    11 |     1272 |
+------+---------------------------------------------+-------+----------+

*************************** 1. row ***************************
scroll_token: eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMSwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo2OTc4LCJ0eXBlIjoiaW50In1dfQ==

对于每次后续的“加载更多”点击,您将使用上一个查询返回的新滚动令牌。这种方法相较于传统的基于偏移量的分页有几个优点:

  1. 更好的性能: 深分页时没有性能下降
  2. 没有 max_matches 限制: 您可以高效地分页浏览数百万个结果
  3. 顺序访问: 高效地浏览连续的结果页面

实施提示:

  • 将滚动令牌存储在应用程序的状态中(客户端或服务器端)
  • 当用户点击“加载更多”时,在每个后续请求中包含该令牌
  • 始终使用最新的令牌,因为每个令牌代表结果集中的当前位置
  • 请记住,滚动令牌是临时的,最终会过期

这种实施方法解决了传统基于偏移量的分页在 第 8 节 中描述的许多限制,特别是在深分页场景中。

结论

Manticore Search 提供灵活的分页选项,以适应不同的用例。传统的基于偏移量的分页适用于具有有限页面导航的简单界面,而基于滚动的分页在深分页场景(如无限滚动或数据导出)中提供更好的性能。

在 Manticore Search 中实现分页时:

  1. 对于浅分页(前几页):使用传统的基于偏移量的分页,并保持一致的排序。
  2. 对于深分页(许多页面或大型数据集):使用基于滚动的分页,并使用适当的基于 ID 的排序标准。
  3. 对于一致的结果:始终在您的 ORDER BY 子句中包含 id 字段,并在查询中保持一致的排序标准。
  4. 对于全文搜索:在您的排序标准中包含 weight()_score,以确保最相关的结果优先出现。

通过了解每种方法的优点、限制和最佳实践,您可以为特定应用需求实施最有效的分页策略。请记住,最佳的分页方法取决于您的用例、数据量和用户体验要求。

安装Manticore Search

安装Manticore Search