分页是任何搜索引擎的重要功能,它允许用户高效地浏览大量结果集。Manticore Search 提供了多种强大的分页方法,每种方法都有其自身的优缺点。本文将探讨 Manticore Search 中可用的不同分页选项,并通过实际的 SQL 示例帮助您为您的使用场景实现最佳解决方案。
目录
分页简介
在处理大型数据集时,一次性返回所有匹配结果是不切实际的。分页通过将结果划分为可管理的块或“页面”来解决这个问题。Manticore Search 提供了多种分页方法,以适应不同的使用场景,从简单的列表导航到无限滚动界面。
在本文中,我们将探讨三种主要的分页方法:
我们还将讨论它们通过 SQL 和 HTTP JSON 接口的实现方式,性能考虑因素以及实际应用。
设置测试环境
在深入分页示例之前,让我们使用 Manticore Load 工具创建一个带有示例数据的测试环境。该工具使生成和加载测试数据以进行基准测试和实验变得容易:
# Create a products table and populate it with 10,000 records containing "smartphone" in the title
manticore-load --quiet \
--batch-size=1000 \
--threads=4 \
--total=10000 \
--drop \
--init="CREATE TABLE products(title text, description text, price int, category string, rating float, stock int)" \
--load="INSERT INTO products(id, title, description, price, category, rating, stock) VALUES(<increment>, '<text/3/5> smartphone', '<text/20/50>', <int/10/1000>, '<string/5/10>', <float/1/5>, <int/0/100>)"
输出:
有了这个测试环境,我们现在可以探索不同的分页方法。
传统的基于偏移的分页
Manticore Search 中最直接的分页方法使用 LIMIT 子句与偏移量和计数参数。这种方法对于任何使用过 SQL 数据库的人都很熟悉。
基本语法
SELECT ... FROM ... [LIMIT [offset,] row_count]
SELECT ... FROM ... [LIMIT row_count][ OFFSET offset]
示例
-- Return the first 5 results (page 1)
SELECT id, title, price, weight() FROM products
WHERE MATCH('smartphone')
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5;
输出:
+------+----------------------------------------------+-------+----------+
| id | title | price | weight() |
+------+----------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone | 10 | 1272 |
| 1388 | 3; her day of. smartphone | 10 | 1272 |
| 1636 | Positively bad winter clean. smartphone | 10 | 1272 |
| 5628 | Foolish from went heard low. smartphone | 10 | 1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone | 10 | 1272 |
+------+----------------------------------------------+-------+----------+
-- Return results 6-10 (page 2)
SELECT id, title, price, weight() FROM products
WHERE MATCH('smartphone')
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5, 5;
输出:
+------+----------------------------------------------+-------+----------+
| id | title | price | weight() |
+------+----------------------------------------------+-------+----------+
| 246 | Soft technically took. smartphone | 11 | 1272 |
| 1105 | Dirty up are! temporarily called. smartphone | 11 | 1272 |
| 3293 | Instantly thick ran that hurt. smartphone | 11 | 1272 |
| 3736 | Work her wrong locally. smartphone | 11 | 1272 |
| 6978 | Stale loud passively sweet clean. smartphone | 11 | 1272 |
+------+----------------------------------------------+-------+----------+
-- Alternative syntax with separate LIMIT and OFFSET
SELECT id, title, price, weight() FROM products
WHERE MATCH('smartphone')
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5 OFFSET 10;
-- Results 11-15 (page 3)
输出:
+------+----------------------------------------+-------+----------+
| id | title | price | weight() |
+------+----------------------------------------+-------+----------+
| 7318 | Her spent at. smartphone | 11 | 1272 |
| 8436 | His? felt work. Angry fast. smartphone | 11 | 1272 |
| 9699 | Feel spend funny. smartphone | 11 | 1272 |
| 2326 | Quietly wet drew found. smartphone | 12 | 1272 |
| 4171 | Set perhaps new send soon. smartphone | 12 | 1272 |
+------+----------------------------------------+-------+----------+
此方法易于实现,适合浏览结果的前几页。然而,由于 Manticore 内部处理查询的方式,它在深度分页时效率较低。有关这些限制的详细信息,请参见 第8节:限制和性能考虑 。
结果集窗口和 max_matches
默认情况下,Manticore Search 限制返回的匹配项数量为 1000。如果您尝试分页超过此限制,查询将导致错误。可以使用 max_matches 选项调整此限制。
理解 max_matches
max_matches 选项控制 Manticore 在搜索时在 RAM 中保留的匹配项数量。它旨在限制每个查询的内存使用。默认值为 1000,对于大多数常见搜索场景来说已经足够,但当您需要访问更深层的结果页面时,可以增加此值。
-- Increase max_matches to allow deeper pagination (results 1001-1005)
SELECT id, title, price FROM products
WHERE MATCH('smartphone')
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 1000, 5
OPTION max_matches=1500;
输出:
+------+-----------------------------------------+-------+
| id | title | price |
+------+-----------------------------------------+-------+
| 3550 | Open sold being late wide. smartphone | 111 |
| 4896 | Grew. Wrote as hungry. smartphone | 111 |
| 647 | Went seen had. smartphone | 112 |
| 883 | Hold lose; have thought. smartphone | 112 |
| 1774 | Get! began sick. Gone, wild. smartphone | 112 |
+------+-----------------------------------------+-------+
-- For very deep pagination (results 5000-5005)
SELECT id, title, price FROM products
WHERE MATCH('smartphone')
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5000, 5
OPTION max_matches=5500;
输出:
+------+------------------------------------------------+-------+
| id | title | price |
+------+------------------------------------------------+-------+
| 4894 | Subtly? inside for spend evening. smartphone | 507 |
| 5203 | Bad; artistically from yearly. smartphone | 507 |
| 7446 | Empty probably clever! universally. smartphone | 507 |
| 8053 | In! rich daily irregularly. smartphone | 507 |
| 8055 | Short cold fall; keep physically. smartphone | 507 |
+------+------------------------------------------------+-------+
请注意,增加 max_matches 会带来内存成本。每个匹配项都会消耗内存,因此将此值设置得过高可能会影响服务器性能,尤其是在高负载情况下。对于一种更节省内存的深度分页替代方法,请参见
基于滚动的分页
。
基于滚动的分页
基于滚动的分页提供了一种高效浏览大型结果集的方法。与传统的基于偏移的分页不同,它使用一个标记来跟踪结果集中的当前位置,这有助于高效地浏览结果的连续页面。
工作原理
- 执行一个带有排序条件的初始查询(必须在 ORDER BY 中包含
id) - 获取一个封装当前位置的滚动标记
- 在后续查询中使用此标记来获取下一批结果
示例
-- Initial query with sorting criteria (must include id in ORDER BY)
SELECT id, title, price, weight() FROM products
WHERE MATCH('smartphone')
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5; SHOW SCROLL\G
输出:
+------+----------------------------------------------+-------+----------+
| id | title | price | weight() |
+------+----------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone | 10 | 1272 |
| 1388 | 3; her day of. smartphone | 10 | 1272 |
| 1636 | Positively bad winter clean. smartphone | 10 | 1272 |
| 5628 | Foolish from went heard low. smartphone | 10 | 1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone | 10 | 1272 |
+------+----------------------------------------------+-------+----------+
*************************** 1. row ***************************
scroll_token: eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMCwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo4NTYxLCJ0eXBlIjoiaW50In1dfQ==
-- Subsequent paginated query using the scroll token
SELECT id, title, price, weight() FROM products
WHERE MATCH('smartphone')
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5
OPTION scroll='eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MywidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo1MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo0NCwidHlwZSI6ImludCJ9XX0=';
输出:
+------+----------------------------------------------+-------+----------+
| id | title | price | weight() |
+------+----------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone | 10 | 1272 |
| 1388 | 3; her day of. smartphone | 10 | 1272 |
| 1636 | Positively bad winter clean. smartphone | 10 | 1272 |
| 5628 | Foolish from went heard low. smartphone | 10 | 1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone | 10 | 1272 |
+------+----------------------------------------------+-------+----------+
基于滚动的分页特别适合在 Web 应用程序中实现“加载更多”或无限滚动功能。有关完整实现示例,请参见 实际使用案例 。
通过 HTTP JSON 进行分页
除了 SQL,Manticore Search 还提供通过 HTTP JSON 进行分页。这对于 Web 应用程序或微服务架构特别有用。传统的基于偏移的分页和基于滚动的分页都通过 JSON API 支持。
通过 HTTP JSON 的传统基于偏移的分页
HTTP JSON API 使用 offset 和 limit 参数来控制分页。这等同于 SQL 的 LIMIT offset, count 语法。
基本语法
{
"table": "table_name",
"query": { "match": { ... } },
"limit": number_of_results,
"offset": starting_position
}
示例
# First page (results 1-5)
curl -s 0:9308/search -d '{
"table": "products",
"query": { "match": { "*": "smartphone" } },
"limit": 5,
"offset": 0,
"sort": [
{ "_score": "desc" },
{ "price": "asc" },
{ "id": "asc" }
]
}'|jq .
输出:
{
"took": 0,
"timed_out": false,
"hits": {
"total": 10000,
"total_relation": "eq",
"hits": [
{
"_id": 1179,
"_score": 1272,
"_source": {
"title": "Weak spoke? fall. smartphone",
"description": "Made down never long 3 angry are went asked closed! cold historically got take can to commonly emotionally; socially asked get tame frequently think nowhere? carefully? scientifically small.",
"price": 10,
"category": "hkadwpwwk",
"rating": 2.300000,
"stock": 63
}
},
// ... more results ...
]
}
}
# Second page (results 6-10)
curl -s 0:9308/search -d '{
"table": "products",
"query": { "match": { "*": "smartphone" } },
"limit": 5,
"offset": 5,
"sort": [
{ "_score": "desc" },
{ "price": "asc" },
{ "id": "asc" }
]
}' | jq .
输出:
{
"took": 0,
"timed_out": false,
"hits": {
"total": 10000,
"total_relation": "eq",
"hits": [
{
"_id": 246,
"_score": 1272,
"_source": {
"title": "Soft technically took. smartphone",
"description": "This sometimes empty short normally! been send. Locally kind! quickly might infrequently culturally softly became tried quietly good gradually; inside dead loud.",
"price": 11,
"category": "wuiymrjdp",
"rating": 1.600000,
"stock": 87
}
},
// ... more results ...
]
}
}
替代语法
HTTP JSON API 还支持使用 from 和 size 代替 offset 和 limit 的替代语法:
{
"table": "table_name",
"query": { "match": { ... } },
"size": number_of_results,
"from": starting_position
}
这与之前的格式等效,但遵循 Elasticsearch 的约定,对某些用户可能更熟悉。
通过 HTTP JSON 进行滚动分页
HTTP JSON API 还支持滚动分页以实现更高效的深度分页。这需要设置适当的排序条件并使用滚动令牌。
初始查询
curl -s 0:9308/search -d '{
"table": "products",
"query": { "match": { "*": "smartphone" } },
"limit": 5,
"track_scores": true,
"sort": [
{ "_score": "desc" },
{ "id": "asc" }
],
"options": {
"scroll": true
}
}' | jq .
输出:
{
"took": 0,
"timed_out": false,
"hits": {
"total": 10000,
"total_relation": "eq",
"hits": [
{
"_id": 1,
"_score": 1272,
"_source": {
"title": "Incorrectly heavy soft. smartphone",
"description": "Carefully late sit; put draw afraid; gave hardly been vaguely 1000 this at understood gave always east fake, dull internationally sent. At cold; hot.",
"price": 540,
"category": "ogqejby",
"rating": 2.800000,
"stock": 52
}
},
// ... more results ...
]
},
"scroll": "eyJvcmRlcl9ieV9zdHIiOiJAd2VpZ2h0IGRlc2MsIGlkIGFzYyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo1LCJ0eXBlIjoiaW50In1dfQ=="
}
注意响应中的 scroll 值,这是一个 base64 编码的令牌,表示您在结果集中的当前位置。
后续查询
curl -s 0:9308/search -d '{
"table": "products",
"query": { "match": { "*": "smartphone" } },
"limit": 5,
"track_scores": true,
"options": {
"scroll": "eyJvcmRlcl9ieV9zdHIiOiJAd2VpZ2h0IGRlc2MsIGlkIGFzYyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo1LCJ0eXBlIjoiaW50In1dfQ=="
}
}' | jq .
输出:
{
"took": 0,
"timed_out": false,
"hits": {
"total": 9995,
"total_relation": "eq",
"hits": [
{
"_id": 6,
"_score": 1272,
"_source": {
"title": "Negatively! it become. smartphone",
"description": "Be quiet cold nightly poor hourly had thin got dry, informally work, 100 50 needed specifically serious do bring below began! rich loud.",
"price": 602,
"category": "hjpwt",
"rating": 2.500000,
"stock": 47
}
},
// ... more results ...
]
},
"scroll": "eyJvcmRlcl9ieV9zdHIiOiJAd2VpZ2h0IGRlc2MsIGlkIGFzYyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMCwidHlwZSI6ImludCJ9XX0="
}
每个响应都包含一个用于下一页的新滚动令牌,允许继续分页。
HTTP JSON 分页的最佳实践
- 包含排序条件:为了获得可预测的分页结果,始终包含显式的排序,例如:
"sort": [ { "_score": "desc" }, { "price": "asc" }, { "id": "asc" } ] - 正确处理滚动令牌:存储每个响应中的滚动令牌,并在下一个请求中使用它。
- 使用适当的错误处理:检查错误响应,尤其是在深度分页时。
- 考虑响应大小:通过使用
_source参数仅选择所需的字段来控制返回的数据量:"_source": ["title", "price", "rating"] - 监控性能:对于非常大的结果集,请监控服务器性能并考虑使用基于滚动的分页。
HTTP JSON 分页方法遵循与 SQL 对应方法相同的原则(在 第3节 和 第5节 中描述),但使用 JSON 语法。这使它们非常适合现代 Web 应用程序和 REST API。
分页的排序最佳实践
正确的排序对于有效的分页至关重要,尤其是基于滚动的分页。以下是关键考虑因素:
1. 包含唯一标识符
为了获得一致的分页结果,始终在 ORDER BY 子句中包含唯一标识符(通常是 id 字段)。这确保了即使多个文档在其他排序条件上具有相同的值,分页仍会产生一致的结果。
-- Good: Includes id as the tie-breaker
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC;
-- Bad: No unique identifier, may lead to inconsistent pagination
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY price ASC;
2. 包含 weight() 用于全文搜索
当在查询中使用 MATCH() 时,始终在排序条件中包含 weight(),以确保最相关的结果首先出现:
-- Good: Includes weight() for relevance sorting
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC;
-- Less optimal: Missing relevance sorting
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY price ASC, id ASC;
3. 保持排序顺序一致性
当使用传统的基于偏移的分页时,始终在所有分页查询中保持相同的排序顺序。在页面之间更改排序顺序可能导致结果丢失或重复。
-- First page (correct)
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC LIMIT 10;
-- Second page (correct - same sort order)
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC LIMIT 10, 10;
-- Second page (incorrect - different sort order)
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY rating DESC, id ASC LIMIT 10, 10;
4. 使用确定性排序
对于基于滚动的分页,始终使用确定性排序条件。应避免使用非确定性函数(如 RAND()),因为它们会在查询之间产生不一致的结果。
-- Good: Deterministic sorting
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC;
-- Bad: Non-deterministic sorting, will break scroll pagination
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY RAND();
5. 使用 ID 进行排序方向
当使用 id 字段作为排序的平局处理时,您可以使用 ASC 或 DESC 顺序,但必须一致应用:
-- These are both valid for scroll pagination
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC;
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id DESC;
6. 优化排序性能
为了提高排序性能,请考虑使用 sort_method 选项:
-- For pre-sorted data (e.g., by id)
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY id ASC
LIMIT 5 OPTION sort_method=kbuffer;
-- For other sorting criteria
SELECT * FROM products WHERE MATCH('smartphone') ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5 OPTION sort_method=pq;
您选择的排序方法会显著影响分页的性能和一致性。例如,确定性排序对于基于滚动的分页至关重要,而包含唯一标识符对于所有分页方法的一致结果至关重要。有关排序如何影响性能的更多详细信息,请参阅 限制和性能考虑 。
限制和性能考虑
了解不同分页方法的限制将帮助您为特定用例选择合适的方法。
基于偏移的分页限制
- 大型偏移量性能下降:随着偏移量值的增加,查询性能会下降。这是因为 Manticore 仍需要处理所有记录直到偏移量值,然后才能返回请求的页面。
- max_matches 限制:默认情况下,您只能分页浏览前 1000 个结果。要超出此限制,必须增加
max_matches,这会增加内存使用。 - 内存使用:较大的
max_matches值需要每个查询更多的 RAM,这可能在高负载下影响服务器性能。 - 结果一致性无法保证:如果在页面请求之间添加或删除文档,可能会看到重复或遗漏的结果。
对于深度分页场景或实现“加载更多”功能,请考虑使用 基于滚动的分页 ,如 第5节 中所述。
基于滚动的分页限制
- 需要在排序条件中包含 ID:您必须在 ORDER BY 子句中包含
id字段,这可能与您期望的排序逻辑不一致。 - 令牌管理:您需要在请求之间存储和管理滚动令牌,这会增加应用程序的复杂性。
- 无一致性保证:基于滚动的分页不保证时间点一致性。如果在请求之间添加、删除或修改文档,结果仍可能受到影响。
- 无随机访问:与基于偏移的分页不同,您无法直接跳转到特定页面;必须顺序导航。
带分页的面向方面的搜索
面向方面的搜索允许用户通过多个维度过滤和导航搜索结果。当与分页结合使用时,了解分页如何影响面向方面的结果非常重要。
-- Faceted search with pagination (first page)
SELECT id, title, price, weight()
FROM products
WHERE MATCH('smartphone')
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 0, 5
FACET category ORDER BY COUNT(*) DESC;
输出:
+------+----------------------------------------------+-------+----------+
| id | title | price | weight() |
+------+----------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone | 10 | 1272 |
| 1388 | 3; her day of. smartphone | 10 | 1272 |
| 1636 | Positively bad winter clean. smartphone | 10 | 1272 |
| 5628 | Foolish from went heard low. smartphone | 10 | 1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone | 10 | 1272 |
+------+----------------------------------------------+-------+----------+
+-----------+----------+
| category | count(*) |
+-----------+----------+
| ogqejby | 1 |
| unmfujgqr | 1 |
| ttefm | 1 |
| hceihdy | 1 |
| sicjr | 1 |
| hjpwt | 1 |
| tvfqyj | 1 |
| mvdjhbexo | 1 |
| scayuo | 1 |
| esmlh | 1 |
| fvbhplj | 1 |
| lcphmiqmv | 1 |
| lnjfhb | 1 |
| qexfdulub | 1 |
| tbswa | 1 |
| eekarf | 1 |
| airjuod | 1 |
| ozkbuvgj | 1 |
| yafbhr | 1 |
| duccr | 1 |
+-----------+----------+
-- Faceted search with pagination (second page)
SELECT id, title, price, weight()
FROM products
WHERE MATCH('smartphone')
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5, 5
FACET category ORDER BY COUNT(*) DESC;
输出:
+------+-----------------------------------------------------------+-------+----------+
| id | title | price | weight() |
+------+-----------------------------------------------------------+-------+----------+
| 246 | Soft technically took. smartphone | 11 | 1272 |
| 1105 | Dirty up are! temporarily called. smartphone | 11 | 1272 |
| 3293 | Instantly thick ran that hurt. smartphone | 11 | 1272 |
| 3736 | Work her wrong locally. smartphone | 11 | 1272 |
| 6978 | Stale loud passively sweet clean. smartphone | 11 | 1272 |
+------+-----------------------------------------------------------+-------+----------+
+-----------+----------+
| category | count(*) |
+-----------+----------+
| ogqejby | 1 |
| unmfujgqr | 1 |
| ttefm | 1 |
| hceihdy | 1 |
| sicjr | 1 |
| hjpwt | 1 |
| tvfqyj | 1 |
| mvdjhbexo | 1 |
| scayuo | 1 |
| esmlh | 1 |
| fvbhplj | 1 |
| lcphmiqmv | 1 |
| lnjfhb | 1 |
| qexfdulub | 1 |
| tbswa | 1 |
| eekarf | 1 |
| airjuod | 1 |
| ozkbuvgj | 1 |
| yafbhr | 1 |
| duccr | 1 |
+-----------+----------+
重要说明: 分面结果是基于整个匹配查询结果集计算的,而不是仅基于分页部分。这意味着无论您查看的是哪一页,分面计数都代表所有匹配文档的分布情况。LIMIT 子句仅影响主结果集中返回的文档,不影响分面计算。如上例所示,两页的分面结果相同,确认了它们是基于整个结果集计算的。
分面结果中的分页
除了对主搜索结果进行分页外,Manticore Search 还允许对分面结果本身进行分页。这在处理大量分面值时特别有用,因为显示所有分面值可能不切实际。
您可以在 FACET 语句中使用 LIMIT 子句来控制返回的分面值数量。
-- Get only the first 5 categories ordered by count
SELECT id, title FROM products
WHERE MATCH('smartphone')
LIMIT 3
FACET category ORDER BY COUNT(*) DESC, category asc LIMIT 0,5;
输出:
+------+---------------------------------------------+
| id | title |
+------+---------------------------------------------+
| 1 | Incorrectly heavy soft. smartphone |
| 2 | Know of make afraid foolish. smartphone |
| 3 | Poor spring evening drove young. smartphone |
+------+---------------------------------------------+
+------------+----------+
| category | count(*) |
+------------+----------+
| aaaabzwfzn | 1 |
| aabswb | 1 |
| aacla | 1 |
| aaejmtubv | 1 |
| aaethytj | 1 |
+------------+----------+
您可以在同一查询中为不同的分面提供不同的分页参数:
-- Multiple facets with different pagination parameters
SELECT id, title FROM products
WHERE MATCH('smartphone')
LIMIT 3
FACET category ORDER BY COUNT(*) DESC, category asc LIMIT 0,5
FACET category ORDER BY category ASC LIMIT 3,5
FACET price ORDER BY FACET() ASC LIMIT 5;
输出:
+------+---------------------------------------------+
| id | title |
+------+---------------------------------------------+
| 1 | Incorrectly heavy soft. smartphone |
| 2 | Know of make afraid foolish. smartphone |
| 3 | Poor spring evening drove young. smartphone |
+------+---------------------------------------------+
+------------+----------+
| category | count(*) |
+------------+----------+
| aaaabzwfzn | 1 |
| aabswb | 1 |
| aacla | 1 |
| aaejmtubv | 1 |
| aaethytj | 1 |
+------------+----------+
+------------+----------+
| category | count(*) |
+------------+----------+
| aaejmtubv | 1 |
| aaethytj | 1 |
| aaktjgaa | 1 |
| aalfwcvwil | 1 |
| aaqmumofe | 1 |
+------------+----------+
+-------+----------+
| price | count(*) |
+-------+----------+
| 10 | 5 |
| 11 | 8 |
| 12 | 10 |
| 13 | 9 |
| 14 | 8 |
+-------+----------+
此示例演示了多种分面分页技术:
- 第一个分面返回按计数排序的前 5 个类别(偏移量 0,限制 5)
- 第二个分面按字母顺序返回类别,跳过前 3 个(偏移量 3,限制 5)
- 第三个分面返回前 5 个价格点,按升序排列
分面分页特别适用于:
- 处理大量分面值(例如数千个产品类别)
- 为分面值实现“显示更多”功能
- 创建动态过滤界面,首先显示最受欢迎的过滤器
- 通过限制返回的分面值数量进行性能优化
注意: 与主结果分页不同,主结果分页在深度偏移时会影响性能,而分面分页即使在大型偏移时也通常效率较高,因为分面计算发生在文档匹配阶段之后。
通过 HTTP JSON 进行分面分页
当使用 HTTP JSON 接口时,也可以使用分面分页。您可以通过聚合规范中的 size 参数来限制返回的分面值数量:
{
"table": "products",
"query": { "match": { "*": "smartphone" } },
"limit": 3,
"aggs": {
"category_counts": {
"terms": {
"field": "category",
"size": 5
}
}
}
}
此示例将类别分面结果限制为前 5 个值。
重要: 与 SQL 分面不同,HTTP JSON 接口目前仅支持使用 size 参数限制分面值数量,不支持分面内的 OFFSET 参数进行分页。您只能指定返回多少结果,而不能指定跳过哪些结果。
有关通过 HTTP JSON 进行分面分页的更多详细信息,请参阅 Manticore Search 文档 。
实际使用案例
带高亮的分页
-- Pagination with highlighting
SELECT id, title, price, weight(),
HIGHLIGHT({limit=100}) as highlight
FROM products
WHERE MATCH('smartphone')
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5, 5;
输出:
+------+----------------------------------------------+-------+----------+-----------------------------------------------------+
| id | title | price | weight() | highlight |
+------+----------------------------------------------+-------+----------+-----------------------------------------------------+
| 246 | Soft technically took. smartphone | 11 | 1272 | Soft technically took. <b>smartphone</b> |
| 1105 | Dirty up are! temporarily called. smartphone | 11 | 1272 | Dirty up are! temporarily called. <b>smartphone</b> |
| 3293 | Instantly thick ran that hurt. smartphone | 11 | 1272 | Instantly thick ran that hurt. <b>smartphone</b> |
| 3736 | Work her wrong locally. smartphone | 11 | 1272 | Work her wrong locally. <b>smartphone</b> |
| 6978 | Stale loud passively sweet clean. smartphone | 11 | 1272 | Stale loud passively sweet clean. <b>smartphone</b> |
+------+----------------------------------------------+-------+----------+-----------------------------------------------------+
使用滚动分页实现“加载更多”按钮
基于滚动的分页非常适合实现“加载更多”功能,因为它可以高效地导航大型结果集。以下是实现方法:
-- Initial query with SHOW SCROLL to get the scroll token
SELECT id, title, price, weight() FROM products
WHERE MATCH('smartphone')
ORDER BY weight() DESC, price ASC, id ASC
LIMIT 5; SHOW SCROLL\G
输出:
+------+------------------------------------------------+-------+----------+
| id | title | price | weight() |
+------+------------------------------------------------+-------+----------+
| 1179 | Weak spoke? fall. smartphone | 10 | 1272 |
| 1388 | 3; her day of. smartphone | 10 | 1272 |
| 1636 | Positively bad winter clean. smartphone | 10 | 1272 |
| 5628 | Foolish from went heard low. smartphone | 10 | 1272 |
| 8561 | Left sent. Infrequently, try lay. smartphone | 10 | 1272 |
+------+------------------------------------------------+-------+----------+
*************************** 1. row ***************************
scroll_token: eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMCwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo4NTYxLCJ0eXBlIjoiaW50In1dfQ==
1 row in set (0.00 sec)
-- When user clicks "Load More", use the scroll token to get the next batch
SELECT id, title, price, weight() FROM products
WHERE MATCH('smartphone')
LIMIT 5
OPTION scroll='eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMCwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo4NTYxLCJ0eXBlIjoiaW50In1dfQ=='; SHOW SCROLL\G
输出:
+------+---------------------------------------------+-------+----------+
| id | title | price | weight() |
+------+---------------------------------------------+-------+----------+
| 246 | Soft technically took. smartphone | 11 | 1272 |
| 1105 | Dirty up are! temporarily called. smartphone| 11 | 1272 |
| 3293 | Instantly thick ran that hurt. smartphone | 11 | 1272 |
| 3736 | Work her wrong locally. smartphone | 11 | 1272 |
| 6978 | Stale loud passively sweet clean. smartphone| 11 | 1272 |
+------+---------------------------------------------+-------+----------+
*************************** 1. row ***************************
scroll_token: eyJvcmRlcl9ieV9zdHIiOiJ3ZWlnaHQoKSBERVNDLCBwcmljZSBBU0MsIGlkIEFTQyIsIm9yZGVyX2J5IjpbeyJhdHRyIjoid2VpZ2h0KCkiLCJkZXNjIjp0cnVlLCJ2YWx1ZSI6MTI3MiwidHlwZSI6ImludCJ9LHsiYXR0ciI6InByaWNlIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjoxMSwidHlwZSI6ImludCJ9LHsiYXR0ciI6ImlkIiwiZGVzYyI6ZmFsc2UsInZhbHVlIjo2OTc4LCJ0eXBlIjoiaW50In1dfQ==
每次点击“加载更多”时,您将使用前一个查询返回的新滚动令牌。这种方法相比传统的基于偏移的分页有以下优势:
- 性能更好:深度分页时不会出现性能下降
- 无最大匹配限制:可以高效地分页数百万个结果
- 顺序访问:高效地导航连续的结果页
实现提示:
- 在应用程序的状态中存储滚动令牌(客户端或服务器端)
- 当用户点击“加载更多”时,在后续请求中包含该令牌
- 始终使用最新的令牌,因为每个令牌代表结果集中的当前位置
- 请记住滚动令牌是临时的,最终会过期
此实现方法解决了传统基于偏移分页在 第 8 节 中描述的许多限制,特别是针对深度分页场景。
结论
Manticore Search 提供了灵活的分页选项,以满足不同的使用场景。传统的基于偏移的分页适用于简单界面和有限的页面导航,而基于滚动的分页则在深度分页场景(如无限滚动或数据导出)中提供更好的性能。
在 Manticore Search 中实现分页时:
- 对于浅层分页(前几页):使用传统的基于偏移的分页,并保持一致的排序。
- 对于深层分页(多页或大型数据集):使用基于滚动的分页,并使用适当的基于 ID 的排序标准。
- 对于一致的结果:始终在 ORDER BY 子句中包含
id字段,并在查询中保持一致的排序标准。 - 对于全文搜索:在排序标准中包含
weight()或_score,以确保最相关的结果首先出现。
通过了解每种方法的优势、限制和最佳实践,您可以为特定的应用需求实现最有效的分页策略。请记住,最佳的分页方法取决于您的使用场景、数据量和用户体验要求。
