我们很高兴分享一个新功能,使构建语义搜索应用像编写SQL一样简单:自动嵌入。
有了这个新增功能,Manticore Search为您处理嵌入生成——无需额外的管道,无需外部服务,无需麻烦。
之前的挑战
到目前为止,语义搜索通常意味着要应对:
- 为嵌入生成设置单独的机器学习管道
- 管理模型及其依赖关系
- 同步您的应用、嵌入服务和搜索引擎
- 处理向量维度不匹配和预处理
- 确保嵌入始终以相同的方式生成
这些开销现在已经消失。
什么是自动嵌入?
使用自动嵌入,您只需插入文本。Manticore自动:
✨ 生成嵌入,使用最先进的模型
✨ 高效存储在向量索引中
✨ 允许您用自然语言查询
✨ 隐藏复杂性,让您专注于功能,而不是基础设施
它是如何工作的
在3个步骤中构建一个语义搜索应用:
1. 创建表(SQL示例)
CREATE TABLE products (
title TEXT,
description TEXT,
category STRING,
price INT,
vector FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
FROM='title,description'
);
一行配置:Manticore从title和description生成嵌入。
2. 插入数据(SQL示例)
INSERT INTO products(id, title, description, category, price) VALUES
(1, 'green hiking backpack', 'Lightweight backpack suitable for hiking trails', 'outdoors', 5999),
(2, 'laptop sleeve', 'Slim padded case for 15-inch laptops', 'electronics', 1999),
(3, 'travel daypack', 'Compact daypack perfect for light travel or hiking', 'luggage', 3999),
(4, 'black laptop backpack', 'Spacious backpack with padded laptop compartment', 'electronics', 6900),
(5, 'mountain hiking bag', 'Durable trail-ready backpack for mountain hikes', 'outdoors', 8950),
(6, 'everyday backpack', 'Versatile backpack for work, gym and school', 'general', 4900),
(7, 'trail running shoes', 'Lightweight shoes with great grip for trails', 'footwear', 7500),
(8, 'camping gear set', 'Complete set for weekend camping adventures', 'outdoors', 12000),
(9, 'outdoor laptop pack', 'Trail-optimized backpack with laptop sleeve', 'outdoors', 7800),
(10, 'compact hiking backpack', 'Light and foldable backpack for trail hikes', 'outdoors', 4200),
(11, 'portable solar charger', 'Foldable solar panel charger for phones and USB devices', 'electronics', 3400),
(12, 'reusable water bottle', 'Insulated stainless steel bottle keeps drinks cold or hot', 'lifestyle', 2500),
(13, 'noise-cancelling headphones', 'Over-ear headphones with noise cancellation', 'electronics', 13900),
(14, 'organic trail mix', 'Healthy mix of nuts and dried fruit, ideal for hikes', 'food', 899),
(15, 'wireless mouse', 'Compact wireless mouse for laptops and desktops', 'electronics', 1599),
(16, 'office chair', 'Ergonomic office chair with lumbar support and mesh back', 'furniture', 27900),
(17, 'notebook and pen set', 'Elegant A5 notebook with smooth-writing pen', 'stationery', 1200),
(18, 'children\'s adventure book', 'Illustrated storybook about outdoor exploration', 'books', 1299),
(19, 'mini drone', 'Lightweight drone with HD camera and remote control', 'gadgets', 4599),
(20, 'wooden puzzle box', 'Challenging mechanical puzzle made of natural wood', 'toys', 1899);
这个多样化的数据集涵盖了户外、电子产品、家具、书籍、玩具等。注意:不需要向量。所有嵌入都是从文本中自动生成的。
注意:价格以美分为单位(例如,5999 = $59.99)。
3. 用自然语言搜索(SQL示例)
SELECT id, title, description, price, knn_dist()
FROM products
WHERE knn(vector, 5, 'lightweight laptop backpack for trail hiking')
LIMIT 5;
结果:
+------+-------------------------+--------------------------------------------------+-------+------------+
| id | title | description | price | knn_dist() |
+------+-------------------------+--------------------------------------------------+-------+------------+
| 9 | outdoor laptop pack | Trail-optimized backpack with laptop sleeve | 7800 | 0.35392243 |
| 1 | green hiking backpack | Lightweight backpack suitable for hiking trails | 5999 | 0.53113687 |
| 5 | mountain hiking bag | Durable trail-ready backpack for mountain hikes | 8950 | 0.62034285 |
| 4 | black laptop backpack | Spacious backpack with padded laptop compartment | 6900 | 0.65785009 |
| 10 | compact hiking backpack | Light and foldable backpack for trail hikes | 4200 | 0.68591022 |
+------+-------------------------+--------------------------------------------------+-------+------------+
查询“轻便的笔记本电脑背包用于徒步旅行”首先找到了最相关的项目:“户外笔记本电脑包”,它结合了笔记本电脑和徒步旅行的特点,接下来是徒步背包和以笔记本电脑为导向的产品。
选择合适的模型
您可以根据需要选择不同的模型:
- 🏠 本地(Hugging Face模型) — 无需API密钥,无限制使用
- 🌐 OpenAI模型 — 一流的语义质量
- 🚀 Voyage & Jina模型 — 针对特定领域和语言优化
混合搜索与过滤(SQL示例)
在一个查询中结合语义、关键字和结构化过滤:
SELECT id, price, highlight()
FROM products
WHERE knn(vector, 7, 'lightweight laptop backpack for trail hiking')
AND category = 'outdoors'
AND MATCH('"lightweight laptop backpack for trail hiking"/0.5');
结果:
+------+-------+-----------------------------------------------------------------------------------------------+
| id | price | highlight() |
+------+-------+-----------------------------------------------------------------------------------------------+
| 9 | 7800 | outdoor <b>laptop</b> pack | <b>Trail</b>-optimized <b>backpack</b> with <b>laptop</b> sleeve |
| 1 | 5999 | green <b>hiking backpack</b> | <b>Lightweight backpack</b> suitable <b>for hiking</b> trails |
| 5 | 8950 | mountain <b>hiking</b> bag | Durable <b>trail</b>-ready <b>backpack for</b> mountain hikes |
| 10 | 4200 | compact <b>hiking backpack</b> | Light and foldable <b>backpack for trail</b> hikes |
+------+-------+-----------------------------------------------------------------------------------------------+
注意:highlight()返回标记(例如,<b>...</b>)。
这个强大的组合通过类别(outdoors)进行过滤,通过嵌入确保语义相关性,要求文本级关键字匹配,并突出显示匹配的术语——所有这些都在一个查询中完成!
完整的HTTP/JSON API支持
自动嵌入与Manticore的HTTP/JSON API无缝协作,提供与SQL相同的功能,但通过REST端点。
通过JSON插入数据(HTTP/JSON API示例)
使用/insert端点——嵌入自动生成:
curl "http://localhost:9308/insert" -H "Content-Type: application/json" \
-d '{
"table": "products",
"id": 21,
"doc": {
"title": "wireless headphones",
"description": "Bluetooth headphones with noise cancellation",
"category": "electronics",
"price": 15900
}
}'
响应:
{
"table": "products",
"id": 21,
"created": true,
"result": "created",
"status": 201
}
使用自动嵌入的批量插入(HTTP/JSON API示例)
使用/bulk高效插入多个文档:
curl "http://localhost:9308/bulk" -H "Content-Type: application/x-ndjson" \
--data-raw $'{"insert": {"table": "products", "id": 22, "doc": {"title": "gaming laptop", "description": "High-performance laptop for gaming and work", "category": "electronics", "price": 159900}}}
{"insert": {"table": "products", "id": 23, "doc": {"title": "smartphone", "description": "Latest flagship smartphone with 5G", "category": "electronics", "price": 89900}}}
{"insert": {"table": "products", "id": 24, "doc": {"title": "tablet computer", "description": "Lightweight tablet for work and entertainment", "category": "electronics", "price": 49900}}}'
响应:
{
"items": [
{
"bulk": {
"table": "products",
"_id": 24,
"created": 3,
"deleted": 0,
"updated": 0,
"result": "created",
"status": 201
}
}
],
"current_line": 3,
"skipped_lines": 0,
"errors": false,
"error": ""
}
批量操作成功插入了3个文档,带有自动生成的嵌入。
通过JSON进行语义搜索(HTTP/JSON API示例)
使用/search进行自然语言查询:
curl "http://localhost:9308/search" -H "Content-Type: application/json" \
-d '{
"table": "products",
"_source": ["title"],
"size": 5,
"knn": {
"field": "vector",
"query": "outdoor hiking adventure",
"k": 3
}
}'
响应:
{
"took": 8,
"timed_out": false,
"hits": {
"total": 24,
"total_relation": "eq",
"hits": [
{
"_id": 18,
"_score": 1,
"_knn_dist": 0.75467718,
"_source": {
"title": "children's adventure book"
}
},
{
"_id": 1,
"_score": 1,
"_knn_dist": 0.83226496,
"_source": {
"title": "green hiking backpack"
}
},
{
"_id": 5,
"_score": 1,
"_knn_dist": 0.89348459,
"_source": {
"title": "mountain hiking bag"
}
},
{
"_id": 10,
"_score": 1,
"_knn_dist": 0.92611158,
"_source": {
"title": "compact hiking backpack"
}
},
{
"_id": 3,
"_score": 1,
"_knn_dist": 0.98721427,
"_source": {
"title": "travel daypack"
}
}
]
}
}
查询“户外徒步冒险”找到的最相关匹配是“儿童冒险书”(0.754距离),接下来是与徒步相关的背包。这表明语义搜索可以找到概念上相关的项目,而不仅仅是字面关键字匹配。
通过JSON进行过滤和混合搜索(HTTP/JSON API示例)
将语义搜索与传统过滤结合:
curl "http://localhost:9308/search" -H "Content-Type: application/json" \
-d '{
"table": "products",
"_source": ["title", "price"],
"size": 5,
"knn": {
"field": "vector",
"query": "technology electronic device",
"k": 5,
"filter": {
"range": {"price": {"gte": 15000}}
}
}
}'
响应:
{
"took": 10,
"timed_out": false,
"hits": {
"total": 5,
"total_relation": "eq",
"hits": [
{
"_id": 24,
"_score": 1,
"_knn_dist": 1.31113040,
"_source": {
"title": "tablet computer",
"price": 49900
}
},
{
"_id": 23,
"_score": 1,
"_knn_dist": 1.56920886,
"_source": {
"title": "smartphone",
"price": 89900
}
},
{
"_id": 22,
"_score": 1,
"_knn_dist": 1.59042466,
"_source": {
"title": "gaming laptop",
"price": 159900
}
},
{
"_id": 16,
"_score": 1,
"_knn_dist": 1.84979212,
"_source": {
"title": "office chair",
"price": 27900
}
},
{
"_id": 21,
"_score": 1,
"_knn_dist": 1.88567829,
"_source": {
"title": "wireless headphones",
"price": 15900
}
}
]
}
}
对“技术电子设备”的搜索,带有价格过滤(≥$150),正确优先考虑电子产品,并排除了价格较低的产品,如徒步背包和小型电子产品。注意“平板电脑”因其与查询的强语义匹配而排名最高。
直接向量与自动嵌入文本查询
HTTP/JSON API支持两者:
- 自动嵌入文本查询:
"query": "户外徒步冒险"(自动嵌入) - 直接向量查询:
"query": [0.1, 0.2, 0.3, ...](预计算向量)
这种灵活性允许您在同一应用中混合自动生成的嵌入和自定义向量。
OpenAI集成(OpenAI API示例)
为了更好的语义理解,您可以使用OpenAI的嵌入模型:
-- Create table with OpenAI embeddings
CREATE TABLE products_openai (
title TEXT,
description TEXT,
category string,
price INT,
vector FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
MODEL_NAME='openai/text-embedding-ada-002'
FROM='title, description'
API_KEY='your-openai-api-key'
);
-- Insert data (embeddings generated via OpenAI API)
INSERT INTO products_openai(title, description, category, price) VALUES
('smartphone device', 'latest mobile technology with advanced features', 'electronics', 79900),
('laptop computer', 'portable workstation for developers and professionals', 'electronics', 129900);
-- Search with natural language
SELECT id, title, description, knn_dist()
FROM products_openai
WHERE knn(vector, 2, 'mobile phone technology');
结果:
+---------------------+-------------------+-------------------------------------------------------+------------+
| id | title | description | knn_dist() |
+---------------------+-------------------+-------------------------------------------------------+------------+
| 2309215617435041807 | smartphone device | latest mobile technology with advanced features | 0.20333229 |
| 2309215617435041808 | laptop computer | portable workstation for developers and professionals | 0.40197325 |
+---------------------+-------------------+-------------------------------------------------------+------------+
OpenAI的模型在理解细微关系方面表现出色——“手机技术”正确地将智能手机识别为比笔记本电脑更相关。
为生产而构建
- ⚡ 快速:HNSW索引,可选量化,优化存储
- 🛡️ 可靠:多个模型提供者,空向量处理
- 🔧 灵活:从您选择的任何字段嵌入
用例
自动嵌入使构建变得简单:
- 🛍️ 电子商务搜索:“防水徒步靴”→找到相关产品
- 📚 文档发现:“关于数据隐私的合同”→呈现法律文档
- 🎵 内容推荐:“适合锻炼的欢快音乐”→按氛围匹配
- 🏠 房地产搜索:“靠近公园的舒适公寓”→找到符合生活方式的家
更多现实世界示例
让我们看看自动嵌入在不同搜索场景中的实际应用:
寻找工作与生产力物品
SELECT id, title, description, price, knn_dist()
FROM products
WHERE knn(vector, 3, 'work productivity office')
LIMIT 3;
结果:
+------+----------------------+----------------------------------------------------------+-------+------------+
| id | title | description | price | knn_dist() |
+------+----------------------+----------------------------------------------------------+-------+------------+
| 24 | tablet computer | Lightweight tablet for work and entertainment | 49900 | 1.306459 |
| 16 | office chair | Ergonomic office chair with lumbar support and mesh back | 27900 | 1.44871426 |
| 17 | notebook and pen set | Elegant A5 notebook with smooth-writing pen | 1200 | 1.48466742 |
+------+----------------------+----------------------------------------------------------+-------+------------+
搜索理解了“工作生产力办公室”,返回了办公家具、文具和适合工作的装备。
智能类别过滤
有时语义搜索过于宽泛。让我们搜索“户外露营用USB充电器”:
SELECT id, title, description, price, knn_dist()
FROM products
WHERE knn(vector, 5, 'usb charger for outdoor camping');
顶级结果包括许多项目: 太阳能充电器 (0.888),户外背包 (1.139),远足装备 (1.213),等等。
但是当我们添加类别过滤时:
SELECT id, highlight()
FROM products
WHERE knn(vector, 5, 'usb charger for outdoor camping')
AND category = 'electronics'
AND MATCH('"usb charger for outdoor camping"/0.5')
LIMIT 3;
精确结果:
+------+-------------------------------------------------------------------------------------------------------+
| id | highlight() |
+------+-------------------------------------------------------------------------------------------------------+
| 11 | portable solar <b>charger</b> | Foldable solar panel <b>charger for</b> phones and <b>USB</b> devices |
+------+-------------------------------------------------------------------------------------------------------+
注意:highlight() 返回标记 (例如,<b>...</b>)。表格中的粗体是为了可读性。
语义理解 + 类别过滤 + 关键字匹配的组合正好给了我们想要的结果!
寻找有趣和创意的项目
SELECT id, title, description, price, knn_dist()
FROM products
WHERE knn(vector, 3, 'fun creative play toys')
LIMIT 3;
结果:
+------+---------------------------+----------------------------------------------------+-------+------------+
| id | title | description | price | knn_dist() |
+------+---------------------------+----------------------------------------------------+-------+------------+
| 8 | camping gear set | Complete set for weekend camping adventures | 12000 | 1.30462146 |
| 20 | wooden puzzle box | Challenging mechanical puzzle made of natural wood | 1899 | 1.305056 |
| 18 | children's adventure book | Illustrated storybook about outdoor exploration | 1299 | 1.47192979 |
+------+---------------------------+----------------------------------------------------+-------+------------+
自动嵌入理解了“有趣的创意游戏”的概念,并找到了冒险装备、拼图和儿童书籍——所有与创造力和游戏相关的项目!
幕后花絮
自动嵌入依赖于:
- 句子变换器 进行语义理解
- HNSW 进行快速相似性搜索
- 智能缓存 进行高效推理
- 多提供者 API 以实现灵活性
今天就试试
正如您从我们的示例中看到的,自动嵌入提供强大的语义搜索能力,设置简单。无论您是在构建:
- 电子商务平台 进行自然语言产品搜索
- 内容管理系统 进行智能文档发现
- 推荐引擎 理解用户意图
- 知识库 进行语义问答
自动嵌入消除了最困难的部分——管理嵌入——这样您就可以专注于构建用户喜爱的优秀功能。
🚀 准备好改变您的搜索体验了吗?
👉
下载 Manticore Search
并开始使用自动嵌入构建。
📚 查看
KNN 搜索文档
获取详细指南。
💬 加入我们的
Slack 社区
分享您的成功故事。
