⚠️ 此页面为自动翻译,翻译可能不完美。
blog-post

Introducing Auto Embeddings: AI-Powered Search Made Simple

我们很兴奋地分享一项新功能,它让构建语义搜索应用像编写 SQL 一样简单:Auto Embeddings
通过这一功能,Manticore Search 将为您处理嵌入生成——无需额外的管道,无需外部服务,无需麻烦。

之前的挑战

直到现在,语义搜索通常意味着要应对:

  • 为嵌入生成设置单独的 ML 管道
  • 管理模型及其依赖项
  • 同步您的应用、嵌入服务和搜索引擎
  • 处理向量维度不匹配和预处理
  • 确保嵌入始终以相同方式生成

这些开销现在已经消失。

什么是 Auto Embeddings?

通过 Auto Embeddings,您只需插入文本。Manticore 会自动:

使用最先进的模型生成嵌入
高效地存储在向量索引中
允许您使用自然语言查询
隐藏复杂性,让您专注于功能而非基础设施

它是如何工作的

通过 3 个步骤 构建语义搜索应用:

1. 创建表(SQL 示例)

CREATE TABLE products (
    title TEXT,
    description TEXT,
    category STRING,
    price INT,
    vector FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
        MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
        FROM='title,description'
);

一行配置:Manticore 从 titledescription 生成嵌入。

2. 插入数据(SQL 示例)

INSERT INTO products(id, title, description, category, price) VALUES
  (1, 'green hiking backpack', 'Lightweight backpack suitable for hiking trails', 'outdoors', 5999),
  (2, 'laptop sleeve', 'Slim padded case for 15-inch laptops', 'electronics', 1999),
  (3, 'travel daypack', 'Compact daypack perfect for light travel or hiking', 'luggage', 3999),
  (4, 'black laptop backpack', 'Spacious backpack with padded laptop compartment', 'electronics', 6900),
  (5, 'mountain hiking bag', 'Durable trail-ready backpack for mountain hikes', 'outdoors', 8950),
  (6, 'everyday backpack', 'Versatile backpack for work, gym and school', 'general', 4900),
  (7, 'trail running shoes', 'Lightweight shoes with great grip for trails', 'footwear', 7500),
  (8, 'camping gear set', 'Complete set for weekend camping adventures', 'outdoors', 12000),
  (9, 'outdoor laptop pack', 'Trail-optimized backpack with laptop sleeve', 'outdoors', 7800),
  (10, 'compact hiking backpack', 'Light and foldable backpack for trail hikes', 'outdoors', 4200),
  (11, 'portable solar charger', 'Foldable solar panel charger for phones and USB devices', 'electronics', 3400),
  (12, 'reusable water bottle', 'Insulated stainless steel bottle keeps drinks cold or hot', 'lifestyle', 2500),
  (13, 'noise-cancelling headphones', 'Over-ear headphones with noise cancellation', 'electronics', 13900),
  (14, 'organic trail mix', 'Healthy mix of nuts and dried fruit, ideal for hikes', 'food', 899),
  (15, 'wireless mouse', 'Compact wireless mouse for laptops and desktops', 'electronics', 1599),
  (16, 'office chair', 'Ergonomic office chair with lumbar support and mesh back', 'furniture', 27900),
  (17, 'notebook and pen set', 'Elegant A5 notebook with smooth-writing pen', 'stationery', 1200),
  (18, 'children\'s adventure book', 'Illustrated storybook about outdoor exploration', 'books', 1299),
  (19, 'mini drone', 'Lightweight drone with HD camera and remote control', 'gadgets', 4599),
  (20, 'wooden puzzle box', 'Challenging mechanical puzzle made of natural wood', 'toys', 1899);

这个多样化数据集涵盖户外、电子产品、家具、书籍、玩具等。注意:无需向量。所有嵌入都从文本自动生成。

注意:价格以美分计算(例如,5999 = 59.99 美元)。

3. 使用自然语言搜索(SQL 示例)

SELECT id, title, description, price, knn_dist()
FROM products 
WHERE knn(vector, 5, 'lightweight laptop backpack for trail hiking')
LIMIT 5;

结果:

+------+-------------------------+--------------------------------------------------+-------+------------+
| id   | title                   | description                                      | price | knn_dist() |
+------+-------------------------+--------------------------------------------------+-------+------------+
|    9 | outdoor laptop pack     | Trail-optimized backpack with laptop sleeve      |  7800 | 0.35392243 |
|    1 | green hiking backpack   | Lightweight backpack suitable for hiking trails  |  5999 | 0.53113687 |
|    5 | mountain hiking bag     | Durable trail-ready backpack for mountain hikes  |  8950 | 0.62034285 |
|    4 | black laptop backpack   | Spacious backpack with padded laptop compartment |  6900 | 0.65785009 |
|   10 | compact hiking backpack | Light and foldable backpack for trail hikes      |  4200 | 0.68591022 |
+------+-------------------------+--------------------------------------------------+-------+------------+

查询 "lightweight laptop backpack for trail hiking" 首先找到了最相关的产品:名为 "outdoor laptop pack" 的产品,它结合了笔记本电脑和徒步旅行功能,其次是徒步背包和以笔记本电脑为导向的产品。

选择合适的模型

您可以根据需求选择不同的模型:

  • 🏠 本地(Hugging Face 模型) —— 无需 API 密钥,可无限使用
  • 🌐 OpenAI 模型 —— 语义质量最佳
  • 🚀 Voyage & Jina 模型 —— 领域和语言优化

混合搜索与过滤(SQL 示例)

在一个查询中结合语义、关键字和结构化过滤器:

SELECT id, price, highlight()
FROM products
WHERE knn(vector, 7, 'lightweight laptop backpack for trail hiking')
  AND category = 'outdoors'
  AND MATCH('"lightweight laptop backpack for trail hiking"/0.5');

结果:

+------+-------+-----------------------------------------------------------------------------------------------+
| id   | price | highlight()                                                                                   |
+------+-------+-----------------------------------------------------------------------------------------------+
|    9 |  7800 | outdoor <b>laptop</b> pack | <b>Trail</b>-optimized <b>backpack</b> with <b>laptop</b> sleeve |
|    1 |  5999 | green <b>hiking backpack</b> | <b>Lightweight backpack</b> suitable <b>for hiking</b> trails  |
|    5 |  8950 | mountain <b>hiking</b> bag | Durable <b>trail</b>-ready <b>backpack for</b> mountain hikes    |
|   10 |  4200 | compact <b>hiking backpack</b> | Light and foldable <b>backpack for trail</b> hikes           |
+------+-------+-----------------------------------------------------------------------------------------------+

注意:highlight() 返回标记(例如,<b>...</b>)。

这种强大组合通过类别(outdoors)进行过滤,通过嵌入确保语义相关性,要求文本级别的关键字匹配,并突出显示匹配的术语——所有这些都在一个查询中完成!

完整的 HTTP/JSON API 支持

Auto Embeddings 与 Manticore 的 HTTP/JSON API 无缝协作,提供与 SQL 相同的功能,但通过 REST 端点。

通过 JSON 插入数据(HTTP/JSON API 示例)

使用 /insert 端点 - 嵌入自动生成:

curl "http://localhost:9308/insert" -H "Content-Type: application/json" \
  -d '{
    "table": "products", 
    "id": 21, 
    "doc": {
      "title": "wireless headphones", 
      "description": "Bluetooth headphones with noise cancellation", 
      "category": "electronics", 
      "price": 15900
    }
  }'

响应:

{
  "table": "products",
  "id": 21,
  "created": true,
  "result": "created",
  "status": 201
}

使用 Auto Embeddings 批量插入(HTTP/JSON API 示例)

使用 /bulk 高效插入多个文档:

curl "http://localhost:9308/bulk" -H "Content-Type: application/x-ndjson" \
  --data-raw $'{"insert": {"table": "products", "id": 22, "doc": {"title": "gaming laptop", "description": "High-performance laptop for gaming and work", "category": "electronics", "price": 159900}}}
{"insert": {"table": "products", "id": 23, "doc": {"title": "smartphone", "description": "Latest flagship smartphone with 5G", "category": "electronics", "price": 89900}}}
{"insert": {"table": "products", "id": 24, "doc": {"title": "tablet computer", "description": "Lightweight tablet for work and entertainment", "category": "electronics", "price": 49900}}}'

响应:

{
  "items": [
    {
      "bulk": {
        "table": "products",
        "_id": 24,
        "created": 3,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    }
  ],
  "current_line": 3,
  "skipped_lines": 0,
  "errors": false,
  "error": ""
}

批量操作成功插入了 3 个带有自动生成嵌入的文档。

通过 JSON 进行语义搜索(HTTP/JSON API 示例)

使用 /search 通过自然语言查询:

curl "http://localhost:9308/search" -H "Content-Type: application/json" \
  -d '{
    "table": "products",
    "_source": ["title"],
    "size": 5,
    "knn": {
      "field": "vector",
      "query": "outdoor hiking adventure",
      "k": 3
    }
  }'

响应:

{
  "took": 8,
  "timed_out": false,
  "hits": {
    "total": 24,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 18,
        "_score": 1,
        "_knn_dist": 0.75467718,
        "_source": {
          "title": "children's adventure book"
        }
      },
      {
        "_id": 1,
        "_score": 1,
        "_knn_dist": 0.83226496,
        "_source": {
          "title": "green hiking backpack"
        }
      },
      {
        "_id": 5,
        "_score": 1,
        "_knn_dist": 0.89348459,
        "_source": {
          "title": "mountain hiking bag"
        }
      },
      {
        "_id": 10,
        "_score": 1,
        "_knn_dist": 0.92611158,
        "_source": {
          "title": "compact hiking backpack"
        }
      },
      {
        "_id": 3,
        "_score": 1,
        "_knn_dist": 0.98721427,
        "_source": {
          "title": "travel daypack"
        }
      }
    ]
  }
}

查询 "outdoor hiking adventure" 找到的最相关匹配是 "children's adventure book"(距离 0.754),其次是与徒步旅行相关的背包。这展示了语义搜索如何找到概念上相关的产品,而不仅仅是字面关键字匹配。

通过 JSON 进行过滤和混合搜索(HTTP/JSON API 示例)

将语义搜索与传统过滤器结合:

curl "http://localhost:9308/search" -H "Content-Type: application/json" \
  -d '{
    "table": "products",
    "_source": ["title", "price"],
    "size": 5,
    "knn": {
      "field": "vector", 
      "query": "technology electronic device",
      "k": 5,
      "filter": {
        "range": {"price": {"gte": 15000}}
      }
    }
  }'

响应:

{
  "took": 10,
  "timed_out": false,
  "hits": {
    "total": 5,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 24,
        "_score": 1,
        "_knn_dist": 1.31113040,
        "_source": {
          "title": "tablet computer",
          "price": 49900
        }
      },
      {
        "_id": 23,
        "_score": 1,
        "_knn_dist": 1.56920886,
        "_source": {
          "title": "smartphone",
          "price": 89900
        }
      },
      {
        "_id": 22,
        "_score": 1,
        "_knn_dist": 1.59042466,
        "_source": {
          "title": "gaming laptop",
          "price": 159900
        }
      },
      {
        "_id": 16,
        "_score": 1,
        "_knn_dist": 1.84979212,
        "_source": {
          "title": "office chair",
          "price": 27900
        }
      },
      {
        "_id": 21,
        "_score": 1,
        "_knn_dist": 1.88567829,
        "_source": {
          "title": "wireless headphones",
          "price": 15900
        }
      }
    ]
  }
}

对 "technology electronic device" 的搜索结合了价格过滤(≥150 美元),正确优先考虑了电子产品并排除了价格较低的产品,如徒步背包和小型电子产品。请注意 "tablet computer" 因其与查询的强语义匹配而排名最高。

直接向量与 Auto-Embedded 文本查询

HTTP/JSON API 支持:

  • Auto-embedded 文本查询"query": "outdoor hiking adventure"(自动嵌入)
  • 直接向量查询"query": [0.1, 0.2, 0.3, ...](预计算向量)

这种灵活性允许您在同一应用中混合使用自动生成的嵌入和自定义向量。

OpenAI 集成(OpenAI API 示例)

为了获得更好的语义理解,您可以使用 OpenAI 的嵌入模型:

-- Create table with OpenAI embeddings
CREATE TABLE products_openai (
  title TEXT,
  description TEXT,
  category string,
  price INT,
  vector FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='openai/text-embedding-ada-002'
    FROM='title, description'
    API_KEY='your-openai-api-key'
);

-- Insert data (embeddings generated via OpenAI API)
INSERT INTO products_openai(title, description, category, price) VALUES
  ('smartphone device', 'latest mobile technology with advanced features', 'electronics', 79900),
  ('laptop computer', 'portable workstation for developers and professionals', 'electronics', 129900);

-- Search with natural language
SELECT id, title, description, knn_dist()
FROM products_openai 
WHERE knn(vector, 2, 'mobile phone technology');

结果:

+---------------------+-------------------+-------------------------------------------------------+------------+
| id                  | title             | description                                           | knn_dist() |
+---------------------+-------------------+-------------------------------------------------------+------------+
| 2309215617435041807 | smartphone device | latest mobile technology with advanced features       | 0.20333229 |
| 2309215617435041808 | laptop computer   | portable workstation for developers and professionals | 0.40197325 |
+---------------------+-------------------+-------------------------------------------------------+------------+

OpenAI 的模型擅长理解细微的关系——"mobile phone technology" 正确识别智能手机比笔记本电脑更相关。

专为生产环境打造

  • 快速:HNSW 索引,可选量化,优化存储
  • 🛡️ 可靠:多个模型提供商,空向量处理
  • 🔧 灵活:从您选择的任何字段嵌入

用例

Auto Embeddings 让您轻松构建:

  • 🛍️ 电子商务搜索:"waterproof hiking boots" → 找到相关产品
  • 📚 文档发现:"contracts about data privacy" → 显示法律文件
  • 🎵 内容推荐:"upbeat music for workouts" → 按氛围匹配
  • 🏠 房地产搜索:"cozy apartments near parks" → 找到符合生活方式的房屋

更多现实场景示例

让我们看看 Auto Embeddings 在不同搜索场景中的实际应用:

寻找工作与生产力物品

SELECT id, title, description, price, knn_dist()
FROM products 
WHERE knn(vector, 3, 'work productivity office')
LIMIT 3;

结果:

+------+----------------------+----------------------------------------------------------+-------+------------+
| id   | title                | description                                              | price | knn_dist() |
+------+----------------------+----------------------------------------------------------+-------+------------+
|   24 | tablet computer      | Lightweight tablet for work and entertainment            | 49900 |   1.306459 |
|   16 | office chair         | Ergonomic office chair with lumbar support and mesh back | 27900 | 1.44871426 |
|   17 | notebook and pen set | Elegant A5 notebook with smooth-writing pen              |  1200 | 1.48466742 |
+------+----------------------+----------------------------------------------------------+-------+------------+

搜索理解了 "work productivity office" 并返回了办公家具、文具和适合工作的装备。

智能分类过滤

有时语义搜索 广泛。让我们搜索 "usb charger for outdoor camping":

SELECT id, title, description, price, knn_dist()
FROM products 
WHERE knn(vector, 5, 'usb charger for outdoor camping');

Top results include many items: 太阳能充电器 (0.888),户外背包 (1.139),徒步装备 (1.213) 等。

但当我们添加类别过滤时:

SELECT id, highlight()
FROM products 
WHERE knn(vector, 5, 'usb charger for outdoor camping')
  AND category = 'electronics'
  AND MATCH('"usb charger for outdoor camping"/0.5')
LIMIT 3;

精确结果:

+------+-------------------------------------------------------------------------------------------------------+
| id   | highlight()                                                                                           |
+------+-------------------------------------------------------------------------------------------------------+
|   11 | portable solar <b>charger</b> | Foldable solar panel <b>charger for</b> phones and <b>USB</b> devices |
+------+-------------------------------------------------------------------------------------------------------+

注意:highlight() 返回的是标记(例如,<b>...</b>)。表格中的加粗是为了可读性。

语义理解 + 类别过滤 + 关键词匹配的结合让我们得到了想要的结果!

寻找有趣且富有创意的商品

SELECT id, title, description, price, knn_dist()
FROM products 
WHERE knn(vector, 3, 'fun creative play toys')
LIMIT 3;

结果:

+------+---------------------------+----------------------------------------------------+-------+------------+
| id   | title                     | description                                        | price | knn_dist() |
+------+---------------------------+----------------------------------------------------+-------+------------+
|    8 | camping gear set          | Complete set for weekend camping adventures        | 12000 | 1.30462146 |
|   20 | wooden puzzle box         | Challenging mechanical puzzle made of natural wood |  1899 |   1.305056 |
|   18 | children's adventure book | Illustrated storybook about outdoor exploration    |  1299 | 1.47192979 |
+------+---------------------------+----------------------------------------------------+-------+------------+

Auto Embeddings 理解了“有趣且富有创意的玩耍”这一概念,并找到了冒险装备、拼图和儿童书籍——所有与创造力和玩耍相关的商品!

幕后技术

Auto Embeddings 依赖于:

  • Sentence Transformers 用于语义理解
  • HNSW 用于快速相似性搜索
  • 智能缓存 用于高效推理
  • 多供应商API 用于灵活性

今天就来尝试

从我们的示例中可以看到,Auto Embeddings 仅需极少的设置即可提供强大的语义搜索功能。无论您正在构建:

  • 电子商务平台,支持自然语言产品搜索
  • 内容管理系统,支持智能文档发现
  • 推荐引擎,能够理解用户意图
  • 知识库,支持语义问答

Auto Embeddings 去除了最困难的部分——管理嵌入向量——让您专注于构建用户喜爱的出色功能。

🚀 准备好改变您的搜索体验了吗?

👉 下载 Manticore Search 并立即开始使用 Auto Embeddings 构建。
📚 查看 KNN 搜索文档 获取详细指南。
💬 加入我们的 Slack 社区 分享您的成功故事。


有问题或反馈?加入我们的 社区论坛 或关注我们的 Twitter

安装Manticore Search

安装Manticore Search