blog-post

Introducing Auto Embeddings: AI-Powered Search Made Simple

我们很高兴与您分享一个新功能,使构建语义搜索应用程序变得像编写 SQL 一样简单:自动嵌入
有了这个新功能,Manticore Search 为您处理嵌入生成——无需额外的管道,无需外部服务,无需麻烦。

之前的挑战

到目前为止,语义搜索往往意味着要应对:

  • 设置单独的机器学习管道以生成嵌入
  • 管理模型及其依赖关系
  • 同步您的应用程序、嵌入服务和搜索引擎
  • 处理向量维度不匹配和预处理
  • 确保始终以相同方式生成嵌入

这样的开销现在已经消失。

什么是自动嵌入?

使用自动嵌入,您只需插入文本。Manticore 自动:

生成使用最先进模型的嵌入
在向量索引中高效存储
让您以自然语言查询
隐藏复杂性 以便您专注于功能,而不是基础设施

原理

3 步 内构建语义搜索应用程序:

1. 创建表(SQL 示例)

CREATE TABLE products (
    title TEXT,
    description TEXT,
    category STRING,
    price INT,
    vector FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
        MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
        FROM='title,description'
);

一行配置:Manticore 从 titledescription 生成嵌入。

2. 插入数据(SQL 示例)

INSERT INTO products(id, title, description, category, price) VALUES
  (1, 'green hiking backpack', 'Lightweight backpack suitable for hiking trails', 'outdoors', 5999),
  (2, 'laptop sleeve', 'Slim padded case for 15-inch laptops', 'electronics', 1999),
  (3, 'travel daypack', 'Compact daypack perfect for light travel or hiking', 'luggage', 3999),
  (4, 'black laptop backpack', 'Spacious backpack with padded laptop compartment', 'electronics', 6900),
  (5, 'mountain hiking bag', 'Durable trail-ready backpack for mountain hikes', 'outdoors', 8950),
  (6, 'everyday backpack', 'Versatile backpack for work, gym and school', 'general', 4900),
  (7, 'trail running shoes', 'Lightweight shoes with great grip for trails', 'footwear', 7500),
  (8, 'camping gear set', 'Complete set for weekend camping adventures', 'outdoors', 12000),
  (9, 'outdoor laptop pack', 'Trail-optimized backpack with laptop sleeve', 'outdoors', 7800),
  (10, 'compact hiking backpack', 'Light and foldable backpack for trail hikes', 'outdoors', 4200),
  (11, 'portable solar charger', 'Foldable solar panel charger for phones and USB devices', 'electronics', 3400),
  (12, 'reusable water bottle', 'Insulated stainless steel bottle keeps drinks cold or hot', 'lifestyle', 2500),
  (13, 'noise-cancelling headphones', 'Over-ear headphones with noise cancellation', 'electronics', 13900),
  (14, 'organic trail mix', 'Healthy mix of nuts and dried fruit, ideal for hikes', 'food', 899),
  (15, 'wireless mouse', 'Compact wireless mouse for laptops and desktops', 'electronics', 1599),
  (16, 'office chair', 'Ergonomic office chair with lumbar support and mesh back', 'furniture', 27900),
  (17, 'notebook and pen set', 'Elegant A5 notebook with smooth-writing pen', 'stationery', 1200),
  (18, 'children\'s adventure book', 'Illustrated storybook about outdoor exploration', 'books', 1299),
  (19, 'mini drone', 'Lightweight drone with HD camera and remote control', 'gadgets', 4599),
  (20, 'wooden puzzle box', 'Challenging mechanical puzzle made of natural wood', 'toys', 1899);

这个多样化的数据集涵盖了户外、电子产品、家具、书籍、玩具等。请注意:无需向量。所有嵌入都是从文本中自动生成的。

注意:价格以分为单位(例如,5999 = $59.99)。

3. 使用自然语言搜索(SQL 示例)

SELECT id, title, description, price, knn_dist()
FROM products 
WHERE knn(vector, 5, 'lightweight laptop backpack for trail hiking')
LIMIT 5;

结果:

+------+-------------------------+--------------------------------------------------+-------+------------+
| id   | title                   | description                                      | price | knn_dist() |
+------+-------------------------+--------------------------------------------------+-------+------------+
|    9 | outdoor laptop pack     | Trail-optimized backpack with laptop sleeve      |  7800 | 0.35392243 |
|    1 | green hiking backpack   | Lightweight backpack suitable for hiking trails  |  5999 | 0.53113687 |
|    5 | mountain hiking bag     | Durable trail-ready backpack for mountain hikes  |  8950 | 0.62034285 |
|    4 | black laptop backpack   | Spacious backpack with padded laptop compartment |  6900 | 0.65785009 |
|   10 | compact hiking backpack | Light and foldable backpack for trail hikes      |  4200 | 0.68591022 |
+------+-------------------------+--------------------------------------------------+-------+------------+

查询“用于徒步旅行的轻便笔记本电脑背包”首先找到最相关的物品:“户外笔记本电脑包”,它结合了笔记本电脑和徒步旅行的功能,接着是徒步旅行背包和针对笔记本电脑的产品。

选择合适的模型

您可以根据需要选择不同的模型:

  • 🏠 本地(Hugging Face 模型) — 无需 API 密钥,使用无限制
  • 🌐 OpenAI 模型 — 一流的语义质量
  • 🚀 Voyage 和 Jina 模型 — 针对领域和语言的优化

混合搜索和过滤(SQL 示例)

在一个查询中结合语义、关键字和结构过滤:

SELECT id, price, highlight()
FROM products
WHERE knn(vector, 7, 'lightweight laptop backpack for trail hiking')
  AND category = 'outdoors'
  AND MATCH('"lightweight laptop backpack for trail hiking"/0.5');

结果:

+------+-------+-----------------------------------------------------------------------------------------------+
| id   | price | highlight()                                                                                   |
+------+-------+-----------------------------------------------------------------------------------------------+
|    9 |  7800 | outdoor <b>laptop</b> pack | <b>Trail</b>-optimized <b>backpack</b> with <b>laptop</b> sleeve |
|    1 |  5999 | green <b>hiking backpack</b> | <b>Lightweight backpack</b> suitable <b>for hiking</b> trails  |
|    5 |  8950 | mountain <b>hiking</b> bag | Durable <b>trail</b>-ready <b>backpack for</b> mountain hikes    |
|   10 |  4200 | compact <b>hiking backpack</b> | Light and foldable <b>backpack for trail</b> hikes           |
+------+-------+-----------------------------------------------------------------------------------------------+

注意:highlight() 返回标记(例如,<b>...</b>)。

这种强大的组合通过类别(outdoors)进行过滤,通过嵌入确保语义相关性,要求文本级关键字匹配,并突出显示匹配的术语——所有操作仅需一个查询!

完整的 HTTP/JSON API 支持

自动嵌入与 Manticore 的 HTTP/JSON API 无缝工作,提供与 SQL 相同的功能,但通过 REST 端点。

通过 JSON 插入数据(HTTP/JSON API 示例)

使用 /insert 端点 - 嵌入会自动生成:

curl "http://localhost:9308/insert" -H "Content-Type: application/json" \
  -d '{
    "table": "products", 
    "id": 21, 
    "doc": {
      "title": "wireless headphones", 
      "description": "Bluetooth headphones with noise cancellation", 
      "category": "electronics", 
      "price": 15900
    }
  }'

响应:

{
  "table": "products",
  "id": 21,
  "created": true,
  "result": "created",
  "status": 201
}

使用自动嵌入的大规模插入(HTTP/JSON API 示例)

使用 /bulk 高效插入多个文档:

curl "http://localhost:9308/bulk" -H "Content-Type: application/x-ndjson" \
  --data-raw $'{"insert": {"table": "products", "id": 22, "doc": {"title": "gaming laptop", "description": "High-performance laptop for gaming and work", "category": "electronics", "price": 159900}}}
{"insert": {"table": "products", "id": 23, "doc": {"title": "smartphone", "description": "Latest flagship smartphone with 5G", "category": "electronics", "price": 89900}}}
{"insert": {"table": "products", "id": 24, "doc": {"title": "tablet computer", "description": "Lightweight tablet for work and entertainment", "category": "electronics", "price": 49900}}}'

响应:

{
  "items": [
    {
      "bulk": {
        "table": "products",
        "_id": 24,
        "created": 3,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    }
  ],
  "current_line": 3,
  "skipped_lines": 0,
  "errors": false,
  "error": ""
}

批量操作成功插入了 3 个文档,具有自动生成的嵌入。

通过 JSON 进行语义搜索(HTTP/JSON API 示例)

使用自然语言查询进行搜索,调用 /search

curl "http://localhost:9308/search" -H "Content-Type: application/json" \
  -d '{
    "table": "products",
    "_source": ["title"],
    "size": 5,
    "knn": {
      "field": "vector",
      "query": "outdoor hiking adventure",
      "k": 3
    }
  }'

响应:

{
  "took": 8,
  "timed_out": false,
  "hits": {
    "total": 24,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 18,
        "_score": 1,
        "_knn_dist": 0.75467718,
        "_source": {
          "title": "children's adventure book"
        }
      },
      {
        "_id": 1,
        "_score": 1,
        "_knn_dist": 0.83226496,
        "_source": {
          "title": "green hiking backpack"
        }
      },
      {
        "_id": 5,
        "_score": 1,
        "_knn_dist": 0.89348459,
        "_source": {
          "title": "mountain hiking bag"
        }
      },
      {
        "_id": 10,
        "_score": 1,
        "_knn_dist": 0.92611158,
        "_source": {
          "title": "compact hiking backpack"
        }
      },
      {
        "_id": 3,
        "_score": 1,
        "_knn_dist": 0.98721427,
        "_source": {
          "title": "travel daypack"
        }
      }
    ]
  }
}

查询“户外徒步探险”找到最相关的匹配项是“儿童冒险书”(0.754 距离),其后为与徒步相关的背包。这表明语义搜索可以找到概念上相关的项目,而不仅仅是字面上的关键字匹配。

通过 JSON 进行过滤和混合搜索(HTTP/JSON API 示例)

将语义搜索与传统过滤结合:

curl "http://localhost:9308/search" -H "Content-Type: application/json" \
  -d '{
    "table": "products",
    "_source": ["title", "price"],
    "size": 5,
    "knn": {
      "field": "vector", 
      "query": "technology electronic device",
      "k": 5,
      "filter": {
        "range": {"price": {"gte": 15000}}
      }
    }
  }'

响应:

{
  "took": 10,
  "timed_out": false,
  "hits": {
    "total": 5,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 24,
        "_score": 1,
        "_knn_dist": 1.31113040,
        "_source": {
          "title": "tablet computer",
          "price": 49900
        }
      },
      {
        "_id": 23,
        "_score": 1,
        "_knn_dist": 1.56920886,
        "_source": {
          "title": "smartphone",
          "price": 89900
        }
      },
      {
        "_id": 22,
        "_score": 1,
        "_knn_dist": 1.59042466,
        "_source": {
          "title": "gaming laptop",
          "price": 159900
        }
      },
      {
        "_id": 16,
        "_score": 1,
        "_knn_dist": 1.84979212,
        "_source": {
          "title": "office chair",
          "price": 27900
        }
      },
      {
        "_id": 21,
        "_score": 1,
        "_knn_dist": 1.88567829,
        "_source": {
          "title": "wireless headphones",
          "price": 15900
        }
      }
    ]
  }
}

对“科技电子设备”的搜索以及价格过滤(≥$150)正确优先考虑了电子产品,并排除了像徒步背包和小型电子产品等低价产品。请注意,由于与查询的强语义匹配,“平板电脑”排名最高。

直接向量与自动嵌入文本查询

HTTP/JSON API 支持两者:

  • 自动嵌入文本查询"query": "户外徒步探险"(自动嵌入)
  • 直接向量查询"query": [0.1, 0.2, 0.3, ...](预计算向量)

这种灵活性允许您在同一应用程序中混合自动生成的嵌入和自定义向量。

OpenAI 集成(OpenAI API 示例)

为了更好地理解语义,您可以使用 OpenAI 的嵌入模型:

-- Create table with OpenAI embeddings
CREATE TABLE products_openai (
  title TEXT,
  description TEXT,
  category string,
  price INT,
  vector FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='openai/text-embedding-ada-002'
    FROM='title, description'
    API_KEY='your-openai-api-key'
);

-- Insert data (embeddings generated via OpenAI API)
INSERT INTO products_openai(title, description, category, price) VALUES
  ('smartphone device', 'latest mobile technology with advanced features', 'electronics', 79900),
  ('laptop computer', 'portable workstation for developers and professionals', 'electronics', 129900);

-- Search with natural language
SELECT id, title, description, knn_dist()
FROM products_openai 
WHERE knn(vector, 2, 'mobile phone technology');

结果:

+---------------------+-------------------+-------------------------------------------------------+------------+
| id                  | title             | description                                           | knn_dist() |
+---------------------+-------------------+-------------------------------------------------------+------------+
| 2309215617435041807 | smartphone device | latest mobile technology with advanced features       | 0.20333229 |
| 2309215617435041808 | laptop computer   | portable workstation for developers and professionals | 0.40197325 |
+---------------------+-------------------+-------------------------------------------------------+------------+

OpenAI 的模型在理解细微关系方面表现出色——“移动电话技术”正确识别出智能手机比笔记本电脑相关性更高。

为生产而建

  • 快速:HNSW 索引、可选量化、优化存储
  • 🛡️ 可靠:多个模型提供者、空向量处理
  • 🔧 灵活:从您选择的任何字段嵌入

用例

自动嵌入使构建变得简单:

  • 🛍️ 电子商务搜索: “防水徒步靴” → 找到相关产品
  • 📚 文档发现: “有关数据隐私的合同” → 发现法律文档
  • 🎵 内容推荐: “适合锻炼的快节奏音乐” → 按氛围匹配
  • 🏠 房地产搜索: “靠近公园的舒适公寓” → 找到适合生活方式的家

更多现实世界示例

让我们看看自动嵌入在不同搜索场景中的应用:

寻找工作和生产力物品

SELECT id, title, description, price, knn_dist()
FROM products 
WHERE knn(vector, 3, 'work productivity office')
LIMIT 3;

结果:

+------+----------------------+----------------------------------------------------------+-------+------------+
| id   | title                | description                                              | price | knn_dist() |
+------+----------------------+----------------------------------------------------------+-------+------------+
|   24 | tablet computer      | Lightweight tablet for work and entertainment            | 49900 |   1.306459 |
|   16 | office chair         | Ergonomic office chair with lumbar support and mesh back | 27900 | 1.44871426 |
|   17 | notebook and pen set | Elegant A5 notebook with smooth-writing pen              |  1200 | 1.48466742 |
+------+----------------------+----------------------------------------------------------+-------+------------+

搜索理解了“工作生产力办公室”,并返回了办公家具、文具和工作合适的装备。

智能类别过滤

有时候语义搜索是 广泛的。让我们搜索“户外露营用 USB 充电器”:

SELECT id, title, description, price, knn_dist()
FROM products 
WHERE knn(vector, 5, 'usb charger for outdoor camping');

顶级结果包括许多项目: solar charger (0.888), outdoor packs (1.139), hiking gear (1.213), 等等。

但当我们增加类别过滤时:

SELECT id, highlight()
FROM products 
WHERE knn(vector, 5, 'usb charger for outdoor camping')
  AND category = 'electronics'
  AND MATCH('"usb charger for outdoor camping"/0.5')
LIMIT 3;

精确结果:

+------+-------------------------------------------------------------------------------------------------------+
| id   | highlight()                                                                                           |
+------+-------------------------------------------------------------------------------------------------------+
|   11 | portable solar <b>charger</b> | Foldable solar panel <b>charger for</b> phones and <b>USB</b> devices |
+------+-------------------------------------------------------------------------------------------------------+

注意:highlight() 返回标记 (例如,<b>...</b>)。表中的粗体字为了可读性。

语义理解 + 类别过滤 + 关键字匹配的结合正好给了我们想要的结果!

寻找有趣和创意的项目

SELECT id, title, description, price, knn_dist()
FROM products 
WHERE knn(vector, 3, 'fun creative play toys')
LIMIT 3;

结果:

+------+---------------------------+----------------------------------------------------+-------+------------+
| id   | title                     | description                                        | price | knn_dist() |
+------+---------------------------+----------------------------------------------------+-------+------------+
|    8 | camping gear set          | Complete set for weekend camping adventures        | 12000 | 1.30462146 |
|   20 | wooden puzzle box         | Challenging mechanical puzzle made of natural wood |  1899 |   1.305056 |
|   18 | children's adventure book | Illustrated storybook about outdoor exploration    |  1299 | 1.47192979 |
+------+---------------------------+----------------------------------------------------+-------+------------+

Auto Embeddings 理解了“有趣的创意玩耍”的概念,找到了冒险装备、拼图和儿童书籍——所有与创造力和玩耍相关的项目!

幕后花絮

Auto Embeddings 依赖于:

  • 句子变换器 用于语义理解
  • HNSW 用于快速相似性搜索
  • 智能缓存 用于高效推断
  • 多提供商 API 用于灵活性

今天就试试

正如您从我们的示例中看到的,Auto Embeddings 提供强大的语义搜索功能,设置简单。无论您是在构建:

  • 电子商务平台 具有自然语言产品搜索
  • 内容管理系统 具有智能文档发现
  • 推荐引擎 理解用户意图
  • 知识库 具备语义问答功能

Auto Embeddings 移除了最难的部分——管理嵌入——以便您可以专注于构建用户喜爱的卓越功能。

🚀 准备好转变您的搜索体验了吗?

👉 下载 Manticore Search 并今天就开始使用 Auto Embeddings 构建。
📚 查看 KNN 搜索文档 以获取详细指南。
💬 加入我们的 Slack 社区 分享您的成功故事。


有问题或反馈?加入我们的 社区论坛 或在 Twitter 上关注我们。

安装Manticore Search

安装Manticore Search