blog-post

Introducing Auto Embeddings: AI-Powered Search Made Simple

We’re excited to share a new feature that makes building semantic search apps as simple as writing SQL: Auto Embeddings.
With this addition, Manticore Search takes care of embedding generation for you—no extra pipelines, no external services, no hassle.

The Challenge Before

Until now, semantic search often meant wrestling with:

  • Setting up separate ML pipelines for embedding generation
  • Managing models and their dependencies
  • Syncing your app, embedding service, and search engine
  • Handling vector dimension mismatches and preprocessing
  • Making sure embeddings are always generated the same way

That overhead is now gone.

What Are Auto Embeddings?

With Auto Embeddings, you just insert text. Manticore automatically:

Generates embeddings with state-of-the-art models
Stores them efficiently in vector indexes
Lets you query in natural language
Hides the complexity so you can focus on features, not infrastructure

How It Works

Build a semantic search app in 3 steps:

1. Create a Table (SQL Example)

CREATE TABLE products (
    title TEXT,
    description TEXT,
    category STRING,
    price INT,
    vector FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
        MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
        FROM='title,description'
);

Configured in one line: Manticore generates embeddings from title and description.

2. Insert Data (SQL Example)

INSERT INTO products(id, title, description, category, price) VALUES
  (1, 'green hiking backpack', 'Lightweight backpack suitable for hiking trails', 'outdoors', 5999),
  (2, 'laptop sleeve', 'Slim padded case for 15-inch laptops', 'electronics', 1999),
  (3, 'travel daypack', 'Compact daypack perfect for light travel or hiking', 'luggage', 3999),
  (4, 'black laptop backpack', 'Spacious backpack with padded laptop compartment', 'electronics', 6900),
  (5, 'mountain hiking bag', 'Durable trail-ready backpack for mountain hikes', 'outdoors', 8950),
  (6, 'everyday backpack', 'Versatile backpack for work, gym and school', 'general', 4900),
  (7, 'trail running shoes', 'Lightweight shoes with great grip for trails', 'footwear', 7500),
  (8, 'camping gear set', 'Complete set for weekend camping adventures', 'outdoors', 12000),
  (9, 'outdoor laptop pack', 'Trail-optimized backpack with laptop sleeve', 'outdoors', 7800),
  (10, 'compact hiking backpack', 'Light and foldable backpack for trail hikes', 'outdoors', 4200),
  (11, 'portable solar charger', 'Foldable solar panel charger for phones and USB devices', 'electronics', 3400),
  (12, 'reusable water bottle', 'Insulated stainless steel bottle keeps drinks cold or hot', 'lifestyle', 2500),
  (13, 'noise-cancelling headphones', 'Over-ear headphones with noise cancellation', 'electronics', 13900),
  (14, 'organic trail mix', 'Healthy mix of nuts and dried fruit, ideal for hikes', 'food', 899),
  (15, 'wireless mouse', 'Compact wireless mouse for laptops and desktops', 'electronics', 1599),
  (16, 'office chair', 'Ergonomic office chair with lumbar support and mesh back', 'furniture', 27900),
  (17, 'notebook and pen set', 'Elegant A5 notebook with smooth-writing pen', 'stationery', 1200),
  (18, 'children\'s adventure book', 'Illustrated storybook about outdoor exploration', 'books', 1299),
  (19, 'mini drone', 'Lightweight drone with HD camera and remote control', 'gadgets', 4599),
  (20, 'wooden puzzle box', 'Challenging mechanical puzzle made of natural wood', 'toys', 1899);

This diverse dataset spans outdoors, electronics, furniture, books, toys, and more. Notice: no vectors needed. All embeddings are generated automatically from the text.

Note: Prices are in cents (e.g., 5999 = $59.99).

3. Search with Natural Language (SQL Example)

SELECT id, title, description, price, knn_dist()
FROM products 
WHERE knn(vector, 5, 'lightweight laptop backpack for trail hiking')
LIMIT 5;

Results:

+------+-------------------------+--------------------------------------------------+-------+------------+
| id   | title                   | description                                      | price | knn_dist() |
+------+-------------------------+--------------------------------------------------+-------+------------+
|    9 | outdoor laptop pack     | Trail-optimized backpack with laptop sleeve      |  7800 | 0.35392243 |
|    1 | green hiking backpack   | Lightweight backpack suitable for hiking trails  |  5999 | 0.53113687 |
|    5 | mountain hiking bag     | Durable trail-ready backpack for mountain hikes  |  8950 | 0.62034285 |
|    4 | black laptop backpack   | Spacious backpack with padded laptop compartment |  6900 | 0.65785009 |
|   10 | compact hiking backpack | Light and foldable backpack for trail hikes      |  4200 | 0.68591022 |
+------+-------------------------+--------------------------------------------------+-------+------------+

The query “lightweight laptop backpack for trail hiking” found the most relevant item first: the “outdoor laptop pack” which combines both laptop and trail features, followed by hiking backpacks and laptop-oriented products.

Pick the Right Model

You can choose different models depending on your needs:

  • 🏠 Local (Hugging Face models) — no API keys, unlimited use
  • 🌐 OpenAI models — best-in-class semantic quality
  • 🚀 Voyage & Jina models — domain- and language-optimized

Hybrid Search & Filtering (SQL Example)

Combine semantic, keyword, and structured filters in one query:

SELECT id, price, highlight()
FROM products
WHERE knn(vector, 7, 'lightweight laptop backpack for trail hiking')
  AND category = 'outdoors'
  AND MATCH('"lightweight laptop backpack for trail hiking"/0.5');

Results:

+------+-------+-----------------------------------------------------------------------------------------------+
| id   | price | highlight()                                                                                   |
+------+-------+-----------------------------------------------------------------------------------------------+
|    9 |  7800 | outdoor <b>laptop</b> pack | <b>Trail</b>-optimized <b>backpack</b> with <b>laptop</b> sleeve |
|    1 |  5999 | green <b>hiking backpack</b> | <b>Lightweight backpack</b> suitable <b>for hiking</b> trails  |
|    5 |  8950 | mountain <b>hiking</b> bag | Durable <b>trail</b>-ready <b>backpack for</b> mountain hikes    |
|   10 |  4200 | compact <b>hiking backpack</b> | Light and foldable <b>backpack for trail</b> hikes           |
+------+-------+-----------------------------------------------------------------------------------------------+

Note: highlight() returns markup (e.g., <b>...</b>).

This powerful combination filters by category (outdoors), ensures semantic relevance through embeddings, requires text-level keyword matches, and highlights the matching terms — all in one query!

Complete HTTP/JSON API Support

Auto Embeddings work seamlessly with Manticore’s HTTP/JSON API, providing the same functionality as SQL but through REST endpoints.

Inserting Data via JSON (HTTP/JSON API Example)

Use the /insert endpoint - embeddings are generated automatically:

curl "http://localhost:9308/insert" -H "Content-Type: application/json" \
  -d '{
    "table": "products", 
    "id": 21, 
    "doc": {
      "title": "wireless headphones", 
      "description": "Bluetooth headphones with noise cancellation", 
      "category": "electronics", 
      "price": 15900
    }
  }'

Response:

{
  "table": "products",
  "id": 21,
  "created": true,
  "result": "created",
  "status": 201
}

Bulk Inserts with Auto Embeddings (HTTP/JSON API Example)

Insert multiple documents efficiently using /bulk:

curl "http://localhost:9308/bulk" -H "Content-Type: application/x-ndjson" \
  --data-raw $'{"insert": {"table": "products", "id": 22, "doc": {"title": "gaming laptop", "description": "High-performance laptop for gaming and work", "category": "electronics", "price": 159900}}}
{"insert": {"table": "products", "id": 23, "doc": {"title": "smartphone", "description": "Latest flagship smartphone with 5G", "category": "electronics", "price": 89900}}}
{"insert": {"table": "products", "id": 24, "doc": {"title": "tablet computer", "description": "Lightweight tablet for work and entertainment", "category": "electronics", "price": 49900}}}'

Response:

{
  "items": [
    {
      "bulk": {
        "table": "products",
        "_id": 24,
        "created": 3,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    }
  ],
  "current_line": 3,
  "skipped_lines": 0,
  "errors": false,
  "error": ""
}

The bulk operation successfully inserted 3 documents with auto-generated embeddings.

Semantic Search via JSON (HTTP/JSON API Example)

Search with natural language queries using /search:

curl "http://localhost:9308/search" -H "Content-Type: application/json" \
  -d '{
    "table": "products",
    "_source": ["title"],
    "size": 5,
    "knn": {
      "field": "vector",
      "query": "outdoor hiking adventure",
      "k": 3
    }
  }'

Response:

{
  "took": 8,
  "timed_out": false,
  "hits": {
    "total": 24,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 18,
        "_score": 1,
        "_knn_dist": 0.75467718,
        "_source": {
          "title": "children's adventure book"
        }
      },
      {
        "_id": 1,
        "_score": 1,
        "_knn_dist": 0.83226496,
        "_source": {
          "title": "green hiking backpack"
        }
      },
      {
        "_id": 5,
        "_score": 1,
        "_knn_dist": 0.89348459,
        "_source": {
          "title": "mountain hiking bag"
        }
      },
      {
        "_id": 10,
        "_score": 1,
        "_knn_dist": 0.92611158,
        "_source": {
          "title": "compact hiking backpack"
        }
      },
      {
        "_id": 3,
        "_score": 1,
        "_knn_dist": 0.98721427,
        "_source": {
          "title": "travel daypack"
        }
      }
    ]
  }
}

The query “outdoor hiking adventure” found the most relevant match to be the “children’s adventure book” (0.754 distance), followed by hiking-related backpacks. This shows how semantic search can find conceptually related items beyond just literal keyword matches.

Filtering and Hybrid Search via JSON (HTTP/JSON API Example)

Combine semantic search with traditional filters:

curl "http://localhost:9308/search" -H "Content-Type: application/json" \
  -d '{
    "table": "products",
    "_source": ["title", "price"],
    "size": 5,
    "knn": {
      "field": "vector", 
      "query": "technology electronic device",
      "k": 5,
      "filter": {
        "range": {"price": {"gte": 15000}}
      }
    }
  }'

Response:

{
  "took": 10,
  "timed_out": false,
  "hits": {
    "total": 5,
    "total_relation": "eq",
    "hits": [
      {
        "_id": 24,
        "_score": 1,
        "_knn_dist": 1.31113040,
        "_source": {
          "title": "tablet computer",
          "price": 49900
        }
      },
      {
        "_id": 23,
        "_score": 1,
        "_knn_dist": 1.56920886,
        "_source": {
          "title": "smartphone",
          "price": 89900
        }
      },
      {
        "_id": 22,
        "_score": 1,
        "_knn_dist": 1.59042466,
        "_source": {
          "title": "gaming laptop",
          "price": 159900
        }
      },
      {
        "_id": 16,
        "_score": 1,
        "_knn_dist": 1.84979212,
        "_source": {
          "title": "office chair",
          "price": 27900
        }
      },
      {
        "_id": 21,
        "_score": 1,
        "_knn_dist": 1.88567829,
        "_source": {
          "title": "wireless headphones",
          "price": 15900
        }
      }
    ]
  }
}

The search for “technology electronic device” with price filtering (≥$150) correctly prioritized electronics items and excluded lower-priced products like hiking backpacks and smaller electronics. Notice how “tablet computer” ranks highest due to its strong semantic match to the query.

Direct Vector vs Auto-Embedded Text Queries

The HTTP/JSON API supports both:

  • Auto-embedded text queries: "query": "outdoor hiking adventure" (auto-embedded)
  • Direct vector queries: "query": [0.1, 0.2, 0.3, ...] (pre-computed vector)

This flexibility allows you to mix auto-generated embeddings with custom vectors in the same application.

OpenAI Integration (OpenAI API Example)

For even better semantic understanding, you can use OpenAI’s embedding models:

-- Create table with OpenAI embeddings
CREATE TABLE products_openai (
  title TEXT,
  description TEXT,
  category string,
  price INT,
  vector FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='openai/text-embedding-ada-002'
    FROM='title, description'
    API_KEY='your-openai-api-key'
);

-- Insert data (embeddings generated via OpenAI API)
INSERT INTO products_openai(title, description, category, price) VALUES
  ('smartphone device', 'latest mobile technology with advanced features', 'electronics', 79900),
  ('laptop computer', 'portable workstation for developers and professionals', 'electronics', 129900);

-- Search with natural language
SELECT id, title, description, knn_dist()
FROM products_openai 
WHERE knn(vector, 2, 'mobile phone technology');

Results:

+---------------------+-------------------+-------------------------------------------------------+------------+
| id                  | title             | description                                           | knn_dist() |
+---------------------+-------------------+-------------------------------------------------------+------------+
| 2309215617435041807 | smartphone device | latest mobile technology with advanced features       | 0.20333229 |
| 2309215617435041808 | laptop computer   | portable workstation for developers and professionals | 0.40197325 |
+---------------------+-------------------+-------------------------------------------------------+------------+

OpenAI’s models excel at understanding nuanced relationships — “mobile phone technology” correctly identified the smartphone as much more relevant than the laptop.

Built for Production

  • Fast: HNSW indexing, optional quantization, optimized storage
  • 🛡️ Reliable: multiple model providers, empty-vector handling
  • 🔧 Flexible: embed from any field(s) you choose

Use Cases

Auto Embeddings make it easy to build:

  • 🛍️ E-commerce search: “waterproof hiking boots” → finds relevant products
  • 📚 Document discovery: “contracts about data privacy” → surfaces legal docs
  • 🎵 Content recommendations: “upbeat music for workouts” → matches by vibe
  • 🏠 Real estate search: “cozy apartments near parks” → finds lifestyle-fit homes

More Real-World Examples

Let’s see Auto Embeddings in action with different search scenarios:

Finding Work & Productivity Items

SELECT id, title, description, price, knn_dist()
FROM products 
WHERE knn(vector, 3, 'work productivity office')
LIMIT 3;

Results:

+------+----------------------+----------------------------------------------------------+-------+------------+
| id   | title                | description                                              | price | knn_dist() |
+------+----------------------+----------------------------------------------------------+-------+------------+
|   24 | tablet computer      | Lightweight tablet for work and entertainment            | 49900 |   1.306459 |
|   16 | office chair         | Ergonomic office chair with lumbar support and mesh back | 27900 | 1.44871426 |
|   17 | notebook and pen set | Elegant A5 notebook with smooth-writing pen              |  1200 | 1.48466742 |
+------+----------------------+----------------------------------------------------------+-------+------------+

The search understood “work productivity office” and returned office furniture, stationery, and work-appropriate gear.

Smart Category Filtering

Sometimes semantic search is too broad. Let’s search for “usb charger for outdoor camping”:

SELECT id, title, description, price, knn_dist()
FROM products 
WHERE knn(vector, 5, 'usb charger for outdoor camping');

Top results include many items: solar charger (0.888), outdoor packs (1.139), hiking gear (1.213), etc.

But when we add category filtering:

SELECT id, highlight()
FROM products 
WHERE knn(vector, 5, 'usb charger for outdoor camping')
  AND category = 'electronics'
  AND MATCH('"usb charger for outdoor camping"/0.5')
LIMIT 3;

Precise result:

+------+-------------------------------------------------------------------------------------------------------+
| id   | highlight()                                                                                           |
+------+-------------------------------------------------------------------------------------------------------+
|   11 | portable solar <b>charger</b> | Foldable solar panel <b>charger for</b> phones and <b>USB</b> devices |
+------+-------------------------------------------------------------------------------------------------------+

Note: highlight() returns markup (e.g., <b>...</b>). Bold in the table is for readability.

The combination of semantic understanding + category filtering + keyword matching gave us exactly what we wanted!

Finding Fun & Creative Items

SELECT id, title, description, price, knn_dist()
FROM products 
WHERE knn(vector, 3, 'fun creative play toys')
LIMIT 3;

Results:

+------+---------------------------+----------------------------------------------------+-------+------------+
| id   | title                     | description                                        | price | knn_dist() |
+------+---------------------------+----------------------------------------------------+-------+------------+
|    8 | camping gear set          | Complete set for weekend camping adventures        | 12000 | 1.30462146 |
|   20 | wooden puzzle box         | Challenging mechanical puzzle made of natural wood |  1899 |   1.305056 |
|   18 | children's adventure book | Illustrated storybook about outdoor exploration    |  1299 | 1.47192979 |
+------+---------------------------+----------------------------------------------------+-------+------------+

Auto Embeddings understood the concept of “fun creative play” and found adventure gear, puzzles, and children’s books—all items that relate to creativity and play!

Behind the Scenes

Auto Embeddings rely on:

  • Sentence Transformers for semantic understanding
  • HNSW for fast similarity search
  • Smart caching for efficient inference
  • Multi-provider APIs for flexibility

Try It Today

As you’ve seen from our examples, Auto Embeddings deliver powerful semantic search capabilities with minimal setup. Whether you’re building:

  • E-commerce platforms with natural language product search
  • Content management systems with intelligent document discovery
  • Recommendation engines that understand user intent
  • Knowledge bases with semantic question answering

Auto Embeddings remove the hardest part — managing embeddings — so you can focus on building great features that users love.

🚀 Ready to transform your search experience?

👉 Download Manticore Search and start building with Auto Embeddings today.
📚 Check out the KNN search documentation for detailed guides.
💬 Join our Slack community to share your success stories.


Questions or feedback? Join our community forum or follow us on Twitter .

Install Manticore Search

Install Manticore Search