TL;DR: Learn how Manticore Search enables text-to-image search by combining natural language processing with vector-based image retrieval. Explore different approaches, from traditional methods to advanced vector search, and try our text-to-image search demo to see it in action.
Introduction
Finding the right image by describing it in natural language — like “a sunset over the beach” — is no longer just a vision of the future. Text-to-image search makes this possible by seamlessly connecting language with visual content.
Traditional methods, such as keyword matching or manual tagging, have their limitations in accuracy and scalability. Modern approaches, like the one powered by Manticore Search, leverage semantic understanding and vector embeddings to provide more precise and context-aware results. Whether you’re creating an e-commerce platform, managing a media library, or building a creative tool, text-to-image search can simplify the way users interact with your content.
Want to see it in action? Try our text-to-image search demo.
Different Approaches to Text-to-Image Search
1. Traditional Keyword-Based Search
Keyword-based search is one of the simplest methods for implementing text-to-image search. To start, you need to set up the appropriate data structure:
CREATE TABLE images (
title text,
description text,
tags text,
path string stored
);
To make this work effectively, you’ll need to manually tag each image first. Once tagged, you can then search using queries like this:
SELECT * FROM images WHERE MATCH('@tags red dress');
Or you can also search by the title and description you’ve added to your images:
SELECT * FROM images WHERE MATCH('@(title,description) red dress');
This approach relies on matching text metadata with search terms. While simple, this approach requires manual tagging and has several limitations. It misses semantic relationships between terms, cannot understand broader context, and is restricted to exact matches rather than conceptual similarity.
2. Categorization-Based Search
To enhance search functionality for products, you can implement categorization. This approach allows you to filter images based on category selection while simultaneously applying full-text search to custom attributes. This dual-search method helps users find exactly what they need more efficiently.
You should organize your images by sorting them into distinct categories on your own. Here’s how the table structure might look:
CREATE TABLE images (
category_id int,
subcategory_id int,
attributes json,
path string stored
);
This method organizes images into predefined categories, providing a more structured approach than keyword-based search while being easier to navigate. However, it is limited by rigid categorization and requires manual classification of images into specific groups.
3. Using Manticore Search with Image Captions
There are many AI models capable of generating image captions. While these captions may not be polished enough for customer-facing content, they can be valuable for enhancing search functionality. This approach complements traditional keyword-based methods, offering an additional layer of search capabilities.
No matter how you organize your images - whether through categories or manual tagging - you’ll need to utilize full-text search operators to effectively find images by searching through those populated fields.
Let’s examine how the workflow might unfold if you choose to implement manual captioning:
- Insert your data with properly formatted captions:
INSERT INTO images (id, auto_caption, indexed_caption, path) VALUES (1, 'A person walking on a beach during sunset', 'beach sunset walking person', '/images/sunset_beach.jpg'), (2, 'Red car parked in an urban setting', 'car red urban parking', '/images/red_car.jpg');
- Perform basic text searches using the indexed_caption field:
SELECT * FROM images WHERE MATCH('@indexed_caption beach sunset');
- Use advanced search operators for more precise results:
-- Phrase matching SELECT * FROM images WHERE MATCH('"red car"'); -- Boolean operators SELECT * FROM images WHERE MATCH('@indexed_caption (beach | sunset) -night');
Key considerations for optimal search:
- Pre-process captions to remove stop words
- Use stemming for better matching
- Consider synonyms for common terms
- Implement relevance scoring for better results
You can also combine full-text search with additional filters:
SELECT path, WEIGHT() as relevance
FROM images
WHERE MATCH('@indexed_caption sunset')
AND category_id = 1
ORDER BY relevance DESC
LIMIT 10;
4. Vector-Based Search (Recommended Approach)
As we’ve explored, traditional methods like keyword-based or categorization-based search, along with enhanced options using captions, offer varying levels of efficiency and complexity. While these approaches can be valuable in specific contexts, they often fall short in handling nuanced or ambiguous queries.
To overcome these limitations, vector-based search with Manticore Search offers a powerful alternative. By using vector embeddings, this approach bridges the gap between text and images, enabling semantic understanding and precise matching. Let’s break down how it works and the steps to implement it.
How Vector-Based Search Works
Vector-based search relies on embeddings — numerical representations of data that encode semantic meaning. A multimodal model processes both text and image inputs to generate these embeddings, allowing the system to match queries and images based on meaning rather than literal keywords. The workflow includes:
- Generating embeddings for both images and text using a machine learning model.
- Storing these embeddings in a database optimized for similarity search.
- Querying the database to find the most relevant matches for a given input.
To help visualize this process, the following diagram illustrates how text and images are converted into embeddings and matched through Manticore Search:
Setting Up Vector-Based Search with Manticore
Here’s how you can implement vector-based search step by step:
- Set Up Your Multimodal Model
Use a model like TinyCLIP, a lightweight version of CLIP, to generate embeddings for images and text. Load the model and its processor:from transformers import CLIPProcessor, CLIPModel clip_model = CLIPModel.from_pretrained("wkcn/TinyCLIP-ViT-61M-32-Text-29M-LAION400M") clip_processor = CLIPProcessor.from_pretrained("wkcn/TinyCLIP-ViT-61M-32-Text-29M-LAION400M")
- Create the Database Structure
Set up a table in Manticore Search to store embeddings:CREATE TABLE images ( id bigint, image_path text, embeddings float_vector knn_type='hnsw' knn_dims='512' hnsw_similarity='COSINE' );
- Generate and Insert Embeddings
Process your images through the model to create embeddings and store them in the database. For guidance on implementing this step, refer to the load-dataset script on GitHub. While this script is not meant for direct use, it serves as a detailed reference for creating embeddings and inserting them into the database. - Query the Database
$embeddings = $Embed->getImageEmbeddings($image->getPath()); $query = new Manticoresearch\Query\KnnQuery('embeddings', $embeddings, 10); $docs = $client->index('image')->search($query)->get();
Advantages of Vector-Based Search
By implementing vector-based image search with Manticore, you gain several key benefits:
- Semantic Understanding: Matches queries and images based on their meaning rather than exact keywords.
- Visual Concept Matching: Accurately captures complex visual ideas and aligns them with textual descriptions.
- Natural Language Compatibility: Handles natural language queries with ease, enabling intuitive searches.
- Enhanced Accuracy: Delivers the most precise results, even for abstract or nuanced inputs.
- Scalability: Reduces manual effort and works seamlessly with large datasets.
Comparing Approaches
Here’s how different methods perform in real-world scenarios:
Approach | Description | Accuracy | Setup Complexity | Maintenance | Query Speed |
---|---|---|---|---|---|
Keywords-based | Matches user queries to manually added tags or metadata. Simple but lacks semantic understanding. | Low | Low | High | Fast |
Categorization-based | Organizes images into predefined categories and allows structured filtering. | Medium | Medium | High | Fast |
Using captions | Uses AI-generated or manually created captions for enhanced text-based search capabilities. | Medium | High | Medium | Fast |
Vector-based | Leverages embeddings for semantic search, connecting text and images meaningfully. | High | High | Low | Medium |
Try the Demo on GitHub
Curious to see how it all comes together? We’ve open-sourced our demo, so you can explore and even build your own implementation. Visit the GitHub repository for the full source code: Manticore Image Search Demo.
The repository includes detailed instructions and sample scripts to help you set up and customize your text-to-image search application. Whether you’re experimenting with the demo or planning a production-ready solution, this is a great starting point.
Conclusion
Vector-based text-to-image search using Manticore Search offers significant advantages over traditional approaches. It provides:
- Superior accuracy in matching user queries with relevant images
- No need for manual tagging or captioning
- Better understanding of semantic relationships
- Scalable performance for growing image collections
While other methods like keyword-based or categorization approaches can be useful for specific use cases, vector-based search represents the most advanced and efficient solution for modern text-to-image search requirements.
Try our demo at image.manticoresearch.com or dive into the source code on GitHub: Manticore Image Search Demo.