Manticore Search at Scale on Google Cloud: Locally's Story

Manticore Search at Scale on Google Cloud: Locally's Story

Published: Jan 20, 2026

TL;DR: 

- Locally runs Manticore Search on Google Cloud as a core part of its production infrastructure  
- Tens of thousands of queries per second across geo search, ecommerce search, aggregations, and vector search  
- Migrated from Elasticsearch after long-term challenges with complexity, cost, and indexing speed  
- Observed ~16× lower costs compared to equivalent Elasticsearch services  
- Leverages GCP Compute Engine, Managed Instance Groups, and HAProxy for horizontal scaling  
- Remains stable through peak seasonal traffic with minimal operational overhead  
- Recommended by Ben Hirsch, CTO & co-founder of Locally

Context

This article is based on direct input from Ben Hirsch, CTO and co-founder of Locally. It describes how the company runs Manticore Search in production on Google Cloud today, focusing on architectural decisions, GCP service integration, operational experience, and lessons that may be useful to other teams deploying Manticore on GCP.

A Founder’s Perspective on Search as Core Infrastructure

For Locally , search is core infrastructure. The platform powers location-aware ecommerce experiences for tens of thousands of retailers worldwide and sits directly in the critical path of many high-traffic brand and retailer websites. Traffic patterns are highly dynamic and strongly seasonal, which places strict requirements on performance, reliability, and cost control.

From Ben Hirsch’s perspective as both CTO and founder, search decisions are long-term infrastructure choices. They influence not only query speed, but also operating costs, engineering productivity, and the ability to scale without adding unnecessary complexity.

Locally had been running Elasticsearch on Google Cloud since its early days in 2013. Over time, the team found that the operational overhead, cost, and indexing performance at scale made it harder to evolve the platform efficiently. This led them to explore alternatives in early 2024, including Manticore Search.

What began as internal experimentation progressed to a production proof-of-concept and later to a broader migration, all within their existing Google Cloud infrastructure.

A High-Throughput, Multi-Purpose Search Platform

Today, Manticore Search is deeply integrated into Locally's production systems and supports multiple workloads.

“Locally processes tens of thousands of Manticore queries per second in production.”
— Ben Hirsch, CTO & Co-founder, Locally

Rather than serving a single purpose, Manticore functions as a general-purpose search layer. It supports aggregation-heavy queries, geo-based lookups, ecommerce search, and vector-powered typeahead.

As Hirsch explains:

“We use it in four distinct ways:
As a highly available index of common aggregations which are too expensive or slow to query out of our primary, relational database at runtime.
As a geo-spatial index of location and inventory data used in the critical path of thousands of high-traffic brand and retailer web sites.
To power an ecommerce search engine, providing a facet-rich, location-specific product availability UI for 50,000+ retailers internationally.
Using vector embeddings to power a domain-specific, natural language capable typeahead experience for our vast client and shopper user base.”

This consolidation reduced the need for multiple specialized systems and simplified how search-related features are built and maintained. Running Manticore on Google Cloud allowed the team to keep search infrastructure aligned with their existing operational tooling and cloud environment.

Why Locally Moved Away from Elasticsearch

Locally’s decision to migrate away from Elasticsearch was informed by more than a decade of production experience. While Elasticsearch met early needs, it became increasingly difficult to operate efficiently at Locally’s scale.

"However, our engineers always found it to be unapproachable and as a result we were never able to get sufficient momentum developing on it. Additionally Elastic was incredibly complicated and expensive to host and maintain and indexing data at our scale was too slow."

When the team began evaluating Manticore Search in early 2024, indexing performance stood out immediately, while query performance remained comparable.

"We began tinkering with Manticore in early 2024 and immediately realized how fast it was - comparable to Elastic at runtime and much faster indexing."

The cost impact became clear after deploying a proof-of-concept.

"We then deployed a proof-of-concept to production and noticed a 16x reduction in cost from equivalent Elasticsearch services."

At that point, Locally decided to migrate its remaining Elasticsearch workloads to Manticore Search.

Deployment on Google Cloud

All of Locally's production infrastructure runs on Google Cloud Platform, which made it the natural environment for deploying Manticore Search. This allows the team to leverage existing cloud infrastructure, monitoring tools, and operational practices.

The team initially evaluated running Manticore on Google Kubernetes Engine (GKE). Due to large index sizes and the nature of their workload, this approach proved impractical.

"We originally tried deploying Manticore to GKE but ultimately scrapped this approach due to the size of some of our indexes."

Locally instead adopted a Compute Engine–based deployment model built around virtual machines, Managed Instance Groups, and external load balancing. This approach proved better suited for their large-scale, stateful search workloads.

"Now we are using Compute instances and multiple instance groups with auto-scaling as well as HA Proxy nodes to distribute index and read loads."

The early rollout included some operational challenges, but the setup stabilized as the team refined its deployment and scaling strategy.

“We have now made it through two holiday seasons with minimal issues and virtually no downtime.”

Architecture at a Glance

In production, Locally's Manticore setup consists of:

Multiple purpose-driven Manticore clusters rather than a single monolithic cluster
Google Cloud Managed Instance Groups for automated scaling and node replacement
HAProxy nodes running on Compute Engine for distributing indexing and query load
Horizontal scaling leveraging auto-scaling capabilities combined with Manticore's built-in replication for redundancy
Custom Grafana dashboards integrated with monitoring for tracking performance and system health

This structure evolved over time as the team adapted the architecture to real traffic patterns and data growth, taking full advantage of Google Cloud's infrastructure automation features.

Performance, Stability, and Scaling

From a query performance perspective, Manticore has remained consistently fast under load.

“Query performance has been phenomenal and unwavering.”

In terms of stability, the main learning curve was understanding how the system behaves under real-world traffic patterns.

“In terms of stability, it took us some time to get adjusted to how Manticore consumes resources.”

As data volume and traffic increased, early architectural decisions were revisited. Initially, all indexes lived in a single cluster, which proved limiting as the platform grew. The monolithic approach caused new nodes to take a long time to join the cluster and created a single point of failure for many production services.

"However, over time we split the monolith into several, purpose-driven mini clusters (using GCP's instance groups). This allowed us to handle the growth of data and traffic with ease."

Combined with horizontal scaling capabilities and Manticore's built-in replication, this approach resulted in a more efficient and resilient system that can adapt to changing workloads.

Day-to-Day Operations

Today, operating Manticore Search is largely automated. Scaling, node replacement, and load distribution are handled through managed infrastructure services and internal tooling, reducing the need for constant manual intervention.

As Hirsch summarizes:

"The day-to-day operations with Manticore are now very hands-off and fully automated. This is thanks to GCP's Managed Instance Groups auto-scaling, HAProxy Load Balancers (with an in-house dynamic configuration tool we created to monitor the MIG changes), and custom Grafana dashboards for monitoring all of the involved service's health metrics."

This operational model, built on Google Cloud's automation capabilities, allows the team to focus more on product development and less on maintaining search infrastructure. The integration with monitoring, logging, and scaling services creates a cohesive operational experience.

Conclusion: A Deliberate Infrastructure Choice

For Locally, adopting Manticore Search on Google Cloud was not about chasing new technology. It was about choosing infrastructure that aligned with real production needs: high traffic, seasonal variability, cost sensitivity, and a desire for operational clarity—all within their existing cloud ecosystem.

The migration from Elasticsearch to Manticore Search delivered measurable results: 16× cost reduction, faster indexing, and comparable query performance. But perhaps more importantly, it simplified operations and gave the engineering team a system they could confidently develop on and scale, all while leveraging Google Cloud's robust infrastructure services.

From Ben Hirsch's perspective as both CTO and co-founder, the most important outcome has been confidence in how the system behaves under real-world conditions. After more than a decade operating Elasticsearch in production, the team now has a search infrastructure that is both powerful and approachable.

Today, Manticore Search supports multiple critical workloads across Locally's platform and continues to perform reliably through peak seasonal demand. The architecture has evolved from a single monolithic cluster to purpose-driven mini clusters, demonstrating that the combination of Manticore Search and Google Cloud can adapt as needs grow.

"I would absolutely recommend this setup to other teams, especially if they are seeking to migrate away from ElasticSearch in favor of a system that is faster, easier to scale up and down, and has a low barrier to entry."
— Ben Hirsch, CTO & Co-founder, Locally

Locally's experience illustrates a pragmatic approach to infrastructure: evaluate carefully, adapt architecture as the system grows, and invest in tools that remain dependable once they are in place. Their experience shows how a pragmatic search stack can evolve into long-term infrastructure when performance, cost, and operational clarity matter.