A Glimpse into ClauseBase
ClauseBase was founded by three former lawyers frustrated with the tedious contract drafting process. They realized that legal teams and law firms tend to recreate contracts in an extremely labor-intensive way, which drives up the cost of legal services. Typically, the starting point for drafting is an old file — cleaned up by removing old names — followed by a lengthy process of adding, editing, and rearranging clauses. This copy-pasting of clauses from older contracts wastes a lot of time, especially since relevant clauses are often scattered across old files and email chains.
To address this, ClauseBase created Clause9, a tool for automating full legal documents. Instead of spending hours on manual drafting, lawyers can now produce sophisticated documents with just a few clicks. ClauseBuddy, their second product, allows users to build a “clause library” to collect and reuse useful contract elements. Originally, these clauses had to be uploaded manually. But recently, ClauseBase added a feature that automatically extracts clauses from old contracts — organizing them into a library stored with Manticore Search.
Vector Search for Legal Clauses
ClauseBase wanted a way for lawyers to easily find alternate versions of contract clauses, whether for drafting or during negotiations. For any given clause, there are countless variations: different lengths, different tones (neutral, aggressive, or industry-specific), and so on. Lawyers needed a way to quickly browse these variations to avoid reinventing the wheel and find inspiration.
While having a manually curated clause library with detailed metadata is the ideal, creating one takes time. Many lawyers are just beginning their journey with technology and are looking for an easier way to find similar clauses. Vector search provides that solution by storing a text vector for each clause in Manticore, allowing lawyers to quickly search for clauses with similar meaning—without needing manual tagging. It’s the next best thing to having perfectly annotated clauses.
Why Manticore Search?
Initially, ClauseBase stored its extracted clauses in PostgreSQL. While PostgreSQL was capable to a certain point, it had significant limitations in full-text searches. Features like autocomplete, faceted search, and phrase proximity search were difficult to implement. PostgreSQL also lacked advanced ranking features like BM25, which is crucial for legal texts, where common words like “obligation,” “party,” and “liability” need to be weighted appropriately to improve search results.
On top of this, performance issues began to arise. ClauseBuddy offers users the ability to extract their own documents, and also provides a public sample database with millions of clauses sourced from the US EDGAR library. The PostgreSQL and Pgvector combination became noticeably slow with this volume of data, taking several seconds to return results. It was at this point that ClauseBase decided to switch to Manticore Search.
Having previously used Manticore for searching through entire legal documents, ClauseBase already knew it to be fast and feature-rich. So, they migrated the clause library to Manticore, and it paid off — users can now seamlessly jump between a clause and the original document.
Working Together on Vector Search
When ClauseBase decided they needed vector search, it wasn’t yet available in Manticore. They reached out to the Manticore team, and together they collaborated to implement the vector search functionality. Just a few weeks later, ClauseBase was able to start experimenting with Manticore’s new vector search functionality. Later on, when ClauseBase encountered issues, the Manticore team promptly addressed and fixed them, ensuring a smooth experience.
The Impact of Vector Search
The integration of vector search in ClauseBase has been in place for almost a year, and the results are very promising. ClauseBase aimed to find a solution that would avoid relying on commercial vector services, which often have slower response times. After conducting extensive experiments, they selected models that balanced quality and performance, specifically tailored to their use case.
In addition to using vector search within Manticore Search, ClauseBase introduced a reranking process to further refine the search results. The reranker processes the top matches from the initial vector search, effectively “thinking more deeply” about a smaller set of potential answers. This approach has significantly improved the relevance of search outcomes, providing users with results that are not only fast but also contextually accurate and highly useful.