Home  //  Users  //  Tatoeba

Tatoeba

About Tatoeba

Tatoeba is a large database of sentences and translations. Its content is ever-growing and results from the voluntary contributions of thousands of members.

Tatoeba provides a tool for you to see examples of how words are used in the context of a sentence. You specify words that interest you, and it returns sentences containing these words with their translations in the desired languages. The name Tatoeba (for example in Japanese) captures this concept.

The project was founded by Trang Ho in 2006, hosted on Sourceforge under the codename of multilangdict.

The challenge

We knew about Manticore since November 2017, but took us a while to effectively migrate. We were using Sphinx but lately was crashing quite often and as a result making our homepage completely broken ( #1767).

Why Manticore?

A long time ago (2010) we were using Lucene and decided to switch to Sphinx due to memory restrictions. Before switching to Manticore we had a quick look at other solutions, like Elastic Search, but rewriting all the search-related code would been a big effort. While Elastic has a lot of fancy stuff, our data if pretty “flat’ (sentences with metadata) and Manticore just fit in.

Outcome

From #1767: “we now only have some quick performance drops, instead of a continuous failure. In addition, it looks like the search daemon does not block any more when this happens, so the page will just be slow or failing for a few visitors.”
But search speed (and overall our website speed) seems to have improved.

We have between 220K-280K searches per month or 7.5-10K per day.

Trang Ho & Gilles Bedel

Tatoeba