From SphinxSE to FEDERATED

In this article we discuss the differences between using SphinxSE and FEDERATED engine with Manticore Search

The SphinxSE MySQL plugin was made available since the first public release of Sphinx Search. The idea was to allow users to perform search queries using the same MySQL connection. Another benefit was the search result could be joined with another MySQL table for further filtering/sorting or to display data not available in the search result. Also at that time SphinxQL didn’t existed yet.

One issue with the SphinxSE is that the plugin is not included in all MySQL flavors. Currently only MariaDB includes SphinxSE in their bundled plugins list. Someone using MySQL or Percona variant would need to download the sources of MySQL and compile SphinxSE plugin, which is not an out-of-box experience.  Upgrading MySQL to newer versions might require recompiling again for that version, which made things even more difficult.  Maintenance of SphinxSE was also painful on development side: it required keeping it up-to-date with MySQL API changes and also, since it had own custom search syntax, maintaining yet another query parser.

Adding FEDERATED support

Recently we added in Manticore Search support for queries executed on a FEDERATED engine table. Since FEDERATED is included by default in all MySQL flavors, no special setup or compiling is needed. All you have to do is to create a FEDERATED table that links to Manticore SphinxQL and you can perform search queries.

SphinxSE requires to send the query in a special format, as a suite of parameters delimited by semicolon like

WHERE query='test;filter=cat_id,1,5,10;groupby=day:published_ts;fieldweights=title,10,abstract,3,content,1;'

This syntax never covered all features, for example ability to do GROUP N BY.

In FEDERATED, the query string is a full SphinxQL SELECT command:

WHERE query='SELECT * FROM test_index WHERE MATCH (\'test\') AND ANY(cat_id) IN (1,5,10 GROUP BY day,published_ts OPTION field_weights=(title=10,abstract=3,content=1)';

Besides having the full search features at hand, SphinxQL queries offers portability, making it easier to test separately (using a mysql client).

Limitations

The FEDERATED table connected to Manticore uses a similar schema to a SphinxSE table: there is a mandatory column for query string. To get attribute values from a search results, they require to have columns declared in the table.

Some may ask why it’s not possible to run directly the SphinxQL query instead of putting it in a string column. The biggest problem is that SphinxQL has custom clauses, like MATCH(), and MySQL parser would not recognize that. FEDERATED itself has some drawbacks, for example it ignores LIMIT clause.

Another limitation is regarding joining with MySQL table in order to perform additional filtering or sorting/grouping. The joining is done with the result set provided by the Manticore search query. If you need for example to perform an ordering involving a column from a MySQL table that needs to apply to the whole search, the entire result set should be brought from Manticore. This can be a problem on large results. It might be better to add the needed columns in Manticore and perform in Manticore the desired operation.

The Manticore index used in the  FEDERATED table definition must be an index with storage (plain or RealTime).

Text Highlighting

With FEDERATED it’s possible to perform snippeting on texts that are stored only in MySQL. The setup requires a secondary FEDERATED table that will be used for performing SELECTS with SNIPPET function. We’ll consider an example using wikipedia articles, where we want only to get back the article id and the title (which is not stored in our index):

First we declare the FEDERATED table, where WikiSearch will be used for making the actual search and WikiSnippet for the SELECT with snippeting:


CREATE TABLE WikiSearch
(
    id INTEGER UNSIGNED NOT NULL,
    query VARCHAR(3072) NOT NULL,
    INDEX(query)
) ENGINE=FEDERATED
DEFAULT CHARSET=utf8
CONNECTION='mysql://FEDERATED@127.0.0.1:9306/DB/wikipedia';


CREATE TABLE WikiSnippet
(
    id          INTEGER UNSIGNED NOT NULL,
    snippets    VARCHAR(3072) NOT NULL,
    query       VARCHAR(3072) NOT NULL,
    INDEX(query)
) ENGINE=FEDERATED
DEFAULT CHARSET=utf8
CONNECTION='mysql://FEDERATED@127.0.0.1:9306/DB/wikipedia';

We’re going to perform a search on the wikipedia index from Manticore and join the result with the wikipedia table (named wikipediaTable) from MySQL. The id and article from the joined select will be used to perform an individual SELECT for each article that will return back the snippet.

mysql> SELECT id, 
(SELECT snippets FROM WikiSnippet WHERE query=concat('SELECT SNIPPET(\'',a.title,'\',\'town\') AS snippets FROM wikipedia WHERE id=',a.id)) AS snippet
 FROM ( SELECT WikiSearch.id,wikipediaTable.title FROM WikiSearch JOIN wikipediaTable ON WikiSearch.id=content.id WHERE 
         query='SELECT id FROM wikipedia WHERE match(\'@title town\')') a;
+-----------+----------------------------------------------------------+
| id        | snippet                                                  |
+-----------+----------------------------------------------------------+
| 221741590 | New England <b>town</b>                                  |
| 225879601 | <b>Town</b> meeting                                      |
| 227372942 | Huddersfield <b>Town</b> F.C.                            |
| 227789260 | Cape <b>Town</b>                                         |
| 228005065 | Lincoln <b>Town</b> Car                                  |
| 228031371 | Swindon <b>Town</b> F.C.                                 |
| 228037958 | Newburgh (<b>town</b>), New York                         |
| 209273335 | Lewiston (<b>town</b>), New York                         |
| 218353936 | List of Cape <b>Town</b> suburbs                         |
| 218900544 | New <b>Town</b>, Prague                                  |
| 219113587 | History of Swindon <b>Town</b> F.C.                      |
| 221331452 | Chrysler <b>Town</b> and Country                         |
| 222331137 | List of Ipswich <b>Town</b> F.C. statistics and records  |
| 225807671 | East Hampton (<b>town</b>), New York                     |
| 226345846 | Old <b>Town</b>                                          |
| 226474919 | Sengkang New <b>Town</b>                                 |
| 227031418 | Camden <b>Town</b>                                       |
| 227043533 | List of <b>town</b> tramway systems in the United States |
| 227328024 | Ghost <b>town</b>                                        |
| 227337559 | <b>Town</b>                                              |
+-----------+----------------------------------------------------------+
20 rows in set (0.01 sec)

For more information, check out the FEDERATED support page in the official documentation.

Leave a Reply