Faceted search

Faceted search is a crucial functionality for finding a needle in a haystack and better user search experience for all types of applications. In this tutorial we will figure out what is faceted search and how to make a simple one.

Faceted search is as essential function of a modern search application as autocomplete, spell correction and search keywords highlighting. Especially in E-commerce products.

It comes to the rescue when we deal with large amounts of data and various properties related to each other, whether it is size, color, manufacturer or something else. When querying large amounts of data search results often include large swaths of entries which does not fit user’s expectations. Faceted search allows an end-user to explicitly specify the dimensions that they want their search results to meet.

In simple terms, it gives the ability to make filtration of search results by properties of searched items.

Let’s suppose that the user wants to find some movie starring Nicolas Cage, so he types his name in the search bar. That will return search results with movies from different years, different directors and with different IMDB’s scores and ratings.

This is how it might look:

We can see that the user has 34 different movies with his favourite actor, but how can  he choose what to watch from such a big number of movies?

Here comes the Faceted search. We can let the user find the movie he wants to watch by giving him the ability to choose additional parameters such as “Year”, “Rating”, “Director’s name” and “IMDB score“ etc.

This is what the faceted search results will look like:

With the faceted search we make it possible to determine what exactly the user wants. By selecting the facets users get more relevant search results which makes the search better, more convenient, and useful in terms of user experience and business goals.

So let’s see how faceted search works and how you can make it yourself for your search application.

Implementing the movies faceted search

In Manticore Search there is an optimization that retains the result set of the original query and reuses it for each facet calculation. As the aggregations are applied to already calculated subset of documents they are fast and the total execution time can be in many cases only marginally bigger than the initial query. Facets can be added to any query and the facet can be any attribute or expression. A facet result contains the facet values along with the facet counts. Facets are available using SQL SELECT statement by declaring them at the very end of the query in the following format:

FACET {expr_list} [BY {expr_list}] [ORDER BY {expr | FACET()} {ASC | DESC}] [LIMIT [offset,] count]

Of course you can just send all the queries you want in a single multi-query, but if you’re looking for a more elegant solution there’s the FACET clause.

Now we’re going to run several queries with movie facets like:

  • Title_year
  • Content_rating
  • Director_name

And will use the interval to get values within the integer range for the IMDB score.

Let’s try some simple facets on the ‘movies’ index from the autocomplete course and perform simple facets on several attributes. For example, let’s select ‘robert de niro’ from the ‘Director name’ properties.

SELECT * FROM movies WHERE MATCH('robert de niro') LIMIT 10 FACET title_year FACET content_rating FACET director_name;

 

This gives multiple results set, where the first one is from our main query and the rest are the facets. Each facet result contains the attribute values and the group count.

+------------+----------+
| title_year | count(*) |
+------------+----------+
|       1987 |        1 |
|       1991 |        1 |
|       2005 |        1 |
|       1997 |        3 |
|       1974 |        1 |
|       2001 |        2 |
|       2002 |        2 |
|       1999 |        2 |
|       1985 |        1 |
|       1995 |        1 |
|       2016 |        2 |
|       2009 |        1 |
|       2004 |        4 |
|       1990 |        1 |
|       2013 |        3 |
|       2015 |        3 |
|       2011 |        2 |
|       2010 |        3 |
|       1996 |        3 |
|       1973 |        1 |
+------------+----------+
20 rows in set (0.10 sec)

+----------------+----------+
| content_rating | count(*) |
+----------------+----------+
| R              |       37 |
| PG-13          |       12 |
| PG             |        4 |
+----------------+----------+
3 rows in set (0.10 sec)

+----------------------+----------+
| director_name        | count(*) |
+----------------------+----------+
| Brian De Palma       |        1 |
| Martin Scorsese      |        7 |
| John Polson          |        1 |
| Quentin Tarantino    |        1 |
| Francis Ford Coppola |        1 |
| John Herzfeld        |        1 |
| Harold Ramis         |        2 |
| Terry Gilliam        |        1 |
| Michael Caton-Jones  |        1 |
| James Mangold        |        1 |
| Dan Mazer            |        1 |
| Kirk Jones           |        1 |
| Joel Schumacher      |        1 |
| Nick Hamm            |        1 |
| Peter Segal          |        1 |
| Jonathan Jakubowicz  |        1 |
| Scott Mann           |        1 |
| David O. Russell     |        2 |
| Gary McKendry        |        1 |
| Jon Turteltaub       |        1 |
+----------------------+----------+
20 rows in set (0.10 sec)

Sorting in FACETS

By default facets results are not sorted and are also limited to 20 rows. Each facet can have it’s own limit clause. This can be done like this:

SELECT * FROM movies WHERE MATCH('robert de niro') LIMIT 10 FACET title_year LIMIT 5 FACET content_rating LIMIT 1 FACET director_name LIMIT 100;

+------------+----------+
| title_year | count(*) |
+------------+----------+
|       1987 |        1 |
|       1991 |        1 |
|       2005 |        1 |
|       1997 |        3 |
|       1974 |        1 |
+------------+----------+
5 rows in set (0.19 sec)

+----------------+----------+
| content_rating | count(*) |
+----------------+----------+
| R              |       37 |
+----------------+----------+
1 row in set (0.19 sec)

+----------------------+----------+
| director_name        | count(*) |
+----------------------+----------+
| Brian De Palma       |        1 |
| Martin Scorsese      |        7 |
| John Polson          |        1 |
| Quentin Tarantino    |        1 |
| Francis Ford Coppola |        1 |
 .................................
| Tony Scott           |        1 |
| Nancy Meyers         |        1 |
| Frank Oz             |        1 |
| Neil Jordan          |        1 |
+----------------------+----------+
42 rows in set (0.19 sec)

 

As you can see, the results are not sorted by default. Each facet can have it’s own sorting rule, e.g. we can sort the groups by counts using COUNT(*):

SELECT * FROM movies WHERE MATCH('robert de niro') LIMIT 10 FACET title_year ORDER BY COUNT(*) DESC;
+------------+----------+
| title_year | count(*) |
+------------+----------+
|       2004 |        4 |
|       2013 |        3 |
|       2015 |        3 |
|       1997 |        3 |
|       2010 |        3 |
|       1996 |        3 |
|       2000 |        3 |
|       2001 |        2 |
|       2002 |        2 |
|       1999 |        2 |
|       2016 |        2 |
|       2011 |        2 |
|       2012 |        2 |
|       2008 |        2 |
|       1985 |        1 |
|       1991 |        1 |
|       1995 |        1 |
|       2009 |        1 |
|       1990 |        1 |
|       2005 |        1 |
+------------+----------+
20 rows in set (0.09 sec)

 

Or we can sort by an attribute, e.g. title_year:

SELECT * FROM movies WHERE MATCH('robert de niro') LIMIT 10 FACET title_year ORDER BY title_year DESC;
+------------+----------+
| title_year | count(*) |
+------------+----------+
|       2016 |        2 |
|       2015 |        3 |
|       2013 |        3 |
|       2012 |        2 |
|       2011 |        2 |
|       2010 |        3 |
|       2009 |        1 |
|       2008 |        2 |
|       2005 |        1 |
|       2004 |        4 |
|       2002 |        2 |
|       2001 |        2 |
|       2000 |        3 |
|       1999 |        2 |
|       1998 |        1 |
|       1997 |        3 |
|       1996 |        3 |
|       1995 |        1 |
|       1991 |        1 |
|       1990 |        1 |
+------------+----------+
20 rows in set (0.09 sec)

 

Facet selection

In the simplest examples we use FACET attr_name and the result set will contain the attr_name and count columns.

But a facet can be made from multiple attributes:

SELECT * FROM movies WHERE MATCH('robert de niro') LIMIT 10 FACET director_facebook_likes,director_name BY director_name ORDER BY director_facebook_likes DESC;
+-------------------------+---------------------+----------+
| director_facebook_likes | director_name       | count(*) |
+-------------------------+---------------------+----------+
|                   17000 | Martin Scorsese     |        7 |
|                   16000 | Quentin Tarantino   |        1 |
|                   12000 | Tony Scott          |        1 |
|                   11000 | Harold Ramis        |        2 |
|                     737 | David O. Russell    |        2 |
|                     541 | Joel Schumacher     |        1 |
|                     517 | Michael Cimino      |        1 |
|                     446 | James Mangold       |        1 |
|                     287 | John Frankenheimer  |        1 |
|                     278 | Nancy Meyers        |        1 |
|                     277 | Neil Jordan         |        1 |
|                     272 | Barry Levinson      |        3 |
|                     226 | Jon Turteltaub      |        1 |
|                     116 | Jay Roach           |        2 |
|                     105 | Michael Caton-Jones |        1 |
|                     102 | Martin Brest        |        1 |
|                      89 | Rodrigo Cortés      |        1 |
|                      88 | Peter Segal         |        1 |
|                      88 | George Tillman Jr.  |        1 |
|                      80 | Paul Weitz          |        1 |
+-------------------------+---------------------+----------+
20 rows in set (0.01 sec)

Expressions in FACETs

In some situations, faceting on actual values is not something we want. The most common example is product prices, as we can have a wide range of values. Instead of getting facet of actual values, we want to get a list of ranges instead. This can be achieved easily with function INTERVAL().

In our example, we use the imdb_score as it is a float value which we obviously don’t want to group by, instead we are grouping by ranges between only integer values:

SELECT * FROM movies WHERE MATCH('robert de niro') LIMIT 100 FACET title_year LIMIT 100 FACET content_rating LIMIT 100 FACET director_name LIMIT 100;
Empty set (0.09 sec)

+------------+----------+
| title_year | count(*) |
+------------+----------+
|       1987 |        1 |
|       1991 |        1 |
|       2005 |        1 |
|       1997 |        3 |
|       1974 |        1 |
|       2001 |        2 |
|       2002 |        2 |
|       1999 |        2 |
|       1985 |        1 |
|       1995 |        1 |
|       2016 |        2 |
|       2009 |        1 |
|       2004 |        4 |
|       1990 |        1 |
|       2013 |        3 |
|       2015 |        3 |
|       2011 |        2 |
|       2010 |        3 |
|       1996 |        3 |
|       1973 |        1 |
|       2000 |        3 |
|       1988 |        1 |
|       1977 |        1 |
|       1984 |        1 |
|       1980 |        1 |
|       2012 |        2 |
|       2008 |        2 |
|       1998 |        1 |
|       1976 |        1 |
|       1978 |        1 |
|       1989 |        1 |
+------------+----------+
31 rows in set (0.07 sec)

+----------------+----------+
| content_rating | count(*) |
+----------------+----------+
| R              |       37 |
| PG-13          |       12 |
| PG             |        4 |
+----------------+----------+
3 rows in set (0.08 sec)

+----------------------+----------+
| director_name        | count(*) |
+----------------------+----------+
| Brian De Palma       |        1 |
| Martin Scorsese      |        7 |
| John Polson          |        1 |
| Quentin Tarantino    |        1 |
| Francis Ford Coppola |        1 |
| John Herzfeld        |        1 |
| Harold Ramis         |        2 |
| Terry Gilliam        |        1 |
| Michael Caton-Jones  |        1 |
| James Mangold        |        1 |
| Dan Mazer            |        1 |
| Kirk Jones           |        1 |
| Joel Schumacher      |        1 |
| Nick Hamm            |        1 |
| Peter Segal          |        1 |
| Jonathan Jakubowicz  |        1 |
| Scott Mann           |        1 |
| David O. Russell     |        2 |
| Gary McKendry        |        1 |
| Jon Turteltaub       |        1 |
| Paul Weitz           |        1 |
| Ethan Maniquis       |        1 |
| Jerry Zaks           |        1 |
| Jay Roach            |        2 |
| George Tillman Jr.   |        1 |
| Martin Brest         |        1 |
| Garry Marshall       |        1 |
| Sergio Leone         |        1 |
| Rodrigo Cortés       |        1 |
| Jon Avnet            |        1 |
| John Frankenheimer   |        1 |
| Bibo Bergeron        |        1 |
| Barry Levinson       |        3 |
| John Curran          |        1 |
| Des McAnuff          |        1 |
| Justin Zackham       |        1 |
| Mary McGuckian       |        1 |
| Michael Cimino       |        1 |
| Tony Scott           |        1 |
| Nancy Meyers         |        1 |
| Frank Oz             |        1 |
| Neil Jordan          |        1 |
+----------------------+----------+
42 rows in set (0.08 sec)

Interactive Course

You can learn more about faceted search if you take our free "Manticore Faceting" interactive course which features a command line and web panel for easier learning.

Leave a Reply