Besides the autocomplete feature, for which we covered a simple example in this course, another common feature people add to search applications is the ability to show wrong typed word corrections.
Manticore Search comes with a feature that allows getting suggestions for a word from the index dictionary.
It can be done by enabling the infixing option. Not only infixing allow wildcard searches, but it also creates n-gram hashes from the indexed words.
N-grams (or just parts of words of N characters length) are used to find words that are close to each other (as plain text, not linguistic-wise). Combined with Levenshtein distance between the suggestion candidate word and the original word, we can provide suggestions that are suitable as corrections for the bad word. This functionality is provided by CALL SUGGEST and CALL QSUGGEST functions (read more in the documentation).
So let’s start to review how it works in Manticore Search below or you can try it by own in our interactive course.
First, we should enable infixing in our index.
index movies
{
type = plain
path = /var/lib/manticore/data/movies
source = movies
min_infix_len = 3
}
CALL SUGGEST usage
When a user performs a query that returns no results it’s possible that the user may have mistyped something.
Let’s connect to Manticore and take an example (mind the mistype in ‘revenge’):
mysql -P9306 -h0
root@didyoumean-b85fb586f-2nvh2:/tutorial# mysql -P9306 -h0
Welcome to the MariaDB monitor. Commands end with ; or g.
Your MySQL connection id is 1
Server version: 3.2.0 e526a014@191017 release
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
And take a quick example of a word suggestion:
CALL SUGGEST('rvenge','movies');
MySQL [(none)]> CALL SUGGEST('rvenge','movies');
+---------+----------+------+
| suggest | distance | docs |
+---------+----------+------+
| revenge | 1 | 77 |
| range | 2 | 5 |
| avenger | 2 | 3 |
| avenged | 2 | 1 |
| event | 3 | 9 |
+---------+----------+------+
5 rows in set (0.00 sec)
The output contains 3 columns: the suggestion, a calculated Levenshtein distance and doc hits of the suggestions in the index.
The first suggestion has a distance of 1 compared to our input and it’s the actual word expected to be suggested.
This is usually the best scenario when we get on the minimal distance a single suggestion, as it’s most likely to be the one we look for.
It is possible to even for distance 1 to have more than one suggestion:
CALL SUGGEST('aprentice','movies');
MySQL [(none)]> CALL SUGGEST('aprentice','movies');
+------------+----------+------+
| suggest | distance | docs |
+------------+----------+------+
| apprentice | 1 | 6 |
| prentice | 1 | 1 |
| practice | 3 | 5 |
| argentine | 3 | 1 |
| prestige | 3 | 1 |
+------------+----------+------+
5 rows in set (0.00 sec)
When they share the same distance, suggestions are sorted again by their doc hits.
In this example, ‘apprentice’ is most likely what the user wanted as it has more hits than ‘prentice’.
Of course, when the input word is actually found in our index, it will appear as the first suggestion with distance=0
CALL SUGGEST('revenge','movies');
MySQL [(none)]> CALL SUGGEST('revenge','movies');
+----------+----------+------+
| suggest | distance | docs |
+----------+----------+------+
| revenge | 0 | 77 |
| reverse | 2 | 2 |
| revelle | 2 | 1 |
| seven | 3 | 11 |
| berenger | 3 | 9 |
+----------+----------+------+
5 rows in set (0.01 sec)
If we want to increase the number of suggestions, we can add the limit parameter:
CALL SUGGEST('aprentice','movies', 10 as limit);
MySQL [(none)]> CALL SUGGEST('aprentice','movies', 10 as limit);
+------------+----------+------+
| suggest | distance | docs |
+------------+----------+------+
| apprentice | 1 | 6 |
| prentice | 1 | 1 |
| practice | 3 | 5 |
| argentine | 3 | 1 |
| prestige | 3 | 1 |
| adventure | 4 | 894 |
| lawrence | 4 | 43 |
| laurence | 4 | 10 |
| terence | 4 | 9 |
| prejudice | 4 | 9 |
+------------+----------+------+
10 rows in set (0.00 sec)
If we want to restrict the suggestions, we can lower the maximum Levenshtein distance (default is 4) and maximum word length (default is 3):
CALL SUGGEST('aprentice','movies', 10 as limit,3 as max_edits,2 as delta_len);
MySQL [(none)]> CALL SUGGEST('aprentice','movies', 10 as limit,3 as max_ dits,2 as delta_len);
+------------+----------+------+
| suggest | distance | docs |
+------------+----------+------+
| apprentice | 1 | 6 |
| prentice | 1 | 1 |
| practice | 3 | 5 |
| argentine | 3 | 1 |
| prestige | 3 | 1 |
+------------+----------+------+
5 rows in set (0.00 sec)
For the next step, we need to exit the MySQL client
exit;
MySQL [(none)]> exit;
Bye
Working example
A simple working example of ‘Did you mean’ can be seen in the Web panel of our interactive course.
The PHP script provides a simple search page results.
In case the input string doesn’t find a result the script tests each word with ‘CALL SUGGEST’ and tries to build a new query string.
If the new query string has matched, it’s result set is provided.
The script can be viewed with cat /html/index.php
root@didyoumean-b85fb586f-2nvh2:/tutorial# cat /html/index.php
If you feel something is missing, try the course, read the documentation or ask in the community.