Percolate Queries: docs_id option

In this article we discuss the docs_id option which provides an easier manipulation of CALL PQ result set.

Let’s consider the following PQ batch call:


mysql> CALL PQ ('pq', ('{"title":"butter is good as", "id":3}', 
                       '{"title":"was butter","id":4}', 
                       '{"title":"sas was butter","id":5}', 
                       '{"title":"bas was butter", "id":6}', 
                       '{"title":"butter is good as","id":7}'), 
                 1 as docs_json ,1 as docs,1 as query);
+------+-----------+------------+------+---------+
| UID  | Documents | Query      | Tags | Filters |
+------+-----------+------------+------+---------+
|    1 | 1,5       | butter is  |      |         |
|    2 | 2,3,4     | butter was |      |         |
+------+-----------+------------+------+---------+
2 rows in set (0.00 sec)

What is wrong with this output?

We receive list of pairs between queries and positions of documents in the input array, however our documents have an identificator – the id attribute.

If we want to pass the result to a next process, we would need to do a post-processing work to extract the document ids from the input array using the positions.

To avoid this post-processing in 2.7.0 we added a new option called docs_id which allows replacing the document positions with the value of a document attribute.


mysql> CALL PQ ('pq', ('{"title":"butter is good as", "id":3}', 
                       '{"title":"was butter","id":4}', 
                       '{"title":"sas was butter","id":5}', 
                       '{"title":"bas was butter", "id":6}', 
                       '{"title":"butter is good as","id":7}'), 
                1 as docs_json ,1 as docs,1 as query, 'id' as docs_id);
+------+-----------+------------+------+---------+
| UID  | Documents | Query      | Tags | Filters |
+------+-----------+------------+------+---------+
|    1 | 3,7       | butter is  |      |         |
|    2 | 4,5,6     | butter was |      |         |
+------+-----------+------------+------+---------+
2 rows in set (0.01 sec)

Now we have the document ids (or other identificator at choice) in the PQ result set.

The chosen ‘docs_id’ attribute must be an integer attribute. It’s values don’t necessary need to be unique. For example the queries could have a logical grouping by another integer attribute. If we want to find out just the groups that match criteria we can use that attribute instead of the document id:


mysql> CALL PQ ('pq', ('{"title":"butter is good as", "id":3,"gid":10}', 
                       '{"title":"was butter","id":4,"gid":20}', 
                       '{"title":"sas was butter","id":5,"gid":10}', 
                       '{"title":"bas was butter", "id":6,"gid":20}', 
                       '{"title":"butter is good as","id":7,"gid":10}'),
                1 as docs_json ,1 as docs,1 as query, 'gid' as docs_id);
+------+-----------+------------+------+---------+
| UID  | Documents | Query      | Tags | Filters |
+------+-----------+------------+------+---------+
|    1 | 10        | butter is  |      |         |
|    2 | 10,20     | butter was |      |         |
+------+-----------+------------+------+---------+
2 rows in set (0.00 sec)

Leave a Reply

Training

Personal and team training will maximize them performance. 

Custom development

Need cone custom or individual features?

Fill the form and don’t forget to make the description of what you need.

Free config review

There are often optimizations that can be made to a Sphinx / Manticore setup by changing some simple directives in the configuration or making quick changes to an index definition.

Some common mistakes and issues can include:

  • doing main+delta without kill-lists, even if the delta does include updated records found in the main
  • using wildcarding with very short prefix/infix which can hammer performance in some cases
  • disabled (unintentional) seamless rotates and getting stalls on index rotations
  • adding texts as string attributes even if they are not using for any kind of operation (filtering, grouping, sorting) or mandatory to be present in results
  • using deprecated settings¬†

Having a quick look on the configuration can show issues or potential issues, this is why we want to offer a gift to our growing community!

When uploading your configuration file, we recommend to remove any database credentials first.

We suggest also you give as many possible details about your setup: how big is the data you have, how typical queries look and what issues you experience.

Contact us