Manticore Search 4.0.2: full columnar store support, auto index compaction, locks system revamp, pseudo sharding

Manticore Search 4.0.2: full columnar store support, auto index compaction, locks system revamp, pseudo sharding

Published: Sep 21, 2021

Major new features

Full support of Manticore Columnar Library. Previously Manticore Columnar Library was supported only for plain indexes. Now it’s supported:
in real-time indexes for INSERT, REPLACE, DELETE, OPTIMIZE
in replication
in ALTER
in indextool --check
Automatic indexes compaction (#478). Finally you don’t have to call OPTIMIZE manually or via a crontask or other kind of automation. Manticore now does it on your own. You can set default compaction threshold via optimize_cutoff.
Chunk snapshots and locks system revamp. These changes may be invisible from outside at first glance, but they improve the behaviour of many things happening in real-time indexes significantly. In a nutshell, previously most Manticore data manipulation operations relied on locks heavily, now we use disk chunk snapshots instead. In particular:
- read operations (e.g. SELECTs, replication) are performed with snapshots
- operations that just change internal index structure without modifying schema/documents (e.g. merging RAM segments, saving disk chunks, merging disk chunks) are performed with read-only snapshots and replace the existing chunks in the end
- UPDATEs and DELETEs are performed against existing chunks, but for the case of merging that may be happening the writes are collected and are then applied against the new chunks
- UPDATEs acquire an exclusive lock sequentially for every chunk. Merges acquire a shared lock when entering the stage of collecting attributes from the chunk. So at the same time only one (merge or update) operation has access to attributes of the chunk.
- when merging gets to the phase it needs attributes it sets a special flag. When UPDATE finishes it checks the flag and if it’s set, the whole update is stored in a special collection. Finally when the merge finishes, it applies the updates set to the newborn disk chunk
- ALTER runs via an exclusive lock
- replication runs as a usual read operation, but in addition saves the attributes before SST and forbids updates during the SST
ALTER can add/remove a full-text field. Previously it could only add/remove an attribute.
🔬 Experimental: pseudo sharding for full-scan queries - allows to parallelize any non-full-text search query. Instead of preparing shards manually you can now just enable new option searchd.pseudo_sharding and expect up to CPU cores lower response time for non-full-text search queries. Note it can easily occupy all existing CPU cores, so if you care not only about latency, but throughput too - use it with caution.

Minor changes

Linux Mint and Ubuntu Hirsute Hippo are supported via APT repository
faster update by id via HTTP in big indexes in some cases (depends on the ids distribution)
custom startup flags for systemd. Now you don’t need to start searchd manually in case you need to run Manticore with some specific startup flag
new function LEVENSHTEIN() which calculates Levenshtein distance
added new searchd startup flags --replay-flags=ignore-trx-errors and --replay-flags=ignore-all-errors so one can still start searchd if the binlog is corrupted
#621 - expose errors from RE2
more accurate COUNT(DISTINCT) for distributed indexes consisting of local plain indexes
FACET DISTINCT to remove duplicates when you do faceted search
exact form modified doesn’t require morphology now and works for indexes with infix/prefix search enabled

Breaking changes

the new version can read older indexes, but the older versions can’t read Manticore 4’s indexes
removed implicit sorting by id. Sort explicitly if required
charset_table’s default value changes from 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+451, U+451 to non_cjk
OPTIMIZE happens automatically. If you don’t need it make sure to set auto_optimize=0 in section searchd in the configuration file
#616 ondisk_attrs_default were deprecated, now they are removed
for contributors: we now use Clang compiler for Linux builds as according to our tests it can build a faster Manticore Search and Manticore Columnar Library
if max_matches is not specified in a search query it gets updated implicitly with the lowest needed value for the sake of performance of the new columnar storage. It can affect metric total in SHOW META, but not total_found which is the actual number of found documents.

Migration from Manticore 3

make sure you a stop Manticore 3 cleanly:
no binlog files should be in /var/lib/manticore/binlog/ (only binlog.meta should be in the directory)
otherwise the indexes Manticore 4 can’t reply binlogs for won’t be run
the new version can read older indexes, but the older versions can’t read Manticore 4’s indexes, so make sure you make a backup if you want to be able to rollback the new version easily
if you run a replication cluster make sure you:
stop all your nodes first cleanly
and then start the node which was stopped last with --new-cluster (run tool manticore_new_cluster in Linux).
read about restarting a cluster for more details

Bugfixes

Lots of replication issues have been fixed:
696f8649 - fixed crash during SST on joiner with active index; added sha1 verify at joiner node at writing file chunks to speed up index loading; added rotation of changed index files at joiner node on index load; added removal of index files at joiner node when active index gets replaced by a new index from donor node; added replication log points at donor node for sending files and chunks
b296c55a - crash on JOIN CLUSTER in case the address is incorrect
418bf880 - while initial replication of a large index the joining node could fail with ERROR 1064 (42000): invalid GTID, (null), the donor could become unresponsive while another node was joining
6fd350d2 - hash could be calculated wrong for a big index which could result in replication failure
#615 - replication failed on cluster restart
#574 - indextool --help doesn’t display parameter --rotate
#578 - searchd high CPU usage while idle after ca. a day
#587 - flush .meta immediately
#617 - manticore.json gets emptied
#618 - searchd –stopwait fails under root. It also fixes systemctl behaviour (previously it was showing failure for ExecStop and didn’t wait long enough for searchd to stop properly)
#619 - INSERT/REPLACE/DELETE vs SHOW STATUS. command_insert, command_replace and others were showing wrong metrics
#620 - charset_table for a plain index had a wrong default value
8f753688 - new disk chunks don’t get mlocked
#607 - Manticore cluster node crashes when unable to resolve a node by name
#623 - replication of updated index can lead to undefined state
ca03d228 - indexer could hang on indexing a plain index source with a json attribute
53c75305 - fixed not equal expression filter at PQ index
ccf94e02 - fixed select windows at list queries above 1000 matches. SELECT * FROM pq ORDER BY id desc LIMIT 1000 , 100 OPTION max_matches=1100 was not working previously
a0483fe9 - HTTPS request to Manticore could cause warning like “max packet size(8388608) exceeded”

Manticore Search 4.0.2: full columnar store support, auto index compaction, locks system revamp, pseudo sharding

Major new features

Minor changes

Breaking changes

Migration from Manticore 3

Bugfixes

Install Manticore Search