Manticore Search 4.0.2: full columnar store support, auto index compaction, locks system revamp, pseudo sharding

Major new features

  • Full support of Manticore Columnar Library. Previously Manticore Columnar Library was supported only for plain indexes. Now it's supported:
  • in real-time indexes for INSERT, REPLACE, DELETE, OPTIMIZE
  • in replication
  • in ALTER
  • in indextool --check
  • Automatic indexes compaction (#478). Finally you don't have to call OPTIMIZE manually or via a crontask or other kind of automation. Manticore now does it on your own. You can set default compaction threshold via optimize_cutoff.
  • Chunk snapshots and locks system revamp. These changes may be invisible from outside at first glance, but they improve the behaviour of many things happening in real-time indexes significantly. In a nutshell, previously most Manticore data manipulation operations relied on locks heavily, now we use disk chunk snapshots instead. In particular:
    • read operations (e.g. SELECTs, replication) are performed with snapshots
    • operations that just change internal index structure without modifying schema/documents (e.g. merging RAM segments, saving disk chunks, merging disk chunks) are performed with read-only snapshots and replace the existing chunks in the end
    • UPDATEs and DELETEs are performed against existing chunks, but for the case of merging that may be happening the writes are collected and are then applied against the new chunks
    • UPDATEs acquire an exclusive lock sequentially for every chunk. Merges acquire a shared lock when entering the stage of collecting attributes from the chunk. So at the same time only one (merge or update) operation has access to attributes of the chunk.
    • when merging gets to the phase it needs attributes it sets a special flag. When UPDATE finishes it checks the flag and if it's set, the whole update is stored in a special collection. Finally when the merge finishes, it applies the updates set to the newborn disk chunk
    • ALTER runs via an exclusive lock
    • replication runs as a usual read operation, but in addition saves the attributes before SST and forbids updates during the SST
  • ALTER can add/remove a full-text field. Previously it could only add/remove an attribute.
  • 🔬 Experimental: pseudo sharding for full-scan queries - allows to parallelize any non-full-text search query. Instead of preparing shards manually you can now just enable new option searchd.pseudo_sharding and expect up to CPU cores lower response time for non-full-text search queries. Note it can easily occupy all existing CPU cores, so if you care not only about latency, but throughput too - use it with caution.

Minor changes

  • Linux Mint and Ubuntu Hirsute Hippo are supported via APT repository
  • faster update by id via HTTP in big indexes in some cases (depends on the ids distribution)
  • custom startup flags for systemd. Now you don't need to start searchd manually in case you need to run Manticore with some specific startup flag
  • new function LEVENSHTEIN() which calculates Levenshtein distance
  • added new searchd startup flags --replay-flags=ignore-trx-errors and --replay-flags=ignore-all-errors so one can still start searchd if the binlog is corrupted
  • #621 - expose errors from RE2
  • more accurate COUNT(DISTINCT) for distributed indexes consisting of local plain indexes
  • FACET DISTINCT to remove duplicates when you do faceted search
  • exact form modified doesn't require morphology now and works for indexes with infix/prefix search enabled

Breaking changes

  • the new version can read older indexes, but the older versions can't read Manticore 4's indexes
  • removed implicit sorting by id. Sort explicitly if required
  • charset_table's default value changes from 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+451, U+451 to non_cjk
  • OPTIMIZE happens automatically. If you don't need it make sure to set auto_optimize=0 in section searchd in the configuration file
  • #616 ondisk_attrs_default were deprecated, now they are removed
  • for contributors: we now use Clang compiler for Linux builds as according to our tests it can build a faster Manticore Search and Manticore Columnar Library
  • if max_matches is not specified in a search query it gets updated implicitly with the lowest needed value for the sake of performance of the new columnar storage. It can affect metric total in SHOW META, but not total_found which is the actual number of found documents.

Migration from Manticore 3

  • make sure you a stop Manticore 3 cleanly:
  • no binlog files should be in /var/lib/manticore/binlog/ (only binlog.meta should be in the directory)
  • otherwise the indexes Manticore 4 can't reply binlogs for won't be run
  • the new version can read older indexes, but the older versions can't read Manticore 4's indexes, so make sure you make a backup if you want to be able to rollback the new version easily
  • if you run a replication cluster make sure you:
  • stop all your nodes first cleanly
  • and then start the node which was stopped last with --new-cluster (run tool manticore_new_cluster in Linux).
  • read about restarting a cluster for more details

Bugfixes

  • Lots of replication issues have been fixed:
  • 696f8649 - fixed crash during SST on joiner with active index; added sha1 verify at joiner node at writing file chunks to speed up index loading; added rotation of changed index files at joiner node on index load; added removal of index files at joiner node when active index gets replaced by a new index from donor node; added replication log points at donor node for sending files and chunks
  • b296c55a - crash on JOIN CLUSTER in case the address is incorrect
  • 418bf880 - while initial replication of a large index the joining node could fail with ERROR 1064 (42000): invalid GTID, (null), the donor could become unresponsive while another node was joining
  • 6fd350d2 - hash could be calculated wrong for a big index which could result in replication failure
  • #615 - replication failed on cluster restart
  • #574 - indextool --help doesn't display parameter --rotate
  • #578 - searchd high CPU usage while idle after ca. a day
  • #587 - flush .meta immediately
  • #617 - manticore.json gets emptied
  • #618 - searchd --stopwait fails under root. It also fixes systemctl behaviour (previously it was showing failure for ExecStop and didn't wait long enough for searchd to stop properly)
  • #619 - INSERT/REPLACE/DELETE vs SHOW STATUS. command_insert, command_replace and others were showing wrong metrics
  • #620 - charset_table for a plain index had a wrong default value
  • 8f753688 - new disk chunks don't get mlocked
  • #607 - Manticore cluster node crashes when unable to resolve a node by name
  • #623 - replication of updated index can lead to undefined state
  • ca03d228 - indexer could hang on indexing a plain index source with a json attribute
  • 53c75305 - fixed not equal expression filter at PQ index
  • ccf94e02 - fixed select windows at list queries above 1000 matches. SELECT * FROM pq ORDER BY id desc LIMIT 1000 , 100 OPTION max_matches=1100 was not working previously
  • a0483fe9 - HTTPS request to Manticore could cause warning like "max packet size(8388608) exceeded"

2 thoughts on “Manticore Search 4.0.2: full columnar store support, auto index compaction, locks system revamp, pseudo sharding

  • Really great work!

    I have been using the Columnar Engine since version 3.6.0, when it was still very unstable, but the performance was already foreseeable. In the dev versions it got better and now it’s really great.

    One thing I noticed:
    CALL SUGGEST(‘search term’, ‘index’) is not working with distributed indexes.
    It works with every single index from the distributed index, but not with the distributed index itself.
    I’ve noticed this problem since using the Columnar engine, but I’m not sure if it affects the standard engine as well.

    Otherwise, I am very pleased with the current version and I will keep you updated with feedback.

    Thanks a lot!

    • Thank you Johannes. That’s correct: CALL SUGGEST doesn’t support distributed indexes. I’ll appreciate it if you make a feature request about it where you’d explain why it’s important in your case. It’s also correct that it doesn’t matter for CALL SUGGEST (and KEYWORDS) what storage the index uses since they care about full-text part of the index, not attributes while the storages change only attributes behaviour.

Leave a Reply