blog-post

Manticore Search 4.0.2: full columnar store support, auto index compaction, locks system revamp, pseudo sharding

Major new features

  • Full support of Manticore Columnar Library . Previously Manticore Columnar Library was supported only for plain indexes. Now it’s supported:
  • in real-time indexes for INSERT, REPLACE, DELETE, OPTIMIZE
  • in replication
  • in ALTER
  • in indextool --check
  • Automatic indexes compaction ( #478 ). Finally you don’t have to call OPTIMIZE manually or via a crontask or other kind of automation. Manticore now does it on your own. You can set default compaction threshold via optimize_cutoff .
  • Chunk snapshots and locks system revamp. These changes may be invisible from outside at first glance, but they improve the behaviour of many things happening in real-time indexes significantly. In a nutshell, previously most Manticore data manipulation operations relied on locks heavily, now we use disk chunk snapshots instead. In particular:
    • read operations (e.g. SELECTs, replication) are performed with snapshots
    • operations that just change internal index structure without modifying schema/documents (e.g. merging RAM segments, saving disk chunks, merging disk chunks) are performed with read-only snapshots and replace the existing chunks in the end
    • UPDATEs and DELETEs are performed against existing chunks, but for the case of merging that may be happening the writes are collected and are then applied against the new chunks
    • UPDATEs acquire an exclusive lock sequentially for every chunk. Merges acquire a shared lock when entering the stage of collecting attributes from the chunk. So at the same time only one (merge or update) operation has access to attributes of the chunk.
    • when merging gets to the phase it needs attributes it sets a special flag. When UPDATE finishes it checks the flag and if it’s set, the whole update is stored in a special collection. Finally when the merge finishes, it applies the updates set to the newborn disk chunk
    • ALTER runs via an exclusive lock
    • replication runs as a usual read operation, but in addition saves the attributes before SST and forbids updates during the SST
  • ALTER can add/remove a full-text field. Previously it could only add/remove an attribute.
  • 🔬 Experimental: pseudo sharding for full-scan queries - allows to parallelize any non-full-text search query. Instead of preparing shards manually you can now just enable new option searchd.pseudo_sharding and expect up to CPU cores lower response time for non-full-text search queries. Note it can easily occupy all existing CPU cores, so if you care not only about latency, but throughput too - use it with caution.

Minor changes

  • Linux Mint and Ubuntu Hirsute Hippo are supported via APT repository
  • faster update by id via HTTP in big indexes in some cases (depends on the ids distribution)
  • custom startup flags for systemd . Now you don’t need to start searchd manually in case you need to run Manticore with some specific startup flag
  • new function LEVENSHTEIN() which calculates Levenshtein distance
  • added new searchd startup flags --replay-flags=ignore-trx-errors and --replay-flags=ignore-all-errors so one can still start searchd if the binlog is corrupted
  • #621 - expose errors from RE2
  • more accurate COUNT(DISTINCT) for distributed indexes consisting of local plain indexes
  • FACET DISTINCT to remove duplicates when you do faceted search
  • exact form modified doesn’t require morphology now and works for indexes with infix/prefix search enabled

Breaking changes

  • the new version can read older indexes, but the older versions can’t read Manticore 4’s indexes
  • removed implicit sorting by id. Sort explicitly if required
  • charset_table’s default value changes from 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+451, U+451 to non_cjk
  • OPTIMIZE happens automatically. If you don’t need it make sure to set auto_optimize=0 in section searchd in the configuration file
  • #616 ondisk_attrs_default were deprecated, now they are removed
  • for contributors: we now use Clang compiler for Linux builds as according to our tests it can build a faster Manticore Search and Manticore Columnar Library
  • if max_matches is not specified in a search query it gets updated implicitly with the lowest needed value for the sake of performance of the new columnar storage. It can affect metric total in SHOW META , but not total_found which is the actual number of found documents.

Migration from Manticore 3

  • make sure you a stop Manticore 3 cleanly:
  • no binlog files should be in /var/lib/manticore/binlog/ (only binlog.meta should be in the directory)
  • otherwise the indexes Manticore 4 can’t reply binlogs for won’t be run
  • the new version can read older indexes, but the older versions can’t read Manticore 4’s indexes, so make sure you make a backup if you want to be able to rollback the new version easily
  • if you run a replication cluster make sure you:
  • stop all your nodes first cleanly
  • and then start the node which was stopped last with --new-cluster (run tool manticore_new_cluster in Linux).
  • read about restarting a cluster for more details

Bugfixes

  • Lots of replication issues have been fixed:
  • 696f8649 - fixed crash during SST on joiner with active index; added sha1 verify at joiner node at writing file chunks to speed up index loading; added rotation of changed index files at joiner node on index load; added removal of index files at joiner node when active index gets replaced by a new index from donor node; added replication log points at donor node for sending files and chunks
  • b296c55a - crash on JOIN CLUSTER in case the address is incorrect
  • 418bf880 - while initial replication of a large index the joining node could fail with ERROR 1064 (42000): invalid GTID, (null), the donor could become unresponsive while another node was joining
  • 6fd350d2 - hash could be calculated wrong for a big index which could result in replication failure
  • #615 - replication failed on cluster restart
  • #574 - indextool --help doesn’t display parameter --rotate
  • #578 - searchd high CPU usage while idle after ca. a day
  • #587 - flush .meta immediately
  • #617 - manticore.json gets emptied
  • #618 - searchd –stopwait fails under root. It also fixes systemctl behaviour (previously it was showing failure for ExecStop and didn’t wait long enough for searchd to stop properly)
  • #619 - INSERT/REPLACE/DELETE vs SHOW STATUS. command_insert, command_replace and others were showing wrong metrics
  • #620 - charset_table for a plain index had a wrong default value
  • 8f753688 - new disk chunks don’t get mlocked
  • #607 - Manticore cluster node crashes when unable to resolve a node by name
  • #623 - replication of updated index can lead to undefined state
  • ca03d228 - indexer could hang on indexing a plain index source with a json attribute
  • 53c75305 - fixed not equal expression filter at PQ index
  • ccf94e02 - fixed select windows at list queries above 1000 matches. SELECT * FROM pq ORDER BY id desc LIMIT 1000 , 100 OPTION max_matches=1100 was not working previously
  • a0483fe9 - HTTPS request to Manticore could cause warning like “max packet size(8388608) exceeded”

Install Manticore Search

Install Manticore Search