blog-post

Manticore Search Re-indexing with mysqldump

This blog post is written by Marius Matilionis, a senior developer and an expert in Manticore Search at Ivinco. Ivinco specializes in delivering cutting-edge search solutions, database optimizations, incident management, and observability setups to help businesses achieve faster, more efficient, and scalable operations.


Understanding the Challenge

Full re-indexing of large-scale Manticore Search tables can be a time-consuming and resource-intensive task. When significant changes are made to the table configuration, such as altering exceptions or modifying data structures, a full re-index is often necessary to ensure accurate search results.

Traditional Approach: Script-Based Re-indexing

The traditional method involves writing custom scripts to iterate over documents, parse them, and send them to the table. While this approach offers flexibility, it can be slow, especially for large datasets. The performance bottleneck often lies in the iterative nature of the process, which can lead to significant overhead.

A More Efficient Approach: Leveraging mysqldump

mysqldump, a powerful tool for backing up and restoring MySQL databases, can be effectively utilized to streamline the re-indexing process. By directly dumping and restoring the table data, we can significantly reduce the time required for this operation.

Key Steps:

  • Prepare the Table:

    • Store Text-Indexed Columns: Ensure that all text-indexed columns are stored as text to optimize the dump and restore process. This format is more efficient for data transfer and minimizes potential issues during re-indexing.
    • Create a New Table: Create a new table with the desired configuration to accommodate the changes.
  • Perform the mysqldump:

    • Use the following mysqldump command to dump the table data:
      mysqldump -etc --replace -P7103 -h0 manticore rt_index_2 | mysql -P7103 -h0
      
  • Optimize the Table (OPTIONAL. Not needed if auto_optimize is enabled which is a default):

    • Run Optimize (optimize table rt_index_2 option sync=1): After the re-indexing process, the table size was 8.9GB. The Optimize process helps to reclaim disk space (after optimizing, the table size was 4.4GB) and optimize the table structure. This step is crucial to ensure optimal performance and reduce storage overhead.

Performance Analysis and Considerations

Our testing revealed a significant performance improvement when using mysqldump:

Table TypeInitial Size (GB)
Text Indexed3.5
Text Indexed Stored4.4
Re-index typeRe-indexation Time (minutes)
Script-Based Re-indexing94
mysqldump17

As you can see, while storing text-indexed columns as stored increases the initial table size by 25% (from 3.5GB to 4.4GB), it significantly reduces the re-indexing time from 94 minutes to 17 minutes, resulting in a 6x speedup.

Key Considerations:

  • Disk Space: While mysqldump requires additional disk space during the re-indexing process, the final table size remains the same after optimization. In our case, the initial table size was 4.4GB, and after the re-indexing process, the table size increased to 8.9GB. However, after the debug compress, the size was reduced back to 4.4GB.
  • Table Structure: The specific structure of the table can impact the performance of both methods. Experimentation may be necessary to identify the optimal approach for your particular use case.
  • Data Consistency: Ensure data consistency and avoid conflicts during the re-indexing process, especially if the table is actively being updated. This may involve using techniques like locking or asynchronous updates.
  • Hardware and Software Configuration: The performance of the re-indexing process can be influenced by factors such as hardware resources (CPU, memory, disk I/O), database configuration, and network latency.

Conclusion

By leveraging mysqldump for re-indexing, we can significantly reduce the time and resource requirements associated with this critical task. This optimization is particularly beneficial for large-scale search tables where performance and efficiency are paramount. When considering re-indexing strategies, carefully evaluate the specific needs of your application and infrastructure to determine the most suitable approach.


Marius Matilionis is a senior developer and Manticore Search expert at Ivinco, a company specializing in search solutions, database optimizations, incident management, and observability setups. This blog post reflects his expertise in optimizing Manticore Search for large-scale applications.

Install Manticore Search

Install Manticore Search