This blog post is written by Marius Matilionis, a senior developer and an expert in Manticore Search at Ivinco. Ivinco specializes in delivering cutting-edge search solutions, database optimizations, incident management, and observability setups to help businesses achieve faster, more efficient, and scalable operations.
Understanding the Challenge
Full re-indexing of large-scale Manticore Search tables can be a time-consuming and resource-intensive task. When significant changes are made to the table configuration, such as altering exceptions or modifying data structures, a full re-index is often necessary to ensure accurate search results.
Traditional Approach: Script-Based Re-indexing
The traditional method involves writing custom scripts to iterate over documents, parse them, and send them to the table. While this approach offers flexibility, it can be slow, especially for large datasets. The performance bottleneck often lies in the iterative nature of the process, which can lead to significant overhead.
A More Efficient Approach: Leveraging mysqldump
mysqldump
, a powerful tool for backing up and restoring MySQL databases, can be effectively utilized to streamline the re-indexing process. By directly dumping and restoring the table data, we can significantly reduce the time required for this operation.
Key Steps:
Prepare the Table:
- Store Text-Indexed Columns: Ensure that all text-indexed columns are
stored
as text to optimize the dump and restore process. This format is more efficient for data transfer and minimizes potential issues during re-indexing. - Create a New Table: Create a new table with the desired configuration to accommodate the changes.
- Store Text-Indexed Columns: Ensure that all text-indexed columns are
Perform the mysqldump:
- Use the following
mysqldump
command to dump the table data:mysqldump -etc --replace -P7103 -h0 manticore rt_index_2 | mysql -P7103 -h0
- Use the following
Optimize the Table (OPTIONAL. Not needed if
auto_optimize
is enabled which is a default):- Run Optimize (
optimize table rt_index_2 option sync=1
): After the re-indexing process, the table size was 8.9GB. The Optimize process helps to reclaim disk space (after optimizing, the table size was 4.4GB) and optimize the table structure. This step is crucial to ensure optimal performance and reduce storage overhead.
- Run Optimize (
Performance Analysis and Considerations
Our testing revealed a significant performance improvement when using mysqldump
:
Table Type | Initial Size (GB) |
---|---|
Text Indexed | 3.5 |
Text Indexed Stored | 4.4 |
Re-index type | Re-indexation Time (minutes) |
---|---|
Script-Based Re-indexing | 94 |
mysqldump | 17 |
As you can see, while storing text-indexed columns as stored increases the initial table size by 25% (from 3.5GB to 4.4GB), it significantly reduces the re-indexing time from 94 minutes to 17 minutes, resulting in a 6x speedup.
Key Considerations:
- Disk Space: While
mysqldump
requires additional disk space during the re-indexing process, the final table size remains the same after optimization. In our case, the initial table size was 4.4GB, and after the re-indexing process, the table size increased to 8.9GB. However, after the debug compress, the size was reduced back to 4.4GB. - Table Structure: The specific structure of the table can impact the performance of both methods. Experimentation may be necessary to identify the optimal approach for your particular use case.
- Data Consistency: Ensure data consistency and avoid conflicts during the re-indexing process, especially if the table is actively being updated. This may involve using techniques like locking or asynchronous updates.
- Hardware and Software Configuration: The performance of the re-indexing process can be influenced by factors such as hardware resources (CPU, memory, disk I/O), database configuration, and network latency.
Conclusion
By leveraging mysqldump
for re-indexing, we can significantly reduce the time and resource requirements associated with this critical task. This optimization is particularly beneficial for large-scale search tables where performance and efficiency are paramount. When considering re-indexing strategies, carefully evaluate the specific needs of your application and infrastructure to determine the most suitable approach.
Marius Matilionis is a senior developer and Manticore Search expert at Ivinco, a company specializing in search solutions, database optimizations, incident management, and observability setups. This blog post reflects his expertise in optimizing Manticore Search for large-scale applications.