Migrating to Manticore 3: document ids

In this article we’re talking about document id data type change in Manticore Search 3.0.

In previous versions document ids were unsigned big integers. This changed in 3.0 as we switched to signed big integers. The reason behind that decision was to make the document ids uniform with the bigint attributes as those are signed as in most cases even signed bigints should be enough in most cases and anyway we’re moving toward auto-generated IDs.

However in some seldom cases that change may become a problem and in this article we’ll see in more details the difference and how to overcome the problem if that arises.

Unsigned big integers support values between 0 and 18,446,744,073,709,551,615 ( 264-1), while signed big integers can take values between  −9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 ( −263 to 263−1) ). The signed positive range is big enough to fit large datasets (at least there is no known Manticore collection of more than the incredible 9+ quadrillion documents). The issue happens only when the document id is not an incremented value, but a hash using unsigned big integer. For those case, if it’s not possible to switch to a hash that can fit signed type, quite simple conversion can be used to store and retrieve the hashes which is explained below.

In case of RealTime indexes, support for auto generated ids will be added soon. In the cases where the RT doc id was not an id from a database, but only generated (by hash or other methods), things will get easier when the auto generation happens on Manticore’s side.

Why not support both data types?

Supporting both signed and unsigned ids in Manticore would cause right now more troubles than help. For example some clients are very strong regarding the data types expected in the responses. Sending unsigned in responses would cause a lot of confusing as clients are instructed to expect a signed big integer.

How to deal with this change?

To be consistent with the change in Manticore Search the document id type might need a change in your application and data sources. If the ids are stored in a database, but they never reach the positive signed range ( 263 -1), then things are easy: just convert the column to signed. If your ids do go past that value, the numbers need to be converted to signed ranges.

Values over the signed positive range ( 263 -1) can be made as negative numbers by subtracting from them  264 : this will convert 263  (first number above the signed positive range) to  -263 and 264-1 (biggest possible unsigned big integer) to -1. This can be also used if you are generating ids based on a hash function that returns unsigned big integers.

For example in MySQL to convert an unsigned big integer to fit into a signed big integer we can use IF (id>>63, -(~id) - 1,id). To convert back the “mapped” signed number to unsigned we can use IF(signed_id<0, ~0^~signed_id,signed_id) . The reason the bit functions are used is because MySQL supports only bit functions for numbers bigger than 9223372036854775807 (63 bits).

SELECT id,if (id>>63, -(~id) - 1,id) AS mapped_to_signed, 
signed_id, if (signed_id<0, ~0^~signed_id,signed_id)  AS unsigned_from_mapped
FROM test ORDER BY id
id unsigned_to_mapped signed_id unsigned_from_mapped
0 0 0 0
1 1 1 1
9223372036854775807 9223372036854775807 9223372036854775807 9223372036854775807
9223372036854775808 -9223372036854775808 -9223372036854775808 9223372036854775808
9223372036854775809 -9223372036854775807 -9223372036854775807 9223372036854775809
18446744073709551613 -3 -3 18446744073709551613
18446744073709551614 -2 -2 18446744073709551614
18446744073709551615 -1 -1 18446744073709551615

If you are using strong-type languages (like .NET), you need to review the code to make sure it will expect the document ids in the search responses to be signed and not unsigned.

We know this change can create some troubles, but it’s a one-off change that must be made to enjoy the latest Manticore Search.

Leave a Reply