New way of tokenization of Chinese

Why is the Chinese language considered difficult to handle with search engines? The answer is that the Chinese language belongs to the so-called CJK languages that include, alongside with Chinese, Japanese and Korean languages. And the CJK languages have no spaces or any other word separators between words. So why is it bad? Because all the sentences in the text, being divided with spaces or not, consist of words. To find a correct match in full-text search, we need to perform a text tokenization, i.e. to determine the boundaries between text’s words.
[…]