基准测试：Manticore 3 vs Sphinx 3 - 现在更快

已发布: May 14, 2019
自动翻译: Benchmark: Manticore 3 vs Sphinx 3 - now even faster

最近我们发布了 Manticore 3.0.0 ，其中包括许多改进和一些新的优化，提高了性能。在本文中，我们想将新版本的性能与 Sphinx 3.1.1 的性能进行比较。

TL;DR

Manticore 显示：

在某些情况下，搜索性能约提高 2 倍，尤其是在处理较长查询时
在其他所有测试中，性能较低，但仍优于 Sphinx
除了索引时间，Sphinx 快 2%

测试环境

如同我们之前对比 Manticore 2.7 vs Sphinx 3 的基准测试，我们将在 1160 万条来自 Hacker News 的用户评论的数据集上进行基准测试。

基准测试在以下条件下进行：

2016 年的 Hacker News 精选评论数据集，以 CSV 格式
操作系统：Ubuntu 18.04.1 LTS (Bionic Beaver)，内核：4.15.0-47-generic
CPU：Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz，8 核心
32G RAM
HDD
Docker 版本 18.09.2
用于索引和 searchd 的基本镜像 - Ubuntu:bionic
Manticore Search 在 docker 中构建，Sphinx 二进制文件从网站下载，因为没有开源版本可供构建
用于基准测试的 stress-tester

Manticore 和 Sphinx 的配置是相同的：

source full
{
  type = csvpipe
  csvpipe_command = cat /root/hacker_news_comments.prepared.csv|grep -v line_number
  csvpipe_attr_uint = story_id
  csvpipe_attr_timestamp = story_time
  csvpipe_field = story_text
  csvpipe_field = story_author
  csvpipe_attr_uint = comment_id
  csvpipe_field = comment_text
  csvpipe_field = comment_author
  csvpipe_attr_uint = comment_ranking
  csvpipe_attr_uint = author_comment_count
  csvpipe_attr_uint = story_comment_count
}

index full
{
  path = /root/idx_full
  source = full
  html_strip = 1
  mlock = 1
}

searchd
{
  listen = 9306:mysql41
  query_log = /root/query.log
  log = /root/searchd.log
  pid_file = /root/searchd.pid
  binlog_path =
  qcache_max_bytes = 0
}

索引

索引时间为 1263 秒（Manticore） 和 1237 秒（Sphinx）：

Manticore：

indexing index 'full'...
collected 11654429 docs, 6198.6 MB
creating lookup: 11654.4 Kdocs, 100.0% done
creating histograms: 11654.4 Kdocs, 100.0% done
sorted 1115.7 Mhits, 100.0% done
total 11654429 docs, 6198580642 bytes
total <b>1263.497 sec</b>, 4905890 bytes/sec, 9223.94 docs/sec
total 22924 reads, 1.484 sec, 238.4 kb/call avg, 0.0 msec/call avg
total 11687 writes, 11.773 sec, 855.1 kb/call avg, 1.0 msec/call avg

Sphinx：

indexing index 'full'...
collected 11654429 docs, 6198.6 MB
sorted 1115.7 Mhits, 100.0% done
total 11654429 docs, 6.199 Gb
total <b>1236.9</b> sec, 5.011 Mb/sec, 9422 docs/sec

因此在此数据集和索引模式下，Manticore 的索引速度比 Sphinx 慢 ~2%。

性能测试

在测试之前，两个实例都已预热。索引文件如下：

Manticore：

root@bench# ls -lah /var/lib/docker/volumes/64746c338de981014c7c1ea93d4c55f55e13de63ac9e49c2d31292bb239a82b6/_data
total 4.7G
drwx------ 2 root root 4.0K May 14 09:03 .
drwxr-xr-x 3 root root 4.0K May 14 09:01 ..
-rw-r--r-- 1 root root 362M May 13 17:22 idx_full.spa
-rw-r--r-- 1 root root 3.1G May 13 17:31 idx_full.spd
-rw-r--r-- 1 root root  90M May 13 17:31 idx_full.spe
-rw-r--r-- 1 root root  628 May 13 17:31 idx_full.sph
-rw-r--r-- 1 root root  29K May 13 17:22 idx_full.sphi
-rw-r--r-- 1 root root 6.5M May 13 17:31 idx_full.spi
-rw------- 1 root root    0 May 14 09:03 idx_full.spl
-rw-r--r-- 1 root root 1.4M May 13 17:22 idx_full.spm
-rw-r--r-- 1 root root 1.1G May 13 17:31 idx_full.spp
-rw-r--r-- 1 root root  59M May 13 17:22 idx_full.spt

Sphinx：

root@bench /var/lib/docker/volumes # ls -lah /var/lib/docker/volumes/bd28586b5102ff91d4c367f612e2f7b1fe0a066917c8e0b4636d203dd3ba5b0b/_data
total 4.6G
drwx------ 3 root root 4.0K May 14 09:04 .
drwxr-xr-x 3 root root 4.0K May 14 09:03 ..
-rw-r--r-- 1 root root 362M May 13 19:09 idx_full.spa
-rw-r--r-- 1 root root 3.1G May 13 19:17 idx_full.spd
-rw-r--r-- 1 root root  27M May 13 19:17 idx_full.spe
-rw-r--r-- 1 root root  648 May 13 19:17 idx_full.sph
-rw-r--r-- 1 root root 6.3M May 13 19:17 idx_full.spi
-rw-r--r-- 1 root root    8 May 13 19:09 idx_full.spj
-rw-r--r-- 1 root root 1.4M May 13 19:09 idx_full.spk
-rw------- 1 root root    0 May 14 09:04 idx_full.spl
-rw-r--r-- 1 root root 1.1G May 13 19:17 idx_full.spp

测试 1 - 处理集合中前 1000 个术语的时间

首先让我们进行一个简单的测试 - 找到包含集合中前 1000 个术语的文档需要多长时间：

for n in `head -1000 hn_top.txt|awk '{print $1}'`; do
mysql -P9306 -hhn_$engine -e "select * from full where match('@(comment_text,story_text,comment_author,story_author) $n') limit 10 option max_matches=1000" > /dev/null
done

结果是：Sphinx 用时 77.61 秒，Manticore 用时 71.46 秒。

因此在此测试中，Manticore Search 的速度比 Sphinx Search 快 8.59%。

测试 2 - 按组划分的集合中前 1000 个频繁术语（前 1-50，前 50-100 等）

现在我们来看看 Sphinx 和 Manticore 在处理来自前 1000 个频繁术语的子组中的术语时有什么不同。

为了更好地理解查询，这里有一些随机查询示例：

1-50	50-100	100-150	150-200	200-250	250-300
关于	获取	事情	使用过的	不同的	高
我的	工作	是	关闭	系统	构建
更多	事情	似乎	确定	没	下一个

300-350	350-400	400-450	450-500	500-550	550-600
天	价值	20	想要的	价格	它本身
写	服务器	关注	网站	g	文件
没有	电话	故事	最近	模型	商店

600-650	650-700	700-750	750-800	800-850	850-900
招聘	聪明	欢迎	选择	已拿走	主题
客户	愿望	想要	pg	细节	核心
iphone	原因	环境	通常	css	思考

900-950	950-1000
完整	关键字
资源	晚
想法	ipad

Manticore 比 Sphinx 快 6.8%，从 95p 延迟角度看，吞吐量快 12.2%。

测试 3 - 来自集合的前 1000 个常见术语按组划分 + 组 1-100 中的 1 个术语

让我们看看当您有一个非常常见的术语和另一个不太常见的术语时，它是如何工作的。示例是：

1-50	50-100	100-150	150-200	200-250	250-300
更多的	已经	方式	不干扰	刚刚启动的	已经支付
他们作为	可以其他	可以	想自己的	真的别的	如果设计
关于	那时	是等	来自应用	到小的	还在

300-350	350-400	400-450	450-500	500-550	550-600
只是任意	一个互联网	谁的网	你的位置	将程序	基本上
不远	通过社交	过去	一个网站	几个	有公司
1写	意味着	大多数家伙	无论如何	有空间	如果基本上

600-650	650-700	700-750	750-800	800-850	850-900
方式程序员	查看标题	因为单词	认为mac	最贵的	时间改变了
是困难的	我成功	客户	被听到	非常	已经食物
他们的周	比yc	因为好	也桌面	为理解	大多数流量

900-950	950-1000
如果派对	真的相当
什么时候严重	这里服务器
思考速度	是失去

哇，Manticore的吞吐量比Sphinx快106%，在95p延迟方面平均快91.8%。

测试 4 - 来自集合的前1000个频繁术语按组分解 + 第1组1-100的1个术语，两个术语用引号括起来形成一个短语

1-50	50-100	100-150	150-200	200-250	250-300
"它们都"	"也认为"	"那段代码"	"我们应用"	"从给予"	"由用户"
"http 你"	"喜欢已经"	"是我们的"	"作为文章"	"使用制作"	"真的至少"
"由一些"	"使用工作"	"做得很好"	"我应用"	"可能不同"	"如果理解"

300-350	350-400	400-450	450-500	500-550	550-600
"其他开发者"	"通过建设"	"想要创建"	"那是前面"	"任何政府"	"它们考虑"
"有想法"	"我 python"	"谁被给予"	"已经完全"	"有价格"	"ve 开始"
"方式 facebook"	"被编辑"	"上链接"	"使用位置"	"真的交易"	"现在早期"

600-650	650-700	700-750	750-800	800-850	850-900
"已经数周"	"上工程"	"有询问"	"s p"	"谁 css"	"那是加号"
"制作 api"	"我们期望"	"真的愿意"	"t 步骤"	"因为强大"	"为了流量"
"是注意"	"会自己"	"http 学位"	"http 桌面"	"不否则"	"很多食物"

900-950	950-1000
"s 带来"	"非常要求"
"这个选择"	"所有示例"
"现在带来"	"是论证"

这里 Manticore 在吞吐量上平均快：吞吐量快 11.8%，95p 延迟快 21.2%。

测试 5 - 从组 600-750 中各取 2 个术语在不同并发下

本测试旨在展示在不同查询并发下的吞吐量差异。以下是我们获得的结果：

在所有并发下，Manticore 平均快了 31%，吞吐量和 95p 延迟降低了 28%。

测试 6 - 从不同组中提取的 3-5 个术语

本测试旨在通过更长的查询（3-5 个术语）展示搜索中的差异：

来自组 100-200 400-500 800-900 的 3 个术语
来自组 100-200 300-400 500-600 800-900 的 4 个术语
来自组 100-200 300-400 500-600 800-900 900-1000 的 5 个术语

查询示例：

3 个术语	4 个术语	5 个术语
工作自己发布	使用感谢第二 12	切换开始 b 12 地方
每个网站缺失	周围学校模型链接	切换 github 类讨厌最近
他的内容避免	通过工作的数量效果	切换 facebook 办公室绝对英语

Manticore 的吞吐量高出 77.6%，而 95p 延迟低了 81.4%。

测试 7：来自 300-600 组的 3 个 AND 条件和 1 个 NOT 条件来自 300-400

在该测试中，我们在 3 个条件查询中添加了一个 NOT 条件：

吞吐量 - 高出 66.6%，95p 延迟 - 低了 57%。

结论

Sphinx 在 21 分钟的索引性能上表现出几秒钟的优势。
至于我们认为更重要的搜索性能，Manticore 3.0.0 在所有测试中展示了更高的吞吐量和更低的延迟。 整个测试完全 docker 化，并且在我们的 github 上开源。详细的结果可以在这里找到。如果您在您的硬件上运行相同的测试或添加不同的测试到测试套件，我们将非常感激，并请您告诉我们结果。

如果您考虑迁移到 Manticore 3，请阅读这篇文章。我们理解您的索引可能很大，为了简化迁移过程，有一个新的工具 index_converter ，可以轻松将您现有的 Sphinx 2 / Manticore 2 索引转换为新的 Manticore 3 索引格式。

如果您有任何问题、疑问或意见，请随时与我们联系：

在 twitter
发送电子邮件到 [email protected]
在我们的论坛发帖
在我们的社区 Slack 中与我们聊天
在我们的 GitHub 的 bug 跟踪器上抱怨一切有多糟糕