Manticore 2.8.2 vs 3.0 - 在某些测试中速度快2倍

已发布: May 17, 2019
自动翻译: Manticore 2.8.2 vs 3.0 - 2x faster in some tests

正如您可能知道的，最近发布了 Manticore 3.0 版本。

在这个基准测试中，我们来看看它是否比 2.8 更好。测试环境如下：

Hacker News 精选评论数据集，2016 年，CSV 格式
操作系统：Ubuntu 18.04.1 LTS (Bionic Beaver)，内核：4.15.0-47-generic
CPU：Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz，8 核心
32G RAM
HDD
Docker 版本 18.09.2
索引和搜索的基础镜像 - Ubuntu:bionic
stress-tester 用于基准测试

两个版本的 Manticore 配置是一样的：

source full
{
  type = csvpipe
  csvpipe_command = cat /root/hacker_news_comments.prepared.csv|grep -v line_number
  csvpipe_attr_uint = story_id
  csvpipe_attr_timestamp = story_time
  csvpipe_field = story_text
  csvpipe_field = story_author
  csvpipe_attr_uint = comment_id
  csvpipe_field = comment_text
  csvpipe_field = comment_author
  csvpipe_attr_uint = comment_ranking
  csvpipe_attr_uint = author_comment_count
  csvpipe_attr_uint = story_comment_count
}

index full
{
  path = /root/idx_full
  source = full
  html_strip = 1
  mlock = 1
}

searchd
{
  listen = 9306:mysql41
  query_log = /root/query.log
  log = /root/searchd.log
  pid_file = /root/searchd.pid
  binlog_path =
  qcache_max_bytes = 0
}

索引

索引花费了 1303 秒 Manticore 3.0 和 1322 秒 Manticore 2.8.2：

Manticore 3.0：

indexing index 'full'...
collected 11654429 docs, 6198.6 MB
creating lookup: 11654.4 Kdocs, 100.0% done
creating histograms: 11654.4 Kdocs, 100.0% done
sorted 1115.7 Mhits, 100.0% done
total 11654429 docs, 6198580642 bytes
total <b>1303.470</b> sec, 4755444 bytes/sec, 8941.07 docs/sec
total 22924 reads, 16.605 sec, 238.4 kb/call avg, 0.7 msec/call avg
total 11687 writes, 13.532 sec, 855.1 kb/call avg, 1.1 msec/call avg

Manticore 2.8：

indexing index 'full'...
collected 11654429 docs, 6198.6 MB
sorted 1115.7 Mhits, 100.0% done
total 11654429 docs, 6198580642 bytes
total <b>1322.239</b> sec, 4687939 bytes/sec, 8814.15 docs/sec
total 11676 reads, 15.248 sec, 452.6 kb/call avg, 1.3 msec/call avg
total 9431 writes, 12.800 sec, 1065.3 kb/call avg, 1.3 msec/call avg

所以在这个数据集和索引架构下，3.0 版本的索引速度比 2.8 快约 1.5%。

性能测试

在测试之前，两个实例都进行了预热。

Manticore 3.0：

total 4.7G
drwx------ 2 root root 4.0K May 14 17:41 .
drwxr-xr-x 3 root root 4.0K May 14 17:40 ..
-rw-r--r-- 1 root root 362M May 14 17:24 idx_full.spa
-rw-r--r-- 1 root root 3.1G May 14 17:36 idx_full.spd
-rw-r--r-- 1 root root  90M May 14 17:36 idx_full.spe
-rw-r--r-- 1 root root  628 May 14 17:36 idx_full.sph
-rw-r--r-- 1 root root  29K May 14 17:24 idx_full.sphi
-rw-r--r-- 1 root root 6.5M May 14 17:36 idx_full.spi
-rw------- 1 root root    0 May 14 17:41 idx_full.spl
-rw-r--r-- 1 root root 1.4M May 14 17:24 idx_full.spm
-rw-r--r-- 1 root root 1.1G May 14 17:36 idx_full.spp
-rw-r--r-- 1 root root  59M May 14 17:24 idx_full.spt

Manticore 2.8：

total 4.6G
drwx------ 2 root root 4.0K May 16 18:38 .
drwxr-xr-x 3 root root 4.0K May 14 17:43 ..
-rw-r--r-- 1 root root 362M May 14 17:24 idx_full.spa
-rw-r--r-- 1 root root 3.1G May 14 17:36 idx_full.spd
-rw-r--r-- 1 root root  27M May 14 17:36 idx_full.spe
-rw-r--r-- 1 root root  601 May 14 17:36 idx_full.sph
-rw-r--r-- 1 root root 6.3M May 14 17:36 idx_full.spi
-rw-r--r-- 1 root root    0 May 14 17:24 idx_full.spk
-rw------- 1 root root    0 May 16 18:38 idx_full.spl
-rw-r--r-- 1 root root    0 May 14 17:24 idx_full.spm
-rw-r--r-- 1 root root 1.1G May 14 17:36 idx_full.spp
-rw-r--r-- 1 root root    1 May 14 17:36 idx_full.sps

测试 1 - 处理集合中前 1000 个术语所需的时间

首先，让我们进行一个简单的测试 - 了解处理集合中前 1000 个频繁术语并找到每个文档所需的时间：

结果是：Manticore 2.8 的时间为 77.61 秒，Manticore 3.0 的时间为 71.79 秒。

所以在这个测试中，Manticore Search 3.0 比之前的版本快 8%。

测试 2 - 按组分解的集合中前 1000 个频繁术语（前 1-50，前 50-100 等）

现在，让我们看看 3.0 在处理不同频率组的术语时是否更好。下面您可以找到每个组中的一些随机示例：

1-50	50-100	100-150	150-200	200-250	250-300
one	much	our	every	less	pay
with	真的	进入	没有	另一个	理解
是	其他	仍然	向下	已经	每个人

300-350	350-400	400-450	450-500	500-550	550-600
搜索	开发者	创建	兴趣	一般的	共同的
原因	整体	给予	尝试	模型	办公室
没有什么	名字	朋友	访问	数量	支付

600-650	650-700	700-750	750-800	800-850	850-900
管理	他们自己	跨越	pg	论文	核心
相关	营销	学习	意见	挑选	高度
去	除非	帖子	风险	强	流量

900-950	950-1000
想法	体面
接口	年轻
反应	英语

Manticore 2.8 在延迟上平均比 3.0 快 0.4%，并提供 0.5% 更高的吞吐量。这在误差范围内。

测试 3 - 从集合中按组划分的前1000个常见术语 + 第1组1-100的1个术语

Let’s see how it works when you have one very frequent term and another less frequent from different frequency groups. The examples are:

1-50	50-100	100-150	150-200	200-250	250-300
其他可以	没有	遍及	关于他的	该大的	永远
没有	我的用途	其他为什么	有一天	曾给予	他们让
这里一些	已经知道	位置在哪里	如何确定	已经很大	这里的

300-350	350-400	400-450	450-500	500-550	550-600
谁开发	它的书	现在单一	不访问	在解决方案	他们的叫
工作开始	一个ycombinator	从添加	使用网站	知道微软	大多数是
在小时	现在价值	也给定	哪个建立	比权力	早期的

600-650	650-700	700-750	750-800	800-850	850-900
知道科学	应该市场	应该孩子	数字	他们的驱动	谁高度
如果同意	一个分钟	帖子	时间pg	那里选择	有机会
会相关	有国家	获取帖子	http教育	也极其	可以主题

900-950	950-1000
任何派对	尤其是
唯一响应	可以电脑
人火狐	关于电脑

Manticore 3.0 显示平均吞吐量高出 86.3%，95p 延迟低出 109.5%。

测试 4 - 从集合中按组划分的前 1000 个频繁术语 + 来自组 1-100 的 1 个术语，两个术语用引号括起来以形成短语

1-50	50-100	100-150	150-200	200-250	250-300
"工作不"	"你们的"	"我们仍然"	"自己的"	"我获取"	"可以运行"
"我的为了"	"非常的"	"你的工作"	"使用得到了"	"这里坏"	"然后从未"
"但是那"	"获取 com"	"我第一"	"现在日"	"起帮助"	"然后制作"

300-350	350-400	400-450	450-500	500-550	550-600
"它信息"	"他们社区"	"1 关心"	"什么手机"	"出去快乐"	"去观看"
"com 旁边"	"这个服务器"	"com 位置"	"在巨大"	"怎么停止"	"s 写的"
"他看起来"	"是 x"	"时间变成"	"试图"	"应该来到"	"其他声音"

600-650	650-700	700-750	750-800	800-850	850-900
"通过 api"	"马上捐"	"非常好奇"	"有很多"	"修复的内容"	"事情绝对"
"三的内容"	"使用来临"	"那种缺乏"	"谁 UI"	"有理解"	"最话题"
"真的在谈话"	"制作应用程序"	"真的环境"	"将谁在招聘"	"在昂贵的"	"更多核心"

900-950	950-1000
"只是框架"	"抱歉"
"工作资源"	"想要好处"
"他们的资源"	"进一步的"

Manticore v3 在吞吐量上平均快 5.6%，带有 95p 延迟降低 25.1%。

Test 5 - 从组 600-750 中各取 2 个术语，在不同并发下

此测试旨在展示在不同查询并发下吞吐量的差异。几个随机示例：

查询示例：“谈话视图”，“想象 15”，“好奇的术语”

因此，版本 3 在所有并发下平均快 18%，且 95p 延迟降低 15%。

测试 6 - 来自不同组的 3-5 个术语

现在让我们检查更长查询的性能。

从组 100-200 400-500 800-900 中的 3 个术语
从组 100-200 300-400 500-600 800-900 中的 4 个术语
从组 100-200 300-400 500-600 800-900 900-1000 中的 5 个术语

查询示例：

3 个术语	4 个术语	5 个术语
在糟糕的环境下工作	这些搜索非常棒的背景	获得理由结果被采用
总是尝试故事	再次工作评论链接	感受到过程局势更快的带来
工作文本卡片	谷歌号码网络功能	过去几天浏览器已知的薪资

版本 3 再次获胜：吞吐量 - 高出 104%，95p 延迟 - 低 113%。

测试 7：来自组 300-600 的 3 个 AND 术语和 1 个不来自 300-400 的术语

现在让我们为 3 个 AND 添加一个 NOT 术语。

版本 3 的吞吐量高出 33.3%，95p 延迟低了 32%。

结论

新版本在所有测试中表现出显著更高的性能，除了测试 #2，但那里的差异在误差范围内（0.4-0.5%）。

The test is fully dockerized and open sourced in our github . 测试的详细结果可以在这里找到。如果您能在您的硬件上运行相同的测试或通过添加更多测试来贡献给测试套件，并让我们知道结果，我们将不胜感激。
如果您发现任何问题或不准确，请随时告诉我们。

感谢您的阅读！