Demo: GitHub सर्च Manticore Search के साथ

Demo: GitHub सर्च Manticore Search के साथ

प्रकाशित: Feb 27, 2024
स्वत: अनुवाद: Demo: GitHub search with Manticore Search

TL;DR: इस ब्लॉगपोस्ट में हम दिखाते हैं कि हमने कैसे एक सर्च ऐप बनाया जो GitHub के मुद्दों को खोजने के लिए उपयोग किए जाते हैं, Manticore Search का उपयोग करते हुए, बहुत समान है।

डेमो का प्रयास करें:
- अपने रेपो को क्रॉल करें - https://github.manticoresearch.com/ . आपको इंतज़ार करना पड़ेगा।
- एक क्रॉल किए गए रेपो में खोजें - https://github.manticoresearch.com/manticoresoftware/manticoresearch .
GitHub प्रोजेक्ट - https://github.com/manticoresoftware/manticore-github-issue-search .
अपने लिए सेट करें ताकि आप अपने GitHub मुद्दों, पुल अनुरोधों, और टिप्पणियों को एक नए तरीके से देख सकें।

परिचय

Manticore Search की क्षमताओं और प्रदर्शन को प्रभावकारी ढंग से उजागर करने के हमारे सफर में, हमने एक वास्तविक दुनिया के एप्लिकेशन का चयन करने के महत्व को महसूस किया जो एक प्रेरक प्रदर्शन कर सके। हमने कई विकल्पों पर विचार किया — सामान्य विकल्प जैसे:

ई-कॉमर्स साइट
निर्देशिका सूचीकरण
फ़िल्म डेटाबेस आदि।

जबकि ये परिचित हैं और आसानी से समझ में आने वाले उदाहरण हैं और Manticore उनके लिए एकदम सही है, वे व्यावहारिक मूल्य प्रदान करने में पर्याप्त नहीं हैं।

तभी प्रेरणा आई: GitHub मुद्दों के लिए एक खोज उपकरण क्यों न बनाएं? न केवल इसने हमें एक शक्तिशाली डेमो पेश करने का मौका दिया, बल्कि यह Manticore Search की उन्नत सुविधाओं के साथ खोज अनुभव को बढ़ाने का अवसर भी प्रस्तुत करता है जो कम से कम हमारे लिए, Manticore कोर टीम के लिए उपयोगी होना चाहिए।

हमने इस चुनौती को अपनाया और गर्व के साथ प्रस्तुत करते हैं हमारा निर्माण — GitHub मुद्दों के लिए अनुकूलित एक विशेष खोज इंजन। यह केवल एक प्रदर्शन नहीं है; यह एक व्यावहारिक उपकरण है जिसे हम GitHub समुदाय के लिए बहुत उपयोगी होने की उम्मीद करते हैं।

हम आपको आमंत्रित करते हैं कि आप हमारे GitHub मुद्दों की खोज का पता लगाएं और उससे इंटरैक्ट करें https://github.manticoresearch.com . इस व्यावहारिक अनुभव के माध्यम से Manticore Search की पूरी क्षमता का पता लगाएं। हमारे उन्नत खोज क्षमताओं के लाभ का आनंद लें और देखें कि Manticore Search डेटा अन्वेषण को कैसे बदल सकता है।

हमने कुछ मामलों में GitHub की तुलना में 30 गुना तेज़ खोज गति प्राप्त की है। यह जानने के लिए उत्सुक हैं कि यह कैसे काम करता है और हमने इसे कैसे बनाया? आइए देखें कि हमने इसे कैसे बनाया।

Github खोज - 215 ms	Github खोज Manticore द्वारा - 6 ms

आवश्यकताएँ

यह अवधारणा सीधी थी – हमारा उद्देश्य GitHub पर चयनित रेपो से डेटा को Manticore खोज डेटाबेस में लाना था। डेटा को पूर्ण पाठ अनुक्रमित करके, हम प्रभावी खोज क्षमताओं को सक्षम कर सकते थे।

हमारा निर्णय GitHub के बहुत समान एक डिज़ाइन बनाए रखने का था, लेकिन उपयोगकर्ता इंटरफ़ेस में मामूली सुधार के साथ ताकि न केवल तकनीक से पैशान होने वाले उपयोगकर्ता बल्कि कम तकनीकी विशेषज्ञता वाले लोगों को भी समायोजित किया जा सके।

इसके अलावा, हम संयुक्त खोज जैसी अतिरिक्त सुविधाओं को पेश करने का उद्देश्य रखते थे:

मुद्दों और टिप्पणियों का एकीकृत खोज
उन्नत फ़िल्टरिंग विकल्प
प्रतिक्रियाओं के आधार पर परिणामों को क्रमबद्ध करने की क्षमता
अनंत स्क्रॉल पृष्ठन

आइए विवरण में जाएं, सामने आने वाली चुनौतियों की जांच करें, और जानें कि Manticore इनका समाधान कैसे कर सकता है एक व्यावहारिक उदाहरण के माध्यम से।

MVP के लिए सही उपकरण का चयन

डेमो विकसित करना एक चुनौती हो सकती है, लेकिन जब आप समय के खिलाफ दौड़ रहे होते हैं, तो आपको जितनी मदद मिल सके, लेनी होती है। इसीलिए हमने बैकएंड के लिए PHP और क्लाइंट साइड के लिए JavaScript के परीक्षण- और -सत्यता से संयोजन पर भरोसा किया — एक SEO-अनुकूल हाइब्रिड जादू के साथ। आप पूछते हैं कि PHP क्यों? खैर, यह आपके प्रोजेक्ट की पीठ पर एक जेटपैक बांधने की तरह है! इसे शुरू करना तेज है, मान्यता प्राप्त करने में सरल है, और परीक्षण करना आसान है। और निश्चित रूप से, क्योंकि हमारी टीम PHP में अन्य सुंदर और आधुनिक प्रोग्रामिंग भाषाओं की तुलना में अधिक अनुभव रखती है। (BTW, यह पढ़ें कि आप C++ में लिखे गए Manticore Search के लिए PHP प्लगइन कैसे बना सकते हैं।)

Manticore Search के पास PHP क्लाइंट भी है जिसका उपयोग हम डेमो में करते हैं। इसका उपयोग करना साधारण रूप से इस प्रकार है:

<?php
use Manticoresearch\\Client;
$client = new Client(['host' => 'localhost', 'port' => 9308]);
$index = $client->index('repo');
$docs = $index->search('bug')->get();
foreach ($docs as $doc) {
    var_dump($doc->getId(), $doc->getData());
}

बस वैसे ही, आप एक Manticore Client बनाते हैं, उस टेबल को चुनते हैं जिसके साथ आप बातचीत करना चाहते हैं, एक खोज अनुरोध भेजते हैं, और—वाह! — परिणाम बहने लगते हैं।

हम यहाँ PHP के लिए Manticore Search Client के गहरे अंत में नहीं कूदेंगे, लेकिन अगर आप इसे आज़माने के लिए तैयार हैं, तो Manticore Search PHP Client पर उनके रिपोजिटरी की जांच करें।

डेमो में विभिन्न घटक शामिल हैं, क्योंकि यह:

GitHub से डेटा खींचता है,
प्रोसेस करने के लिए रेपोजिटरी की एक कतार बनाए रखता है
ईमेल के जरिए सूचनाएं भेज सकता है
आदि आदि आदि

आपकी सुविधा के लिए, Manticore Search के साथ इंटरैक्ट करने वाला संपूर्ण कोड Manticore.php में स्थित है। यह भविष्य में विभिन्न स्टोरेज इंजनों की तुलना करने पर विचार कर रहे किसी के लिए भी उपयोगी हो सकता है।

दिलचस्प चुनौतियाँ जिन्हें हमें पार करना पड़ा

डेमो पर काम करते समय, ऊपर बताई गई सामान्य बातों को लागू करने के अलावा, हमें कुछ दिलचस्प चुनौतियों का सामना करना पड़ा जो आपको भी अपने प्रोजेक्ट में मिल सकती हैं।

दो टेबल को मिलाकर खोज परिणामों में प्रासंगिकता

खोज प्रणाली के किसी भी पहलू में परिणामों की प्रासंगिकता एक महत्वपूर्ण पहलू है। मैंटिकोर सर्च के बैकएंड के साथ GitHub मुद्दा डेमो को लागू करते समय, यह उल्लेखनीय है कि प्रासंगिकता को सीधे बाहर से ही कुशलतापूर्वक प्रबंधित किया जाता है। मैंटिकोर सर्च पारंपरिक BM25-आधारित रैंकिंग विधियों का उपयोग करता है जो दस्तावेजों और क्वेरी में कीवर्ड की बारंबारता और महत्व के आधार पर खोज परिणामों को क्रमबद्ध करता है और फील्ड लंबाई सामान्यीकरण (जहाँ मिलान शब्द पाया जाता है)। इसका मतलब है कि एक अत्यंत प्रभावी खोज अनुभव शुरू करने के लिए जटिल विन्यास या जटिल एल्गोरिदम की आवश्यकता नहीं होती। अधिक विवरण के लिए, आप दस्तावेजीकरण - रैंकिंग अवलोकन देख सकते हैं।

हमारे सामने आई चुनौती GitHub मुद्दों और टिप्पणियों में संयुक्त खोज करने की थी। तकनीकी रूप से, हमने इसे मैंटिकोर स्तर पर दो अलग-अलग तालिकाओं में विभाजित किया: एक मुद्दों के लिए और दूसरी टिप्पणियों के लिए। रैंकिंग तंत्र का अनुसंधान करने के बाद, हमने रैंक-पक्षपातपूर्ण सटीकता (RBP) एल्गोरिदम को लागू करने का निर्णय लिया, जो हमें दो अलग-अलग स्रोतों से परिणामों को जोड़ने की अनुमति देता है। इसके अलावा, मैंटिकोर सर्च एक ‘स्कोर’ फील्ड प्रदान करता है जिसे PHP क्लाइंट से $doc->getScore() विधि का उपयोग करके प्राप्त किया जा सकता है। आप यहाँ कोड जाँच सकते हैं: मैंटिकोर.php कोड ।

परिणामस्वरूप, हम न केवल ‘बाहर से ही’ प्रासंगिकता प्राप्त करते हैं बल्कि दो स्रोतों को जोड़ने के लिए RBP का लाभ भी उठाते हैं, खोज परिणामों की प्रभावशीलता को अधिकतम करते हुए!

मुद्दों और टिप्पणियों का उन्नत फ़िल्टरिंग

चरण 1: श्रेणियों को रेंडर करना

खोज कार्यक्षमता के क्षेत्र में, एक सामान्य खोज अक्सर पर्याप्त नहीं होती। उपयोगकर्ता अक्सर अपने परिणामों को सीमित करने के लिए फ़िल्टर का उपयोग करते हैं। मैंटिकोर सर्च और कई अन्य खोज इंजन में श्रेणी या समानता पर आधारित सरल फ़िल्टर लागू करना सीधा है। हालांकि, जब निश्चित श्रेणियों के भीतर परिणामों को समूहीकृत करने की बात आती है, तो कार्य डरावना लग सकता है - लेकिन वास्तव में, मैंटिकोर सर्च के साथ यह काफी प्रबंधनीय है।

हमारा लक्ष्य उपयोगकर्ताओं को पूर्व निर्धारित श्रेणियां चुनने और तदनुसार फ़िल्टर लागू करने में सक्षम बनाना है, जबकि किसी अतिरिक्त डेटा को संग्रहित या कैश करने की आवश्यकता से बचना है। उदाहरण के लिए, हम टिप्पणियों की संख्या द्वारा मुद्दों को फ़िल्टर करना चाहते हैं: ≤ 5, 5 और 10 के बीच, और ≥ 10। मैंटिकोर सर्च अपने INTERVAL फ़ंक्शन के साथ इस प्रक्रिया को सरल बनाता है। आइए देखते हैं कि डेमो में इसे कैसे लागू किया गया है।

हमने एक विशेष विधि तैयार की जो हमारी वांछित श्रेणियों के साथ-साथ प्रत्येक श्रेणी में आने वाली वस्तुओं की गणना भी उत्पन्न करती है। यहाँ इसे समझने के लिए कुछ छद्म कोड है:

$client = static::client();
$index = $client->index('issue');
$search = $index->search('');
$range = implode(',', $values);
$facets = $search
    ->limit(0)
    ->filter('repo_id', $repoId)
    ->expression('range', "INTERVAL(comments, $range)")
    ->facet('range', 'counters', sizeof($values) + 1)
    ->get()
    ->getFacets();

आप पूर्ण कोड यहाँ देख सकते हैं:

पूर्ण कोड देखें

चरण 2: फ़िल्टर लागू करना

अगला चरण परिणामों को फ़िल्टर करना है। यह gt (बड़ा) फ़िल्टर को or शर्त के साथ संयोजित करके किया जाता है। नीचे कोड का एक सरलीकृत प्रतिनिधित्व है:

$search->filter('comments', 'gt', 0, Search::FILTER_AND);
$search->filter('comments', 'lte', 3, Search::FILTER_OR);

आप इस कोड स्निपेट को यहाँ जाँच सकते हैं:

स्निपेट जाँचें

प्रतिक्रियाओं द्वारा क्रमबद्ध करना

GitHub पर खोज करते समय, आप देख सकते हैं कि यह प्रतिक्रियाओं को प्रदर्शित या फ़िल्टर करने की अनुमति नहीं देता। हालांकि, कभी-कभी सबसे अधिक प्रतिक्रिया वाले मुद्दों की पहचान करना विशेष रूप से अंतर्दृष्टिपूर्ण हो सकता है - उदाहरण के लिए, सबसे वांछित सुविधाओं को समझने या आगामी चिंताओं का अनुमान लगाने के लिए। यही वह जगह है जहाँ प्रतिक्रियाओं द्वारा क्रमबद्ध करना अमूल्य होता है।

सबसे पहले, हमें प्रतिक्रिया डेटा कैप्चर करने की आवश्यकता है। GitHub API इसे एक सरल JSON ऑब्जेक्ट के रूप में प्रदान करता है:

{
  "url": "https://api.github.com/repos/ClickHouse/ClickHouse/issues/35407/reactions",
  "total_count": 0,
  "+1": 0,
  "-1": 0,
  "laugh": 0,
  "hooray": 0,
  "confused": 0,
  "heart": 0,
  "rocket": 0,
  "eyes": 0
}

यह उत्कृष्ट खबर है क्योंकि मैंटिकोर सर्च मूल JSON समर्थन प्रदान करता है!

इसके बाद, हमें अपनी क्रमबद्धता आवश्यकता पर विचार करना होगा। क्या हमें अलग-अलग JSON फील्ड द्वारा या कई फील्ड के योग द्वारा क्रमबद्ध करने की आवश्यकता है? सौभाग्य से, मैंटिकोर सर्च दोनों को करने में सक्षम बनाता है। यह बिल्कुल हमारी आवश्यकताओं के अनुरूप है! हम सीधे तालिका में JSON संग्रहित कर सकते हैं और निम्न कोड स्निपेट का उपयोग करके क्रमबद्धता को सक्षम कर सकते हैं:

$search->expression(
    'positive_reactions',
    'integer(reactions.`+1`) + integer(reactions.hooray) + integer(reactions.heart) + integer(reactions.rocket)'
);

क्रमबद्धता कार्यान्वयन का व्यापक दृश्य देखने के लिए, यहाँ पूर्ण कोड स्निपेट देखें: मैंटिकोर PHP क्लाइंट क्रमबद्धता उदाहरण

As demonstrated, we utilize the expression function of the Manticore PHP client to access JSON fields using the . notation. This approach eliminates the need for caching counters or performing additional calculations. You can create a JSON field, access it with expressions, maintain high speed, and avoid the overhead of caching mechanisms!

Faceted search

Searching and filtering capabilities are essential components of any robust search functionality. However, a common challenge arises when dealing with the speed of obtaining counts. It’s widely acknowledged that achieving rapid count operations in MySQL necessitates the use of indexes. These indexes not only expand the database size but also add complexity to heavily loaded applications, which often resort to caching and subsequently adjusting these counts as necessary.

The good news is that Manticore Search sidesteps these issues entirely! With Manticore Search, retrieving counts from the database is both straightforward and swift, eliminating the need for additional caching layers.

To display real-time counts that reflect the filters applied on a page, we utilize the same filters used for the search. However, we introduce an extra query for facets, which takes just a few milliseconds. This approach allows us to obtain current counts for specified groups with virtually no overhead. Below is a concise PHP code snippet demonstrating how to accomplish this:

$facets = $search
  ->limit(0) // We're only interested in counts, hence no results needed
  ->filter('repo_id', $repoId) // Filter by repository ID
  ->expression('open', 'if(closed_at=0,1,0)') // Evaluate whether issues are open
  ->facet('open', 'counters', 2) // Get facet counts for open and closed issues
    ->get() // Execute the search query and retrieve the results
  ->getFacets(); // Extract the facets data from the results

Let’s break it down: we set the limit to zero because our goal is to obtain counters, not search results. We filter by the repository ID and apply an expression to group by the closed_at field. This grouping provides us with counters for both open and closed issues.

For those interested in the full implementation, the complete code snippet is available on GitHub: Manticore GitHub Issue Search - Manticore.php

With Manticore Search, the challenge of efficiently obtaining counts is addressed with an almost out-of-the-box solution. What could be more efficient and user-friendly? 😊

Conclusion and further plans

In the process of developing our demo project, we aimed to showcase the capabilities and efficiency of Manticore Search. The result has not only met our expectations but also provided us with a tool that enhances the way we navigate our Github repositories. Through this initiative, we’ve been able to demonstrate the potential of Manticore Search and have integrated a number of improvements and features that enhance the current offerings on GitHub:

We’ve achieved search speeds that are noticeably faster, with searches typically completed in about 5-10ms, compared to GitHub’s search times of over 200ms.
Our demo project allows for the inclusion of comments within search results, providing a broader scope of information than what is currently available on GitHub.
We’ve introduced the ability for users to sort issues based on the number of reactions, offering an additional dimension of user interaction.
Advanced filtering options are available, allowing for more precise searches, such as displaying issues within a specific range of comments or focusing searches exclusively within comments.

We encourage you to explore these enhancements by visiting: https://github.manticoresearch.com

Additionally, for those interested in the open-source code or in running the project locally, it is accessible here: https://github.com/manticoresoftware/manticore-github-issue-search

We’re also excited to announce plans to incorporate vector search (available in Manticore dev packages, preparing for release) into our demo. This upcoming feature aims to further refine the quality of results when combined with full text search, showcasing how to leverage new capabilities in Manticore to enhance search functionality, so stay tuned and follow us on Twitter .

We welcome your feedback on this practical demonstration of Manticore Search’s features and capabilities and look forward to sharing more updates with you. Waiting for your feedback: issues , discussions .