गुणन: क्लस्टर निर्माण, जुड़ने, तालिका सेटिंग्स को अपडेट करना

गुणन: क्लस्टर निर्माण, जुड़ने, तालिका सेटिंग्स को अपडेट करना

प्रकाशित: Aug 09, 2024
स्वत: अनुवाद: Replication: cluster creation, joining, updating table settings

मेरे बारे में

नमस्ते, मैं माइक हूँ।

मैंने हाल ही में Manticore में डेवलपर एडवोकेट के रूप में काम करना शुरू किया है। मैं ऐसा व्यक्ति हूँ जो आईटी से पूरी तरह से दूर नहीं है, लेकिन मैं आधुनिक तकनीकों के साथ जुट रहा हूँ। इस ब्लॉग में, मैं अपने अनुभव और Manticore के बारे में जो सीखता हूँ, वह साझा करूंगा। मैं अपनी यात्रा को एक डायरी प्रारूप में दस्तावेजित करने की योजना बना रहा हूँ, यह बताते हुए कि Manticore क्या है और इसका उपयोग कैसे करें। आइए देखें कि चीजें एक साथ कैसे काम करती हैं, मुद्दों की पहचान करें, और वास्तविक समय में डेवलपर्स के साथ संवाद करें।

यह मेरे पहले ब्लॉग पोस्ट में से एक है। यदि आप मेरे साथ Manticore के बारे में जानने में रुचि रखते हैं, तो मैं आपको निम्नलिखित में अपडेट रखूंगा:

ट्विटर
टेलीग्राम: EN / RU
स्लैक

गुणन

गुणन का उद्देश्य सामान्यत: क्या है?

पिछले लेख में, हमने शब्द रूपों फ़ाइल को बदलकर पूर्ण-टेक्स्ट खोज सेटिंग्स को अपडेट किया। ऐसा करते समय, हमारी तालिका अनुपलब्ध थी। हमारे स्टोर के उदाहरण के संदर्भ में, डाउनटाइम संक्षिप्त था — केवल कुछ सेकंड। छोटे प्रोजेक्ट्स के लिए, ऐसा संक्षिप्त व्यवधान बेहतर सुविधाजनकता के लिए एक नगण्य मूल्य हो सकता है। हालाँकि, जैसे-जैसे हमारा स्टोर अधिक उत्पादों और ग्राहकों के साथ बढ़ता है, सर्वर सेटिंग्स में परिवर्तन अब कई घंटों तक चलने वाले सिस्टम डाउनटाइम का परिणाम बना सकते हैं। डेटाबेस जितना बड़ा होगा, रीइंडेक्सिंग की प्रक्रिया उतनी ही लंबी होगी। हालाँकि Manticore के साथ रीइंडेक्सिंग केवल कुछ घंटों में — दिन या सप्ताह में नहीं — होती है, हम इतना छोटा विलंब भी टालना पसंद करते हैं। इसके अतिरिक्त, यदि हमारे एकमात्र सर्वर के साथ कुछ हो जाए तो क्या होगा? या यदि ग्राहकों की संख्या एक सर्वर के लिए बहुत बड़ी हो जाती है? उच्च गति की खोज इंजन के सभी लाभ खो जाएंगे। नतीजतन, हम अब एक साथ और समानांतर कार्य करने वाले डेटाबेस की कई प्रतियाँ बनाने पर विचार कर रहे हैं। यह सेटअप सुनिश्चित करेगा कि जब डेटा एक सर्वर पर लिखा जाता है, तो यह अन्य जुड़े सर्वरों या नोड्स में स्वचालित रूप से गुणित होता है।

Manticore में, नोड्स के बीच गुणन गलेरा पुस्तकालय के माध्यम से लागू किया गया है। गलेरा समकालिक बहु-प्रधान गुणन प्रौद्योगिकी का उपयोग करता है, जो डेटाबेस क्लस्टरों के लिए उच्च उपलब्धता और दोष सहनशीलता प्रदान करता है। जब एक सर्वर (नोड) में एक नया रिकॉर्ड जोड़ा जाता है, तो परिवर्तन तुरंत सभी जुड़े नोड्स को प्रेषित किया जाता है। इस प्रक्रिया में तीन चरण शामिल होते हैं: स्रोत नोड पर स्थानीय लेनदेन लॉग में लिखना, अन्य नोड्स पर परिवर्तनों को गुणित करना, और लेनदेन वास्तव में लागू होने से पहले सभी नोड्स द्वारा डेटा की प्राप्ति की पुष्टि करना। लेनदेन केवल तब लागू होता है जब क्लस्टर के सभी नोड्स से पुष्टि मिलती है, जो सभी नोड्स पर डेटा की सुसंगतता सुनिश्चित करता है। उपयोगकर्ता के लिए, ये प्रक्रियाएँ अदृश्य रहती हैं, और किसी भी नोड से नए डेटा तक पहुंच उस नोड पर लेनदेन की सफलतापूर्वक समापन के तुरंत बाद होती है।

प्रारंभिक सेटअप

इस समय, हमारा कंटेनर पिछले लेख से ठीक काम कर रहा है, लेकिन यदि यह रुक जाता है और हटा दिया जाता है, तो डेटा हमेशा के लिए खो जाएगा। Manticore Docker गाइड डेटा निर्देशिका को कंटेनर के बाहर मैप करने की सिफारिश करता है। इसके अतिरिक्त, पिछली बार हमने जो कॉन्फ़िगरेशन किया था, उसमें गुणन के लिए आवश्यक बाइनरी पोर्ट 9312 को अग्रेषित करने का समावेश नहीं था। चलिए इसे ठीक करते हैं, सक्रिय कंटेनर से फ़ोल्डर का स्नैपशॉट बनाकर और सही पोर्ट और संग्रहण सेटिंग्स के साथ एक नया लॉन्च करके।

पहले, हमें डेटा की अखंडता सुनिश्चित करनी होगी। हम कंटेनर में प्रवेश करते हैं, Manticore में लॉग इन करते हैं और तालिका को “फ्रीज़” करते हैं, ताकि मेमोरी में हो सकता है कि सभी डेटा विश्वसनीय रूप से डिस्क पर स्थानांतरित हो जाए, और कॉपी करते समय डिस्क पर कुछ भी नहीं बदले।

FREEZE products;

इसके बाद, कंटेनर से डेटा निर्देशिका की कॉपी बनाएं:

docker cp Manticore:/var/lib/manticore .

कमांड तीन भागों में बंटी हुई है:

cp - कॉपी,
Manticore:/var/lib/manticore - कंटेनर का नाम और इसके अंदर फ़ोल्डर का पथ,
. - स्थानीय पथ जहाँ कॉपी करना है, इस मामले में वर्तमान निर्देशिका।

वर्तमान कंटेनर को पहले की तरह काम करने देने के लिए, हम तालिका को “अनफ्रीज” करते हैं:

UNFREEZE products;

अब, आइए आवश्यक सेटिंग्स के साथ एक नया कंटेनर बनाते हैं:

docker run —name manticore_new -v $(pwd)/manticore:/var/lib/manticore -p 10306:9306 -p 10312:9312 -d manticoresearch/manticore
docker exec -it manticore_new mysql

परिणामस्वरूप, हमारे पास अग्रेषित पोर्ट और सर्वर पर डेटाबेस फ़ाइल स्थान के साथ हमारे कंटेनर का एक क्लोन है। आइए जांचते हैं कि हमारे नए कंटेनर में सब कुछ ठीक है:

नए कंटेनर का दृश्य

बहुत अच्छा, सभी डेटा स्थानांतरित हो गया है, जिसमें कॉन्फ़िगरेशन और शब्द रूपों फ़ाइल भी शामिल है, और अब हम इस कंटेनर के आधार पर एक क्लस्टर बनाएंगे।

By the way, a little about the wordforms file: in the folder we copied, there is a copy of this file. I recommend copying it out, because in the future, as practice has shown, you might need to edit it, and using the file located in the folder with the table is a bad idea, which can eventually lead to some problems. I made a copy outside the database folder: cp manticore/products/wf_pet_products.txt wf_pet_products.txt. And some good news, I talked to my colleagues — soon you won’t need to manually move the wordforms file when using mysqldump, everything will automatically be saved in the dump. Here is the task on GitHub.

हमारे पहले क्लस्टर का निर्माण

To implement our new cluster no complex operations are required — it is enough to create a cluster with a name using one command, then attach the necessary table to it. After that, we just need to check how everything is set up.

To add a cluster, use the CREATE CLUSTER command.
To add a table to the cluster, use the ALTER CLUSTER ADD command. It’s important to note that clustering, replication, and other Manticore features are available only for real-time tables !

So, let’s create our first cluster and immediately add our table to it:

create cluster pet_shop;
alter cluster pet_shop add products;

Now let’s check what we got. To do this, use the show status command, but it will give a lot of information, and to avoid getting lost in it, you can use a filter with the like operand:

show status like '%cluster%'

Show status result

At the moment, we are interested in the following lines: cluster_name, cluster_pet_shop_status, cluster_pet_shop_indexes. These show the name of the cluster, the status (if it says Primary, then everything is good), and the tables that are currently in the cluster.
We should also note the line cluster_pet_shop_incoming_addresses. In my setup, it looks like this: 172.17.0.5:9312,172.17.0.5:9315:replication. We will need the address 172.17.0.5:9312. We have port 9312 mapped to port 10312 outside of Docker, but in the example, we will run a new node within the same Docker network 172.17.0.0, making port usage simpler.

Technically, we could have used the original container from the article about Wordforms by just adding the cluster to it and connecting the table. Then we could create a new container with external storage, and replication would copy everything to the local image. But then I wouldn’t show how to solve the problem in another way, by saving a dump through copying the folder from the container… =)
We make our own bed and then lie in it.

The first node is already set up, no more additional actions are needed. Simple? It’s a breeze!

क्लस्टर में एक और नोड जोड़ना

First, we need to start another container with Manticore. We won’t transfer anything to it, just connect it to the existing cluster. The local storage must be different (the folders you connect should be different if you are doing this on the same server). It’s important to remind about ports, as we have already used ports 9306, 10306, and 10312. So, let’s assign different ports, for example, 11306 and 11312.

We create another container with an instance of Manticore, naming it Manticore_new_1. We specify ports 11306 and 11312, for the volume we specify manticore_new_1 (the local folder must already exist).

New node

Or all the same in one command:

docker run --name manticore_new_1 -v $(pwd)/manticore_new_1:/var/lib/manticore -p 11306:9306 -p 11312:9312 -d manticoresearch/manticore

Log in through the MySQL client. Here’s a nuance: if you use a local MySQL client to connect, not the one inside the container, then use the external port you specified when creating the node — 11306. If you use Docker interface and enter through the container terminal (docker exec), then use the default port for Manticore — 9306. In any case, connect. Is there any tables (show tables). The result is expectedly empty, as we just created an empty container with Manticore. Now connect it to the existing cluster — join cluster pet_shop at '172.17.0.5:9312';

Connect to cluster and checking parameters

For clarity, I changed the console color for the second node.

As we can see, the table has been added, the number of records matches the original node, and the configuration of the stemmer and wordforms file is correct.
Basically, that’s it. The cluster is assembled and working, data is being transferred between nodes.

Important Note. If the node you are connecting to the cluster has tables with the same names as the tables in the cluster, the tables on the node will be overwritten with data from the cluster. Cluster data has a higher priority over local tables, so if you are connecting an existing node that already has some data, make sure the names of the existing tables are different from those in the cluster. And as always, in any risky situation, make a backup .

क्लस्टर में डेटा प्रबंधन

When working with table data in the cluster, there are some differences. The insert command now requires some adjustments — we need to specify, besides the table name, its corresponding cluster’s name : insert into <cluster name>: <table name>(<fields>) values (<values>). Don’t forget to update this command in your client.

Let’s add another record while being in the newly created node:

insert into pet_shop:products (name, info, price, avl) values ('Aquarium ship', 'Decorative ship model for aquarium', 6, 1);

Result from add new record

According to the result, the record is added, but how about the other node?
Result for second node

यहाँ भी सब कुछ व्यवस्थित है!
चलो डेटा को अपडेट करने की कोशिश करते हैं:

mysql> update products set price = 8.0 where id = 3317338896206921730;
ERROR 1064 (42000): table products: table 'products' is a part of cluster 'pet_shop', use 'pet_shop:products'

अपडेट करने, बदलने, और खासकर रिकॉर्ड को हटाने के लिए, हमें अब तालिका के नाम में क्लस्टर का नाम भी निर्दिष्ट करना आवश्यक है:

update pet_shop:products set price = 8 where id = 3317338896206921730;
Query OK, 1 row affected (0.01 sec)

इस प्रकार, अब डेटा नोड्स के बीच स्वचालित रूप से और बिना किसी जटिलता के स्थानांतरित किया जाता है, केवल लिखने के आदेशों में एक छोटी सी परिवर्तन के अलावा।

एक पुनरावृत्त तालिका की सेटिंग्स बदलना

क्या होगा अगर हमें तालिका कॉन्फ़िगरेशन को बदलने या हटाने की आवश्यकता है, उदाहरण के लिए, शब्द रूप फाइल को अपडेट करने के लिए? पिछले लेख में, हमें तालिका को हटाना और फिर से बनाना पड़ा, जिससे उपयोगकर्ता कुछ समय के लिए सर्वर उत्तरों के बिना रह गए। उस उदाहरण में, सेटिंग्स को अपडेट करने में लगा समय बहुत कम था क्योंकि तालिका छोटी थी। लेकिन बड़े डेटा सेट, लाखों और अरबों रिकॉर्ड वाली तालिकाओं के साथ, अपडेट करना और अनुक्रमण करना लंबे समय तक ले सकता है, अक्सर घंटों में मापी जाती है। Manticore-आधारित अनुप्रयोगों के लिए निर्बाध सेवा सुनिश्चित करने के लिए, वहाँ वितरित तालिकाएँ हैं, लेकिन हम इस पर एक और लेख में चर्चा करेंगे।

अभी, हमारे पास products तालिका के साथ कई नोड्स में एक पुनरावृत्त डेटाबेस है। हम इस तालिका की कॉन्फ़िगरेशन को क्लस्टर नाम उपसर्ग का उपयोग करके बदल सकते हैं, लेकिन हम इसे हटा नहीं सकते, यहाँ तक कि उपसर्ग के साथ भी। एक पुनरावृत्त तालिका की सेटिंग्स बदलने के लिए, सबसे पहले, इसे क्लस्टर से डिस्कनेक्ट करें: ALTER CLUSTER <cluster name> DROP <table name>। यह तालिका को केवल क्लस्टर से हटा देगा, डेटाबेस से नहीं। तालिका क्लस्टर से अयोजित होने के बाद, एप्लिकेशन से डेटा अपडेट करना संभव नहीं होगा क्योंकि यह क्लस्टर को संदर्भित करता है (उदाहरण के लिए, insert into pet_shop:products ...), और तालिका अब उसमें नहीं है (एप्लिकेशन को इस स्थिति को संभालना चाहिए)। अब हम तालिका को हटाने या पुनः कॉन्फ़िगर करने के लिए तैयार हैं।

उदाहरण के लिए, चलो तालिका कॉन्फ़िगरेशन को अपडेट करें: स्टेमर से लेम्मेटाइज़र में स्विच करें। यहाँ कदम हैं:

तालिका को क्लस्टर से डिस्कनेक्ट करें।
तालिका में रूपविज्ञान को स्टेमर से लेम्मेटाइज़र में बदलें।
तालिका में डेटा को पुनः लोड करें।
क्लस्टर में तालिका को पुनर्स्थापित करें।
दूसरे नोड पर जाँचें।

तालिका को क्लस्टर से डिस्कनेक्ट करना:

ALTER CLUSTER pet_shop DROP products;

अब क्लस्टर में सभी नोड्स पर तालिका उससे डिस्कनेक्ट हो गई है, और इसकी स्कीमा और सेटिंग्स में परिवर्तन किए जा सकते हैं। हमारे काम की तार्किकता यह मानती है कि एक नोड पर हम कुछ तकनीकी कार्य करते हैं, जबकि दूसरा उपयोगकर्ताओं से select प्रश्नों का उत्तर देता है। सुरक्षा उपाय के रूप में, नए रिकॉर्ड जोड़ना अब संभव नहीं होगा क्योंकि एप्लिकेशन <cluster>:<table> प्रारूप में आदेशों का उपयोग करता है और यह तालिका अब क्लस्टर में नहीं है।

update pet_shop:products set price = 9 where id = 3317338896206921730;
ERROR 1064 (42000): table products: table 'products' is not in any cluster, use just 'products'

जैसे ही हमने तालिका को क्लस्टर से हटा दिया, चलो एक select प्रश्न करने की कोशिश करते हैं:

Result for other node

जैसा कि हम देख सकते हैं, प्रश्न संसाधित होता है, डेटा प्रदान किया जाता है, और अंतिम उपयोगकर्ता को खुश होना चाहिए।

अब चलो स्टेमर से लेम्मेटाइज़र में रूपविज्ञान को संशोधित करें, रिकॉर्ड को पुनः अनुक्रमित करें, और सब कुछ फिर से कनेक्ट करें। पिछले लेख में, हमने कुछ कठोर तरीकों का उपयोग करके शब्द रूप फाइल और स्टेमर को बदल दिया था। यहाँ हम अधिक सभ्य उपकरणों का उपयोग करेंगे। तालिका में शब्द रूप फाइल को बदलने या रूपविज्ञान को बदलने के लिए सभी संचालन एक आदेश से किए जा सकते हैं: ALTER TABLE <table name> morphology='<morph type>'। चलो अपने स्टेमर को एक लेम्मेटाइज़र के साथ बदलते हैं:

ALTER TABLE products morphology='lemmatize_en_all';

डेटाबेस में टेक्स्ट प्रीप्रोसेसिंग से संबंधित किसी भी पैरामीटर को बदलने के बाद, सभी मौजूदा रिकॉर्ड को पुनः अनुक्रमित करना आवश्यक है ताकि रूपविज्ञान और अन्य टोकनाइजेशन सेटिंग्स पुराने दस्तावेजों पर लागू हों:

mysqldump -P9306 -h0 --replace --skip-comments manticore products | mysql -P9306 -h0;

यहाँ हम mysqldump तकनीक का उपयोग कर रहे हैं, डंप आउटपुट को सीधे MySQL के माध्यम से Manticore में पुनः निर्देशित कर रहे हैं। --replace विकल्प mysqldump को INSERT के बजाय REPLACE आदेश उत्पन्न करने के लिए मजबूर करता है, जिससे हम “एक बार में” पूरी तालिका को “पुनः लोड” कर सकते हैं। ध्यान रखें कि इस आदेश का निष्पादन समय बड़े तालिका के लिए या कमजोर सर्वर पर लंबा हो सकता है, लेकिन यह हमें अधिक परेशान नहीं करता क्योंकि हमारे पास एक बैकअप नोड है जो वर्तमान में उपयोगकर्ता अनुरोधों को संभाल रहा है और mysqldump आदेश तालिका को अवरुद्ध नहीं करता है।

products तालिका के साथ इस सरल तालिका पुनर्गठन को करने के बाद, हमें एक नया संस्करण मिलता है:

New table

नई सेटिंग्स और सभी डेटा लागू हो गए हैं, अब चलो इस तालिका को फिर से क्लस्टर में जोड़ते हैं:

ALTER CLUSTER pet_shop ADD products;

बस इतना ही, अब तालिका सभी सर्वरों पर अपडेट हो चुकी है, और डेटा हमेशा उपयोगकर्ताओं के लिए उपलब्ध था दूसरे नोड से जबकि हम सब कुछ को कॉन्फ़िगर और जांचने के लिए सुनिश्चित कर रहे थे कि यह ठीक से काम करे।

Other node

यह महत्वपूर्ण है कि यदि सभी नोड्स विफल हो जाएं तो पूरे क्लस्टर की पुनर्प्राप्ति पर ध्यान दें - गलत पुनर्प्राप्ति अनुक्रम के साथ, उसके सभी सेटिंग्स खोने का खतरा होता है। विस्तृत पुनर्प्राप्ति विधियाँ डॉक्यूमेंटेशन में वर्णित हैं।

वैसे, आप हमारी इंटरएक्टिव कोर्स play.manticoresearch.com. में पुनरावृत्ति के साथ आसानी से खेल सकते हैं।

आज के लिए इतना ही! आगे की यात्रा सुचारू हो! यह माइकल था, आपको शुभकामनाएँ!