在本教程中,您将学习如何在Manticore Search中突出显示搜索结果。如果您想提高应用程序或网站中搜索结果的可读性,可以从中受益。
高亮显示允许您获取包含匹配关键字的搜索结果片段。它有助于提升应用程序的搜索体验。
介绍
您可以在Manticore Search中使用几种方法来突出显示文本中的关键字。
- 语句 CALL SNIPPETS 允许从文档中获取包含匹配项的片段列表(称为片段)。它可以与搜索查询分开使用,以突出显示字符串或字符串列表。以下是一个示例:
CALL SNIPPETS('my text with keyword', 'index', 'keyword');
- 函数 SNIPPET() 使用指定的索引设置,从提供的数据和查询中构建片段。此函数主要用于SELECT语句中,以突出显示给定的文本、字段值或通过UDF(用户定义函数)从其他源获取的文本。它可以用于突出显示与匹配子句中的查询相同或不同的查询,由您决定。像这样:
SELECT SNIPPET(content,'camera') FROM index WHERE MATCH('camera');
- 函数 HIGHLIGHT() 可用于突出显示搜索结果。此函数在Manticore 3.2.2中添加。当您将文档存储在Manticore中时,而不仅仅是获取它们的索引,它使突出显示关键字变得更加容易。以下是调用示例:
SELECT HIGHLIGHT() FROM index WHERE MATCH('text feature');
前两个CALL SNIPPETS和SNIPPET()提供了获取包含搜索关键字匹配项的文档部分(称为片段)列表的能力。最后一个HIGHLIGHT()从文档存储中获取所有可用字段,并根据给定查询突出显示它们。与SNIPPET()不同,HIGHLIGHT()支持查询中的字段语法。
所有三个函数共享相同的高亮选项,我们将在下一步中讨论这些选项。在本教程中,我们将展示使用HIGHLIGHT()的示例。
假设您有一个名为'highlight'的索引,具有以下设置:
index highlight
{
type = rt
path = highlight
rt_field = title
rt_field = content
rt_attr_uint = gid
stored_fields = title, content
index_sp = 1
html_strip = 1
}
基本用法
一个快速示例:
首先,添加一个文档:
INSERT INTO highlight(title,content,gid) VALUES('Syntax highlighting','Syntaxhighlighting is a feature of text editors that are used for programming, scripting, or markuplanguages, such as HTML. The feature displays text, especially source code, in different colors and fonts according to the category of terms.[1] This feature facilitates writing in a structured language such as a programming language or a markup language as both structures and syntax errors are visually distinct. Highlighting does not affect the meaning of the text itself;it is intended only for human readers.',1);
然后运行SELECT HIGHLIGHT():
SELECT HIGHLIGHT() AS h FROM highlight WHERE MATCH('text feature')\G
*************************** 1. row ***************************
h: Syntax highlighting is a <b>feature</b> of <b>text</b> editors that are used ... , such as HTML. The <b>feature</b> displays <b>text</b>, especially source code, in ... of terms.[1] This <b>feature</b> facilitates writing in a structured ... affect the meaning of the <b>text</b> itself; it is intended ...
默认情况下,任何匹配的单词都会使用'tag' 'bold'进行高亮,并且每个匹配项周围最多选择5个单词来形成段落。
默认情况下,段落用...分隔。
由于片段通常显示在HTML内容中,因此使用HTML标签来突出显示匹配项,但您可以使用"before_match"、"after_match"、"around"和"chunk_separator"设置自定义行为。例如:
SELECT HIGHLIGHT({before_match='*',after_match='*',around=1,chunk_separator='###'}) AS h FROM highlight WHERE MATCH('text feature')\G
*************************** 1. row ***************************
h: ### a *feature* of *text*###. The *feature* displays *text*###] This *feature* facilitates### the *text* itself###
控制片段大小
默认设置将最大片段大小限制为256个字符(在同名设置下 - "limit")。您可以这样更改:
SELECT HIGHLIGHT({limit=10}) AS h FROM highlight WHERE MATCH('text feature')\G
*************************** 1. row ***************************
h: ... a <b>feature</b> ...
另一个可以更改的限制是片段中包含的单词数量,由"limit_words"定义:
SELECT HIGHLIGHT({limit_words=5},'content') AS h FROM highlight WHERE MATCH('text feature')\G
*************************** 1. row ***************************
h: ... . The <b>feature</b> displays <b>text</b>, especially ...
还可以限制段落数量,例如,如果我们只想获取一个段落:
SELECT HIGHLIGHT({limit_passages=1}) AS h FROM highlight WHERE MATCH('text feature')\G
*************************** 1. row ***************************
h: ... languages, such as HTML. The <b>feature</b> displays <b>text</b>, especially source code, in ...
HIGHLIGHT()函数的默认行为是返回在限制范围内由定义的分隔符分隔的找到的段落。
由于限制可能不足以容纳所有段落,我们可能只能得到部分可能的段落。
为了演示这一点,让我们先添加一个包含较长文本的文档。
INSERT INTO highlight(title,content) values('wikipedia','Syntax highlighting is a feature of text editors that are used for programming, scripting, or markup languages, such as HTML. The feature displays text, especially source code, in different colors and fonts according to the category of terms.[1] This feature facilitates writing in a structured language such as a programming language or a markup language as both structures and syntax errors are visually distinct. Highlighting does not affect the meaning of the text itself; it is intended only for human readers. Syntax highlighting is a form of secondary notation, since the highlights are not part of the text meaning, but serve to reinforce it. Some editors also integrate syntax highlighting with other features, such as spell-checking or code folding, as aids to editing which are external to the language. Contents 1Practical benefits 2Support in text editors 3Syntax elements 3.1Examples 4History and limitations 5See also 6References Practical benefits Highlighting the effect of missing delimiter (after watch=false) in Javascript Syntax highlighting is one strategy to improve the readability and context of the text; especially for code that spans several pages. The reader can easily ignore large sections of comments or code, depending on what they are looking for. Syntax highlighting also helps programmers find errors in their program. For example, most editors highlight string literals in a different color. Consequently, spotting a missing delimiter is much easier because of the contrasting color of the text. Brace MATCHing is another important feature of many popular editors. This makes it simple to see if a brace has been left out or locate the MATCH of the brace the cursor is on by highlighting the pair in a different color. A study published in the conference PPIG evaluated the effects of syntax highlighting on the comprehension of short programs, finding that the presence of syntax highlighting significantly reduces the time taken for a programmer to internalize the semantics of a program.[2] Additionally, data gathered FROM an eye-tracker during the study suggested that syntax highlighting enables programmers to pay less attention to standard syntactic components such as keywords. Support in text editors gedit supports syntax highlighting Some text editors can also export the colored markup in a format that is suitable for printing or for importing into word-processing and other kinds of text-formatting software; for instance asa HTML, colorized LaTeX, PostScript or RTF version of its syntax highlighting. There are several syntax highlighting libraries or "engines" that can be used in other applications, but are not complete programs in themselves, for example the Generic Syntax Highlighter (GeSHi) extension for PHP. For editors that support more than one language, the user can usually specify the language of the text, such as C, LaTeX, HTML, or the text editor can automatically recognize it based on the file extension or by scanning contents of the file. This automatic language detection presents potential problems. For example, a user may want to edit a document containing: morethan one language (for example when editing an HTML file that contains embedded Javascript code), a language that is not recognized (for example when editing source code for an obscure or relatively new programming language), a language that differs FROM the file type (for example when editing source code in an extension-less filein an editor that uses file extensions to detect the language). In these cases, it is not clear what language to use, and a document may not be highlighted or be highlighted incorrectly. Syntax elements Most editors with syntax highlighting allow different colors and text styles to be given to dozens of different lexical sub-elements of syntax. These include keywords, comments, control-flow statements, variables, and other elements. Programmers often heavily customize their settings in an attempt to show as much useful information as possible without making the code difficult to read. ');
现在我们来高亮显示它:
SELECT HIGHLIGHT({},'content') AS h FROM highlight WHERE MATCH('syntax')\G
*************************** 1. row ***************************
h: <b>syntax</b> highlighting is a feature of ... language as both structures and <b>syntax</b> errors are visually distinct. Highlighting ...
*************************** 2. row ***************************
h: ... version of its <b>syntax</b> highlighting. There are several <b>syntax</b> highlighting libraries ... highlighted incorrectly. <b>syntax</b> elements Most editors with <b>syntax</b> highlighting allow different ... different lexical sub-elements of <b>syntax</b>. These include keywords, comments, ...
对于新添加的文档,我们看到HIGHLIGHT()并没有给我们所有段落。我们可以通过增加限制来克服这一点,问题是增加多少。如果我们使用太高的值,HIGHLIGHT()将返回内容的完整正文(包括高亮部分):
SELECT HIGHLIGHT({limit=10000},'content') AS h FROM highlight WHERE MATCH('syntax')\G
*************************** 1. row ***************************
h: <b>syntax</b> highlighting is a feature of text editors that are used for programming, scripting, or markuplanguages, such as HTML. The feature displays text, especially source code, in different colors and fonts according to the category of terms.[1] This feature facilitates writing in a structured language such as a programming language or a markup language as both structures and <b>syntax</b> errors are visually distinct. Highlighting does not affect the meaning of the text itself; it is intended only for human readers.
*************************** 2. row ***************************
h: <b>syntax</b> highlighting is a feature of text editors that are used for programming, scripting, or markuplanguages, such as HTML. The feature displays text, especially source code, in different colors and fonts according to the category of terms.[1] This feature facilitates writing in a structured language such as a programming language or a markup language as both structures and <b>syntax</b> errors are visually distinct. Highlighting does not affect the meaning of the text itself; it is intended only for human readers. <b>syntax</b> highlighting is a form of secondary notation, since the highlights are not part of the text meaning, but serve to reinforce it. Some editors also integrate <b>syntax</b> highlighting with other features, such as spell checking orcode folding, as aids to editing which are external to the language. Contents 1Practical benefits 2Support in text editors 3Syntax elements 3.1Examples 4History and limitations 5See also 6References Practical benefits Highlighting the effect of missing delimiter (after watch=false) in Javascript <b>syntax</b> highlighting is one strategy to improve the readability and context of the text; especially for code that spans several pages. The reader can easily ignore large sections of comments or code, depending on what they are looking for. <b>syntax</b> highlighting also helps programmers find errors in their program. For example, most editors highlight stringliterals in a different color. Consequently, spotting a missing delimiter is much easier because of the contrasting color of the text. Brace MATCHing is another important feature with many popular editors. This makes it simple to see if a brace has been left out or locate the MATCH of the brace the cursor is on by highlighting thepair in a different color. A study published in the conference PPIG evaluated the effects of <b>syntax</b> highlighting on the comprehension of short programs, finding that the presence of <b>syntax</b> highlighting significantly reduces the time taken for a programmer to internalise the semantics of a program.[2] Additionally, data gathered FROM an eye-tracker during the study suggested that <b>syntax</b> highlighting enables programmers to pay less attention to standard syntactic components such as keywords. Support in text editors gedit supports<b>syntax</b> highlighting Some text editors can also export the coloured markup in a format that is suitable for printing or for importing into word-processing and other kinds of text-formatting software; for instance asa HTML, colorized LaTeX, PostScript or RTF version of its <b>syntax</b> highlighting. There are several <b>syntax</b> highlighting libraries or "engines" that can be used in other applications, but are not complete programs in themselves, for example the Generic <b>syntax</b> Highlighter (GeSHi) extension for PHP. For editors thatsupport more than one language, the user can usually specify the language of the text, such as C, LaTeX, HTML,or the text editor can automatically recognize it based on the file extension or by scanning contents of the file. This automatic language detection presents potential problems. For example, a user may want to edit a document containing: more than one language (for example when editing an HTML file that contains embedded Javascript code), a language that is not recognized (for example when editing source code for an obscure or relatively new programming language), a language that differs FROM the file type (for example when editing source code in an extension-less file in an editor that uses file extensions to detect the language). In these cases, it is notclear what language to use, and a document may not be highlighted or be highlighted incorrectly. <b>syntax</b>elements Most editors with <b>syntax</b> highlighting allow different colors and text styles to be given to dozens of different lexical sub-elements of <b>syntax</b>. These include keywords, comments, control-flow statements, variables, and other elements. Programmers often heavily customize their settings in an attempt to show asmuch useful information as possible without making the code difficult to read.
如果我们只想获取段落而不是整个文本的高亮,我们需要使用选项"force_passages":
SELECT HIGHLIGHT({limit=10000,force_passages=1},'content') AS h FROM highlight WHERE MATCH('syntax')\G
*************************** 1. row ***************************
h: <b>syntax</b> highlighting is a feature of ... language as both structures and <b>syntax</b> errors are visually distinct. Highlighting ...
*************************** 2. row ***************************
h: <b>syntax</b> highlighting is a feature of ... language as both structures and <b>syntax</b> errors are visually distinct. Highlighting ... intended only for human readers. <b>syntax</b> highlighting is a form of ... it. Some editors also integrate <b>syntax</b> highlighting with other features, such ... (after watch=false) in Javascript <b>syntax</b> highlighting is one strategy to ... what they are looking for. <b>syntax</b> highlighting also helps programmers find ... PPIG evaluated the effects of <b>syntax</b> highlighting on the comprehension of ... , finding that the presence of <b>syntax</b> highlighting significantly reduces the time ... during the study suggested that <b>syntax</b> highlighting enables programmers to pay ... in text editors gedit supports <b>syntax</b> highlighting Some text editors can ... version of its <b>syntax</b> highlighting. There are several <b>syntax</b> highlighting libraries or ... themselves, for example the Generic <b>syntax</b> Highlighter (GeSHi) extension for PHP ... be highlighted incorrectly. <b>syntax</b> elements Most editors with <b>syntax</b> highlighting allow different ... different lexical sub-elements of <b>syntax</b>. These include keywords, comments, control ...
另一种获取包含高亮的整个文本的方法是简单地使用limit=0:
SELECT HIGHLIGHT({limit=0},'content') AS h FROM highlight WHERE MATCH('text feature')\G
*************************** 1. row ***************************
h: Syntax highlighting is a <b>feature</b> of <b>text</b> editors that are used for programming, scripting, ormarkup languages, such as HTML. The <b>feature</b> displays <b>text</b>, especially source code, in different colors and fonts according to the category of terms.[1] This <b>feature</b> facilitates writing in a structuredlanguage such as a programming language or a markup language as both structures and syntax errors are visuallydistinct. Highlighting does not affect the meaning of the <b>text</b> itself; it is intended only for human readers.
*************************** 2. row ***************************
h: Syntax highlighting is a <b>feature</b> of <b>text</b> editors that are used for programming, scripting, ormarkup languages, such as HTML. The <b>feature</b> displays <b>text</b>, especially source code, in different colors and fonts according to the category of terms.[1] This <b>feature</b> facilitates writing in a structuredlanguage such as a programming language or a markup language as both structures and syntax errors are visuallydistinct. Highlighting does not affect the meaning of the <b>text</b> itself; it is intended only for human readers. Syntax highlighting is a form of secondary notation, since the highlights are not part of the <b>text</b> meaning, but serve to reinforce it. Some editors also integrate syntax highlighting with other features, suchas spell checking or code folding, as aids to editing which are external to the language. Contents 1Practical benefits 2Support in <b>text</b> editors 3Syntax elements 3.1Examples 4History and limitations 5See also 6References Practical benefits Highlighting the effect of missing delimiter (after watch=false) in Javascript Syntax highlighting is one strategy to improve the readability and context of the <b>text</b>; especially for code that spans several pages. The reader can easily ignore large sections of comments or code, depending on what they are looking for. Syntax highlighting also helps programmers find errors in their program. For example, most editors highlight string literals in a different color. Consequently, spotting a missing delimiter is much easier because of the contrasting color of the <b>text</b>. Brace MATCHing is another important <b>feature</b> with many popular editors. This makes it simple to see if a brace has been left out or locate the MATCH of the brace the cursor is on by highlighting the pair in a different color. A study published in the conference PPIG evaluated the effects of syntax highlighting on the comprehension of short programs, finding that the presence of syntax highlighting significantly reduces the time taken for a programmer to internalise the semantics of a program.[2] Additionally, data gathered FROM an eye-tracker during the study suggested that syntax highlighting enables programmers to pay less attention to standard syntactic components such as keywords. Support in <b>text</b> editors gedit supports syntax highlighting Some <b>text</b> editors can also export the coloured markup in a format that is suitable for printing or for importing into word-processing and other kinds of <b>text</b>-formatting software; for instance as a HTML, colorized LaTeX, PostScript or RTF version of its syntax highlighting. There are several syntax highlighting libraries or "engines" that can be used in other applications, but are not complete programs in themselves, for example the Generic Syntax Highlighter (GeSHi) extension for PHP. For editors that support more than one language, the user can usually specify the language of the <b>text</b>, such as C, LaTeX, HTML, or the <b>text</b> editor can automatically recognize it based on the file extension or by scanning contents of the file. This automatic language detection presents potential problems. For example, a user may want to edit a document containing: more than one language (for example when editing an HTML file that contains embedded Javascript code), a language that is not recognized (for example when editing source code for an obscure or relatively new programming language), a language that differs FROM the file type (for example when editing source code in an extension-less file in an editor that uses file extensions to detect the language). In these cases, it is not clear what language to use, and a document may not be highlighted or be highlighted incorrectly. Syntax elements Most editors with syntax highlighting allow different colors and <b>text</b> styles to be given to dozens of different lexical sub-elements of syntax. These include keywords, comments, control-flow statements, variables, and other elements. Programmers often heavily customize their settings in an attempt to show as much useful information as possible without making the code difficult to read.
HTML剥离和边界
如果我们的索引具有句子检测功能,我们可以配置高亮显示以不创建跨句子的段落:
SELECT HIGHLIGHT({},'content') AS h FROM highlight WHERE MATCH('html text')\G
*************************** 1. row ***************************
h: ... highlighting is a feature of <b>text</b> editors that are used for ... markup languages, such as <b>HTML</b>. The feature displays <b>text</b>, especially source code ... affect the meaning of the <b>text</b> itself; it is intended only ... 1 row in set (0.00 sec)
在这个示例中,我们看到段落'... markup languages, such as HTML. The feature displays text, especially source code ...'跨过了句子。
使用passage_boundary=sentence,这个段落将被拆分为两个:
SELECT HIGHLIGHT({passage_boundary='sentence'},'content') AS h FROM highlight WHERE MATCH('html text')\G
*************************** 1. row ***************************
h: ... highlighting is a feature of <b>text</b> editors that are used for ... , or markup languages, such as <b>HTML</b>. ... The feature displays <b>text</b>, especially source code, in different ... affect the meaning of the <b>text</b> itself; it is intended only ... 1 row in set (0.05 sec)
让我们添加一个包含HTML内容的文档。
INSERT INTO highlight(title,content) values('html content','The ideas of syntax highlighting overlap significantly with those of <a title="Structure editor" href="/wiki/Structure_editor">syntax-directed editors</a> One of the first such class of editors for code was Wilfred Hansens 1969 code editor, Emily.<sup id="cite_ref-hansen_3-0" class="reference"><a href="#cite_note-hansen-3">[3]</a></sup><sup id="cite_ref-4" class="reference"><a href="#cite_note-4">[4]</a></sup> It provided advanced language-independent <a title="Autocomplete" href="/wiki/Autocomplete">code completion</a> facilities, and unlike modern editors with syntax highlighting, actually made it impossible to create syntactically incorrect programs.');
默认情况下,高亮显示将根据索引设置处理HTML内容。如果索引中启用了HTML剥离,则HIGHLIGHT()结果也将被剥离。
SELECT HIGHLIGHT({},'content') AS h FROM highlight WHERE MATCH('code class')\G
*************************** 1. row ***************************
h: ... of the first such <b>class</b> of editors for code was Wilfred Hansens ... 1969 <b>code</b> editor, Emily.[3][4 ... ] It provided advanced language-independent <b>code</b> completion facilities, and unlike modern ...
如果我们希望高亮显示也包含HTML标签,我们需要设置"html_strip_mode=none":
SELECT HIGHLIGHT({html_strip_mode='none'},'content') AS h FROM highlight WHERE MATCH('code class')\G
*************************** 1. row ***************************
h: ... the first such <b>class</b> of editors for <b>code</b> was Wilfred ... 1969 <b>code</b> editor, Emily. <sup id="cite_ref-hansen_3-0" style="background: #EBE909; color: #000000;">class=" ... sup id="cite_ref-4" <b>class</b>="reference"><a title="Autocomplete" href="#cite_note- ... ="><b>code</b> completion facilities, and ... 1 row in set (0.05 sec)</a></sup>
请注意,html_strip_mode=none可以高亮显示HTML语法中的一部分单词,如'class'。
为了保护HTML实体,可以使用保留模式,但需要为片段设置无限制(limit=0):
SELECT HIGHLIGHT({html_strip_mode='retain',limit=0},'content') AS h FROM highlight WHERE MATCH('code class')\G *************************** 1. row *************************** h:
<p>The ideas of syntax highlighting overlap significantly with those of <a title="Structure editor" href="/wiki/Structure_editor">syntax-directed editors</a> One of the first such <b>class</b> of editors for <b>class</b> was Wilfred Hansens 1969 <b>code</b> editor, Emily.<sup id="cite_ref-hansen_3-0" class="reference"><a href="#cite_note-hansen-3">[3]</a></sup><sup id="cite_ref-4" class="reference"><a href="#cite_note-4">[4]</a></sup> It provided advanced language-independent <a title="Autocomplete" href="/wiki/Autocomplete"><b>code</b> completion</a> facilities, and unlike modern editors with syntax highlighting, actually made it impossible to create syntactically incorrect programs</p>
本教程已经解释了如何使用HIGHLIGHT()函数在Manticore Search中进行高亮显示。
交互式课程
<img src="HIghlighting-optimized.webp" alt="img">
这篇博客文章以交互式课程的形式提供,其中包含一个命令行,允许您与上述示例进行交互式操作。
