Skip to content

Commit c833893

Browse files
authored
[8.19] Add clarification to semantic_text documentation on default quantization and lexical search support (elastic#128990)
* Add clarification docs for 8.19 * Regenerate ES|QL match docs
1 parent 41137a3 commit c833893

File tree

6 files changed

+68
-67
lines changed

6 files changed

+68
-67
lines changed

docs/reference/esql/functions/description/match.asciidoc

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/reference/esql/functions/kibana/definition/match.json

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/reference/esql/functions/kibana/docs/match.md

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/reference/mapping/types/semantic-text.asciidoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ If you don’t specify an inference endpoint, the `inference_id` field defaults
1818

1919
Using `semantic_text`, you won't need to specify how to generate embeddings for your data, or how to index it.
2020
The {infer} endpoint automatically determines the embedding generation, indexing, and query to use.
21+
Newly created indices with `semantic_text` fields using dense embeddings will be <<dense-vector-quantization,quantized>> to `bbq_hnsw` automatically.
2122

2223
If you use the preconfigured `.elser-2-elasticsearch` endpoint, you can set up `semantic_text` with the following API request:
2324

@@ -225,7 +226,8 @@ In these cases - when you use `sparse_vector` or `dense_vector` field types inst
225226
For indices containing `semantic_text` fields, updates that use scripts have the following behavior:
226227

227228
* Are supported through the https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-update[Update API].
228-
* Are not supported through the https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-bulk-1[Bulk API] and will fail. Even if the script targets non-`semantic_text` fields, the update will fail when the index contains a `semantic_text` field.
229+
* Are not supported through the https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-bulk-1[Bulk API] and will fail.
230+
Even if the script targets non-`semantic_text` fields, the update will fail when the index contains a `semantic_text` field.
229231

230232
[discrete]
231233
[[copy-to-support]]

docs/reference/query-dsl/match-query.asciidoc

Lines changed: 61 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,18 @@
11
[[query-dsl-match-query]]
22
=== Match query
3+
34
++++
45
<titleabbrev>Match</titleabbrev>
56
++++
67

7-
Returns documents that match a provided text, number, date or boolean value. The
8-
provided text is analyzed before matching.
9-
10-
The `match` query is the standard query for performing a full-text search,
11-
including options for fuzzy matching.
8+
Returns documents that match a provided text, number, date or boolean value.
9+
The provided text is analyzed before matching.
1210

13-
`Match` will also work against <<semantic-text, semantic_text>> fields,
14-
however when performing `match` queries against `semantic_text` fields options
15-
that specifically target lexical search such as `fuzziness` or `analyzer` will be ignored.
11+
The `match` query is the standard query for performing a full-text search, including options for fuzzy matching.
1612

13+
`Match` will also work against <<semantic-text, semantic_text>> fields.
14+
As `semantic_text` does not support lexical text search, `match` queries against `semantic_text` fields will automatically perform the correct semantic search.
15+
Because of this, options that specifically target lexical search such as `fuzziness` or `analyzer` will be ignored.
1716

1817
[[match-query-ex-request]]
1918
==== Example request
@@ -32,85 +31,89 @@ GET /_search
3231
}
3332
--------------------------------------------------
3433

35-
3634
[[match-top-level-params]]
3735
==== Top-level parameters for `match`
3836

3937
`<field>`::
4038
(Required, object) Field you wish to search.
4139

42-
4340
[[match-field-params]]
4441
==== Parameters for `<field>`
42+
4543
`query`::
4644
+
4745
--
4846
(Required) Text, number, boolean value or date you wish to find in the provided
4947
`<field>`.
5048

51-
The `match` query <<analysis,analyzes>> any provided text before performing a
52-
search. This means the `match` query can search <<text,`text`>> fields for
53-
analyzed tokens rather than an exact term.
49+
The `match` query <<analysis,analyzes>> any provided text before performing a search.
50+
This means the `match` query can search <<text,`text`>> fields for analyzed tokens rather than an exact term.
5451
--
5552

5653
`analyzer`::
5754
(Optional, string) <<analysis,Analyzer>> used to convert the text in the `query`
58-
value into tokens. Defaults to the <<specify-index-time-analyzer,index-time
59-
analyzer>> mapped for the `<field>`. If no analyzer is mapped, the index's
60-
default analyzer is used.
55+
value into tokens.
56+
Defaults to the <<specify-index-time-analyzer,index-time analyzer>> mapped for the `<field>`.
57+
If no analyzer is mapped, the index's default analyzer is used.
6158

6259
`auto_generate_synonyms_phrase_query`::
6360
+
6461
--
6562
(Optional, Boolean) If `true`, <<query-dsl-match-query-phrase,match phrase>>
66-
queries are automatically created for multi-term synonyms. Defaults to `true`.
63+
queries are automatically created for multi-term synonyms.
64+
Defaults to `true`.
6765

68-
See <<query-dsl-match-query-synonyms,Use synonyms with match query>> for an
69-
example.
66+
See <<query-dsl-match-query-synonyms,Use synonyms with match query>> for an example.
7067
--
7168

7269
`boost`::
7370
+
7471
--
7572
(Optional, float) Floating point number used to decrease or increase the
76-
<<relevance-scores,relevance scores>> of the query. Defaults to `1.0`.
73+
<<relevance-scores,relevance scores>> of the query.
74+
Defaults to `1.0`.
7775

78-
Boost values are relative to the default value of `1.0`. A boost value between
79-
`0` and `1.0` decreases the relevance score. A value greater than `1.0`
76+
Boost values are relative to the default value of `1.0`.
77+
A boost value between
78+
`0` and `1.0` decreases the relevance score.
79+
A value greater than `1.0`
8080
increases the relevance score.
8181
--
8282

8383
`fuzziness`::
84-
(Optional, string) Maximum edit distance allowed for matching. See <<fuzziness>>
85-
for valid values and more information. See <<query-dsl-match-query-fuzziness>>
84+
(Optional, string) Maximum edit distance allowed for matching.
85+
See <<fuzziness>>
86+
for valid values and more information.
87+
See <<query-dsl-match-query-fuzziness>>
8688
for an example.
8789

8890
`max_expansions`::
89-
(Optional, integer) Maximum number of terms to which the query will
90-
expand. Defaults to `50`.
91+
(Optional, integer) Maximum number of terms to which the query will expand.
92+
Defaults to `50`.
9193

9294
`prefix_length`::
93-
(Optional, integer) Number of beginning characters left unchanged for fuzzy
94-
matching. Defaults to `0`.
95+
(Optional, integer) Number of beginning characters left unchanged for fuzzy matching.
96+
Defaults to `0`.
9597

9698
`fuzzy_transpositions`::
97-
(Optional, Boolean) If `true`, edits for fuzzy matching include
98-
transpositions of two adjacent characters (ab → ba). Defaults to `true`.
99+
(Optional, Boolean) If `true`, edits for fuzzy matching include transpositions of two adjacent characters (ab → ba).
100+
Defaults to `true`.
99101

100102
`fuzzy_rewrite`::
101103
+
102104
--
103-
(Optional, string) Method used to rewrite the query. See the
104-
<<query-dsl-multi-term-rewrite, `rewrite` parameter>> for valid values and more
105-
information.
105+
(Optional, string) Method used to rewrite the query.
106+
See the
107+
<<query-dsl-multi-term-rewrite, `rewrite` parameter>> for valid values and more information.
106108

107109
If the `fuzziness` parameter is not `0`, the `match` query uses a `fuzzy_rewrite`
108110
method of `top_terms_blended_freqs_${max_expansions}` by default.
109111
--
110112

111113
`lenient`::
112114
(Optional, Boolean) If `true`, format-based errors, such as providing a text
113-
`query` value for a <<number,numeric>> field, are ignored. Defaults to `false`.
115+
`query` value for a <<number,numeric>> field, are ignored.
116+
Defaults to `false`.
114117

115118
`operator`::
116119
+
@@ -130,16 +133,17 @@ AND of AND Hungary`.
130133
`minimum_should_match`::
131134
+
132135
--
133-
(Optional, string) Minimum number of clauses that must match for a document to
134-
be returned. See the <<query-dsl-minimum-should-match, `minimum_should_match`
136+
(Optional, string) Minimum number of clauses that must match for a document to be returned.
137+
See the <<query-dsl-minimum-should-match, `minimum_should_match`
135138
parameter>> for valid values and more information.
136139
--
137140

138141
`zero_terms_query`::
139142
+
140143
--
141144
(Optional, string) Indicates whether no documents are returned if the `analyzer`
142-
removes all tokens, such as when using a `stop` filter. Valid values are:
145+
removes all tokens, such as when using a `stop` filter.
146+
Valid values are:
143147

144148
`none` (Default)::
145149
No documents are returned if the `analyzer` removes all tokens.
@@ -151,15 +155,15 @@ query.
151155
See <<query-dsl-match-query-zero>> for an example.
152156
--
153157

154-
155158
[[match-query-notes]]
156159
==== Notes
157160

158161
[[query-dsl-match-query-short-ex]]
159162
===== Short request example
160163

161164
You can simplify the match query syntax by combining the `<field>` and `query`
162-
parameters. For example:
165+
parameters.
166+
For example:
163167

164168
[source,console]
165169
----
@@ -176,11 +180,11 @@ GET /_search
176180
[[query-dsl-match-query-boolean]]
177181
===== How the match query works
178182

179-
The `match` query is of type `boolean`. It means that the text
180-
provided is analyzed and the analysis process constructs a boolean query
181-
from the provided text. The `operator` parameter can be set to `or` or `and`
182-
to control the boolean clauses (defaults to `or`). The minimum number of
183-
optional `should` clauses to match can be set using the
183+
The `match` query is of type `boolean`.
184+
It means that the text provided is analyzed and the analysis process constructs a boolean query from the provided text.
185+
The `operator` parameter can be set to `or` or `and`
186+
to control the boolean clauses (defaults to `or`).
187+
The minimum number of optional `should` clauses to match can be set using the
184188
<<query-dsl-minimum-should-match,`minimum_should_match`>>
185189
parameter.
186190

@@ -201,13 +205,11 @@ GET /_search
201205
}
202206
--------------------------------------------------
203207

204-
The `analyzer` can be set to control which analyzer will perform the
205-
analysis process on the text. It defaults to the field explicit mapping
206-
definition, or the default search analyzer.
208+
The `analyzer` can be set to control which analyzer will perform the analysis process on the text.
209+
It defaults to the field explicit mapping definition, or the default search analyzer.
207210

208-
The `lenient` parameter can be set to `true` to ignore exceptions caused by
209-
data-type mismatches, such as trying to query a numeric field with a text
210-
query string. Defaults to `false`.
211+
The `lenient` parameter can be set to `true` to ignore exceptions caused by data-type mismatches, such as trying to query a numeric field with a text query string.
212+
Defaults to `false`.
211213

212214
[[query-dsl-match-query-fuzziness]]
213215
===== Fuzziness in the match query
@@ -218,17 +220,12 @@ See <<fuzziness>> for allowed settings.
218220
The `prefix_length` and
219221
`max_expansions` can be set in this case to control the fuzzy process.
220222
If the fuzzy option is set the query will use `top_terms_blended_freqs_${max_expansions}`
221-
as its <<query-dsl-multi-term-rewrite,rewrite
222-
method>> the `fuzzy_rewrite` parameter allows to control how the query will get
223-
rewritten.
223+
as its <<query-dsl-multi-term-rewrite,rewrite method>> the `fuzzy_rewrite` parameter allows to control how the query will get rewritten.
224224

225-
Fuzzy transpositions (`ab` -> `ba`) are allowed by default but can be disabled
226-
by setting `fuzzy_transpositions` to `false`.
225+
Fuzzy transpositions (`ab` -> `ba`) are allowed by default but can be disabled by setting `fuzzy_transpositions` to `false`.
227226

228-
NOTE: Fuzzy matching is not applied to terms with synonyms or in cases where the
229-
analysis process produces multiple tokens at the same position. Under the hood
230-
these terms are expanded to a special synonym query that blends term frequencies,
231-
which does not support fuzzy expansion.
227+
NOTE: Fuzzy matching is not applied to terms with synonyms or in cases where the analysis process produces multiple tokens at the same position.
228+
Under the hood these terms are expanded to a special synonym query that blends term frequencies, which does not support fuzzy expansion.
232229

233230
[source,console]
234231
--------------------------------------------------
@@ -247,9 +244,9 @@ GET /_search
247244

248245
[[query-dsl-match-query-zero]]
249246
===== Zero terms query
250-
If the analyzer used removes all tokens in a query like a `stop` filter
251-
does, the default behavior is to match no documents at all. In order to
252-
change that the `zero_terms_query` option can be used, which accepts
247+
248+
If the analyzer used removes all tokens in a query like a `stop` filter does, the default behavior is to match no documents at all.
249+
In order to change that the `zero_terms_query` option can be used, which accepts
253250
`none` (default) and `all` which corresponds to a `match_all` query.
254251

255252
[source,console]
@@ -271,8 +268,8 @@ GET /_search
271268
[[query-dsl-match-query-synonyms]]
272269
===== Synonyms
273270

274-
The `match` query supports multi-terms synonym expansion with the <<analysis-synonym-graph-tokenfilter,
275-
synonym_graph>> token filter. When this filter is used, the parser creates a phrase query for each multi-terms synonyms.
271+
The `match` query supports multi-terms synonym expansion with the <<analysis-synonym-graph-tokenfilter, synonym_graph>> token filter.
272+
When this filter is used, the parser creates a phrase query for each multi-terms synonyms.
276273
For example, the following synonym: `"ny, new york"` would produce:
277274

278275
`(ny OR ("new york"))`

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/fulltext/Match.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,7 @@ public class Match extends FullTextFunction implements OptionalArgument, PostAna
148148
149149
Match can be used on fields from the text family like <<text, text>> and <<semantic-text, semantic_text>>,
150150
as well as other field types like keyword, boolean, dates, and numeric types.
151+
When Match is used on a <<semantic-text, semantic_text>> field, it will perform a semantic query on the field.
151152
152153
Match can use <<esql-function-named-params,function named parameters>> to specify additional options for the match query.
153154
All <<match-field-params,match query parameters>> are supported.

0 commit comments

Comments
 (0)