Skip to content

Commit 911b868

Browse files
committed
Adds unique inference chunking settings for elasticsearch
1 parent e36beb6 commit 911b868

File tree

1 file changed

+61
-0
lines changed

1 file changed

+61
-0
lines changed

specification/inference/_types/Services.ts

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -322,6 +322,67 @@ export class InferenceEndpointInfoWatsonx extends InferenceEndpoint {
322322
task_type: TaskTypeWatsonx
323323
}
324324

325+
/**
326+
* Chunking configuration object
327+
*/
328+
export class ElasticsearchInferenceChunkingSettings {
329+
/**
330+
* The maximum size of a chunk in words.
331+
* This value cannot be lower than `20` (for `sentence` strategy) or `10` (for `word` strategy).
332+
* This value should not exceed the window size for the associated model.
333+
* @server_default 250
334+
*/
335+
max_chunk_size?: integer
336+
/**
337+
* The number of overlapping words for chunks.
338+
* It is applicable only to a `word` chunking strategy.
339+
* This value cannot be higher than half the `max_chunk_size` value.
340+
* @server_default 100
341+
*/
342+
overlap?: integer
343+
/**
344+
* The number of overlapping sentences for chunks.
345+
* It is applicable only for a `sentence` chunking strategy.
346+
* It can be either `1` or `0`.
347+
* @server_default 1
348+
*/
349+
sentence_overlap?: integer
350+
/**
351+
* Only applicable to the `recursive` strategy and required when using it.
352+
*
353+
* Sets a predefined list of separators in the saved chunking settings based on the selected text type.
354+
* Values can be `markdown` or `plaintext`.
355+
*
356+
* Using this parameter is an alternative to manually specifying a custom `separators` list.
357+
*/
358+
separator_group?: string
359+
/**
360+
* Only applicable to the `recursive` strategy and required when using it.
361+
*
362+
* A list of strings used as possible split points when chunking text.
363+
*
364+
* Each string can be a plain string or a regular expression (regex) pattern.
365+
* The system tries each separator in order to split the text, starting from the first item in the list.
366+
*
367+
* After splitting, it attempts to recombine smaller pieces into larger chunks that stay within
368+
* the `max_chunk_size` limit, to reduce the total number of chunks generated.
369+
*/
370+
separators?: string[]
371+
/**
372+
* The chunking strategy: `sentence`, `word`, `none` or `recursive`.
373+
*
374+
* * If `strategy` is set to `recursive`, you must also specify:
375+
*
376+
* - `max_chunk_size`
377+
* - either `separators` or`separator_group`
378+
*
379+
* Learn more about different chunking strategies in the linked documentation.
380+
* @server_default sentence
381+
* @ext_doc_id chunking-strategies
382+
*/
383+
strategy?: string
384+
}
385+
325386
/**
326387
* Chunking configuration object
327388
*/

0 commit comments

Comments
 (0)