Skip to content

Commit 27fc493

Browse files
committed
RDoc-3468 create _overview-csharp.mdx
1 parent 01884e3 commit 27fc493

File tree

2 files changed

+145
-118
lines changed

2 files changed

+145
-118
lines changed
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
import Admonition from '@theme/Admonition';
2+
import Tabs from '@theme/Tabs';
3+
import TabItem from '@theme/TabItem';
4+
import CodeBlock from '@theme/CodeBlock';
5+
6+
<Admonition type="note" title="">
7+
8+
* RavenDB can serve as a vector database, see [Why choose RavenDB as your vector database](../../../ai-integration/vector-search/ravendb-as-vector-database.mdx#why-choose-ravendb-as-your-vector-database).
9+
10+
* Vector search can be performed on:
11+
* Raw text stored in your documents.
12+
* Pre-made embeddings that you created yourself and stored using these [Data types](../../../ai-integration/vector-search/data-types-for-vector-search.mdx#numerical-data).
13+
* Pre-made embeddings that are automatically generated from your document content by RavenDB's
14+
**embeddings generation tasks** using external service providers, as explained below.
15+
* In this article:
16+
* [Embeddings generation - overview](../../../ai-integration/generating-embeddings/overview.mdx#embeddings-generation---overview)
17+
* [Embeddings generation - process flow](../../../ai-integration/generating-embeddings/overview.mdx#embeddings-generation---process-flow)
18+
* [Supported providers](../../../ai-integration/generating-embeddings/overview.mdx#supported-providers)
19+
* [Creating an embeddings generation task](../../../ai-integration/generating-embeddings/overview.mdx#creating-an-embeddings-generation-task)
20+
* [Monitoring the tasks](../../../ai-integration/generating-embeddings/overview.mdx#monitoring-the-tasks)
21+
22+
</Admonition>
23+
24+
## Embeddings generation - overview
25+
26+
<Admonition type="note" title="">
27+
28+
#### Embeddings generation - process flow
29+
30+
* **Define an Embeddings Generation Task**:
31+
Specify a [connection string](../../../ai-integration/connection-strings/connection-strings-overview.mdx) that defines the AI provider and model for generating embeddings.
32+
Define the source content - what parts of the documents will be used to create the embeddings.
33+
34+
* **Source content is processed**:
35+
1. The task extracts the specified content from the documents.
36+
2. If a processing script is defined, it transforms the content before further processing.
37+
3. The text is split according to the defined chunking method; a separate embedding will be created for each chunk.
38+
4. Before contacting the provider, RavenDB checks the [embeddings cache](../../../ai-integration/generating-embeddings/embedding-collections.mdx#the-embeddings-cache-collection)
39+
to determine whether an embedding already exists for the given content from that provider.
40+
5. If a matching embedding is found, it is reused, avoiding unnecessary requests.
41+
If no cached embedding is found, the transformed and chunked content is sent to the configured AI provider.
42+
43+
* **Embeddings are generated by the AI provider**:
44+
The provider generates embeddings and sends them back to RavenDB.
45+
If quantization was defined in the task, RavenDB applies it to the embeddings before storing them.
46+
47+
* **Embeddings are stored in your database**:
48+
* Each embedding is stored as an attachment in a [dedicated collection](../../../ai-integration/generating-embeddings/embedding-collections.mdx#the-embeddings-collection).
49+
* RavenDB maintains an [embeddings cache](../../../ai-integration/generating-embeddings/embedding-collections.mdx#the-embeddings-cache-collection),
50+
allowing reuse of embeddings for the same source content and reducing provider calls.
51+
Cached embeddings expire after a configurable duration.
52+
53+
* **Perform vector search:**
54+
Once the embeddings are stored, you can perform vector searches on your document content by:
55+
* Running a [dynamic query](../../../ai-integration/vector-search/vector-search-using-dynamic-query.mdx#querying-pre-made-embeddings-generated-by-tasks), which automatically creates an auto-index for the search.
56+
* Defining a [static index](../../../ai-integration/vector-search/vector-search-using-static-index.mdx#indexing-pre-made-text-embeddings) to store and query embeddings efficiently.
57+
58+
The query search term is split into chunks, and each chunk is looked up in the cache.
59+
If not found, RavenDB requests an embedding from the provider and caches it.
60+
The embedding (cached or newly created) is then used to compare against stored vectors.
61+
62+
* **Continuous processing**:
63+
* Embeddings generation tasks are [Ongoing Tasks](../../../studio/database/tasks/ongoing-tasks/general-info.mdx) that process documents as they change.
64+
Before contacting the provider after a document change, the task first checks the cache to see if a matching embedding already exists, avoiding unnecessary requests.
65+
* The requests to generate embeddings from the source text are sent to the provider in batches.
66+
The batch size is configurable, see the [Ai.Embeddings.MaxBatchSize](../../../server/configuration/ai-integration-configuration.mdx#aiembeddingsmaxbatchsize) configuration key.
67+
* A failed embeddings generation task will retry after the duration set in the
68+
[Ai.Embeddings.MaxFallbackTimeInSec](../../../server/configuration/ai-integration-configuration.mdx#aiembeddingsmaxfallbacktimeinsec) configuration key.
69+
70+
</Admonition>
71+
72+
<Admonition type="note" title="">
73+
74+
#### Supported providers
75+
76+
* The following service providers are supported for auto-generating embeddings using tasks:
77+
78+
* [OpenAI & OpenAI-compatible providers](../../../ai-integration/connection-strings/open-ai.mdx)
79+
* [Azure Open AI](../../../ai-integration/connection-strings/azure-open-ai.mdx)
80+
* [Google AI](../../../ai-integration/connection-strings/google-ai.mdx)
81+
* [Hugging Face](../../../ai-integration/connection-strings/hugging-face.mdx)
82+
* [Ollama](../../../ai-integration/connection-strings/ollama.mdx)
83+
* [Mistral AI](../../../ai-integration/connection-strings/mistral-ai.mdx)
84+
* [bge-micro-v2](../../../ai-integration/connection-strings/embedded.mdx) (a local embedded model within RavenDB)
85+
86+
</Admonition>
87+
88+
![flow chart](../assets/embeddings-generation-task-flow.png)
89+
90+
![flow chart](../assets/vector-search-flow.png)
91+
92+
## Creating an embeddings generation task
93+
94+
* An embeddings generation tasks can be created from:
95+
* The **AI Tasks view in the Studio**, where you can create, edit, and delete tasks. Learn more in [AI Tasks - list view](../../../ai-integration/ai-tasks-list-view.mdx).
96+
* The **Client API** - see [Configuring an embeddings generation task - from the Client API](../../../ai-integration/generating-embeddings/embeddings-generation-task.mdx#configuring-an-embeddings-generation-task---from-the-client-api)
97+
* From the Studio:
98+
99+
![Add ai task 1](../assets/add-ai-task-1.png)
100+
101+
1. Go to the **AI Hub** menu.
102+
2. Open the **AI Tasks** view.
103+
3. Click **Add AI Task** to add a new task.
104+
105+
![Add ai task 2](../assets/add-ai-task-2.png)
106+
107+
* See the complete details of the task configuration in the [Embeddings generation task](../../../ai-integration/generating-embeddings/embeddings-generation-task.mdx) article.
108+
109+
## Monitoring the tasks
110+
111+
* The status and state of each embeddings generation task are visible in the [AI Tasks - list view](../../../ai-integration/ai-tasks-list-view.mdx).
112+
113+
* Task performance and activity over time can be analyzed in the _AI Tasks Stats_ view,
114+
where you can track processing duration, batch sizes, and overall progress.
115+
Learn more about the functionality of the stats view in the [Ongoing Tasks Stats](../../../studio/database/stats/ongoing-tasks-stats/overview.mdx) article.
116+
117+
* The number of embeddings generation tasks across all databases can also be monitored using [SNMP](../../../server/administration/snmp/snmp-overview.mdx).
118+
The following SNMP OIDs provide relevant metrics:
119+
* [5.1.11.25](../../../server/administration/snmp/snmp-overview.mdx#511125) – Total number of enabled embeddings generation tasks.
120+
* [5.1.11.26](../../../server/administration/snmp/snmp-overview.mdx#511126) – Total number of active embeddings generation tasks.
Lines changed: 25 additions & 118 deletions
Original file line numberDiff line numberDiff line change
@@ -1,133 +1,40 @@
11
---
22
title: "Generating Embeddings - Overview"
33
hide_table_of_contents: true
4-
sidebar_label: Overview
4+
sidebar_label: "Overview"
55
sidebar_position: 0
66
---
77

8-
import Admonition from '@theme/Admonition';
9-
import Tabs from '@theme/Tabs';
10-
import TabItem from '@theme/TabItem';
11-
import CodeBlock from '@theme/CodeBlock';
128
import LanguageSwitcher from "@site/src/components/LanguageSwitcher";
139
import LanguageContent from "@site/src/components/LanguageContent";
1410

15-
# Generating Embeddings - Overview
16-
<Admonition type="note" title="">
11+
import OverviewCsharp from './content/_overview-csharp.mdx';
1712

18-
* RavenDB can serve as a vector database, see [Why choose RavenDB as your vector database](../../ai-integration/vector-search/ravendb-as-vector-database.mdx#why-choose-ravendb-as-your-vector-database).
13+
export const supportedLanguages = ["csharp"];
1914

20-
* Vector search can be performed on:
21-
* Raw text stored in your documents.
22-
* Pre-made embeddings that you created yourself and stored using these [Data types](../../ai-integration/vector-search/data-types-for-vector-search.mdx#numerical-data).
23-
* Pre-made embeddings that are automatically generated from your document content by RavenDB's tasks
24-
using external service providers, as explained below.
25-
* In this article:
26-
* [Embeddings generation - overview](../../ai-integration/generating-embeddings/overview.mdx#embeddings-generation---overview)
27-
* [Embeddings generation - process flow](../../ai-integration/generating-embeddings/overview.mdx#embeddings-generation---process-flow)
28-
* [Supported providers](../../ai-integration/generating-embeddings/overview.mdx#supported-providers)
29-
* [Creating an embeddings generation task](../../ai-integration/generating-embeddings/overview.mdx#creating-an-embeddings-generation-task)
30-
* [Monitoring the tasks](../../ai-integration/generating-embeddings/overview.mdx#monitoring-the-tasks)
15+
<LanguageSwitcher supportedLanguages={supportedLanguages} />
3116

32-
</Admonition>
33-
## Embeddings generation - overview
34-
35-
<Admonition type="note" title="">
36-
37-
#### Embeddings generation - process flow
38-
* **Define an Embeddings Generation Task**:
39-
Specify a [connection string](../../ai-integration/connection-strings/connection-strings-overview.mdx) that defines the AI provider and model for generating embeddings.
40-
Define the source content - what parts of the documents will be used to create the embeddings.
41-
42-
* **Source content is processed**:
43-
1. The task extracts the specified content from the documents.
44-
2. If a processing script is defined, it transforms the content before further processing.
45-
3. The text is split according to the defined chunking method; a separate embedding will be created for each chunk.
46-
4. Before contacting the provider, RavenDB checks the [embeddings cache](../../ai-integration/generating-embeddings/embedding-collections.mdx#the-embeddings-cache-collection)
47-
to determine whether an embedding already exists for the given content from that provider.
48-
5. If a matching embedding is found, it is reused, avoiding unnecessary requests.
49-
If no cached embedding is found, the transformed and chunked content is sent to the configured AI provider.
50-
51-
* **Embeddings are generated by the AI provider**:
52-
The provider generates embeddings and sends them back to RavenDB.
53-
If quantization was defined in the task, RavenDB applies it to the embeddings before storing them.
54-
55-
* **Embeddings are stored in your database**:
56-
* Each embedding is stored as an attachment in a [dedicated collection](../../ai-integration/generating-embeddings/embedding-collections.mdx#the-embeddings-collection).
57-
* RavenDB maintains an [embeddings cache](../../ai-integration/generating-embeddings/embedding-collections.mdx#the-embeddings-cache-collection),
58-
allowing reuse of embeddings for the same source content and reducing provider calls.
59-
Cached embeddings expire after a configurable duration.
60-
61-
* **Perform vector search:**
62-
Once the embeddings are stored, you can perform vector searches on your document content by:
63-
* Running a [dynamic query](../../ai-integration/vector-search/vector-search-using-dynamic-query.mdx#querying-pre-made-embeddings-generated-by-tasks), which automatically creates an auto-index for the search.
64-
* Defining a [static index](../../ai-integration/vector-search/vector-search-using-static-index.mdx#indexing-pre-made-text-embeddings) to store and query embeddings efficiently.
65-
66-
The query search term is split into chunks, and each chunk is looked up in the cache.
67-
If not found, RavenDB requests an embedding from the provider and caches it.
68-
The embedding (cached or newly created) is then used to compare against stored vectors.
69-
70-
* **Continuous processing**:
71-
* Embeddings generation tasks are [Ongoing Tasks](../../studio/database/tasks/ongoing-tasks/general-info.mdx) that process documents as they change.
72-
Before contacting the provider after a document change, the task first checks the cache to see if a matching embedding already exists, avoiding unnecessary requests.
73-
* The requests to generate embeddings from the source text are sent to the provider in batches.
74-
The batch size is configurable, see the [Ai.Embeddings.MaxBatchSize](../../server/configuration/ai-integration-configuration.mdx#aiembeddingsmaxbatchsize) configuration key.
75-
* A failed embeddings generation task will retry after the duration set in the
76-
[Ai.Embeddings.MaxFallbackTimeInSec](../../server/configuration/ai-integration-configuration.mdx#aiembeddingsmaxfallbacktimeinsec) configuration key.
77-
78-
</Admonition>
79-
<Admonition type="note" title="">
80-
81-
#### Supported providers
82-
* The following service providers are supported for auto-generating embeddings using tasks:
83-
84-
* [OpenAI & OpenAI-compatible providers](../../ai-integration/connection-strings/open-ai.mdx)
85-
* [Azure Open AI](../../ai-integration/connection-strings/azure-open-ai.mdx)
86-
* [Google AI](../../ai-integration/connection-strings/google-ai.mdx)
87-
* [Hugging Face](../../ai-integration/connection-strings/hugging-face.mdx)
88-
* [Ollama](../../ai-integration/connection-strings/ollama.mdx)
89-
* [Mistral AI](../../ai-integration/connection-strings/mistral-ai.mdx)
90-
* [bge-micro-v2](../../ai-integration/connection-strings/embedded.mdx) (a local embedded model within RavenDB)
91-
92-
</Admonition>
93-
94-
![flow chart](./assets/embeddings-generation-task-flow.png)
95-
96-
![flow chart](./assets/vector-search-flow.png)
97-
98-
99-
100-
## Creating an embeddings generation task
101-
102-
* An embeddings generation tasks can be created from:
103-
* The **AI Tasks view in the Studio**, where you can create, edit, and delete tasks. Learn more in [AI Tasks - list view](../../ai-integration/ai-tasks-list-view.mdx).
104-
* The **Client API** - see [Configuring an embeddings generation task - from the Client API](../../ai-integration/generating-embeddings/embeddings-generation-task.mdx#configuring-an-embeddings-generation-task---from-the-client-api)
105-
* From the Studio:
106-
107-
![Add ai task 1](./assets/add-ai-task-1.png)
108-
109-
1. Go to the **AI Hub** menu.
110-
2. Open the **AI Tasks** view.
111-
3. Click **Add AI Task** to add a new task.
112-
113-
![Add ai task 2](./assets/add-ai-task-2.png)
114-
115-
* See the complete details of the task configuration in the [Embeddings generation task](../../ai-integration/generating-embeddings/embeddings-generation-task.mdx) article.
116-
117-
118-
119-
## Monitoring the tasks
120-
121-
* The status and state of each embeddings generation task are visible in the [AI Tasks - list view](../../ai-integration/ai-tasks-list-view.mdx).
122-
123-
* Task performance and activity over time can be analyzed in the _AI Tasks Stats_ view,
124-
where you can track processing duration, batch sizes, and overall progress.
125-
Learn more about the functionality of the stats view in the [Ongoing Tasks Stats](../../studio/database/stats/ongoing-tasks-stats/overview.mdx) article.
126-
127-
* The number of embeddings generation tasks across all databases can also be monitored using [SNMP](../../server/administration/snmp/snmp-overview.mdx).
128-
The following SNMP OIDs provide relevant metrics:
129-
* [5.1.11.25](../../server/administration/snmp/snmp-overview.mdx#511125) – Total number of enabled embeddings generation tasks.
130-
* [5.1.11.26](../../server/administration/snmp/snmp-overview.mdx#511126) – Total number of active embeddings generation tasks.
17+
<LanguageContent language="csharp">
18+
<OverviewCsharp />
19+
</LanguageContent>
13120

21+
<!---
22+
### Vector Search
23+
- [RavenDB as a vector database](../../ai-integration/vector-search/ravendb-as-vector-database)
24+
- [Vector search using a static index](../../ai-integration/vector-search/vector-search-using-static-index)
25+
- [Vector search using a dynamic query](../../ai-integration/vector-search/vector-search-using-dynamic-query)
13226

27+
### Embeddings Generation
28+
- [The Embedding Collections](../../ai-integration/generating-embeddings/embedding-collections)
29+
- [The Embedding generation task](../../ai-integration/generating-embeddings/embeddings-generation-task)
13330

31+
### AI Connection Strings
32+
- [Connection strings - overview](../../ai-integration/connection-strings/connection-strings-overview)
33+
- [Azure Open AI](../../ai-integration/connection-strings/azure-open-ai)
34+
- [Google AI](../../ai-integration/connection-strings/google-ai)
35+
- [Hugging Face](../../ai-integration/connection-strings/hugging-face)
36+
- [Ollama](../../ai-integration/connection-strings/ollama)
37+
- [OpenAI](../../ai-integration/connection-strings/open-ai)
38+
- [Mistral AI](../../ai-integration/connection-strings/mistral-ai)
39+
- [Embedded model](../../ai-integration/connection-strings/embedded)
40+
-->

0 commit comments

Comments
 (0)