Skip to content

[Inference API] Add VoyageAI inference service integration#142562

Open
fzowl wants to merge 15 commits intoelastic:mainfrom
voyage-ai:voyageai-embedding-task
Open

[Inference API] Add VoyageAI inference service integration#142562
fzowl wants to merge 15 commits intoelastic:mainfrom
voyage-ai:voyageai-embedding-task

Conversation

@fzowl
Copy link
Contributor

@fzowl fzowl commented Feb 16, 2026

Summary
This PR contains the cleaned up VoyageAI integration with the following improvements:

  • Text embeddings functionality verified
  • Multimodal embeddings support added
  • Rerank functionality included

Testing:
All tests compile successfully

 - text embedding models
 - multimodal models
 - text embedding models
 - multimodal models
@elasticsearchmachine
Copy link
Collaborator

@fzowl please enable the option "Allow edits and access to secrets by maintainers" on your PR. For more information, see the documentation.

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.4.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Feb 16, 2026
@fzowl fzowl changed the title [ML] VoyageAI Integration [:ML] VoyageAI Integration Feb 16, 2026
 - text embedding models
 - multimodal models
@fzowl fzowl changed the title [:ML] VoyageAI Integration [:Team:ML, >type:enhancement] VoyageAI Integration Feb 16, 2026
@fzowl fzowl changed the title [:Team:ML, >type:enhancement] VoyageAI Integration [:Team:ML] [>type:enhancement] VoyageAI Integration Feb 16, 2026
@fzowl fzowl changed the title [:Team:ML] [>type:enhancement] VoyageAI Integration [:ML, >enhancement] VoyageAI Integration Feb 16, 2026
@fzowl fzowl changed the title [:ML, >enhancement] VoyageAI Integration [Inference API] Add VoyageAI inference service integration Feb 16, 2026
@john-wagster john-wagster added the :SearchOrg/Inference Label for the Search Inference team label Feb 18, 2026
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-inference-team (Team:Search - Inference)

@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Feb 18, 2026
@DonalEvans DonalEvans self-assigned this Feb 19, 2026
@DonalEvans DonalEvans added >enhancement Feature:GenAI Features around GenAI labels Feb 19, 2026
fzowl added 2 commits March 5, 2026 15:01
…task

# Conflicts:
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/VoyageAIService.java
Copy link
Contributor

@DonalEvans DonalEvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order for the Voyage integration to be used with the embedding task type, a change is needed in SimpleEmbeddingServiceIntegrationValidator.BASE64_IMAGE_DATA, specifically changing the start of the string from data:image/jpg;base64 to data:image/jpeg;base64. This was an oversight on my part when adding that string, since image/jpg is not actually a valid MIME type. For the Jina integration, this wasn't a problem because they discard the MIME type from the input, but Voyage requires the MIME type to be valid, so attempting to create a Voyage endpoint with the embedding task type currently fails.

Please also add unit test coverage for any new behaviour. Some examples off the top of my head are:

  • Add a test class for VoyageAIEmbeddingServiceSettings which extends AbstractBWCWireSerializationTestCase and which covers the fromMap(), toXContent() and updateEmbeddingDetails() methods
  • Change the existing VoyageAIEmbeddingsServiceSettingsTests class to extend AbstractBWCWireSerializationTestCase instead of AbstractWireSerializingTestCase and add a test for updateEmbeddingDetails()
  • Test that VoyageAIModel.uri() returns the correct values for multimodal vs. text-only model/service settings
  • Test that VoyageAIEmbeddingsRequestEntity.toXContent() creates the appropriate request body for multimodal inputs
  • Add tests to VoyageAIEmbeddingsResponseEntityTests for when the element type of the request/model is not FLOAT
  • Add tests to VoyageAIEmbeddingsResponseEntityTests for when the request/model is multimodal (the type returned from parsedResults.embeddings() should be GenericDenseEmbedding*Results rather than DenseEmbedding*Results)
  • Add tests to VoyageAIServiceTests that call VoyageAIService.doEmbeddingInfer() and cover the three main paths, i.e. the model is not a VoyageAIEmbeddingsModel, the inputs contain non-text entries and the model is not multimodal, and the happy path

Incidentally, if you want to manually test your changes against a real Voyage model, you can do the following, which is what led me to find the bug with the invalid MIME type:
1. Run your local build of Elasticsearch using ./gradlew :run -Drun.license_type=trial
2. Create a VoyageAI endpoint with the embedding task type:

Create endpoint

PUT http://elastic:password@localhost:9200/_inference/embedding/voyageai-embeddings
Content-Type: application/json
Accept: application/json

{
    "service": "voyageai",
    "service_settings": {
        "model_id": "voyage-multimodal-3.5",
        "api_key": "{{voyage_api_key}}"
    }
}
3. Perform multimodal embedding:
Perform embedding
POST http://elastic:password@localhost:9200/_inference/embedding/voyageai-embeddings
Content-Type: application/json
Accept: application/json

{
    "input": {
        "content": {
            "type": "image",
            "format": "base64",
            "value": "data:image/jpeg;base64,iVBORw0KGgoAAAANSUhEUgAAABwAAAA4CAIAAABhUg/jAAAAMklEQVR4nO3MQREAMAgAoLkoFreTiSzhy4MARGe9bX99lEqlUqlUKpVKpVKpVCqVHksHaBwCA2cPf0cAAAAASUVORK5CYII="
        }
    }
}

72,
"voyage-02",
72
private static final Map<String, Integer> MODEL_BATCH_SIZES = Map.ofEntries(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are these values coming from? In the documentation for the embeddings API and the multimodal embeddings API, the maximum number of inputs is given as 1000. 7 seems extremely small.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DonalEvans These values are based on the safe and conservative calculation. We have a token limit for the total request (batch) and also per document - so, to be on the safe side with the batching, the total number of documents can be total_number_of_tokens_per_batch/maximum_number_of_tokens_per_document.

The ideal would be to build token-aware batching (which is technically possible), but it requires the tokeniser to be downloaded from HuggingFace. If you are fine with this, i'd be more than happy to build it, that'd be a more elegant solution.

fzowl and others added 4 commits March 12, 2026 19:44
 - text embedding models
 - multimodal models
 - text embedding models
 - multimodal models
 - text embedding models
 - multimodal models
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 74aafbf2-816d-4669-b41f-5aeecf2eeccb

📥 Commits

Reviewing files that changed from the base of the PR and between 18ea08d and 99c3611.

⛔ Files ignored due to path filters (2)
  • server/src/main/resources/transport/definitions/referable/voyage_ai_multimodal_embeddings_added.csv is excluded by !**/*.csv
  • server/src/main/resources/transport/upper_bounds/9.4.csv is excluded by !**/*.csv
📒 Files selected for processing (12)
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/validation/SimpleEmbeddingServiceIntegrationValidator.java
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/VoyageAIService.java
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/BaseVoyageAIEmbeddingsServiceSettings.java
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingsModel.java
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingsServiceSettings.java
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/response/VoyageAIEmbeddingsResponseEntity.java
  • x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/VoyageAIServiceTests.java
  • x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingServiceSettingsTests.java
  • x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingsModelTests.java
  • x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingsServiceSettingsTests.java
  • x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/request/VoyageAIEmbeddingsRequestEntityTests.java
  • x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/response/VoyageAIEmbeddingsResponseEntityTests.java

📝 Walkthrough

Walkthrough

Adds a changelog entry and integrates VoyageAI as an inference service supporting EMBEDDING in addition to TEXT_EMBEDDING and RERANK. Registers new VoyageAI embedding service settings as a NamedWriteable and introduces BaseVoyageAIEmbeddingsServiceSettings and VoyageAIEmbeddingServiceSettings with embedding-type, dimensions, similarity, and multimodal flags. Extends VoyageAI service, models, request/response entities, and action creation to handle structured InferenceStringGroup inputs, multimodal endpoints, and generic embedding result types. Updates model mappings, configuration metadata, and extensive tests for embedding and multimodal behavior.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/response/VoyageAIEmbeddingsResponseEntity.java (1)

182-187: ⚠️ Potential issue | 🟠 Major

Missing TaskType check for INT8 embeddings.

The FLOAT and BIT/BINARY branches return generic result types when taskType == EMBEDDING, but the INT8 branch (lines 182-187) unconditionally returns DenseEmbeddingByteResults. This inconsistency may cause issues when TaskType.EMBEDDING is used with INT8 embedding type.

Proposed fix
             } else if (embeddingType == VoyageAIEmbeddingType.INT8) {
                 var embeddingResult = EmbeddingInt8Result.PARSER.apply(jsonParser, null);
                 List<DenseEmbeddingByteResults.Embedding> embeddingList = embeddingResult.entries.stream()
                     .map(EmbeddingInt8ResultEntry::toInferenceByteEmbedding)
                     .toList();
+
+                if (taskType == TaskType.EMBEDDING) {
+                    return new GenericDenseEmbeddingByteResults(embeddingList);
+                }
                 return new DenseEmbeddingByteResults(embeddingList);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/response/VoyageAIEmbeddingsResponseEntity.java`
around lines 182 - 187, VoyageAIEmbeddingsResponseEntity currently always
returns DenseEmbeddingByteResults for VoyageAIEmbeddingType.INT8; update the
INT8 branch to mirror the FLOAT and BIT/BINARY branches by checking the
request's taskType (TaskType.EMBEDDING) and returning the appropriate generic
result when taskType == EMBEDDING (e.g., return the same generic embedding
response used for FLOAT/BIT branches) otherwise return DenseEmbeddingByteResults
for non-EMBEDDING tasks; locate the INT8 handling in method parsing
embeddingType (symbol: embeddingType, VoyageAIEmbeddingType.INT8) and use the
taskType variable / enum (TaskType.EMBEDDING) to conditionally construct the
correct response class instead of unconditionally returning
DenseEmbeddingByteResults.
♻️ Duplicate comments (1)
x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/BaseVoyageAIEmbeddingsServiceSettings.java (1)

41-42: ⚠️ Potential issue | 🔴 Critical

Add a new transport version for the multimodalModel wire field.

multimodalModel changes the serialized form, but both read and write paths gate it on voyage_ai_integration_added, which predates this field. In a mixed-version cluster, new code will try to read/write the extra boolean against peers that still use the older layout, which can shift the stream and break deserialization. This field needs its own fresh TransportVersion definition and should be gated only behind that new version.

As per coding guidelines, "For changes to a Writeable implementation (writeTo and constructor from StreamInput), add a new public static final <UNIQUE_DESCRIPTIVE_NAME> = TransportVersion.fromName("<unique_descriptive_name>") and use it in the new code paths. Confirm the backport branches and then generate a new version file with ./gradlew generateTransportVersion".

Also applies to: 148-160, 261-271

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/BaseVoyageAIEmbeddingsServiceSettings.java`
around lines 41 - 42, Add a new TransportVersion constant for the
multimodalModel wire change (e.g., public static final TransportVersion
VOYAGE_AI_MULTIMODAL_MODEL_ADDED =
TransportVersion.fromName("voyage_ai_multimodal_model_added")) in
BaseVoyageAIEmbeddingsServiceSettings, and update the writeTo and StreamInput
constructor code paths that currently gate multimodalModel on
VOYAGE_AI_INTEGRATION_ADDED to instead gate only on the new
VOYAGE_AI_MULTIMODAL_MODEL_ADDED; ensure all places handling multimodalModel
serialization/deserialization (the read/write branches in this class) use the
new constant, then run ./gradlew generateTransportVersion and add the generated
version to the transport versions for backport branches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In
`@x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/response/VoyageAIEmbeddingsResponseEntity.java`:
- Around line 182-187: VoyageAIEmbeddingsResponseEntity currently always returns
DenseEmbeddingByteResults for VoyageAIEmbeddingType.INT8; update the INT8 branch
to mirror the FLOAT and BIT/BINARY branches by checking the request's taskType
(TaskType.EMBEDDING) and returning the appropriate generic result when taskType
== EMBEDDING (e.g., return the same generic embedding response used for
FLOAT/BIT branches) otherwise return DenseEmbeddingByteResults for non-EMBEDDING
tasks; locate the INT8 handling in method parsing embeddingType (symbol:
embeddingType, VoyageAIEmbeddingType.INT8) and use the taskType variable / enum
(TaskType.EMBEDDING) to conditionally construct the correct response class
instead of unconditionally returning DenseEmbeddingByteResults.

---

Duplicate comments:
In
`@x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/BaseVoyageAIEmbeddingsServiceSettings.java`:
- Around line 41-42: Add a new TransportVersion constant for the multimodalModel
wire change (e.g., public static final TransportVersion
VOYAGE_AI_MULTIMODAL_MODEL_ADDED =
TransportVersion.fromName("voyage_ai_multimodal_model_added")) in
BaseVoyageAIEmbeddingsServiceSettings, and update the writeTo and StreamInput
constructor code paths that currently gate multimodalModel on
VOYAGE_AI_INTEGRATION_ADDED to instead gate only on the new
VOYAGE_AI_MULTIMODAL_MODEL_ADDED; ensure all places handling multimodalModel
serialization/deserialization (the read/write branches in this class) use the
new constant, then run ./gradlew generateTransportVersion and add the generated
version to the transport versions for backport branches.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 69c87ebe-a9f5-4140-8e5d-a53c13fb1957

📥 Commits

Reviewing files that changed from the base of the PR and between 918ed0b and 18ea08d.

📒 Files selected for processing (21)
  • docs/changelog/142562.yaml
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceNamedWriteablesProvider.java
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/VoyageAIModel.java
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/VoyageAIService.java
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/action/VoyageAIActionCreator.java
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/BaseVoyageAIEmbeddingsServiceSettings.java
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingServiceSettings.java
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingsModel.java
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingsModelCreator.java
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingsServiceSettings.java
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/request/VoyageAIEmbeddingsRequest.java
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/request/VoyageAIEmbeddingsRequestEntity.java
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/request/VoyageAIUtils.java
  • x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/response/VoyageAIEmbeddingsResponseEntity.java
  • x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/VoyageAIServiceTests.java
  • x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/action/VoyageAIEmbeddingsActionTests.java
  • x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingsModelTests.java
  • x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingsServiceSettingsTests.java
  • x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/request/VoyageAIEmbeddingsRequestEntityTests.java
  • x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/request/VoyageAIEmbeddingsRequestTests.java
  • x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/response/VoyageAIEmbeddingsResponseEntityTests.java

fzowl added 4 commits March 14, 2026 18:26
…ive tests

- Fix INT8 embedding type missing TaskType.EMBEDDING check in response parser
- Fix MIME type from image/jpg to image/jpeg in SimpleEmbeddingServiceIntegrationValidator
- Add TransportVersion gating for multimodalModel field in BaseVoyageAIEmbeddingsServiceSettings
- Remove null checks for embeddingType in elementType() and toXContentFragmentOfExposedFields()
- Simplify validation with throwIfValidationErrorsExist()
- Remove redundant parsePersistedConfig methods (now handled by SenderService)
- Add batch size documentation comment explaining values source
- Move holdover comment from class javadoc to NAME constant
- Change buildUriFromSettings to accept ServiceSettings instead of cast
- Add VoyageAIEmbeddingServiceSettingsTests (BWC wire serialization tests)
- Update VoyageAIEmbeddingsServiceSettingsTests to BWC with mutateInstanceForVersion
- Add multimodal model factory methods and URI tests to VoyageAIEmbeddingsModelTests
- Add multimodal input XContent test to VoyageAIEmbeddingsRequestEntityTests
- Add INT8, BIT, and generic embedding response tests to VoyageAIEmbeddingsResponseEntityTests
- Add doEmbeddingInfer tests to VoyageAIServiceTests (invalid model, non-text input, multimodal)
Resolve transport version upper bounds conflict by regenerating
voyage_ai_multimodal_embeddings_added to 9318000.
…iceSettings

Cosmetic rename to avoid confusion with VoyageAIEmbeddingServiceSettings.
The serialized NAME constant remains 'voyageai_embeddings_service_settings'
for backwards compatibility.
@fzowl fzowl requested a review from DonalEvans March 16, 2026 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Feature:GenAI Features around GenAI :SearchOrg/Inference Label for the Search Inference team Team:Search - Inference v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants