[Inference API] Add VoyageAI inference service integration#142562
[Inference API] Add VoyageAI inference service integration#142562fzowl wants to merge 15 commits intoelastic:mainfrom
Conversation
- text embedding models - multimodal models
- text embedding models - multimodal models
|
@fzowl please enable the option "Allow edits and access to secrets by maintainers" on your PR. For more information, see the documentation. |
- text embedding models - multimodal models
|
Pinging @elastic/search-inference-team (Team:Search - Inference) |
…task # Conflicts: # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/VoyageAIService.java
…yageai-embedding-task
DonalEvans
left a comment
There was a problem hiding this comment.
In order for the Voyage integration to be used with the embedding task type, a change is needed in SimpleEmbeddingServiceIntegrationValidator.BASE64_IMAGE_DATA, specifically changing the start of the string from data:image/jpg;base64 to data:image/jpeg;base64. This was an oversight on my part when adding that string, since image/jpg is not actually a valid MIME type. For the Jina integration, this wasn't a problem because they discard the MIME type from the input, but Voyage requires the MIME type to be valid, so attempting to create a Voyage endpoint with the embedding task type currently fails.
Please also add unit test coverage for any new behaviour. Some examples off the top of my head are:
- Add a test class for
VoyageAIEmbeddingServiceSettingswhich extendsAbstractBWCWireSerializationTestCaseand which covers thefromMap(),toXContent()andupdateEmbeddingDetails()methods - Change the existing
VoyageAIEmbeddingsServiceSettingsTestsclass to extendAbstractBWCWireSerializationTestCaseinstead ofAbstractWireSerializingTestCaseand add a test forupdateEmbeddingDetails() - Test that
VoyageAIModel.uri()returns the correct values for multimodal vs. text-only model/service settings - Test that
VoyageAIEmbeddingsRequestEntity.toXContent()creates the appropriate request body for multimodal inputs - Add tests to
VoyageAIEmbeddingsResponseEntityTestsfor when the element type of the request/model is notFLOAT - Add tests to
VoyageAIEmbeddingsResponseEntityTestsfor when the request/model is multimodal (the type returned fromparsedResults.embeddings()should beGenericDenseEmbedding*Resultsrather thanDenseEmbedding*Results) - Add tests to
VoyageAIServiceTeststhat callVoyageAIService.doEmbeddingInfer()and cover the three main paths, i.e. the model is not aVoyageAIEmbeddingsModel, the inputs contain non-text entries and the model is not multimodal, and the happy path
Incidentally, if you want to manually test your changes against a real Voyage model, you can do the following, which is what led me to find the bug with the invalid MIME type:
1. Run your local build of Elasticsearch using ./gradlew :run -Drun.license_type=trial
2. Create a VoyageAI endpoint with the embedding task type:
Create endpoint
PUT http://elastic:password@localhost:9200/_inference/embedding/voyageai-embeddings
Content-Type: application/json
Accept: application/json
{
"service": "voyageai",
"service_settings": {
"model_id": "voyage-multimodal-3.5",
"api_key": "{{voyage_api_key}}"
}
}
Perform embedding
POST http://elastic:password@localhost:9200/_inference/embedding/voyageai-embeddings
Content-Type: application/json
Accept: application/json
{
"input": {
"content": {
"type": "image",
"format": "base64",
"value": "data:image/jpeg;base64,iVBORw0KGgoAAAANSUhEUgAAABwAAAA4CAIAAABhUg/jAAAAMklEQVR4nO3MQREAMAgAoLkoFreTiSzhy4MARGe9bX99lEqlUqlUKpVKpVKpVCqVHksHaBwCA2cPf0cAAAAASUVORK5CYII="
}
}
}
...icsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingsServiceSettings.java
Outdated
Show resolved
Hide resolved
...arch/xpack/inference/services/voyageai/embeddings/BaseVoyageAIEmbeddingsServiceSettings.java
Outdated
Show resolved
Hide resolved
...arch/xpack/inference/services/voyageai/embeddings/BaseVoyageAIEmbeddingsServiceSettings.java
Show resolved
Hide resolved
.../org/elasticsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingsModel.java
Outdated
Show resolved
Hide resolved
| 72, | ||
| "voyage-02", | ||
| 72 | ||
| private static final Map<String, Integer> MODEL_BATCH_SIZES = Map.ofEntries( |
There was a problem hiding this comment.
Where are these values coming from? In the documentation for the embeddings API and the multimodal embeddings API, the maximum number of inputs is given as 1000. 7 seems extremely small.
There was a problem hiding this comment.
@DonalEvans These values are based on the safe and conservative calculation. We have a token limit for the total request (batch) and also per document - so, to be on the safe side with the batching, the total number of documents can be total_number_of_tokens_per_batch/maximum_number_of_tokens_per_document.
The ideal would be to build token-aware batching (which is technically possible), but it requires the tokeniser to be downloaded from HuggingFace. If you are fine with this, i'd be more than happy to build it, that'd be a more elegant solution.
...arch/xpack/inference/services/voyageai/embeddings/BaseVoyageAIEmbeddingsServiceSettings.java
Outdated
Show resolved
Hide resolved
...arch/xpack/inference/services/voyageai/embeddings/BaseVoyageAIEmbeddingsServiceSettings.java
Outdated
Show resolved
Hide resolved
- text embedding models - multimodal models
- text embedding models - multimodal models
- text embedding models - multimodal models
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (12)
📝 WalkthroughWalkthroughAdds a changelog entry and integrates VoyageAI as an inference service supporting EMBEDDING in addition to TEXT_EMBEDDING and RERANK. Registers new VoyageAI embedding service settings as a NamedWriteable and introduces BaseVoyageAIEmbeddingsServiceSettings and VoyageAIEmbeddingServiceSettings with embedding-type, dimensions, similarity, and multimodal flags. Extends VoyageAI service, models, request/response entities, and action creation to handle structured InferenceStringGroup inputs, multimodal endpoints, and generic embedding result types. Updates model mappings, configuration metadata, and extensive tests for embedding and multimodal behavior. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can enable review details to help with troubleshooting, context usage and more.Enable the |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/response/VoyageAIEmbeddingsResponseEntity.java (1)
182-187:⚠️ Potential issue | 🟠 MajorMissing TaskType check for INT8 embeddings.
The FLOAT and BIT/BINARY branches return generic result types when
taskType == EMBEDDING, but the INT8 branch (lines 182-187) unconditionally returnsDenseEmbeddingByteResults. This inconsistency may cause issues whenTaskType.EMBEDDINGis used with INT8 embedding type.Proposed fix
} else if (embeddingType == VoyageAIEmbeddingType.INT8) { var embeddingResult = EmbeddingInt8Result.PARSER.apply(jsonParser, null); List<DenseEmbeddingByteResults.Embedding> embeddingList = embeddingResult.entries.stream() .map(EmbeddingInt8ResultEntry::toInferenceByteEmbedding) .toList(); + + if (taskType == TaskType.EMBEDDING) { + return new GenericDenseEmbeddingByteResults(embeddingList); + } return new DenseEmbeddingByteResults(embeddingList);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/response/VoyageAIEmbeddingsResponseEntity.java` around lines 182 - 187, VoyageAIEmbeddingsResponseEntity currently always returns DenseEmbeddingByteResults for VoyageAIEmbeddingType.INT8; update the INT8 branch to mirror the FLOAT and BIT/BINARY branches by checking the request's taskType (TaskType.EMBEDDING) and returning the appropriate generic result when taskType == EMBEDDING (e.g., return the same generic embedding response used for FLOAT/BIT branches) otherwise return DenseEmbeddingByteResults for non-EMBEDDING tasks; locate the INT8 handling in method parsing embeddingType (symbol: embeddingType, VoyageAIEmbeddingType.INT8) and use the taskType variable / enum (TaskType.EMBEDDING) to conditionally construct the correct response class instead of unconditionally returning DenseEmbeddingByteResults.
♻️ Duplicate comments (1)
x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/BaseVoyageAIEmbeddingsServiceSettings.java (1)
41-42:⚠️ Potential issue | 🔴 CriticalAdd a new transport version for the
multimodalModelwire field.
multimodalModelchanges the serialized form, but both read and write paths gate it onvoyage_ai_integration_added, which predates this field. In a mixed-version cluster, new code will try to read/write the extra boolean against peers that still use the older layout, which can shift the stream and break deserialization. This field needs its own freshTransportVersiondefinition and should be gated only behind that new version.As per coding guidelines, "For changes to a
Writeableimplementation (writeToand constructor fromStreamInput), add a newpublic static final <UNIQUE_DESCRIPTIVE_NAME> = TransportVersion.fromName("<unique_descriptive_name>")and use it in the new code paths. Confirm the backport branches and then generate a new version file with./gradlew generateTransportVersion".Also applies to: 148-160, 261-271
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/BaseVoyageAIEmbeddingsServiceSettings.java` around lines 41 - 42, Add a new TransportVersion constant for the multimodalModel wire change (e.g., public static final TransportVersion VOYAGE_AI_MULTIMODAL_MODEL_ADDED = TransportVersion.fromName("voyage_ai_multimodal_model_added")) in BaseVoyageAIEmbeddingsServiceSettings, and update the writeTo and StreamInput constructor code paths that currently gate multimodalModel on VOYAGE_AI_INTEGRATION_ADDED to instead gate only on the new VOYAGE_AI_MULTIMODAL_MODEL_ADDED; ensure all places handling multimodalModel serialization/deserialization (the read/write branches in this class) use the new constant, then run ./gradlew generateTransportVersion and add the generated version to the transport versions for backport branches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In
`@x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/response/VoyageAIEmbeddingsResponseEntity.java`:
- Around line 182-187: VoyageAIEmbeddingsResponseEntity currently always returns
DenseEmbeddingByteResults for VoyageAIEmbeddingType.INT8; update the INT8 branch
to mirror the FLOAT and BIT/BINARY branches by checking the request's taskType
(TaskType.EMBEDDING) and returning the appropriate generic result when taskType
== EMBEDDING (e.g., return the same generic embedding response used for
FLOAT/BIT branches) otherwise return DenseEmbeddingByteResults for non-EMBEDDING
tasks; locate the INT8 handling in method parsing embeddingType (symbol:
embeddingType, VoyageAIEmbeddingType.INT8) and use the taskType variable / enum
(TaskType.EMBEDDING) to conditionally construct the correct response class
instead of unconditionally returning DenseEmbeddingByteResults.
---
Duplicate comments:
In
`@x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/BaseVoyageAIEmbeddingsServiceSettings.java`:
- Around line 41-42: Add a new TransportVersion constant for the multimodalModel
wire change (e.g., public static final TransportVersion
VOYAGE_AI_MULTIMODAL_MODEL_ADDED =
TransportVersion.fromName("voyage_ai_multimodal_model_added")) in
BaseVoyageAIEmbeddingsServiceSettings, and update the writeTo and StreamInput
constructor code paths that currently gate multimodalModel on
VOYAGE_AI_INTEGRATION_ADDED to instead gate only on the new
VOYAGE_AI_MULTIMODAL_MODEL_ADDED; ensure all places handling multimodalModel
serialization/deserialization (the read/write branches in this class) use the
new constant, then run ./gradlew generateTransportVersion and add the generated
version to the transport versions for backport branches.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 69c87ebe-a9f5-4140-8e5d-a53c13fb1957
📒 Files selected for processing (21)
docs/changelog/142562.yamlx-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceNamedWriteablesProvider.javax-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/VoyageAIModel.javax-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/VoyageAIService.javax-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/action/VoyageAIActionCreator.javax-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/BaseVoyageAIEmbeddingsServiceSettings.javax-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingServiceSettings.javax-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingsModel.javax-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingsModelCreator.javax-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingsServiceSettings.javax-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/request/VoyageAIEmbeddingsRequest.javax-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/request/VoyageAIEmbeddingsRequestEntity.javax-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/request/VoyageAIUtils.javax-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/voyageai/response/VoyageAIEmbeddingsResponseEntity.javax-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/VoyageAIServiceTests.javax-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/action/VoyageAIEmbeddingsActionTests.javax-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingsModelTests.javax-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/embeddings/VoyageAIEmbeddingsServiceSettingsTests.javax-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/request/VoyageAIEmbeddingsRequestEntityTests.javax-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/request/VoyageAIEmbeddingsRequestTests.javax-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/voyageai/response/VoyageAIEmbeddingsResponseEntityTests.java
…i/elasticsearch into voyageai-embedding-task
…ive tests - Fix INT8 embedding type missing TaskType.EMBEDDING check in response parser - Fix MIME type from image/jpg to image/jpeg in SimpleEmbeddingServiceIntegrationValidator - Add TransportVersion gating for multimodalModel field in BaseVoyageAIEmbeddingsServiceSettings - Remove null checks for embeddingType in elementType() and toXContentFragmentOfExposedFields() - Simplify validation with throwIfValidationErrorsExist() - Remove redundant parsePersistedConfig methods (now handled by SenderService) - Add batch size documentation comment explaining values source - Move holdover comment from class javadoc to NAME constant - Change buildUriFromSettings to accept ServiceSettings instead of cast - Add VoyageAIEmbeddingServiceSettingsTests (BWC wire serialization tests) - Update VoyageAIEmbeddingsServiceSettingsTests to BWC with mutateInstanceForVersion - Add multimodal model factory methods and URI tests to VoyageAIEmbeddingsModelTests - Add multimodal input XContent test to VoyageAIEmbeddingsRequestEntityTests - Add INT8, BIT, and generic embedding response tests to VoyageAIEmbeddingsResponseEntityTests - Add doEmbeddingInfer tests to VoyageAIServiceTests (invalid model, non-text input, multimodal)
Resolve transport version upper bounds conflict by regenerating voyage_ai_multimodal_embeddings_added to 9318000.
…iceSettings Cosmetic rename to avoid confusion with VoyageAIEmbeddingServiceSettings. The serialized NAME constant remains 'voyageai_embeddings_service_settings' for backwards compatibility.
Summary
This PR contains the cleaned up VoyageAI integration with the following improvements:
Testing:
All tests compile successfully