Releases: kubernetes-sigs/gateway-api-inference-extension
v1.0.1
v1.0.1-rc.1
This is a small patch release to fix helm issues.
Context: #1616
v1.0.0
Inference Gateway v1
This release marks the v1 of Inference Gateway, and with it the promotion of the InferencePool
CRD to v1.
We're excited to announce our v1 release of Inference Gateway! A huge thank you to our contributors, gateway implementers, and downstream community for helping to shape IGW into something we are proud of.
If you're new: Please take a look at our guide to get started! Or learn more about IGW here: https://gateway-api-inference-extension.sigs.k8s.io/
There is still much to do and more enhancements to come. Namely:
- SLO-based predictive scheduling
- Flow Control for multi-tenancy support
- An improved pluggable Data Layer system
- Multi-modal support
- APIs to support meeting multiple different SLOs in a single InferencePool
We look forward to what's next in the Inference space and looking forward to continuing to grow with it.
Onwards!
Cheers,
The IGW maintainer team
What's Changed
- chore(deps): bump golang.org/x/sync from 0.15.0 to 0.16.0 by @dependabot[bot] in #1160
- feat: Introduce pluggable queue framework by @LukeAVanDrie in #1138
- removed USE_STREAMING env var from conformance + tests by @nirrozenbaum in #1157
- Conformance: Fixes the EPP ConfigMap Namespace by @danehans in #1166
- feat: Introduce pluggable intra-flow dispatch policy framework by @LukeAVanDrie in #1139
- Add support for plugin configuration in the InferencePool helm chart by @ahg-g in #1168
- feat(epp): use kebab-cased flags for epp by @Xunzhuo in #1177
- chore: remove duplicated import for code polish by @Xunzhuo in #1179
- Add documentation for the new Configuration via text feature by @shmuelk in #1110
- fix: set epp image tag when releasing by @Xunzhuo in #1182
- feat: Introduce pluggable inter-flow dispatch policy framework by @LukeAVanDrie in #1167
- Update istio release by @LiorLieberman in #1186
- test: kubectl-validate manifests in presubmit by @chewong in #1083
- Delete the unnecessary Marshal of processRequestBody by @whzghb in #1127
- feat(flowcontrol): Introduce ManagedQueue and Service Contracts by @LukeAVanDrie in #1174
- (feat) initial types and interfaces for pluggable data layer by @elevran in #1154
- Fix a regression in prefix plugin which can cause data race by @liu-cong in #1188
- feat: generate crd with version annotation. by @zetxqx in #1134
- chore: update vllm deployment tag to latest by @Xunzhuo in #1184
- moved build details to version package by @nirrozenbaum in #1185
- Add an "Implementing a Compatible Data Plane" section to the implementers guide by @AndresGuedez in #1143
- feat(flowcontrol): Implement registry shard by @LukeAVanDrie in #1187
- feat(flowcontrol): refine types and consolidate docs by @LukeAVanDrie in #1191
- docs: update to use kebab-cased flags changed at #1177 by @nekomeowww in #1193
- added graceful shutdown when scheduler config is not initialized by @nirrozenbaum in #1198
- feat: move x-k8s to apix and add v1 InferencePool to api/v1 by @capri-xiyue in #1116
- feat: Change epp and conformance to use v1 type InferencePool by @capri-xiyue in #1118
- chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1200
- Enhanced InferencePool Chart Configurability by @vMaroon in #1211
- refactor(flowcontrol): Enable behavioral mocking by @LukeAVanDrie in #1202
- random endpoint pick on tie break in max score picker by @nirrozenbaum in #1205
- removed cmd/registry file by @nirrozenbaum in #1206
- Support scraping metrics from target running with TLS by @pierDipi in #1190
- gke-gateway v0.5.0 conformance test report 9/9 by @zetxqx in #1005
- added join slack badge to readme by @nirrozenbaum in #1218
- chore: 🔨 Use the v0.3.0 llm-d-inference-sim image tag. by @yafengio in #1140
- style: ✨ optimize import order and more readable. by @yafengio in #1220
- Remove TODO stubs from website by @sats-23 in #1221
- docs: update whole repo to v1 inferencepool by @capri-xiyue in #1213
- release issue template: updated the tag command to include the -s for signing the tag by @nirrozenbaum in #1196
- fix try it out section in quickstart by @nirrozenbaum in #1197
- Do not log potentially sensitive data below DEBUG log level by @pierDipi in #1192
- Update index.md with gateway-inference-extension slack by @LiorLieberman in #1225
- Add fallback logic to support multiple endpoints by @rlakhtakia in #1122
- chore: 🔨 add fmt-imports tool for import order. by @yafengio in #1228
- fix: missing permission to list inference.networking.k8s.io/v1/inferencepool by @nekomeowww in #1230
- fix: Make test iter deterministic to fix flake by @LukeAVanDrie in #1231
- feat(flowcontrol): Implement ShardProcessor engine by @LukeAVanDrie in #1203
- Add a set of configuration defaults by @shmuelk in #1223
- Proposing the successor to the InferenceModel API by @kfswain in #1199
- cleanup of unused fields and functions by @nirrozenbaum in #1233
- chore: update CRD BundleVersion to main-dev by @zetxqx in #1216
- Change String() to accept a value reciever. by @elevran in #1239
- renamed kvcache-scorer to kvcache-utilization-scorer by @nirrozenbaum in #1238
- Add unit tests by @elevran in #1195
- test-report: istio 1.28-alpha v0.4.0 & v0.5.0 report 9/9 by @aslakknutsen in #1102
- added scheduler config logging on bootstrap by @nirrozenbaum in #1247
- fix: updated to v1 inferencepool in manifests by @capri-xiyue in #1248
- chore(deps): bump github.com/onsi/gomega from 1.37.0 to 1.38.0 by @dependabot[bot] in #1253
- chore(deps): bump sigs.k8s.io/yaml from 1.5.0 to 1.6.0 by @dependabot[bot] in #1251
- chore(deps): bump google.golang.org/grpc from 1.73.0...
v1.0.0-rc.4
a list of PRs that are cherry picked into RC4:
CRD updates:
performance issues fixed in pickers:
helm chart fix:
bug fix in prefix when no request id header is supplied by the gateway:
#1490 (was on the original list but somehow missed, without this prefix cache won't work in bursty workload)
test flake fix, required for llm-d to use formal image of IGW:
** all the items in this list have been cherry picked successfully into the release branch.
v1.0.0-rc.3
v1.0.0-rc.2
This release is primarily updating the InferencePool API and Conformance tests after the completion of the API review conducted in this PR: #1173
NOTE: Barring any breaking change after this RC the APIs are considered frozen for the remainder of the v1.0 release cycle
v1.0.0-rc.1
What's Changed
- chore(deps): bump golang.org/x/sync from 0.15.0 to 0.16.0 by @dependabot[bot] in #1160
- feat: Introduce pluggable queue framework by @LukeAVanDrie in #1138
- removed USE_STREAMING env var from conformance + tests by @nirrozenbaum in #1157
- Conformance: Fixes the EPP ConfigMap Namespace by @danehans in #1166
- feat: Introduce pluggable intra-flow dispatch policy framework by @LukeAVanDrie in #1139
- Add support for plugin configuration in the InferencePool helm chart by @ahg-g in #1168
- feat(epp): use kebab-cased flags for epp by @Xunzhuo in #1177
- chore: remove duplicated import for code polish by @Xunzhuo in #1179
- Add documentation for the new Configuration via text feature by @shmuelk in #1110
- fix: set epp image tag when releasing by @Xunzhuo in #1182
- feat: Introduce pluggable inter-flow dispatch policy framework by @LukeAVanDrie in #1167
- Update istio release by @LiorLieberman in #1186
- test: kubectl-validate manifests in presubmit by @chewong in #1083
- Delete the unnecessary Marshal of processRequestBody by @whzghb in #1127
- feat(flowcontrol): Introduce ManagedQueue and Service Contracts by @LukeAVanDrie in #1174
- (feat) initial types and interfaces for pluggable data layer by @elevran in #1154
- Fix a regression in prefix plugin which can cause data race by @liu-cong in #1188
- feat: generate crd with version annotation. by @zetxqx in #1134
- chore: update vllm deployment tag to latest by @Xunzhuo in #1184
- moved build details to version package by @nirrozenbaum in #1185
- Add an "Implementing a Compatible Data Plane" section to the implementers guide by @AndresGuedez in #1143
- feat(flowcontrol): Implement registry shard by @LukeAVanDrie in #1187
- feat(flowcontrol): refine types and consolidate docs by @LukeAVanDrie in #1191
- docs: update to use kebab-cased flags changed at #1177 by @nekomeowww in #1193
- added graceful shutdown when scheduler config is not initialized by @nirrozenbaum in #1198
- feat: move x-k8s to apix and add v1 InferencePool to api/v1 by @capri-xiyue in #1116
- feat: Change epp and conformance to use v1 type InferencePool by @capri-xiyue in #1118
- chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1200
- Enhanced InferencePool Chart Configurability by @vMaroon in #1211
- refactor(flowcontrol): Enable behavioral mocking by @LukeAVanDrie in #1202
- random endpoint pick on tie break in max score picker by @nirrozenbaum in #1205
- removed cmd/registry file by @nirrozenbaum in #1206
- Support scraping metrics from target running with TLS by @pierDipi in #1190
- gke-gateway v0.5.0 conformance test report 9/9 by @zetxqx in #1005
- added join slack badge to readme by @nirrozenbaum in #1218
- chore: 🔨 Use the v0.3.0 llm-d-inference-sim image tag. by @yafengio in #1140
- style: ✨ optimize import order and more readable. by @yafengio in #1220
- Remove TODO stubs from website by @sats-23 in #1221
- docs: update whole repo to v1 inferencepool by @capri-xiyue in #1213
- release issue template: updated the tag command to include the -s for signing the tag by @nirrozenbaum in #1196
- fix try it out section in quickstart by @nirrozenbaum in #1197
- Do not log potentially sensitive data below DEBUG log level by @pierDipi in #1192
- Update index.md with gateway-inference-extension slack by @LiorLieberman in #1225
- Add fallback logic to support multiple endpoints by @rlakhtakia in #1122
- chore: 🔨 add fmt-imports tool for import order. by @yafengio in #1228
- fix: missing permission to list inference.networking.k8s.io/v1/inferencepool by @nekomeowww in #1230
- fix: Make test iter deterministic to fix flake by @LukeAVanDrie in #1231
- feat(flowcontrol): Implement ShardProcessor engine by @LukeAVanDrie in #1203
- Add a set of configuration defaults by @shmuelk in #1223
- Proposing the successor to the InferenceModel API by @kfswain in #1199
- cleanup of unused fields and functions by @nirrozenbaum in #1233
- chore: update CRD BundleVersion to main-dev by @zetxqx in #1216
- Change String() to accept a value reciever. by @elevran in #1239
- renamed kvcache-scorer to kvcache-utilization-scorer by @nirrozenbaum in #1238
- Add unit tests by @elevran in #1195
- test-report: istio 1.28-alpha v0.4.0 & v0.5.0 report 9/9 by @aslakknutsen in #1102
- added scheduler config logging on bootstrap by @nirrozenbaum in #1247
- fix: updated to v1 inferencepool in manifests by @capri-xiyue in #1248
- chore(deps): bump github.com/onsi/gomega from 1.37.0 to 1.38.0 by @dependabot[bot] in #1253
- chore(deps): bump sigs.k8s.io/yaml from 1.5.0 to 1.6.0 by @dependabot[bot] in #1251
- chore(deps): bump google.golang.org/grpc from 1.73.0 to 1.74.2 by @dependabot[bot] in #1252
- Update the Endpoint Picker Protocol with a new metadata field that communicates status associated with picked endpoints by @AndresGuedez in #1226
- chore(deps): bump sigs.k8s.io/controller-tools from 0.17.3 to 0.18.0 by @dependabot[bot] in #1254
- Update golangci lint to v2.x by @elevran in #1256
- Add nightly benchmarking documentation by @kaushikmitr in #1234
- normalize score to make sure it is always in the range of [0,1] by @nirrozenbaum in #1236
- updated metrics and logging for plugins by @nirrozenbaum in https://github....
v0.5.1
v0.5.1-rc.1
This patch fix is intended to resolve a few bug fixes. Justification & breakdown here: #1215
v0.5.0
Overview
Major Highlights
-
Conformance Tests: Validate your controller’s behavior with end-to-end tests covering InferencePool,
InferenceModel, HTTPRoute, and more. -
New Config API: A new Config API which allows the configuration of plugins through a config file without touching core code.
-
Helm Charts: helm chart update to support the reuse of Config API easily.
What's Changed
- Add scripts for running e2es by @keithmattix in #978
- fix: istio example destination rule by @EyalPazz in #970
- Bump Istio tag reference by @keithmattix in #974
- adds New functions to the scorers for consistency by @nirrozenbaum in #975
- feat(conformance): enable multiple endpoints in header based filter for EPP's conformance testing. by @zetxqx in #964
- e2e makefile comment fix by @nirrozenbaum in #976
- API: Adds 5xx Status Code for Invalid ExtRef by @danehans in #991
- feat(conformance): Add test for invalid EPP service reference by @SinaChavoshi in #959
- moved the creation of the context to main.go. by @nirrozenbaum in #995
- doc: fix dead links by @caozhuozi in #989
- feat: add health check for epp cluster by @zhengkezhou1 in #966
- test: gRPC server unit tests and utilities for further end-to-end tests by @irar2 in #820
- Update dynamic-lora-sidecar to expose metrics to track loaded adapters by @shotarok in #980
- refactor: Replace prefix cache structure with golang-lru by @kfirtoledo in #928
- feat(conformance): Add HTTPRouteMultipleRulesDifferentPools test by @SinaChavoshi in #834
- feat: Load the SchedulerConfig from a configuration file/text and make it easier to add plugins by @shmuelk in #881
- adding pre-request plugin to requestcontrol layer by @nirrozenbaum in #1004
- feat(conformance): Add test execution instruction to the guide. by @SinaChavoshi in #878
- fix: Update bbr fqdn to use helm release namespace by @chewong in #1009
- feat(conformance): Add HTTPRoute port validation tests for InferencePool backends by @zetxqx in #911
- refactor(conformance): move some common resources to shared place and add EPP service to tests needed. by @zetxqx in #982
- fix(Conformance): Add namespace-(labels|annotations) flag parsing by @aslakknutsen in #984
- bump cpu deployment version by @nirrozenbaum in #1016
- fix: api doc typo InvalidExtnesionRef by @aslakknutsen in #1018
- Adds vLLM CPU and Sim Support to Release Script by @danehans in #1020
- Add Makefile to run unit tests of tools/dynamic-lora-sidecar locally by @shotarok in #1021
- profile handler ProcessResult returns additional return value by @nirrozenbaum in #1013
- cleanup after config api PR was merged by @nirrozenbaum in #1012
- Making inferenceModel optional by @kfswain in #1024
- Adding Design Principles by @robscott in #596
- Adding Nir as a maintainer! by @kfswain in #1026
- [Fix] Missing property "apiGroup" error by @yafengio in #1015
- API: Adds default status condition to InferencePool by @danehans in #830
- feat(conformance): Add EPP conformance test for Gateway routing by @zetxqx in #961
- update sim deployment tag to latest by @nirrozenbaum in #1041
- refactor: rename plugin.Name() => plugin.Type() by @elevran in #1038
- docs: update the Getting Started guide to use the latest CRDs by @kfirtoledo in #1045
- added cycle state to pick & process results in profile handler by @nirrozenbaum in #1040
- feat(conformance): Add HTTPRouteMultipleGatewaysDifferentPools test by @SinaChavoshi in #838
- feat(conformance) add EPP unavailable fail-open test by @zetxqx in #999
- Add APIs for the instantiated plugins to the EPP Handle by @shmuelk in #1039
- chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1050
- chore(deps): bump github.com/prometheus/common from 0.64.0 to 0.65.0 by @dependabot[bot] in #1051
- Only create LOCALBIN directory when it does not exist by @elevran in #1054
- remove datastore dependency from the scheduler by @nirrozenbaum in #1049
- add e2e test for epp metrics by @delavet in #938
- refactor(confromance) use common resources for
InferencePoolHTTPRoutePortValidation
test by @zetxqx in #1034 - Reintroduce Plugin.Name() by @elevran in #1057
- Extensible/Pluggable data layer proposal by @elevran in #1023
- Add subsetting logic for epp by @rlakhtakia in #981
- docs: added gke clean up instructions by @capri-xiyue in #1064
- feat(flowcontrol): Add Foundational Types and Architecture by @LukeAVanDrie in #997
- refactor: Allow export prefix SchedulingContextState for use across plugins by @kfirtoledo in #1063
- feat: Added a factory function for the DecisionTree filter by @shmuelk in #1053
- Adding pprof endpoints to metrics port by @kfswain in #1069
- version in README by @nirrozenbaum in #1072
- feat: Add a context.Context to the plugins.HAndle interface by @shmuelk in #1076
- Update model server protocol with prefix cache reuse by @liu-cong in #1077
- Update prefix plugin guide to use vllm as default to be consistent by @liu-cong in #1078
- refactor(conformance) merge similar utility functions. by @zetxqx in #1055
- fix(conformance): fix conformance setup issue by not relying on
suite.Setup
from gateway-api by @zetxqx in #1060 - e2e cleanup by @nirrozenbaum in #988
- fix: add wait after both httproute deletes for status to update by @aslakknutsen in #1056
- API: Refine ResolvedRefs condition for invalid ExtensionReference and expand InferencePoolReason values by @zetxqx in https://github.com/kubernetes-sigs/gateway-api-infe...