We're excited to announce our v1 release of Inference Gateway! A huge thank you to our contributors, gateway implementers, and downstream community for helping to shape IGW into something we are proud of.

If you're new: Please take a look at our guide to get started! Or learn more about IGW here: https://gateway-api-inference-extension.sigs.k8s.io/

There is still much to do and more enhancements to come. Namely:

SLO-based predictive scheduling
Flow Control for multi-tenancy support
An improved pluggable Data Layer system
Multi-modal support
APIs to support meeting multiple different SLOs in a single InferencePool

We look forward to what's next in the Inference space and looking forward to continuing to grow with it.

Onwards!

Cheers,
The IGW maintainer team

What's Changed

chore(deps): bump golang.org/x/sync from 0.15.0 to 0.16.0 by @dependabot[bot] in #1160
feat: Introduce pluggable queue framework by @LukeAVanDrie in #1138
removed USE_STREAMING env var from conformance + tests by @nirrozenbaum in #1157
Conformance: Fixes the EPP ConfigMap Namespace by @danehans in #1166
feat: Introduce pluggable intra-flow dispatch policy framework by @LukeAVanDrie in #1139
Add support for plugin configuration in the InferencePool helm chart by @ahg-g in #1168
feat(epp): use kebab-cased flags for epp by @Xunzhuo in #1177
chore: remove duplicated import for code polish by @Xunzhuo in #1179
Add documentation for the new Configuration via text feature by @shmuelk in #1110
fix: set epp image tag when releasing by @Xunzhuo in #1182
feat: Introduce pluggable inter-flow dispatch policy framework by @LukeAVanDrie in #1167
Update istio release by @LiorLieberman in #1186
test: kubectl-validate manifests in presubmit by @chewong in #1083
Delete the unnecessary Marshal of processRequestBody by @whzghb in #1127
feat(flowcontrol): Introduce ManagedQueue and Service Contracts by @LukeAVanDrie in #1174
(feat) initial types and interfaces for pluggable data layer by @elevran in #1154
Fix a regression in prefix plugin which can cause data race by @liu-cong in #1188
feat: generate crd with version annotation. by @zetxqx in #1134
chore: update vllm deployment tag to latest by @Xunzhuo in #1184
moved build details to version package by @nirrozenbaum in #1185
Add an "Implementing a Compatible Data Plane" section to the implementers guide by @AndresGuedez in #1143
feat(flowcontrol): Implement registry shard by @LukeAVanDrie in #1187
feat(flowcontrol): refine types and consolidate docs by @LukeAVanDrie in #1191
docs: update to use kebab-cased flags changed at #1177 by @nekomeowww in #1193
added graceful shutdown when scheduler config is not initialized by @nirrozenbaum in #1198
feat: move x-k8s to apix and add v1 InferencePool to api/v1 by @capri-xiyue in #1116
feat: Change epp and conformance to use v1 type InferencePool by @capri-xiyue in #1118
chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1200
Enhanced InferencePool Chart Configurability by @vMaroon in #1211
refactor(flowcontrol): Enable behavioral mocking by @LukeAVanDrie in #1202
random endpoint pick on tie break in max score picker by @nirrozenbaum in #1205
removed cmd/registry file by @nirrozenbaum in #1206
Support scraping metrics from target running with TLS by @pierDipi in #1190
gke-gateway v0.5.0 conformance test report 9/9 by @zetxqx in #1005
added join slack badge to readme by @nirrozenbaum in #1218
chore: 🔨 Use the v0.3.0 llm-d-inference-sim image tag. by @yafengio in #1140
style: ✨ optimize import order and more readable. by @yafengio in #1220
Remove TODO stubs from website by @sats-23 in #1221
docs: update whole repo to v1 inferencepool by @capri-xiyue in #1213
release issue template: updated the tag command to include the -s for signing the tag by @nirrozenbaum in #1196
fix try it out section in quickstart by @nirrozenbaum in #1197
Do not log potentially sensitive data below DEBUG log level by @pierDipi in #1192
Update index.md with gateway-inference-extension slack by @LiorLieberman in #1225
Add fallback logic to support multiple endpoints by @rlakhtakia in #1122
chore: 🔨 add fmt-imports tool for import order. by @yafengio in #1228
fix: missing permission to list inference.networking.k8s.io/v1/inferencepool by @nekomeowww in #1230
fix: Make test iter deterministic to fix flake by @LukeAVanDrie in #1231
feat(flowcontrol): Implement ShardProcessor engine by @LukeAVanDrie in #1203
Add a set of configuration defaults by @shmuelk in #1223
Proposing the successor to the InferenceModel API by @kfswain in #1199
cleanup of unused fields and functions by @nirrozenbaum in #1233
chore: update CRD BundleVersion to main-dev by @zetxqx in #1216
Change String() to accept a value reciever. by @elevran in #1239
renamed kvcache-scorer to kvcache-utilization-scorer by @nirrozenbaum in #1238
Add unit tests by @elevran in #1195
test-report: istio 1.28-alpha v0.4.0 & v0.5.0 report 9/9 by @aslakknutsen in #1102
added scheduler config logging on bootstrap by @nirrozenbaum in #1247
fix: updated to v1 inferencepool in manifests by @capri-xiyue in #1248
chore(deps): bump github.com/onsi/gomega from 1.37.0 to 1.38.0 by @dependabot[bot] in #1253
chore(deps): bump sigs.k8s.io/yaml from 1.5.0 to 1.6.0 by @dependabot[bot] in #1251
chore(deps): bump google.golang.org/grpc from 1.73.0...

Contributors

aslakknutsen, robscott, and 38 other contributors

Assets 5

08 Sep 12:10

nirrozenbaum

v1.0.0-rc.4

7ce1f47

v1.0.0-rc.4 Pre-release

Pre-release

a list of PRs that are cherry picked into RC4:

bug fix in prefix when no request id header is supplied by the gateway:

#1490 (was on the original list but somehow missed, without this prefix cache won't work in bursty workload)

test flake fix, required for llm-d to use formal image of IGW:

#1534

** all the items in this list have been cherry picked successfully into the release branch.

Assets 3

05 Sep 11:57

nirrozenbaum

v1.0.0-rc.3

9c24d20

v1.0.0-rc.3 Pre-release

Pre-release

cherry picked PRs:
#1508 - critical bug fix to allow setting custom plugins config through helm chart
#1509 - prefix writing its state to CycleState.
#1412 - new weighted random picker

Assets 3

29 Aug 00:11

kfswain

v1.0.0-rc.2

d70b157

v1.0.0-rc.2 Pre-release

Pre-release

This release is primarily updating the InferencePool API and Conformance tests after the completion of the API review conducted in this PR: #1173

NOTE: Barring any breaking change after this RC the APIs are considered frozen for the remainder of the v1.0 release cycle

Assets 3

26 Aug 12:59

kfswain

v1.0.0-rc.1

dc82925

v1.0.0-rc.1 Pre-release

Pre-release

What's Changed

chore(deps): bump golang.org/x/sync from 0.15.0 to 0.16.0 by @dependabot[bot] in #1160
feat: Introduce pluggable queue framework by @LukeAVanDrie in #1138
removed USE_STREAMING env var from conformance + tests by @nirrozenbaum in #1157
Conformance: Fixes the EPP ConfigMap Namespace by @danehans in #1166
feat: Introduce pluggable intra-flow dispatch policy framework by @LukeAVanDrie in #1139
Add support for plugin configuration in the InferencePool helm chart by @ahg-g in #1168
feat(epp): use kebab-cased flags for epp by @Xunzhuo in #1177
chore: remove duplicated import for code polish by @Xunzhuo in #1179
Add documentation for the new Configuration via text feature by @shmuelk in #1110
fix: set epp image tag when releasing by @Xunzhuo in #1182
feat: Introduce pluggable inter-flow dispatch policy framework by @LukeAVanDrie in #1167
Update istio release by @LiorLieberman in #1186
test: kubectl-validate manifests in presubmit by @chewong in #1083
Delete the unnecessary Marshal of processRequestBody by @whzghb in #1127
feat(flowcontrol): Introduce ManagedQueue and Service Contracts by @LukeAVanDrie in #1174
(feat) initial types and interfaces for pluggable data layer by @elevran in #1154
Fix a regression in prefix plugin which can cause data race by @liu-cong in #1188
feat: generate crd with version annotation. by @zetxqx in #1134
chore: update vllm deployment tag to latest by @Xunzhuo in #1184
moved build details to version package by @nirrozenbaum in #1185
Add an "Implementing a Compatible Data Plane" section to the implementers guide by @AndresGuedez in #1143
feat(flowcontrol): Implement registry shard by @LukeAVanDrie in #1187
feat(flowcontrol): refine types and consolidate docs by @LukeAVanDrie in #1191
docs: update to use kebab-cased flags changed at #1177 by @nekomeowww in #1193
added graceful shutdown when scheduler config is not initialized by @nirrozenbaum in #1198
feat: move x-k8s to apix and add v1 InferencePool to api/v1 by @capri-xiyue in #1116
feat: Change epp and conformance to use v1 type InferencePool by @capri-xiyue in #1118
chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1200
Enhanced InferencePool Chart Configurability by @vMaroon in #1211
refactor(flowcontrol): Enable behavioral mocking by @LukeAVanDrie in #1202
random endpoint pick on tie break in max score picker by @nirrozenbaum in #1205
removed cmd/registry file by @nirrozenbaum in #1206
Support scraping metrics from target running with TLS by @pierDipi in #1190
gke-gateway v0.5.0 conformance test report 9/9 by @zetxqx in #1005
added join slack badge to readme by @nirrozenbaum in #1218
chore: 🔨 Use the v0.3.0 llm-d-inference-sim image tag. by @yafengio in #1140
style: ✨ optimize import order and more readable. by @yafengio in #1220
Remove TODO stubs from website by @sats-23 in #1221
docs: update whole repo to v1 inferencepool by @capri-xiyue in #1213
release issue template: updated the tag command to include the -s for signing the tag by @nirrozenbaum in #1196
fix try it out section in quickstart by @nirrozenbaum in #1197
Do not log potentially sensitive data below DEBUG log level by @pierDipi in #1192
Update index.md with gateway-inference-extension slack by @LiorLieberman in #1225
Add fallback logic to support multiple endpoints by @rlakhtakia in #1122
chore: 🔨 add fmt-imports tool for import order. by @yafengio in #1228
fix: missing permission to list inference.networking.k8s.io/v1/inferencepool by @nekomeowww in #1230
fix: Make test iter deterministic to fix flake by @LukeAVanDrie in #1231
feat(flowcontrol): Implement ShardProcessor engine by @LukeAVanDrie in #1203
Add a set of configuration defaults by @shmuelk in #1223
Proposing the successor to the InferenceModel API by @kfswain in #1199
cleanup of unused fields and functions by @nirrozenbaum in #1233
chore: update CRD BundleVersion to main-dev by @zetxqx in #1216
Change String() to accept a value reciever. by @elevran in #1239
renamed kvcache-scorer to kvcache-utilization-scorer by @nirrozenbaum in #1238
Add unit tests by @elevran in #1195
test-report: istio 1.28-alpha v0.4.0 & v0.5.0 report 9/9 by @aslakknutsen in #1102
added scheduler config logging on bootstrap by @nirrozenbaum in #1247
fix: updated to v1 inferencepool in manifests by @capri-xiyue in #1248
chore(deps): bump github.com/onsi/gomega from 1.37.0 to 1.38.0 by @dependabot[bot] in #1253
chore(deps): bump sigs.k8s.io/yaml from 1.5.0 to 1.6.0 by @dependabot[bot] in #1251
chore(deps): bump google.golang.org/grpc from 1.73.0 to 1.74.2 by @dependabot[bot] in #1252
Update the Endpoint Picker Protocol with a new metadata field that communicates status associated with picked endpoints by @AndresGuedez in #1226
chore(deps): bump sigs.k8s.io/controller-tools from 0.17.3 to 0.18.0 by @dependabot[bot] in #1254
Update golangci lint to v2.x by @elevran in #1256
Add nightly benchmarking documentation by @kaushikmitr in #1234
normalize score to make sure it is always in the range of [0,1] by @nirrozenbaum in #1236
updated metrics and logging for plugins by @nirrozenbaum in https://github....

Contributors

aslakknutsen, robscott, and 38 other contributors

Assets 3

23 Jul 20:04

kfswain

v0.5.1

978c23b

v0.5.1

This patch fix is intended to resolve a few bug fixes. Justification & breakdown here: #1215

Assets 3

22 Jul 23:20

kfswain

v0.5.1-rc.1

a3dda17

v0.5.1-rc.1 Pre-release

Pre-release

This patch fix is intended to resolve a few bug fixes. Justification & breakdown here: #1215

Assets 3

21 Jul 18:20

nirrozenbaum

v0.5.0

38577e6

v0.5.0

Overview

Major Highlights

Conformance Tests: Validate your controller’s behavior with end-to-end tests covering InferencePool,
InferenceModel, HTTPRoute, and more.
New Config API: A new Config API which allows the configuration of plugins through a config file without touching core code.
Helm Charts: helm chart update to support the reuse of Config API easily.

What's Changed

Add scripts for running e2es by @keithmattix in #978
fix: istio example destination rule by @EyalPazz in #970
Bump Istio tag reference by @keithmattix in #974
adds New functions to the scorers for consistency by @nirrozenbaum in #975
feat(conformance): enable multiple endpoints in header based filter for EPP's conformance testing. by @zetxqx in #964
e2e makefile comment fix by @nirrozenbaum in #976
API: Adds 5xx Status Code for Invalid ExtRef by @danehans in #991
feat(conformance): Add test for invalid EPP service reference by @SinaChavoshi in #959
moved the creation of the context to main.go. by @nirrozenbaum in #995
doc: fix dead links by @caozhuozi in #989
feat: add health check for epp cluster by @zhengkezhou1 in #966
test: gRPC server unit tests and utilities for further end-to-end tests by @irar2 in #820
Update dynamic-lora-sidecar to expose metrics to track loaded adapters by @shotarok in #980
refactor: Replace prefix cache structure with golang-lru by @kfirtoledo in #928
feat(conformance): Add HTTPRouteMultipleRulesDifferentPools test by @SinaChavoshi in #834
feat: Load the SchedulerConfig from a configuration file/text and make it easier to add plugins by @shmuelk in #881
adding pre-request plugin to requestcontrol layer by @nirrozenbaum in #1004
feat(conformance): Add test execution instruction to the guide. by @SinaChavoshi in #878
fix: Update bbr fqdn to use helm release namespace by @chewong in #1009
feat(conformance): Add HTTPRoute port validation tests for InferencePool backends by @zetxqx in #911
refactor(conformance): move some common resources to shared place and add EPP service to tests needed. by @zetxqx in #982
fix(Conformance): Add namespace-(labels|annotations) flag parsing by @aslakknutsen in #984
bump cpu deployment version by @nirrozenbaum in #1016
fix: api doc typo InvalidExtnesionRef by @aslakknutsen in #1018
Adds vLLM CPU and Sim Support to Release Script by @danehans in #1020
Add Makefile to run unit tests of tools/dynamic-lora-sidecar locally by @shotarok in #1021
profile handler ProcessResult returns additional return value by @nirrozenbaum in #1013
cleanup after config api PR was merged by @nirrozenbaum in #1012
Making inferenceModel optional by @kfswain in #1024
Adding Design Principles by @robscott in #596
Adding Nir as a maintainer! by @kfswain in #1026
[Fix] Missing property "apiGroup" error by @yafengio in #1015
API: Adds default status condition to InferencePool by @danehans in #830
feat(conformance): Add EPP conformance test for Gateway routing by @zetxqx in #961
update sim deployment tag to latest by @nirrozenbaum in #1041
refactor: rename plugin.Name() => plugin.Type() by @elevran in #1038
docs: update the Getting Started guide to use the latest CRDs by @kfirtoledo in #1045
added cycle state to pick & process results in profile handler by @nirrozenbaum in #1040
feat(conformance): Add HTTPRouteMultipleGatewaysDifferentPools test by @SinaChavoshi in #838
feat(conformance) add EPP unavailable fail-open test by @zetxqx in #999
Add APIs for the instantiated plugins to the EPP Handle by @shmuelk in #1039
chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1050
chore(deps): bump github.com/prometheus/common from 0.64.0 to 0.65.0 by @dependabot[bot] in #1051
Only create LOCALBIN directory when it does not exist by @elevran in #1054
remove datastore dependency from the scheduler by @nirrozenbaum in #1049
add e2e test for epp metrics by @delavet in #938
refactor(confromance) use common resources for InferencePoolHTTPRoutePortValidation test by @zetxqx in #1034
Reintroduce Plugin.Name() by @elevran in #1057
Extensible/Pluggable data layer proposal by @elevran in #1023
Add subsetting logic for epp by @rlakhtakia in #981
docs: added gke clean up instructions by @capri-xiyue in #1064
feat(flowcontrol): Add Foundational Types and Architecture by @LukeAVanDrie in #997
refactor: Allow export prefix SchedulingContextState for use across plugins by @kfirtoledo in #1063
feat: Added a factory function for the DecisionTree filter by @shmuelk in #1053
Adding pprof endpoints to metrics port by @kfswain in #1069
version in README by @nirrozenbaum in #1072
feat: Add a context.Context to the plugins.HAndle interface by @shmuelk in #1076
Update model server protocol with prefix cache reuse by @liu-cong in #1077
Update prefix plugin guide to use vllm as default to be consistent by @liu-cong in #1078
refactor(conformance) merge similar utility functions. by @zetxqx in #1055
fix(conformance): fix conformance setup issue by not relying on suite.Setup from gateway-api by @zetxqx in #1060
e2e cleanup by @nirrozenbaum in #988
fix: add wait after both httproute deletes for status to update by @aslakknutsen in #1056
API: Refine ResolvedRefs condition for invalid ExtensionReference and expand InferencePoolReason values by @zetxqx in https://github.com/kubernetes-sigs/gateway-api-infe...

Contributors

aslakknutsen, robscott, and 27 other contributors

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What's Changed

Uh oh!

Uh oh!

Inference Gateway v1

What's Changed

Contributors

Uh oh!

a list of PRs that are cherry picked into RC4:

CRD updates:

performance issues fixed in pickers:

helm chart fix:

bug fix in prefix when no request id header is supplied by the gateway:

test flake fix, required for llm-d to use formal image of IGW:

Uh oh!

Uh oh!

Uh oh!

What's Changed

Contributors

Uh oh!

Uh oh!

Uh oh!

Overview

Major Highlights

What's Changed

Contributors

Uh oh!

Releases: kubernetes-sigs/gateway-api-inference-extension

v1.0.1

What's Changed

Uh oh!

v1.0.1-rc.1

Uh oh!

v1.0.0

Inference Gateway v1

What's Changed

Contributors

Uh oh!

v1.0.0-rc.4

a list of PRs that are cherry picked into RC4:

CRD updates:

performance issues fixed in pickers:

helm chart fix:

bug fix in prefix when no request id header is supplied by the gateway:

test flake fix, required for llm-d to use formal image of IGW:

Uh oh!

v1.0.0-rc.3

Uh oh!

v1.0.0-rc.2

Uh oh!

v1.0.0-rc.1

What's Changed

Contributors

Uh oh!

v0.5.1

Uh oh!

v0.5.1-rc.1

Uh oh!

v0.5.0

Overview

Major Highlights

What's Changed

Contributors

Uh oh!