v1.0.0
Inference Gateway v1
This release marks the v1 of Inference Gateway, and with it the promotion of the InferencePool
CRD to v1.
We're excited to announce our v1 release of Inference Gateway! A huge thank you to our contributors, gateway implementers, and downstream community for helping to shape IGW into something we are proud of.
If you're new: Please take a look at our guide to get started! Or learn more about IGW here: https://gateway-api-inference-extension.sigs.k8s.io/
There is still much to do and more enhancements to come. Namely:
- SLO-based predictive scheduling
- Flow Control for multi-tenancy support
- An improved pluggable Data Layer system
- Multi-modal support
- APIs to support meeting multiple different SLOs in a single InferencePool
We look forward to what's next in the Inference space and looking forward to continuing to grow with it.
Onwards!
Cheers,
The IGW maintainer team
What's Changed
- chore(deps): bump golang.org/x/sync from 0.15.0 to 0.16.0 by @dependabot[bot] in #1160
- feat: Introduce pluggable queue framework by @LukeAVanDrie in #1138
- removed USE_STREAMING env var from conformance + tests by @nirrozenbaum in #1157
- Conformance: Fixes the EPP ConfigMap Namespace by @danehans in #1166
- feat: Introduce pluggable intra-flow dispatch policy framework by @LukeAVanDrie in #1139
- Add support for plugin configuration in the InferencePool helm chart by @ahg-g in #1168
- feat(epp): use kebab-cased flags for epp by @Xunzhuo in #1177
- chore: remove duplicated import for code polish by @Xunzhuo in #1179
- Add documentation for the new Configuration via text feature by @shmuelk in #1110
- fix: set epp image tag when releasing by @Xunzhuo in #1182
- feat: Introduce pluggable inter-flow dispatch policy framework by @LukeAVanDrie in #1167
- Update istio release by @LiorLieberman in #1186
- test: kubectl-validate manifests in presubmit by @chewong in #1083
- Delete the unnecessary Marshal of processRequestBody by @whzghb in #1127
- feat(flowcontrol): Introduce ManagedQueue and Service Contracts by @LukeAVanDrie in #1174
- (feat) initial types and interfaces for pluggable data layer by @elevran in #1154
- Fix a regression in prefix plugin which can cause data race by @liu-cong in #1188
- feat: generate crd with version annotation. by @zetxqx in #1134
- chore: update vllm deployment tag to latest by @Xunzhuo in #1184
- moved build details to version package by @nirrozenbaum in #1185
- Add an "Implementing a Compatible Data Plane" section to the implementers guide by @AndresGuedez in #1143
- feat(flowcontrol): Implement registry shard by @LukeAVanDrie in #1187
- feat(flowcontrol): refine types and consolidate docs by @LukeAVanDrie in #1191
- docs: update to use kebab-cased flags changed at #1177 by @nekomeowww in #1193
- added graceful shutdown when scheduler config is not initialized by @nirrozenbaum in #1198
- feat: move x-k8s to apix and add v1 InferencePool to api/v1 by @capri-xiyue in #1116
- feat: Change epp and conformance to use v1 type InferencePool by @capri-xiyue in #1118
- chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1200
- Enhanced InferencePool Chart Configurability by @vMaroon in #1211
- refactor(flowcontrol): Enable behavioral mocking by @LukeAVanDrie in #1202
- random endpoint pick on tie break in max score picker by @nirrozenbaum in #1205
- removed cmd/registry file by @nirrozenbaum in #1206
- Support scraping metrics from target running with TLS by @pierDipi in #1190
- gke-gateway v0.5.0 conformance test report 9/9 by @zetxqx in #1005
- added join slack badge to readme by @nirrozenbaum in #1218
- chore: 🔨 Use the v0.3.0 llm-d-inference-sim image tag. by @yafengio in #1140
- style: ✨ optimize import order and more readable. by @yafengio in #1220
- Remove TODO stubs from website by @sats-23 in #1221
- docs: update whole repo to v1 inferencepool by @capri-xiyue in #1213
- release issue template: updated the tag command to include the -s for signing the tag by @nirrozenbaum in #1196
- fix try it out section in quickstart by @nirrozenbaum in #1197
- Do not log potentially sensitive data below DEBUG log level by @pierDipi in #1192
- Update index.md with gateway-inference-extension slack by @LiorLieberman in #1225
- Add fallback logic to support multiple endpoints by @rlakhtakia in #1122
- chore: 🔨 add fmt-imports tool for import order. by @yafengio in #1228
- fix: missing permission to list inference.networking.k8s.io/v1/inferencepool by @nekomeowww in #1230
- fix: Make test iter deterministic to fix flake by @LukeAVanDrie in #1231
- feat(flowcontrol): Implement ShardProcessor engine by @LukeAVanDrie in #1203
- Add a set of configuration defaults by @shmuelk in #1223
- Proposing the successor to the InferenceModel API by @kfswain in #1199
- cleanup of unused fields and functions by @nirrozenbaum in #1233
- chore: update CRD BundleVersion to main-dev by @zetxqx in #1216
- Change String() to accept a value reciever. by @elevran in #1239
- renamed kvcache-scorer to kvcache-utilization-scorer by @nirrozenbaum in #1238
- Add unit tests by @elevran in #1195
- test-report: istio 1.28-alpha v0.4.0 & v0.5.0 report 9/9 by @aslakknutsen in #1102
- added scheduler config logging on bootstrap by @nirrozenbaum in #1247
- fix: updated to v1 inferencepool in manifests by @capri-xiyue in #1248
- chore(deps): bump github.com/onsi/gomega from 1.37.0 to 1.38.0 by @dependabot[bot] in #1253
- chore(deps): bump sigs.k8s.io/yaml from 1.5.0 to 1.6.0 by @dependabot[bot] in #1251
- chore(deps): bump google.golang.org/grpc from 1.73.0 to 1.74.2 by @dependabot[bot] in #1252
- Update the Endpoint Picker Protocol with a new metadata field that communicates status associated with picked endpoints by @AndresGuedez in #1226
- chore(deps): bump sigs.k8s.io/controller-tools from 0.17.3 to 0.18.0 by @dependabot[bot] in #1254
- Update golangci lint to v2.x by @elevran in #1256
- Add nightly benchmarking documentation by @kaushikmitr in #1234
- normalize score to make sure it is always in the range of [0,1] by @nirrozenbaum in #1236
- updated metrics and logging for plugins by @nirrozenbaum in #1235
- fix(flowcontrol): Prevent panic on nil item during shard shutdown by @LukeAVanDrie in #1257
- chore(deps): bump github.com/elastic/crd-ref-docs from 0.1.0 to 0.2.0 by @dependabot[bot] in #1250
- cleanup of config from scheduling package by @nirrozenbaum in #1263
- Add support for multi platform image by @adarshagrawal38 in #1010
- epp: add more error integration test cases by @zhengkezhou1 in #1074
- fix: split EPP RBAC into cluster and namespaced scoped permission by @chewong in #1071
- Renaming InferenceModel to InferenceObjectives by @kfswain in #1255
- cleanup: refactor PodList calls to prepare for making pod metrics staleness configurable by @nayihz in #1046
- refactor(conformance): restructure tests and resources by @zetxqx in #1232
- Refactor(conformance): merge similar helper functions and make the condition check on inferencePool stricter by @zetxqx in #1261
- Update lora affinity to be a scorer. by @rlakhtakia in #1121
- fix image not build issue by @zetxqx in #1286
- Fixes for
make fmt-imports
by @elevran in #1287 - Docs: fixed InferenceObjective in docs by @capri-xiyue in #1284
- adding fairness-id header to be used in flow control by @kfswain in #1282
- update release template to include patch release by @nirrozenbaum in #1270
- Switch to the new default scheduler plugins in integration test by @liu-cong in #1291
- Revert "fix image not build issue" by @danehans in #1295
- Rename lora affinity plugin to lora-affinity-scorer to be consistent with others by @liu-cong in #1297
- revert #1010 to resume new main EPP image build by @zetxqx in #1300
- chore(deps): bump github.com/prometheus/client_golang from 1.22.0 to 1.23.0 by @dependabot[bot] in #1303
- fix(conformance): Reduce flakiness by using service selector modification in
EppUnAvailableFailOpen
by @zetxqx in #1265 - Promote plugin v2 config to be the default by @liu-cong in #1290
- feat: changed to support both v1 and v1a2 ip in EPP by @capri-xiyue in #1277
- Deprecate legacy filters by @liu-cong in #1305
- Updating tabs to spaces by @kfswain in #1311
- Pluggable metrics collection by @elevran in #1237
- Removing concurrency issues from Random Picker by @kfswain in #1314
- Docs: Bumps kgateway Version in Quickstart by @danehans in #1318
- Conformance: Adds Report for kgateway by @danehans in #1317
- Refactor the configuration defaults handling code by @shmuelk in #1294
- Select InferenceObjective by header by @kfswain in #1307
- refactor: 👷 clean unused randomGenerator parameter. by @yafengio in #1322
- Docs: Updates kgateway in Implementations Guide by @danehans in #1325
- remove protocol specifics from cmd-line flags by @nirrozenbaum in #1296
- Adds envoy-ai-gateway conformance report by @Xunzhuo in #1320
- Filter inference objectives based on inference pool group by @nicolexin in #1306
- Correcting title name by @kfswain in #1334
- added plugin state that can be used to share data between different extension point of a plugin by @nirrozenbaum in #1299
- Updating model name rewrite to be done by header key by @kfswain in #1331
- docs: Enable GIE in Istio installation command by @zhengkezhou1 in #1345
- InferenceObjective: Updates
PoolRef
Group Version by @danehans in #1346 - Modifying Criticality; from string, to int by @kfswain in #1348
- explicit return from test after nil check by @elevran in #1350
- fix: change the inferenceobjective to use v1 inferencepool by @capri-xiyue in #1338
- [bug] Fix datalayer Collector test flake by @elevran in #1342
- cleanup: fix typo and delete useless parameter by @nayihz in #1310
- refactor(flowcontrol): Adopt Composite FlowKey as Primary Identifier by @LukeAVanDrie in #1340
- Conformance: Adds InferenceObjective Request Header by @danehans in #1353
- test: enhance plugin state test coverage and readability by @yankay in #1349
- refactor(conformance): Relocate constants to minimize package dependencies by @zetxqx in #1355
- [BBR] perf: optimize model name extraction with selective JSON unmarshaling using struct tags by @pierDipi in #1359
- chore(deps): bump google.golang.org/protobuf from 1.36.6 to 1.36.7 by @dependabot[bot] in #1358
- Rename criticality to priority by @ahg-g in #1363
- Affiliate ready state with leader election with a flag ha-enable-leader-election by @yangligt2 in #1337
- cleanup: simplify endpointpickerconfig by @capri-xiyue in #1324
- Apply shedding upon saturation for priority below 0 by @ahg-g in #1361
- Add agentgateway as implementation by @howardjohn in #1321
- Upgrade the inferencePool selector to a struct from a map. by @zetxqx in #1330
- fix: make extensionRef to be optional by @capri-xiyue in #1365
- fix: make v1a2 remove inline and change conversion logic to match the convention by @capri-xiyue in #1368
- Pluggable data layer: transition
backend/metrics
to use type aliases fromdatalayer
package by @elevran in #1351 - docs: match Inference Extension CRDs by @zhengkezhou1 in #1377
- refactor: prevent double logging of NamespacedName across reconcilers by @chewong in #1379
- Enable kubeapilinter for GIE APIs by @rikatz in #1366
- feat: TargetPortNumber int32 to become TargetPorts []Port by @capri-xiyue in #1354
- Update doc on sglang models support. by @ReneeZhuGG in #1369
- feat: added shortname as alias by @capri-xiyue in #1375
- Tooling: Adds PR Template by @danehans in #1385
- docs: add prometheus + grafana deployment guide by @EyalPazz in #1019
- docs: how to debug integration tests by @zhengkezhou1 in #1067
- doc: update the release-quickstart.sh to include the image tag for lora-syncer by @Ruoyu-y in #1080
- Fixes Quickstart Script by @danehans in #1388
- gitignore macOS generated files. by @bexxmodd in #1378
- feat: added env var for pool group by @capri-xiyue in #1328
- updated env var example in helm chart by @nirrozenbaum in #1390
- remove InferenceModel section from EPP protocol by @nirrozenbaum in #1389
- fix: make port number become pointer by @capri-xiyue in #1400
- Fix: Handle empty string healthcheck as readiness healthcheck by @yangligt2 in #1402
- fix: updated listtype for targetports by @capri-xiyue in #1401
- chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1403
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.23.4 to 2.24.0 by @dependabot[bot] in #1404
- depreacte post cycle from scheduling framework by @nirrozenbaum in #1392
- Conformance: Updates InferencePoolInvalidEPPService Test Manifest by @danehans in #1417
- Conformance: Fixes Secondary InferencePool Selector by @danehans in #1419
- Fix Makefile syntax and version consistency by @ErikJiang in #1413
- Enable pluggable datalayer as experimental feature by @elevran in #1391
- fix(cleanup): change the naming to be endpointpickerref by @capri-xiyue in #1420
- fix api-ref-docs make target and generate result by @kfswain in #1422
- trace logging for scores per pod by @nirrozenbaum in #1395
- remove env vars for cmd-lind args by @nirrozenbaum in #1397
- Ensure EPP flags are configurable via Helm chart by @rahulgurnani in #1302
- feat(flowcontrol): Implement the FlowRegistry by @LukeAVanDrie in #1319
- fix(cleanup): change spec doc by @capri-xiyue in #1434
- Cleanup helm flags which have default values from values yaml by @rahulgurnani in #1429
- Fix epp startup error due to missing plugin config file flag by @liu-cong in #1439
- Update inference gateway public docs to use helm charts by @rahulgurnani in #1370
- Fix typo. by @zetxqx in #1437
- Final bits of v1.0 API cleanup by @robscott in #1441
- remove wg serving leads from owners file by @nirrozenbaum in #1445
- fix: first hash of prefix cache with same model name by @livelxw in #1341
- cleanup: final clean up of pointer by @capri-xiyue in #1444
- adding mutex and contention profile gathering by @kfswain in #1448
- remove empty condition list when doing v1 and v1alpha2 conversion. by @zetxqx in #1447
- Updates InferencePool API Conversion Code by @danehans in #1451
New Contributors
- @whzghb made their first contribution in #1127
- @AndresGuedez made their first contribution in #1143
- @nekomeowww made their first contribution in #1193
- @vMaroon made their first contribution in #1211
- @pierDipi made their first contribution in #1190
- @sats-23 made their first contribution in #1221
- @yangligt2 made their first contribution in #1337
- @rikatz made their first contribution in #1366
- @ReneeZhuGG made their first contribution in #1369
- @Ruoyu-y made their first contribution in #1080
- @bexxmodd made their first contribution in #1378
- @ErikJiang made their first contribution in #1413
- @livelxw made their first contribution in #1341
Full Changelog: v0.5.1...v1.0.0