Releases: kubernetes-sigs/gateway-api-inference-extension
v0.5.1
v0.5.1-rc.1
This patch fix is intended to resolve a few bug fixes. Justification & breakdown here: #1215
v0.5.0
Overview
Major Highlights
-
Conformance Tests: Validate your controller’s behavior with end-to-end tests covering InferencePool,
InferenceModel, HTTPRoute, and more. -
New Config API: A new Config API which allows the configuration of plugins through a config file without touching core code.
-
Helm Charts: helm chart update to support the reuse of Config API easily.
What's Changed
- Add scripts for running e2es by @keithmattix in #978
- fix: istio example destination rule by @EyalPazz in #970
- Bump Istio tag reference by @keithmattix in #974
- adds New functions to the scorers for consistency by @nirrozenbaum in #975
- feat(conformance): enable multiple endpoints in header based filter for EPP's conformance testing. by @zetxqx in #964
- e2e makefile comment fix by @nirrozenbaum in #976
- API: Adds 5xx Status Code for Invalid ExtRef by @danehans in #991
- feat(conformance): Add test for invalid EPP service reference by @SinaChavoshi in #959
- moved the creation of the context to main.go. by @nirrozenbaum in #995
- doc: fix dead links by @caozhuozi in #989
- feat: add health check for epp cluster by @zhengkezhou1 in #966
- test: gRPC server unit tests and utilities for further end-to-end tests by @irar2 in #820
- Update dynamic-lora-sidecar to expose metrics to track loaded adapters by @shotarok in #980
- refactor: Replace prefix cache structure with golang-lru by @kfirtoledo in #928
- feat(conformance): Add HTTPRouteMultipleRulesDifferentPools test by @SinaChavoshi in #834
- feat: Load the SchedulerConfig from a configuration file/text and make it easier to add plugins by @shmuelk in #881
- adding pre-request plugin to requestcontrol layer by @nirrozenbaum in #1004
- feat(conformance): Add test execution instruction to the guide. by @SinaChavoshi in #878
- fix: Update bbr fqdn to use helm release namespace by @chewong in #1009
- feat(conformance): Add HTTPRoute port validation tests for InferencePool backends by @zetxqx in #911
- refactor(conformance): move some common resources to shared place and add EPP service to tests needed. by @zetxqx in #982
- fix(Conformance): Add namespace-(labels|annotations) flag parsing by @aslakknutsen in #984
- bump cpu deployment version by @nirrozenbaum in #1016
- fix: api doc typo InvalidExtnesionRef by @aslakknutsen in #1018
- Adds vLLM CPU and Sim Support to Release Script by @danehans in #1020
- Add Makefile to run unit tests of tools/dynamic-lora-sidecar locally by @shotarok in #1021
- profile handler ProcessResult returns additional return value by @nirrozenbaum in #1013
- cleanup after config api PR was merged by @nirrozenbaum in #1012
- Making inferenceModel optional by @kfswain in #1024
- Adding Design Principles by @robscott in #596
- Adding Nir as a maintainer! by @kfswain in #1026
- [Fix] Missing property "apiGroup" error by @yafengio in #1015
- API: Adds default status condition to InferencePool by @danehans in #830
- feat(conformance): Add EPP conformance test for Gateway routing by @zetxqx in #961
- update sim deployment tag to latest by @nirrozenbaum in #1041
- refactor: rename plugin.Name() => plugin.Type() by @elevran in #1038
- docs: update the Getting Started guide to use the latest CRDs by @kfirtoledo in #1045
- added cycle state to pick & process results in profile handler by @nirrozenbaum in #1040
- feat(conformance): Add HTTPRouteMultipleGatewaysDifferentPools test by @SinaChavoshi in #838
- feat(conformance) add EPP unavailable fail-open test by @zetxqx in #999
- Add APIs for the instantiated plugins to the EPP Handle by @shmuelk in #1039
- chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1050
- chore(deps): bump github.com/prometheus/common from 0.64.0 to 0.65.0 by @dependabot[bot] in #1051
- Only create LOCALBIN directory when it does not exist by @elevran in #1054
- remove datastore dependency from the scheduler by @nirrozenbaum in #1049
- add e2e test for epp metrics by @delavet in #938
- refactor(confromance) use common resources for
InferencePoolHTTPRoutePortValidation
test by @zetxqx in #1034 - Reintroduce Plugin.Name() by @elevran in #1057
- Extensible/Pluggable data layer proposal by @elevran in #1023
- Add subsetting logic for epp by @rlakhtakia in #981
- docs: added gke clean up instructions by @capri-xiyue in #1064
- feat(flowcontrol): Add Foundational Types and Architecture by @LukeAVanDrie in #997
- refactor: Allow export prefix SchedulingContextState for use across plugins by @kfirtoledo in #1063
- feat: Added a factory function for the DecisionTree filter by @shmuelk in #1053
- Adding pprof endpoints to metrics port by @kfswain in #1069
- version in README by @nirrozenbaum in #1072
- feat: Add a context.Context to the plugins.HAndle interface by @shmuelk in #1076
- Update model server protocol with prefix cache reuse by @liu-cong in #1077
- Update prefix plugin guide to use vllm as default to be consistent by @liu-cong in #1078
- refactor(conformance) merge similar utility functions. by @zetxqx in #1055
- fix(conformance): fix conformance setup issue by not relying on
suite.Setup
from gateway-api by @zetxqx in #1060 - e2e cleanup by @nirrozenbaum in #988
- fix: add wait after both httproute deletes for status to update by @aslakknutsen in #1056
- API: Refine ResolvedRefs condition for invalid ExtensionReference and expand InferencePoolReason values by @zetxqx in https://github.com/kubernetes-sigs/gateway-api-infe...
v0.5.0-rc.3
Overview
Major Highlights
-
Conformance Tests: Validate your controller’s behavior with end-to-end tests covering InferencePool,
InferenceModel, HTTPRoute, and more. -
New Config API: A new Config API which allows the configuration of plugins through a config file without touching core code.
-
Helm Charts: helm chart update to support the reuse of Config API easily.
What's Changed
- Add scripts for running e2es by @keithmattix in #978
- fix: istio example destination rule by @EyalPazz in #970
- Bump Istio tag reference by @keithmattix in #974
- adds New functions to the scorers for consistency by @nirrozenbaum in #975
- feat(conformance): enable multiple endpoints in header based filter for EPP's conformance testing. by @zetxqx in #964
- e2e makefile comment fix by @nirrozenbaum in #976
- API: Adds 5xx Status Code for Invalid ExtRef by @danehans in #991
- feat(conformance): Add test for invalid EPP service reference by @SinaChavoshi in #959
- moved the creation of the context to main.go. by @nirrozenbaum in #995
- doc: fix dead links by @caozhuozi in #989
- feat: add health check for epp cluster by @zhengkezhou1 in #966
- test: gRPC server unit tests and utilities for further end-to-end tests by @irar2 in #820
- Update dynamic-lora-sidecar to expose metrics to track loaded adapters by @shotarok in #980
- refactor: Replace prefix cache structure with golang-lru by @kfirtoledo in #928
- feat(conformance): Add HTTPRouteMultipleRulesDifferentPools test by @SinaChavoshi in #834
- feat: Load the SchedulerConfig from a configuration file/text and make it easier to add plugins by @shmuelk in #881
- adding pre-request plugin to requestcontrol layer by @nirrozenbaum in #1004
- feat(conformance): Add test execution instruction to the guide. by @SinaChavoshi in #878
- fix: Update bbr fqdn to use helm release namespace by @chewong in #1009
- feat(conformance): Add HTTPRoute port validation tests for InferencePool backends by @zetxqx in #911
- refactor(conformance): move some common resources to shared place and add EPP service to tests needed. by @zetxqx in #982
- fix(Conformance): Add namespace-(labels|annotations) flag parsing by @aslakknutsen in #984
- bump cpu deployment version by @nirrozenbaum in #1016
- fix: api doc typo InvalidExtnesionRef by @aslakknutsen in #1018
- Adds vLLM CPU and Sim Support to Release Script by @danehans in #1020
- Add Makefile to run unit tests of tools/dynamic-lora-sidecar locally by @shotarok in #1021
- profile handler ProcessResult returns additional return value by @nirrozenbaum in #1013
- cleanup after config api PR was merged by @nirrozenbaum in #1012
- Making inferenceModel optional by @kfswain in #1024
- Adding Design Principles by @robscott in #596
- Adding Nir as a maintainer! by @kfswain in #1026
- [Fix] Missing property "apiGroup" error by @yafengio in #1015
- API: Adds default status condition to InferencePool by @danehans in #830
- feat(conformance): Add EPP conformance test for Gateway routing by @zetxqx in #961
- update sim deployment tag to latest by @nirrozenbaum in #1041
- refactor: rename plugin.Name() => plugin.Type() by @elevran in #1038
- docs: update the Getting Started guide to use the latest CRDs by @kfirtoledo in #1045
- added cycle state to pick & process results in profile handler by @nirrozenbaum in #1040
- feat(conformance): Add HTTPRouteMultipleGatewaysDifferentPools test by @SinaChavoshi in #838
- feat(conformance) add EPP unavailable fail-open test by @zetxqx in #999
- Add APIs for the instantiated plugins to the EPP Handle by @shmuelk in #1039
- chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1050
- chore(deps): bump github.com/prometheus/common from 0.64.0 to 0.65.0 by @dependabot[bot] in #1051
- Only create LOCALBIN directory when it does not exist by @elevran in #1054
- remove datastore dependency from the scheduler by @nirrozenbaum in #1049
- add e2e test for epp metrics by @delavet in #938
- refactor(confromance) use common resources for
InferencePoolHTTPRoutePortValidation
test by @zetxqx in #1034 - Reintroduce Plugin.Name() by @elevran in #1057
- Extensible/Pluggable data layer proposal by @elevran in #1023
- Add subsetting logic for epp by @rlakhtakia in #981
- docs: added gke clean up instructions by @capri-xiyue in #1064
- feat(flowcontrol): Add Foundational Types and Architecture by @LukeAVanDrie in #997
- refactor: Allow export prefix SchedulingContextState for use across plugins by @kfirtoledo in #1063
- feat: Added a factory function for the DecisionTree filter by @shmuelk in #1053
- Adding pprof endpoints to metrics port by @kfswain in #1069
- version in README by @nirrozenbaum in #1072
- feat: Add a context.Context to the plugins.HAndle interface by @shmuelk in #1076
- Update model server protocol with prefix cache reuse by @liu-cong in #1077
- Update prefix plugin guide to use vllm as default to be consistent by @liu-cong in #1078
- refactor(conformance) merge similar utility functions. by @zetxqx in #1055
- fix(conformance): fix conformance setup issue by not relying on
suite.Setup
from gateway-api by @zetxqx in #1060 - e2e cleanup by @nirrozenbaum in #988
- fix: add wait after both httproute deletes for status to update by @aslakknutsen in #1056
- API: Refine ResolvedRefs condition for invalid ExtensionReference and expand InferencePoolReason values by @zetxqx in https://github.com/kubernetes-sigs/gateway-api-infe...
v0.5.0-rc.2
Overview
Major Highlights
-
Conformance Tests: Validate your controller’s behavior with end-to-end tests covering InferencePool,
InferenceModel, HTTPRoute, and more. -
New Config API: A new Config API which allows the configuration of plugins through a config file without touching core code.
-
Helm Charts: helm chart update to support the reuse of Config API easily.
What's Changed
- Add scripts for running e2es by @keithmattix in #978
- fix: istio example destination rule by @EyalPazz in #970
- Bump Istio tag reference by @keithmattix in #974
- adds New functions to the scorers for consistency by @nirrozenbaum in #975
- feat(conformance): enable multiple endpoints in header based filter for EPP's conformance testing. by @zetxqx in #964
- e2e makefile comment fix by @nirrozenbaum in #976
- API: Adds 5xx Status Code for Invalid ExtRef by @danehans in #991
- feat(conformance): Add test for invalid EPP service reference by @SinaChavoshi in #959
- moved the creation of the context to main.go. by @nirrozenbaum in #995
- doc: fix dead links by @caozhuozi in #989
- feat: add health check for epp cluster by @zhengkezhou1 in #966
- test: gRPC server unit tests and utilities for further end-to-end tests by @irar2 in #820
- Update dynamic-lora-sidecar to expose metrics to track loaded adapters by @shotarok in #980
- refactor: Replace prefix cache structure with golang-lru by @kfirtoledo in #928
- feat(conformance): Add HTTPRouteMultipleRulesDifferentPools test by @SinaChavoshi in #834
- feat: Load the SchedulerConfig from a configuration file/text and make it easier to add plugins by @shmuelk in #881
- adding pre-request plugin to requestcontrol layer by @nirrozenbaum in #1004
- feat(conformance): Add test execution instruction to the guide. by @SinaChavoshi in #878
- fix: Update bbr fqdn to use helm release namespace by @chewong in #1009
- feat(conformance): Add HTTPRoute port validation tests for InferencePool backends by @zetxqx in #911
- refactor(conformance): move some common resources to shared place and add EPP service to tests needed. by @zetxqx in #982
- fix(Conformance): Add namespace-(labels|annotations) flag parsing by @aslakknutsen in #984
- bump cpu deployment version by @nirrozenbaum in #1016
- fix: api doc typo InvalidExtnesionRef by @aslakknutsen in #1018
- Adds vLLM CPU and Sim Support to Release Script by @danehans in #1020
- Add Makefile to run unit tests of tools/dynamic-lora-sidecar locally by @shotarok in #1021
- profile handler ProcessResult returns additional return value by @nirrozenbaum in #1013
- cleanup after config api PR was merged by @nirrozenbaum in #1012
- Making inferenceModel optional by @kfswain in #1024
- Adding Design Principles by @robscott in #596
- Adding Nir as a maintainer! by @kfswain in #1026
- [Fix] Missing property "apiGroup" error by @yafengio in #1015
- API: Adds default status condition to InferencePool by @danehans in #830
- feat(conformance): Add EPP conformance test for Gateway routing by @zetxqx in #961
- update sim deployment tag to latest by @nirrozenbaum in #1041
- refactor: rename plugin.Name() => plugin.Type() by @elevran in #1038
- docs: update the Getting Started guide to use the latest CRDs by @kfirtoledo in #1045
- added cycle state to pick & process results in profile handler by @nirrozenbaum in #1040
- feat(conformance): Add HTTPRouteMultipleGatewaysDifferentPools test by @SinaChavoshi in #838
- feat(conformance) add EPP unavailable fail-open test by @zetxqx in #999
- Add APIs for the instantiated plugins to the EPP Handle by @shmuelk in #1039
- chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1050
- chore(deps): bump github.com/prometheus/common from 0.64.0 to 0.65.0 by @dependabot[bot] in #1051
- Only create LOCALBIN directory when it does not exist by @elevran in #1054
- remove datastore dependency from the scheduler by @nirrozenbaum in #1049
- add e2e test for epp metrics by @delavet in #938
- refactor(confromance) use common resources for
InferencePoolHTTPRoutePortValidation
test by @zetxqx in #1034 - Reintroduce Plugin.Name() by @elevran in #1057
- Extensible/Pluggable data layer proposal by @elevran in #1023
- Add subsetting logic for epp by @rlakhtakia in #981
- docs: added gke clean up instructions by @capri-xiyue in #1064
- feat(flowcontrol): Add Foundational Types and Architecture by @LukeAVanDrie in #997
- refactor: Allow export prefix SchedulingContextState for use across plugins by @kfirtoledo in #1063
- feat: Added a factory function for the DecisionTree filter by @shmuelk in #1053
- Adding pprof endpoints to metrics port by @kfswain in #1069
- version in README by @nirrozenbaum in #1072
- feat: Add a context.Context to the plugins.HAndle interface by @shmuelk in #1076
- Update model server protocol with prefix cache reuse by @liu-cong in #1077
- Update prefix plugin guide to use vllm as default to be consistent by @liu-cong in #1078
- refactor(conformance) merge similar utility functions. by @zetxqx in #1055
- fix(conformance): fix conformance setup issue by not relying on
suite.Setup
from gateway-api by @zetxqx in #1060 - e2e cleanup by @nirrozenbaum in #988
- fix: add wait after both httproute deletes for status to update by @aslakknutsen in #1056
- API: Refine ResolvedRefs condition for invalid ExtensionReference and expand InferencePoolReason values by @zetxqx in https://github.com/kubernetes-sigs/gateway-api-infe...
v0.5.0-rc.1
Overview
Major Highlights
-
Conformance Tests: Validate your controller’s behavior with end-to-end tests covering InferencePool,
InferenceModel, HTTPRoute, and more. -
New Config API: A new Config API which allows the configuration of plugins through a config file without touching core code.
What's Changed
- Add scripts for running e2es by @keithmattix in #978
- fix: istio example destination rule by @EyalPazz in #970
- Bump Istio tag reference by @keithmattix in #974
- adds New functions to the scorers for consistency by @nirrozenbaum in #975
- feat(conformance): enable multiple endpoints in header based filter for EPP's conformance testing. by @zetxqx in #964
- e2e makefile comment fix by @nirrozenbaum in #976
- API: Adds 5xx Status Code for Invalid ExtRef by @danehans in #991
- feat(conformance): Add test for invalid EPP service reference by @SinaChavoshi in #959
- moved the creation of the context to main.go. by @nirrozenbaum in #995
- doc: fix dead links by @caozhuozi in #989
- feat: add health check for epp cluster by @zhengkezhou1 in #966
- test: gRPC server unit tests and utilities for further end-to-end tests by @irar2 in #820
- Update dynamic-lora-sidecar to expose metrics to track loaded adapters by @shotarok in #980
- refactor: Replace prefix cache structure with golang-lru by @kfirtoledo in #928
- feat(conformance): Add HTTPRouteMultipleRulesDifferentPools test by @SinaChavoshi in #834
- feat: Load the SchedulerConfig from a configuration file/text and make it easier to add plugins by @shmuelk in #881
- adding pre-request plugin to requestcontrol layer by @nirrozenbaum in #1004
- feat(conformance): Add test execution instruction to the guide. by @SinaChavoshi in #878
- fix: Update bbr fqdn to use helm release namespace by @chewong in #1009
- feat(conformance): Add HTTPRoute port validation tests for InferencePool backends by @zetxqx in #911
- refactor(conformance): move some common resources to shared place and add EPP service to tests needed. by @zetxqx in #982
- fix(Conformance): Add namespace-(labels|annotations) flag parsing by @aslakknutsen in #984
- bump cpu deployment version by @nirrozenbaum in #1016
- fix: api doc typo InvalidExtnesionRef by @aslakknutsen in #1018
- Adds vLLM CPU and Sim Support to Release Script by @danehans in #1020
- Add Makefile to run unit tests of tools/dynamic-lora-sidecar locally by @shotarok in #1021
- profile handler ProcessResult returns additional return value by @nirrozenbaum in #1013
- cleanup after config api PR was merged by @nirrozenbaum in #1012
- Making inferenceModel optional by @kfswain in #1024
- Adding Design Principles by @robscott in #596
- Adding Nir as a maintainer! by @kfswain in #1026
- [Fix] Missing property "apiGroup" error by @yafengio in #1015
- API: Adds default status condition to InferencePool by @danehans in #830
- feat(conformance): Add EPP conformance test for Gateway routing by @zetxqx in #961
- update sim deployment tag to latest by @nirrozenbaum in #1041
- refactor: rename plugin.Name() => plugin.Type() by @elevran in #1038
- docs: update the Getting Started guide to use the latest CRDs by @kfirtoledo in #1045
- added cycle state to pick & process results in profile handler by @nirrozenbaum in #1040
- feat(conformance): Add HTTPRouteMultipleGatewaysDifferentPools test by @SinaChavoshi in #838
- feat(conformance) add EPP unavailable fail-open test by @zetxqx in #999
- Add APIs for the instantiated plugins to the EPP Handle by @shmuelk in #1039
- chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1050
- chore(deps): bump github.com/prometheus/common from 0.64.0 to 0.65.0 by @dependabot[bot] in #1051
- Only create LOCALBIN directory when it does not exist by @elevran in #1054
- remove datastore dependency from the scheduler by @nirrozenbaum in #1049
- add e2e test for epp metrics by @delavet in #938
- refactor(confromance) use common resources for
InferencePoolHTTPRoutePortValidation
test by @zetxqx in #1034 - Reintroduce Plugin.Name() by @elevran in #1057
- Extensible/Pluggable data layer proposal by @elevran in #1023
- Add subsetting logic for epp by @rlakhtakia in #981
- docs: added gke clean up instructions by @capri-xiyue in #1064
- feat(flowcontrol): Add Foundational Types and Architecture by @LukeAVanDrie in #997
- refactor: Allow export prefix SchedulingContextState for use across plugins by @kfirtoledo in #1063
- feat: Added a factory function for the DecisionTree filter by @shmuelk in #1053
- Adding pprof endpoints to metrics port by @kfswain in #1069
- version in README by @nirrozenbaum in #1072
- feat: Add a context.Context to the plugins.HAndle interface by @shmuelk in #1076
- Update model server protocol with prefix cache reuse by @liu-cong in #1077
- Update prefix plugin guide to use vllm as default to be consistent by @liu-cong in #1078
- refactor(conformance) merge similar utility functions. by @zetxqx in #1055
- fix(conformance): fix conformance setup issue by not relying on
suite.Setup
from gateway-api by @zetxqx in #1060 - e2e cleanup by @nirrozenbaum in #988
- fix: add wait after both httproute deletes for status to update by @aslakknutsen in #1056
- API: Refine ResolvedRefs condition for invalid ExtensionReference and expand InferencePoolReason values by @zetxqx in #1070
- Tidy up Data Layer documentation by @elevran in https:...
v0.4.0
Overview
We are thrilled to announce the v0.4.0 release—our biggest update yet! This version brings powerful new Endpoint Picker (EPP) scheduler capabilities, performance improvements, and initial Gateway conformance tests.
Major Highlights
-
Modular Endpoint Picker (EPP) Scheduler: A kube-scheduler–style plugin API lets you build custom routing logic,
filter and score backends, or swap in new picker strategies without touching core code. -
Prefix-Cache-Aware Routing: Dramatically lower tail latency by routing requests based on cached network prefixes,
improving response times under load. -
Richer Metrics: Gain deeper insights with new metrics including:
- NTPOT (Normalized Time Per Output Token)
- Scheduler latency
- Per-pod queue depth
- Build and version info
-
Optional vLLM Simulator Backend: Spin up a lightweight simulator for local development and testing—no real model
servers required. -
Initial Conformance Tests: Validate your controller’s behavior with end-to-end tests covering InferencePool,
InferenceModel, HTTPRoute, and more.
What's Changed
- Adding larger logo by @robscott in #630
- Minor fixes to the user guide by @nicolexin in #633
- Add istio to implementations.md by @LiorLieberman in #631
- Update e2e test config by @kfswain in #636
- Fix parsing issue in BBR helm by @rramkumar1 in #638
- fixed bug - sleep is expecting to get a string by @nirrozenbaum in #618
- #632 Add favicon for doc site by @Conor0Callaghan in #634
- Move integration test utils to central package by @rramkumar1 in #626
- BBR readme fixes by @rramkumar1 in #640
- Add integration tests to exercise streaming mode in BBR by @rramkumar1 in #627
- Adding 2 new reviewers to the reviewers alias by @kfswain in #644
- Add initial implementer's guide by @nicolexin in #635
- Update BBR istio.yaml to use FULL_DUPLEX_STREAMED mode by @rramkumar1 in #629
- Docs: Bumps Kgateway to v2.0.0 by @danehans in #646
- remove deprecated v1alpha2.AddToScheme and use v1alpha2.Install instead by @nirrozenbaum in #649
- removed time.sleep and using ticker instead by @nirrozenbaum in #648
- update release version in README by @nirrozenbaum in #653
- fix some issues in e2e tests by @nirrozenbaum in #621
- Refactor scheduler to make it more readable by @liu-cong in #645
- Getting started docs version bump by @SachinVarghese in #654
- expose "Normalized Time Per Output Token" (NTPOT) metric by @kaushikmitr in #643
- Bump github.com/onsi/ginkgo/v2 from 2.23.3 to 2.23.4 by @dependabot in #657
- Bump google.golang.org/grpc from 1.71.0 to 1.71.1 by @dependabot in #658
- Fix links and description in implementations.md by @xiaolin593 in #650
- fix manifests and description in the user guides by @cr7258 in #652
- Bump github.com/onsi/gomega from 1.36.3 to 1.37.0 by @dependabot in #659
- adjust the gpu deployment to increase max batch size by @ahg-g in #642
- Cleaning up config pkg by @ahg-g in #663
- Rename pkg/body-based-routing to pkg/bbr by @rramkumar1 in #664
- deploy: Enable logging for GKE gateway by default by @smarterclayton in #666
- moved IsPodReady func to podutils by @nirrozenbaum in #662
- removed double loop on docs in hermetic test by @nirrozenbaum in #668
- fix bbr dockerfile that was broken in PR #664 by @nirrozenbaum in #669
- Use dedicated namespace for e2e test code by @rramkumar1 in #661
- cleaning up inferencePool helm docs by @ahg-g in #665
- move inf model IsCritial func out of datastore by @nirrozenbaum in #670
- Consolidating down to FULL_DUPLEX_STREAMED supported ext-proc server by @kfswain in #672
- Document model server compatibility and config options by @liu-cong in #537
- Bump github.com/prometheus/client_model from 0.6.1 to 0.6.2 by @dependabot in #687
- Bump github.com/prometheus/client_golang from 1.21.1 to 1.22.0 by @dependabot in #688
- added badges to README by @nirrozenbaum in #682
- Bump sigs.k8s.io/structured-merge-diff/v4 from 4.6.0 to 4.7.0 by @dependabot in #686
- docs(gateways): fix Envoy AI Gateway link by @maxbrunet in #700
- minor changes in few places by @nirrozenbaum in #702
- Docs: Adds Kgateway Cleanup to Quickstart by @danehans in #701
- using namespaced name by @nirrozenbaum in #707
- EPP Architecture proposal by @kfswain in #683
- removed unused Fake struct by @nirrozenbaum in #723
- epp: return correct response for trailers by @howardjohn in #726
- Refactor scheduler to run plugins by @liu-cong in #677
- Complete the InferencePool documentation by @nicolexin in #673
- reduce log level in metrics logger not to trash the log by @nirrozenbaum in #708
- few updates in datastore by @nirrozenbaum in #713
- scheduler restructuring by @nirrozenbaum in #730
- filter irrelevant pods in pod controller by @nayihz in #696
- EPP: Update GetRandomPod() to return nil if no pods exist by @danehans in #731
- Move filter and scorer plugins registration to a separate file by @mayabar in #729
- Update issue templates by @kfswain in #738
- docs: add concepts and definitions to README.md by @shaneutt in #734
- Add unit tests for pod APIs under pkg/datastore by @rlakhtakia in #712
- added a target dedicated for running unit-test only by @nirrozenbaum in #739
- Updating proposal directories to match their PR number by @kfswain in #741
- Fixing errors in new template & disabling the default blank template by @kfswain in #742
- fixed broken link to implementations by @nirrozenbaum in https://githu...
v0.4.0-rc.1
TL;DR
- We have made major refactor to the EPP, allowing for a more modular and maintainable system.
- As a part of this overall, we have implemented a pluggable, extendable scheduler system. Allowing users to create their own custom, sophisticated routing logic
- We have also included native support for Prefix Cache Aware Routing
What's Changed
- Adding larger logo by @robscott in #630
- Minor fixes to the user guide by @nicolexin in #633
- Add istio to implementations.md by @LiorLieberman in #631
- Update e2e test config by @kfswain in #636
- Fix parsing issue in BBR helm by @rramkumar1 in #638
- fixed bug - sleep is expecting to get a string by @nirrozenbaum in #618
- #632 Add favicon for doc site by @Conor0Callaghan in #634
- Move integration test utils to central package by @rramkumar1 in #626
- BBR readme fixes by @rramkumar1 in #640
- Add integration tests to exercise streaming mode in BBR by @rramkumar1 in #627
- Adding 2 new reviewers to the reviewers alias by @kfswain in #644
- Add initial implementer's guide by @nicolexin in #635
- Update BBR istio.yaml to use FULL_DUPLEX_STREAMED mode by @rramkumar1 in #629
- Docs: Bumps Kgateway to v2.0.0 by @danehans in #646
- remove deprecated v1alpha2.AddToScheme and use v1alpha2.Install instead by @nirrozenbaum in #649
- removed time.sleep and using ticker instead by @nirrozenbaum in #648
- update release version in README by @nirrozenbaum in #653
- fix some issues in e2e tests by @nirrozenbaum in #621
- Refactor scheduler to make it more readable by @liu-cong in #645
- Getting started docs version bump by @SachinVarghese in #654
- expose "Normalized Time Per Output Token" (NTPOT) metric by @kaushikmitr in #643
- Bump github.com/onsi/ginkgo/v2 from 2.23.3 to 2.23.4 by @dependabot in #657
- Bump google.golang.org/grpc from 1.71.0 to 1.71.1 by @dependabot in #658
- Fix links and description in implementations.md by @xiaolin593 in #650
- fix manifests and description in the user guides by @cr7258 in #652
- Bump github.com/onsi/gomega from 1.36.3 to 1.37.0 by @dependabot in #659
- adjust the gpu deployment to increase max batch size by @ahg-g in #642
- Cleaning up config pkg by @ahg-g in #663
- Rename pkg/body-based-routing to pkg/bbr by @rramkumar1 in #664
- deploy: Enable logging for GKE gateway by default by @smarterclayton in #666
- moved IsPodReady func to podutils by @nirrozenbaum in #662
- removed double loop on docs in hermetic test by @nirrozenbaum in #668
- fix bbr dockerfile that was broken in PR #664 by @nirrozenbaum in #669
- Use dedicated namespace for e2e test code by @rramkumar1 in #661
- cleaning up inferencePool helm docs by @ahg-g in #665
- move inf model IsCritial func out of datastore by @nirrozenbaum in #670
- Consolidating down to FULL_DUPLEX_STREAMED supported ext-proc server by @kfswain in #672
- Document model server compatibility and config options by @liu-cong in #537
- Bump github.com/prometheus/client_model from 0.6.1 to 0.6.2 by @dependabot in #687
- Bump github.com/prometheus/client_golang from 1.21.1 to 1.22.0 by @dependabot in #688
- added badges to README by @nirrozenbaum in #682
- Bump sigs.k8s.io/structured-merge-diff/v4 from 4.6.0 to 4.7.0 by @dependabot in #686
- docs(gateways): fix Envoy AI Gateway link by @maxbrunet in #700
- minor changes in few places by @nirrozenbaum in #702
- Docs: Adds Kgateway Cleanup to Quickstart by @danehans in #701
- using namespaced name by @nirrozenbaum in #707
- EPP Architecture proposal by @kfswain in #683
- removed unused Fake struct by @nirrozenbaum in #723
- epp: return correct response for trailers by @howardjohn in #726
- Refactor scheduler to run plugins by @liu-cong in #677
- Complete the InferencePool documentation by @nicolexin in #673
- reduce log level in metrics logger not to trash the log by @nirrozenbaum in #708
- few updates in datastore by @nirrozenbaum in #713
- scheduler restructuring by @nirrozenbaum in #730
- filter irrelevant pods in pod controller by @nayihz in #696
- EPP: Update GetRandomPod() to return nil if no pods exist by @danehans in #731
- Move filter and scorer plugins registration to a separate file by @mayabar in #729
- Update issue templates by @kfswain in #738
- docs: add concepts and definitions to README.md by @shaneutt in #734
- Add unit tests for pod APIs under pkg/datastore by @rlakhtakia in #712
- added a target dedicated for running unit-test only by @nirrozenbaum in #739
- Updating proposal directories to match their PR number by @kfswain in #741
- Fixing errors in new template & disabling the default blank template by @kfswain in #742
- fixed broken link to implementations by @nirrozenbaum in #750
- Weighted scorers by @nirrozenbaum in #737
- add max score picker by @nirrozenbaum in #752
- Add GetEnvString helper function by @liu-cong in #758
- Bump the kubernetes group with 6 updates by @dependabot in #754
- extract pod representation from backend/metrics to backend by @nirrozenbaum in #751
- Request for adding Alibaba Cloud Container Service for Kubernetes...
v0.3.0
tl;dr
- FULL_DUPLEX_STREAMED is on by default
- We have helm charts published for InferencePool
- Many smaller polish items resolved
What's Changed
- Add the base model of the cpu vllm sample app to InferenceModel.yaml by @liu-cong in #481
- Fix: Updates Docs for Sidecar Requirements by @danehans in #484
- Switch default serving and health check ports for bbr by @rramkumar1 in #487
- Fix: e2e test dir and manifest naming by @danehans in #488
- Amend the endpoint picker protocol to support fallbacks and subsetting by @ahg-g in #445
- Update Makefile for BBR to ensure all proper tags are added by @rramkumar1 in #490
- Improve response handling issues. by @kfswain in #494
- Add metrics for BBR extension by @rramkumar1 in #468
- [Metrics] Add vLLM streaming support for metrics by @JeffLuoo in #329
- added support for testing cpu example in e2e tests by @nirrozenbaum in #485
- Redesign EPP Metrics Pipeline to be Model Server Agnostic by @BenjaminBraunDev in #461
- Update GO version to 1.24 by @BenjaminBraunDev in #501
- Fixing image build and adding image building to test runs by @kfswain in #502
- Create inference model/pool objects in memory instead of reading them files by @ahg-g in #505
- Refactor the integration tests setup by @ahg-g in #506
- fix log line by @ahg-g in #509
- update release version by @nirrozenbaum in #512
- Add nil option for metric_spec to specify metrics to not be scraped. by @BenjaminBraunDev in #503
- switch to using formal vllm-cpu image by @nirrozenbaum in #511
- cleanup logging by @kfswain in #514
- Rename ext_proc.yaml to inferencepool.yaml by @ahg-g in #515
- Bump the kubernetes group with 6 updates by @dependabot in #520
- Update extension-policy to match the new epp service name by @ahg-g in #522
- Bump github.com/prometheus/common from 0.62.0 to 0.63.0 by @dependabot in #519
- Refactor beforeSuite in integration tests by @ahg-g in #508
- Split the extension policy since it is envoy specific by @ahg-g in #524
- Docs: Uses tabs for quickstart model server options by @danehans in #527
- Add instructions to run benchmarks by @liu-cong in #480
- add helm template by @Kuromesi in #416
- bump vllm-cpu image to latest by @nirrozenbaum in #530
- removed hf token from cpu based example by @nirrozenbaum in #464
- Bump golang.org/x/net from 0.35.0 to 0.36.0 by @dependabot in #529
- Move benchmark under tools by @liu-cong in #534
- fixed rbac in helm chart by @ahg-g in #531
- Support full duplex streaming in body-based routing extension by @rramkumar1 in #463
- Simplifying EPP-side buffer by @kfswain in #538
- integration test stability improvements by @kfswain in #541
- Add inferencepool chart push mechanics by @ahg-g in #540
- Updated the image used for cloudbuild by @ahg-g in #542
- setting gotoolchain to auto by @ahg-g in #543
- Simplify body streaming for BBR by @rramkumar1 in #544
- Bug fix: Initialize RequestReceivedTimestamp by @liu-cong in #539
- [Metrics] Handle vLLM streaming response in streaming server by @JeffLuoo in #518
- Add some more unit tests for BBR by @rramkumar1 in #545
- Tag the main version of the helm chart with v0 by @ahg-g in #547
- Default to streaming mode by @ahg-g in #552
- Initial helm chart for bbr by @rramkumar1 in #546
- Add makefile configs for bbr helm chart by @rramkumar1 in #553
- Adding deprecation notice of BUFFERED mode on patch policy. by @kfswain in #560
- Allow bodyless requests to passthrough EPP by @kfswain in #555
- remove controller-runtime dependency from API by @kfswain in #565
- Swapping out flow image by @kfswain in #562
- Update boilerplate template by @kfswain in #566
- Allow partial metric updates by @liu-cong in #561
- Removing unsafe lib by switching to atomic.Pointer by @kfswain in #567
- Bump google.golang.org/protobuf from 1.36.5 to 1.36.6 by @dependabot in #568
- Bump github.com/onsi/gomega from 1.36.2 to 1.36.3 by @dependabot in #569
- Bump sigs.k8s.io/controller-runtime from 0.20.3 to 0.20.4 by @dependabot in #570
- Configure the vllm deployment with best practices for startup by @smarterclayton in #550
- Configure gpu-deployment.yaml to force vLLM v1 with LoRA by @smarterclayton in #573
- Cleanup logging in the request scheduling path by @ahg-g in #583
- minor update to Makefile by @nirrozenbaum in #588
- Adding printer columns to inference model by @kfswain in #574
- Add provider-specific manifests for BBR helm chart by @rramkumar1 in #585
- helm-improvements by @LiorLieberman in #590
- Setting zap to emit logs as JSON in the deployment. by @kfswain in #591
- Updating llama 2 7b to llama 3.1 8b Instruct and adding new LoRA adapters by @kfswain in #578
- Renaming resources to better mirror how names are expected to be used by @kfswain in #592
- update algorithm parameters from env variables by @kaushikmitr in #580
- update benchmarking guide with latest results wi...
v0.3.0-rc.1
What's Changed
- Add the base model of the cpu vllm sample app to InferenceModel.yaml by @liu-cong in #481
- Fix: Updates Docs for Sidecar Requirements by @danehans in #484
- Switch default serving and health check ports for bbr by @rramkumar1 in #487
- Fix: e2e test dir and manifest naming by @danehans in #488
- Amend the endpoint picker protocol to support fallbacks and subsetting by @ahg-g in #445
- Update Makefile for BBR to ensure all proper tags are added by @rramkumar1 in #490
- Improve response handling issues. by @kfswain in #494
- Add metrics for BBR extension by @rramkumar1 in #468
- [Metrics] Add vLLM streaming support for metrics by @JeffLuoo in #329
- added support for testing cpu example in e2e tests by @nirrozenbaum in #485
- Redesign EPP Metrics Pipeline to be Model Server Agnostic by @BenjaminBraunDev in #461
- Update GO version to 1.24 by @BenjaminBraunDev in #501
- Fixing image build and adding image building to test runs by @kfswain in #502
- Create inference model/pool objects in memory instead of reading them files by @ahg-g in #505
- Refactor the integration tests setup by @ahg-g in #506
- fix log line by @ahg-g in #509
- update release version by @nirrozenbaum in #512
- Add nil option for metric_spec to specify metrics to not be scraped. by @BenjaminBraunDev in #503
- switch to using formal vllm-cpu image by @nirrozenbaum in #511
- cleanup logging by @kfswain in #514
- Rename ext_proc.yaml to inferencepool.yaml by @ahg-g in #515
- Bump the kubernetes group with 6 updates by @dependabot in #520
- Update extension-policy to match the new epp service name by @ahg-g in #522
- Bump github.com/prometheus/common from 0.62.0 to 0.63.0 by @dependabot in #519
- Refactor beforeSuite in integration tests by @ahg-g in #508
- Split the extension policy since it is envoy specific by @ahg-g in #524
- Docs: Uses tabs for quickstart model server options by @danehans in #527
- Add instructions to run benchmarks by @liu-cong in #480
- add helm template by @Kuromesi in #416
- bump vllm-cpu image to latest by @nirrozenbaum in #530
- removed hf token from cpu based example by @nirrozenbaum in #464
- Bump golang.org/x/net from 0.35.0 to 0.36.0 by @dependabot in #529
- Move benchmark under tools by @liu-cong in #534
- fixed rbac in helm chart by @ahg-g in #531
- Support full duplex streaming in body-based routing extension by @rramkumar1 in #463
- Simplifying EPP-side buffer by @kfswain in #538
- integration test stability improvements by @kfswain in #541
- Add inferencepool chart push mechanics by @ahg-g in #540
- Updated the image used for cloudbuild by @ahg-g in #542
- setting gotoolchain to auto by @ahg-g in #543
- Simplify body streaming for BBR by @rramkumar1 in #544
- Bug fix: Initialize RequestReceivedTimestamp by @liu-cong in #539
- [Metrics] Handle vLLM streaming response in streaming server by @JeffLuoo in #518
- Add some more unit tests for BBR by @rramkumar1 in #545
- Tag the main version of the helm chart with v0 by @ahg-g in #547
- Default to streaming mode by @ahg-g in #552
- Initial helm chart for bbr by @rramkumar1 in #546
- Add makefile configs for bbr helm chart by @rramkumar1 in #553
- Adding deprecation notice of BUFFERED mode on patch policy. by @kfswain in #560
- Allow bodyless requests to passthrough EPP by @kfswain in #555
- remove controller-runtime dependency from API by @kfswain in #565
- Swapping out flow image by @kfswain in #562
- Update boilerplate template by @kfswain in #566
- Allow partial metric updates by @liu-cong in #561
- Removing unsafe lib by switching to atomic.Pointer by @kfswain in #567
- Bump google.golang.org/protobuf from 1.36.5 to 1.36.6 by @dependabot in #568
- Bump github.com/onsi/gomega from 1.36.2 to 1.36.3 by @dependabot in #569
- Bump sigs.k8s.io/controller-runtime from 0.20.3 to 0.20.4 by @dependabot in #570
- Configure the vllm deployment with best practices for startup by @smarterclayton in #550
- Configure gpu-deployment.yaml to force vLLM v1 with LoRA by @smarterclayton in #573
- Cleanup logging in the request scheduling path by @ahg-g in #583
- minor update to Makefile by @nirrozenbaum in #588
- Adding printer columns to inference model by @kfswain in #574
- Add provider-specific manifests for BBR helm chart by @rramkumar1 in #585
- helm-improvements by @LiorLieberman in #590
- Setting zap to emit logs as JSON in the deployment. by @kfswain in #591
- Updating llama 2 7b to llama 3.1 8b Instruct and adding new LoRA adapters by @kfswain in #578
- Renaming resources to better mirror how names are expected to be used by @kfswain in #592
- update algorithm parameters from env variables by @kaushikmitr in #580
- update benchmarking guide with latest results with vllm v1 by @kaushikmitr in #559
- Added provider support to InferencePool helm chart by @ahg-g in #595
- make dynamic lora sidecar health check parameters configurable and for...