Skip to content

Releases: kubernetes-sigs/gateway-api-inference-extension

v0.5.1

23 Jul 20:04
v0.5.1
Compare
Choose a tag to compare

This patch fix is intended to resolve a few bug fixes. Justification & breakdown here: #1215

v0.5.1-rc.1

22 Jul 23:20
v0.5.1-rc.1
Compare
Choose a tag to compare
v0.5.1-rc.1 Pre-release
Pre-release

This patch fix is intended to resolve a few bug fixes. Justification & breakdown here: #1215

v0.5.0

21 Jul 18:20
38577e6
Compare
Choose a tag to compare

Overview

Major Highlights

  • Conformance Tests: Validate your controller’s behavior with end-to-end tests covering InferencePool,
    InferenceModel, HTTPRoute, and more.

  • New Config API: A new Config API which allows the configuration of plugins through a config file without touching core code.

  • Helm Charts: helm chart update to support the reuse of Config API easily.

What's Changed

Read more

v0.5.0-rc.3

20 Jul 07:08
bbe9dda
Compare
Choose a tag to compare
v0.5.0-rc.3 Pre-release
Pre-release

Overview

Major Highlights

  • Conformance Tests: Validate your controller’s behavior with end-to-end tests covering InferencePool,
    InferenceModel, HTTPRoute, and more.

  • New Config API: A new Config API which allows the configuration of plugins through a config file without touching core code.

  • Helm Charts: helm chart update to support the reuse of Config API easily.

What's Changed

Read more

v0.5.0-rc.2

16 Jul 16:56
7fa8fc0
Compare
Choose a tag to compare
v0.5.0-rc.2 Pre-release
Pre-release

Overview

Major Highlights

  • Conformance Tests: Validate your controller’s behavior with end-to-end tests covering InferencePool,
    InferenceModel, HTTPRoute, and more.

  • New Config API: A new Config API which allows the configuration of plugins through a config file without touching core code.

  • Helm Charts: helm chart update to support the reuse of Config API easily.

What's Changed

Read more

v0.5.0-rc.1

15 Jul 21:05
73fd266
Compare
Choose a tag to compare
v0.5.0-rc.1 Pre-release
Pre-release

Overview

Major Highlights

  • Conformance Tests: Validate your controller’s behavior with end-to-end tests covering InferencePool,
    InferenceModel, HTTPRoute, and more.

  • New Config API: A new Config API which allows the configuration of plugins through a config file without touching core code.

What's Changed

  • Add scripts for running e2es by @keithmattix in #978
  • fix: istio example destination rule by @EyalPazz in #970
  • Bump Istio tag reference by @keithmattix in #974
  • adds New functions to the scorers for consistency by @nirrozenbaum in #975
  • feat(conformance): enable multiple endpoints in header based filter for EPP's conformance testing. by @zetxqx in #964
  • e2e makefile comment fix by @nirrozenbaum in #976
  • API: Adds 5xx Status Code for Invalid ExtRef by @danehans in #991
  • feat(conformance): Add test for invalid EPP service reference by @SinaChavoshi in #959
  • moved the creation of the context to main.go. by @nirrozenbaum in #995
  • doc: fix dead links by @caozhuozi in #989
  • feat: add health check for epp cluster by @zhengkezhou1 in #966
  • test: gRPC server unit tests and utilities for further end-to-end tests by @irar2 in #820
  • Update dynamic-lora-sidecar to expose metrics to track loaded adapters by @shotarok in #980
  • refactor: Replace prefix cache structure with golang-lru by @kfirtoledo in #928
  • feat(conformance): Add HTTPRouteMultipleRulesDifferentPools test by @SinaChavoshi in #834
  • feat: Load the SchedulerConfig from a configuration file/text and make it easier to add plugins by @shmuelk in #881
  • adding pre-request plugin to requestcontrol layer by @nirrozenbaum in #1004
  • feat(conformance): Add test execution instruction to the guide. by @SinaChavoshi in #878
  • fix: Update bbr fqdn to use helm release namespace by @chewong in #1009
  • feat(conformance): Add HTTPRoute port validation tests for InferencePool backends by @zetxqx in #911
  • refactor(conformance): move some common resources to shared place and add EPP service to tests needed. by @zetxqx in #982
  • fix(Conformance): Add namespace-(labels|annotations) flag parsing by @aslakknutsen in #984
  • bump cpu deployment version by @nirrozenbaum in #1016
  • fix: api doc typo InvalidExtnesionRef by @aslakknutsen in #1018
  • Adds vLLM CPU and Sim Support to Release Script by @danehans in #1020
  • Add Makefile to run unit tests of tools/dynamic-lora-sidecar locally by @shotarok in #1021
  • profile handler ProcessResult returns additional return value by @nirrozenbaum in #1013
  • cleanup after config api PR was merged by @nirrozenbaum in #1012
  • Making inferenceModel optional by @kfswain in #1024
  • Adding Design Principles by @robscott in #596
  • Adding Nir as a maintainer! by @kfswain in #1026
  • [Fix] Missing property "apiGroup" error by @yafengio in #1015
  • API: Adds default status condition to InferencePool by @danehans in #830
  • feat(conformance): Add EPP conformance test for Gateway routing by @zetxqx in #961
  • update sim deployment tag to latest by @nirrozenbaum in #1041
  • refactor: rename plugin.Name() => plugin.Type() by @elevran in #1038
  • docs: update the Getting Started guide to use the latest CRDs by @kfirtoledo in #1045
  • added cycle state to pick & process results in profile handler by @nirrozenbaum in #1040
  • feat(conformance): Add HTTPRouteMultipleGatewaysDifferentPools test by @SinaChavoshi in #838
  • feat(conformance) add EPP unavailable fail-open test by @zetxqx in #999
  • Add APIs for the instantiated plugins to the EPP Handle by @shmuelk in #1039
  • chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1050
  • chore(deps): bump github.com/prometheus/common from 0.64.0 to 0.65.0 by @dependabot[bot] in #1051
  • Only create LOCALBIN directory when it does not exist by @elevran in #1054
  • remove datastore dependency from the scheduler by @nirrozenbaum in #1049
  • add e2e test for epp metrics by @delavet in #938
  • refactor(confromance) use common resources for InferencePoolHTTPRoutePortValidation test by @zetxqx in #1034
  • Reintroduce Plugin.Name() by @elevran in #1057
  • Extensible/Pluggable data layer proposal by @elevran in #1023
  • Add subsetting logic for epp by @rlakhtakia in #981
  • docs: added gke clean up instructions by @capri-xiyue in #1064
  • feat(flowcontrol): Add Foundational Types and Architecture by @LukeAVanDrie in #997
  • refactor: Allow export prefix SchedulingContextState for use across plugins by @kfirtoledo in #1063
  • feat: Added a factory function for the DecisionTree filter by @shmuelk in #1053
  • Adding pprof endpoints to metrics port by @kfswain in #1069
  • version in README by @nirrozenbaum in #1072
  • feat: Add a context.Context to the plugins.HAndle interface by @shmuelk in #1076
  • Update model server protocol with prefix cache reuse by @liu-cong in #1077
  • Update prefix plugin guide to use vllm as default to be consistent by @liu-cong in #1078
  • refactor(conformance) merge similar utility functions. by @zetxqx in #1055
  • fix(conformance): fix conformance setup issue by not relying on suite.Setup from gateway-api by @zetxqx in #1060
  • e2e cleanup by @nirrozenbaum in #988
  • fix: add wait after both httproute deletes for status to update by @aslakknutsen in #1056
  • API: Refine ResolvedRefs condition for invalid ExtensionReference and expand InferencePoolReason values by @zetxqx in #1070
  • Tidy up Data Layer documentation by @elevran in https:...
Read more

v0.4.0

23 Jun 04:32
v0.4.0
2b5b337
Compare
Choose a tag to compare

Overview

We are thrilled to announce the v0.4.0 release—our biggest update yet! This version brings powerful new Endpoint Picker (EPP) scheduler capabilities, performance improvements, and initial Gateway conformance tests.

Major Highlights

  • Modular Endpoint Picker (EPP) Scheduler: A kube-scheduler–style plugin API lets you build custom routing logic,
    filter and score backends, or swap in new picker strategies without touching core code.

  • Prefix-Cache-Aware Routing: Dramatically lower tail latency by routing requests based on cached network prefixes,
    improving response times under load.

  • Richer Metrics: Gain deeper insights with new metrics including:

    • NTPOT (Normalized Time Per Output Token)
    • Scheduler latency
    • Per-pod queue depth
    • Build and version info
  • Optional vLLM Simulator Backend: Spin up a lightweight simulator for local development and testing—no real model
    servers required.

  • Initial Conformance Tests: Validate your controller’s behavior with end-to-end tests covering InferencePool,
    InferenceModel, HTTPRoute, and more.

What's Changed

Read more

v0.4.0-rc.1

13 Jun 15:28
v0.4.0-rc.1
Compare
Choose a tag to compare
v0.4.0-rc.1 Pre-release
Pre-release

TL;DR

  • We have made major refactor to the EPP, allowing for a more modular and maintainable system.
    • As a part of this overall, we have implemented a pluggable, extendable scheduler system. Allowing users to create their own custom, sophisticated routing logic
  • We have also included native support for Prefix Cache Aware Routing

What's Changed

Read more

v0.3.0

02 Apr 23:01
v0.3.0
d172cbb
Compare
Choose a tag to compare

tl;dr

  • FULL_DUPLEX_STREAMED is on by default
  • We have helm charts published for InferencePool
  • Many smaller polish items resolved

What's Changed

Read more

v0.3.0-rc.1

01 Apr 03:45
v0.3.0-rc.1
Compare
Choose a tag to compare
v0.3.0-rc.1 Pre-release
Pre-release

What's Changed

Read more