New and noteworthy
-
This release is primarily focused on sharing and enabling users to try our experimental features we are developing:
-
Flow Control is available as an experimental feature! To enable include ENABLE_EXPERIMENTAL_FLOW_CONTROL_LAYER as an env var, set to true (this can be done from the helm chart). Docs are WIP and soon coming!
-
Multi-port support is available with GW implementations that also support this. This enables sophisticated features like Wide EP. GW providers support forthcoming.
-
Multi-Cluster support the API surface has been extended to experimentally support multi-cluster support. Docs are WIP and coming soon!
What's Changed
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.24.0 to 2.25.1 by @dependabot[bot] in #1467
- adding warning labels to site while we update docs by @kfswain in #1466
- removed datastore dependency from saturation detector by @nirrozenbaum in #1293
- Add initial troubleshooting guide by @nicolexin in #1430
- fix: Pin to vllm v0.8.5 by @capri-xiyue in #1453
- chore(deps): bump google.golang.org/protobuf from 1.36.7 to 1.36.8 by @dependabot[bot] in #1471
- Updating proposal statuses by @kfswain in #1472
- chore(deps): bump google.golang.org/grpc from 1.74.2 to 1.75.0 by @dependabot[bot] in #1468
- fix(conformance): remove the inferenceObjective dependency compeletely. by @zetxqx in #1477
- chore(deps): bump github.com/onsi/gomega from 1.38.0 to 1.38.1 by @dependabot[bot] in #1469
- chore(deps): bump github.com/stretchr/testify from 1.10.0 to 1.11.0 by @dependabot[bot] in #1470
- Add Alibaba Cloud ack-gie conformance report for v0.5.1 by @delavet in #1478
- pin vllm gpu image by @nirrozenbaum in #1479
- [docs] Updating the FAQ by @kfswain in #1474
- Fix troubleshooting guide by @nicolexin in #1485
- Update provider name in helm chart for GKE to be not case sensitive by @rahulgurnani in #1486
- refactor(registry): Replace event-driven GC with a lease-based lifecycle by @LukeAVanDrie in #1476
- use context in indexer go routine instead of context.TODO by @nirrozenbaum in #1491
- cleanup: make port definitions symmetric and clean endpointPickerRef.port by @capri-xiyue in #1484
- Adding conformance report for Kubvernor by @dawid-nowak in #1316
- fix(conformance) add endpointConfig port in conformance test for v1 as it's required for service kind now. by @zetxqx in #1499
- feat(conformance): Use CRD annotation to populate the ConformanceReport GatewayAPIInferenceExtensionVersion by @zetxqx in #1214
- if request id was not supplied in header, generate uuid by @nirrozenbaum in #1490
- prefix state temp fix - write state to both plugin state and cycle state by @nirrozenbaum in #1509
- Fix(datastore): Correct inverted log messages in podResyncAll by @fyuan1316 in #1511
- Add WeightedRandomPicker by @Jooho in #1412
- allow setting custom plugins file through helm by @nirrozenbaum in #1508
- feat(helm): add affinity and tolerations to epp-deployment by @hhk7734 in #1504
- typo: add vLLM Prefix Cache & LoRA Adapters links by @zhengkezhou1 in #1280
- docs: update BBR guide by @chewong in #1517
- Docs: updated docs to include network service api enable for gke by @capri-xiyue in #1435
- cleanup modelName from inferenceObjective. by @zetxqx in #1521
- fix: fixed helm by @capri-xiyue in #1522
- Perf updates by @kfswain in #1523
- fix serve multiple genai models md file by @learner0810 in #1527
- update guides docs to fix miss guide by @Frapschen in #1532
- minor updates and godoc to weighted random picker by @nirrozenbaum in #1514
- follow up - improving logging perf issues in few more places by @nirrozenbaum in #1528
- Fixing test flake by @kfswain in #1534
- Update vllm image version for CPU deployment by @rahulgurnani in #1526
- adding elevran as code reviewer instead of shaneutt by @nirrozenbaum in #1533
- Update helm chart Readme with custom plugin config by @rahulgurnani in #1516
- Bumps k8s.io Deps to v0.34.0 by @danehans in #1537
- Helm fix by @nirrozenbaum in #1540
- Make apiVersion configurable for inferencePool in the helm charts by @rahulgurnani in #1542
- Update guide for better clarity and to avoid errors by @rlakhtakia in #1475
- fix helm chart support for gke v1alpha2. by @zetxqx in #1551
- chore: bump sim model server version by @nirrozenbaum in #1555
- remove duplicated section in quickstart guide by @nirrozenbaum in #1553
- Merge shuffle score pods logic by @learner0810 in #1552
- Update getting started guide with 1.0 release by @rahulgurnani in #1557
- Added envoy proxy ai-gateway by @learner0810 in #1554
- fix-import-groups by @learner0810 in #1560
- chore(deps): bump golang.org/x/sync from 0.16.0 to 0.17.0 by @dependabot[bot] in #1549
- chore(deps): bump sigs.k8s.io/controller-tools from 0.18.0 to 0.19.0 by @dependabot[bot] in #1548
- fix flake in weighted random picker by @nirrozenbaum in #1561
- Main uniquely name crbac by @Gregory-Pereira in #1564
- Update priority in EPP flow control from uint to int by @rahulgurnani in #1518
- rename inference_model metrics to inference_objective by @JeffLuoo in #1567
- add a hold label when PRs are pushed to branch other than main by @nirrozenbaum in #1570
- Refactor LLMRequest: Structured RequestData for Completions & Chat-Completions by @vMaroon in #1446
- epp servicemonitor by @sallyom in #1425
- remove scheduler epp flowchart by @kaushikmitr in #1573
- Fixes to overview.md and inferencemodel.md by @DamianSawicki in #1281
- fix prefix plugin unit-test by @vMaroon in #1575
- Update the endpoint picker diagram by @ahg-g in #1572
- Docs: Updates Intro Diagram by @danehans in #1577
- Conformance: Adds v1.0.0 Gateway Conformance Report for Kgateway by @danehans in #1579
- Updating the the doc site by @kfswain in #1500
- feat: Add top-level Flow Controller by @LukeAVanDrie in #1525
- chore: remove unneeded logging before returning error by @phuhung273 in #1544
- update istio version to v1.28 in guide document by @flpanbin in #1582
- docs: dashboards README metrics link fix by @JaredTan95 in #1589
- fix(deps): update prometheus/common to v0.66.1 and prometheus/client_golang to v1.23.1 by @JeffLuoo in #1580
- chore(deps): bump github.com/prometheus/client_golang from 1.23.0 to 1.23.2 by @dependabot[bot] in #1550
- One liner: add COPY api ./api command to resolve import dependency in… by @davidbreitgand in #1585
- Enhance pool namespace resolution: use flag if set, else NAMESPACE en… by @jyizheng in #1578
- Scrubbing out more InferenceModel references by @kfswain in #1592
- cleanup of quickstart and readme by @nirrozenbaum in #1588
- Conformance: Adds Report for Kgateway with Agentgateway by @danehans in #1587
- refactor: flow control config by @LukeAVanDrie in #1581
- chore(deps): bump google.golang.org/grpc from 1.75.0 to 1.75.1 by @dependabot[bot] in #1596
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.25.1 to 2.25.3 by @dependabot[bot] in #1597
- Deprecate inferencepool-resources.yaml by @rahulgurnani in #1586
- chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1594
- chore(deps): bump google.golang.org/protobuf from 1.36.8 to 1.36.9 by @dependabot[bot] in #1595
- Add makefile entries for api-lint by @rikatz in #1384
- test: Refactor end to end test code to enable reuse downstream by @shmuelk in #1515
- Add BBR docs, example deployment by @srampal in #1498
- fix(conformance): Use pinned version of EPP for conformance test instead of main. by @zetxqx in #1262
- Updates gateway manifests for v1 InferencePool by @danehans in #1603
- docs: remove duplication in multimodel serving guide by @sagiahrac in #1605
- Docs: Updates kgateway and agentgateway Implementations for v1.0.0 by @danehans in #1607
- Fix some markdown formatting errors by @srampal in #1609
- helm: add NAMESPACE env var to EPP deployment for pool namespace by @jyizheng in #1610
- chore: bump sim version by @nirrozenbaum in #1612
- add gke monitoring helm support. by @zetxqx in #1600
- Rename prefix scorer HashBlockSize to BlockSize by @Frapschen in #1613
- Remove duplicate gcpbackendpolicy and healthcheckpolicy config by @liu-cong in #1618
- Consolidate ha config into a single enableLeaderElection, also fix rolling update stuck bug by @liu-cong in #1620
- improve the readable of test-unit make target by @Frapschen in #1614
- feat: Adapt flow control to per-request saturation by @LukeAVanDrie in #1622
- fix missing GCPBakendPolicy. by @zetxqx in #1623
- use replicas field in helm to decide if EPP should run in HA mode by @nirrozenbaum in #1628
- Explicitly set pool-namespace flag for wider compatibility by @liu-cong in #1633
- Bug fix - error calling gt: incompatible types for comparison: float64 and int by @liu-cong in #1630
- enable istio as a provider + configuring destinationRule by @Gregory-Pereira in #1381
- Update to v1.0.1-rc1 helm chart which fixed many bugs by @liu-cong in #1641
- [docs] Fix indentation for MkDocs Admonitions display issue by @yankay in #1643
- update automatically the helm chart version used in the quickstart guide by @learner0810 in #1645
- [chore] Fix serve-multiple-genai-models.md mistake by @Frapschen in #1647
- remove istio destinationrule creation from quickstart by @nirrozenbaum in #1648
- Adds WeightedRandomPicker plugin description by @learner0810 in #1657
- chore(deps): bump github.com/prometheus/prometheus from 0.305.0 to 0.306.0 by @dependabot[bot] in #1634
- Loosen validation for inference pool crd to support up to 8 ports per inference pool by @syw14 in #1653
- Adding simple helm CI by @kfswain in #1635
- Updating proposal process to reflect how we are currently operating by @kfswain in #1660
- chore: update quickstart to use latest release instead of rc by @nirrozenbaum in #1662
- support vLLM cache salting in prefix aware scorer by @Frapschen in #1646
- Quick doc update by @kfswain in #1659
- update make target naming in helm chart pushs commands by @kfswain in #1666
- Proposal for Multi-Cluster InferencePools by @bexxmodd in #1374
- Update inference_gateway.json by @mnmehta in #1676
- Adding Kubvernor to the list of implementors by @dawid-nowak in #1313
- Multi-Cluster: Adds v1alpha1 API Types and Docs by @danehans in #1658
- chore(deps): bump google.golang.org/protobuf from 1.36.9 to 1.36.10 by @dependabot[bot] in #1684
- chore(deps): bump github.com/envoyproxy/go-control-plane/envoy from 1.32.4 to 1.35.0 by @dependabot[bot] in #1686
- chore(deps): bump google.golang.org/grpc from 1.75.1 to 1.76.0 by @dependabot[bot] in #1685
- Use actual platform architecture when building images by @shmuelk in #1681
- chore: bump controller-tools version in Makefile by @nirrozenbaum in #1694
- Dep: Bumps Gateway API to v1.4.0 by @danehans in #1691
- updated link to the updated recordings youtube channel by @nirrozenbaum in #1692
- docs: add epp version history by @zhengkezhou1 in #1360
- [traces] init the trace sdk by @Frapschen in #1638
- feat :Add GetActivePods to handle/datastore and remove deleted pod from prefix-cache scorer by @kfirtoledo in #1376
- Conformance: Adds Weight-Based Traffic Splitting Test by @danehans in #1669
- feat(fc): Initial wiring of the flow control layer by @LukeAVanDrie in #1701
- chore(deps): bump go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc from 1.36.0 to 1.38.0 by @dependabot[bot] in #1709
- chore(deps): bump sigs.k8s.io/controller-runtime from 0.22.1 to 0.22.3 by @dependabot[bot] in #1711
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.25.3 to 2.26.0 by @dependabot[bot] in #1712
- chore(deps): bump github.com/prometheus/common from 0.66.1 to 0.67.1 by @dependabot[bot] in #1710
- conformance: Rename infra namespace by @davidjumani in #1667
- Fix LabelSelector validation markers for map field by @KillianGolds in #1679
- feat: Flow Control context refactor by @LukeAVanDrie in #1702
- feat: Add initial Flow Control metrics by @LukeAVanDrie in #1714
- Docs: added migration guide by @capri-xiyue in #1558
- Fix prefix-cache-scorer benchmark panic by @Frapschen in #1664
- Docs: Versions the quickstart guide by @danehans in #1604
- Auto-configure prefix-cache-scorer parameters from engine metrics by @learner0810 in #1629
- Adds PATCH Variable to Release Script by @danehans in #1723
- chore: Improve Flow Control docs and logging by @LukeAVanDrie in #1726
- Add manifest outputs that split v1 and experimental by @learner0810 in #1644
- Fixed epp pod starting but not working when using multiple schedulingProfiles by @learner0810 in #1698
- Add missing yq dependency for
make artifactsby @yankay in #1728 - Fix(test): resolve data race in StreamedRequest by @LukeAVanDrie in #1727
- Docs: Adds k8s supported verions by @danehans in #1731
- Break PostResponse
requestcontrolplugin into 3 separate plugins to add streamed request functionality by @BenjaminBraunDev in #1661 - Conformance:use gateway api utility functions from v1.4.0 and minor fix typos. by @zetxqx in #1704
- Adding a flag to control whether auth is added to the EPP metrics server by @Frapschen in #1639
- Support for vLLM Data parallel by @shmuelk in #1663
- Docs: Bumps Quickstart to v1.0.2 Release by @danehans in #1745
- Docs: Updates Kgateway in Quickstart by @danehans in #1740
- Align datalayer/metrics with PR #1580 by @elevran in #1749
- Add Install Gateway Section in Getting Started guide by @dharaneeshvrd in #1673
- chore(deps): bump github.com/prometheus/prometheus from 0.306.0 to 0.307.1 by @dependabot[bot] in #1753
- support multi modal inputs by @learner0810 in #1617
New Contributors
- @dawid-nowak made their first contribution in #1316
- @fyuan1316 made their first contribution in #1511
- @Jooho made their first contribution in #1412
- @hhk7734 made their first contribution in #1504
- @Frapschen made their first contribution in #1532
- @sallyom made their first contribution in #1425
- @DamianSawicki made their first contribution in #1281
- @phuhung273 made their first contribution in #1544
- @flpanbin made their first contribution in #1582
- @JaredTan95 made their first contribution in #1589
- @davidbreitgand made their first contribution in #1585
- @jyizheng made their first contribution in #1578
- @srampal made their first contribution in #1498
- @sagiahrac made their first contribution in #1605
- @syw14 made their first contribution in #1653
- @mnmehta made their first contribution in #1676
- @davidjumani made their first contribution in #1667
- @KillianGolds made their first contribution in #1679
- @dharaneeshvrd made their first contribution in #1673
Full Changelog: v1.0.2...v1.1.0