Skip to content

feat(flowcontrol): Implement registry shard #1187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

LukeAVanDrie
Copy link
Contributor

This work tracks #674

This pull request introduces the operational layer of the new sharded Flow Registry architecture. It delivers the concurrent, high-performance "data plane" that each FlowController worker will interact with, paving the way for the top-level administrative control plane (FlowRegistry) to be built in a subsequent PR.

The core architectural principle here is the separation of concerns between the global registry's administrative functions and the per-worker operational view. This PR focuses exclusively on the latter by implementing the registryShard.

The key components introduced are:

  • registry.Config: A new configuration object that defines the master layout for the entire registry. It includes robust validation for checking compatibility between policies and queues, defaulting logic, and a partition() method to distribute capacity allocations across all shards.

  • registry.registryShard: This is the heart of the PR. It is a concrete and concurrent-safe implementation of the contracts.RegistryShard interface. It provides a FlowController worker with its partitioned view of the system state, managing all flow queues and policies within its boundary. Concurrency is managed via a RWMutex for structural map changes and lock-free atomic operations for all statistics, ensuring high performance on the hot path.

  • registry.managedQueue: A critical decorator that wraps a framework.SafeQueue. It serves two essential functions in the architecture:

    1. Atomic Statistics: It maintains its own counters and uses a callback to atomically reconcile its state with its parent shard. This is the key to enabling lock-free, aggregated statistics at the shard and global level.
    2. Lifecycle Enforcement: It tracks the queue's state (active vs. draining). This is a forward-looking feature that is essential for enabling graceful administrative actions in the future, such as changing a flow's priority or decommissioning a shard without losing requests.
  • Contracts and Errors: New sentinel errors (ErrFlowInstanceNotFound, ErrPriorityBandNotFound, ErrPolicyQueueIncompatible) have been added to the contracts package to create a stable and predictable API for consumers of the registry.

Testing

All new components are accompanied by comprehensive unit tests, which can be found in pkg/epp/flowcontrol/registry/. The tests cover:

  • Configuration validation and partitioning logic.
  • managedQueue behavior, including statistics reconciliation and concurrency safety.
  • registryShard accessors, error paths, and statistics aggregation.

This commit introduces the core operational layer of the Flow Registry's
sharded architecture. It deliberately focuses on implementing the
`registryShard`—the concurrent, high-performance data plane for a single
worker—as opposed to the top-level `FlowRegistry` administrative
control plane, which will be built upon this foundation.

The `registryShard` provides the concrete implementation of the
`contracts.RegistryShard` port, giving each `FlowController` worker a
safe, partitioned view of the system's state. This design is
fundamental to achieving scalability by minimizing cross-worker
contention on the hot path.

The key components are:

- **`registry.Config`**: The master configuration blueprint for the
  entire `FlowRegistry`. It is validated once and then partitioned,
  with each shard receiving its own slice of the configuration,
  notably for capacity limits.

- **`registry.registryShard`**: The operational heart of this commit. It
  manages the lifecycle of queues and policies within a single shard,
  providing the read-oriented access needed by a `FlowController`
  worker. It ensures concurrency safety through a combination of mutexes
  for structural changes and lock-free atomics for statistics.

- **`registry.managedQueue`**: A stateful decorator that wraps a raw
  `framework.SafeQueue`. Its two primary responsibilities are to enable
  the sharded model by providing atomic, upwardly-reconciled statistics,
  and to enforce lifecycle state (active vs. draining), which is
  essential for the graceful draining of flows during future
  administrative updates.

- **Contracts and Errors**: New sentinel errors are added to the
  `contracts` package to create a clear, stable API boundary between
  the registry and its consumers.

This work establishes the robust, scalable, and concurrent foundation
upon which the top-level `FlowRegistry` administrative interface will be
built.
Copy link

netlify bot commented Jul 17, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit a58d7b3
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/687987c743a27b0008c60c2d
😎 Deploy Preview https://deploy-preview-1187--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 17, 2025
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 17, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @LukeAVanDrie. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jul 17, 2025
@LukeAVanDrie
Copy link
Contributor Author

/assign @ahg-g

/assign @kfswain

@ahg-g
Copy link
Contributor

ahg-g commented Jul 18, 2025

/ok-to-test

How are we sharding exactly?

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 18, 2025
@LukeAVanDrie
Copy link
Contributor Author

LukeAVanDrie commented Jul 18, 2025

/ok-to-test

How are we sharding exactly?

Great question. Here’s the high-level overview of the sharding strategy.

The Short Answer

We shard by running multiple parallel worker loops (shardProcessors) within a single Flow Controller instance. Each logical flow (e.g., for a tenant) gets a dedicated queue on every shard.

When a request arrives, a top-level distributor sends it to the shard with the lightest load for that specific flow. This prevents a central bottleneck and allows us to process requests in parallel.

Think of it as having multiple, independent dispatchers instead of just one.

How it Works: Join-the-Shortest-Queue-by-Bytes (JSQ-Bytes)

The distributor uses a Join-the-Shortest-Queue-by-Bytes (JSQ-Bytes) algorithm.

  1. A request arrives for Flow F.
  2. The distributor inspects the queue for Flow F on every shard (q_F_1, q_F_2, ... q_F_n).
  3. It sends the request to the shard whose queue for F currently has the fewest total bytes.

This balances the load (in terms of memory pressure and queuing capacity) across the shards in real-time.

Current Status & Next Steps

This PR implements the registryShard structure, which makes this parallel processing possible. Subsequent PRs will introduce the shardProcessor (worker) and the top-level distributor that implements the JSQ-Bytes logic (FlowController). For now, the system is configured to run with a single shard, but the architecture is ready for expansion.


Deeper Dive on Design Rationale

The design of the Flow Controller is built on this sharded architecture to enable parallel processing and prevent the central dispatch loop from becoming a bottleneck at high request rates.

Critical Assumption: Workload Homogeneity Within Flows

The effectiveness of the sharded model hinges on a critical assumption: while the system as a whole manages a heterogeneous set of flows, the traffic within a single logical flow is assumed to be roughly homogeneous. A logical flow represents a single workload or tenant, so we expect request characteristics to be statistically similar within that flow.

The JSQ-Bytes distribution makes this assumption robust by:

  1. Reflecting True Load: It distributes work based on each shard's current queue size in bytes—a direct measure of its memory and capacity congestion.
  2. Adapting to Real-Time Congestion: It adaptively steers new work away from momentarily struggling workers.
  3. Hedging Against Assumption Violations: This adaptive, self-correcting nature makes it a powerful hedge. It doesn't just distribute; it actively load balances based on the most relevant feedback available.

Stateful Policies in a Sharded Registry

Sharding means that when the critical assumption holds, the independent actions of policies on each shard result in emergent, approximate global fairness. Achieving true, deterministic global fairness would require policies to depend on an external, globally-consistent state store, which adds significant complexity. The shard_count will be a key operational lever to tune this behavior, especially for low-QPS flows.

@ahg-g
Copy link
Contributor

ahg-g commented Jul 18, 2025

The shard_count will be a key operational lever to tune this behavior, especially for low-QPS flows.

worth considering a waterfall algorithm, meaning create multiple shards, but the requests should be added to one shard until it is at capacity and then spill to the next one.

@ahg-g
Copy link
Contributor

ahg-g commented Jul 18, 2025

This PR is fine, but again, hard to reason about without the full picture.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 18, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, LukeAVanDrie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 18, 2025
@LukeAVanDrie
Copy link
Contributor Author

The shard_count will be a key operational lever to tune this behavior, especially for low-QPS flows.

worth considering a waterfall algorithm, meaning create multiple shards, but the requests should be added to one shard until it is at capacity and then spill to the next one.

That's a great point, and it's a valid architectural pattern for certain types of systems. I considered a "fill-and-spill" (waterfall) approach during the design phase, but I concluded that a "spread" approach using JSQ-Bytes is a better fit for the specific goals of the Flow Controller.

The core trade-off comes down to what we are optimizing for.

A waterfall algorithm is excellent for systems that need to optimize for resource packing or locality. However, the primary goals of our Flow Controller are high throughput, low latency, and fairness between concurrent flows. The waterfall model presents challenges for all three of these goals in our specific context:

  1. It Re-introduces a Single-Threaded Bottleneck: The main motivation for sharding is to parallelize the dispatch loop. A waterfall model effectively serializes all requests through a single "hot" shard until it reaches capacity. This would re-introduce the very bottleneck we're trying to eliminate, capping our throughput at the speed of a single worker.

  2. It Makes Emergent Fairness Impossible: This is the most critical point. Our design relies on emergent, approximate global fairness, where fair policies running on each of the N shards produce a globally fair outcome over time. This only works if each shard receives a statistically representative sample of the overall traffic. The waterfall model breaks this prerequisite. If Shard 1 receives 100% of the traffic, the other N-1 shards have a completely stale and unrepresentative view of the traffic mix. When traffic finally spills over, their policies—especially stateful ones—will make decisions based on a completely outdated context, undermining any global fairness objective.

  3. It Can Create Instability: The waterfall model can lead to oscillating behavior. The system would operate with one hot shard and N-1 cold shards. When the hot shard finally drains, all traffic would suddenly "thunder" over to the next shard in line, making it the new hot spot. This is less stable and predictable than the JSQ-Bytes approach, which gracefully and continuously balances load across all available workers.

To summarize the trade-off:

  • Waterfall (Fill-and-Spill) Model:

    • Optimizes For: Resource Packing, Locality.
    • Behavior: Creates a single "hot" shard, serializing requests.
    • Impact on Fairness: Undermines it. Creates skewed, unrepresentative views for policies, rendering stateful policies ineffective.
    • Best For: Batch systems, systems with expensive, stateful workers.
  • JSQ-Bytes (Spread) Model:

    • Optimizes For: Parallelism, Throughput, Fairness.
    • Behavior: Spreads load across all shards based on real-time congestion.
    • Impact on Fairness: Enables it. Ensures all shards see a representative traffic mix, allowing local policies to produce an emergent, globally fair outcome.
    • Best For: High-throughput, low-latency, fairness-sensitive systems like ours.

For these reasons, I strongly believe that our current JSQ-Bytes "spread" approach is the correct choice to meet the core requirements of this component.

@k8s-ci-robot k8s-ci-robot merged commit 6cf8f31 into kubernetes-sigs:main Jul 18, 2025
9 checks passed
@LukeAVanDrie
Copy link
Contributor Author

This PR is fine, but again, hard to reason about without the full picture.

/lgtm /approve

Thanks for the LGTM and approval, Abdullah!

I agree, this registryShard component is abstract on its own. To give you that "full picture," this PR establishes the core data plane for the FlowController. It's the high-performance, concurrent-safe container that holds all the flow state.

The next PR I'm preparing introduces the execution engine that will consume this state. The reason for the registryShard's specific, read-oriented API is to safely and efficiently serve those parallel flow control workers.

Once you see how a worker uses the contracts.RegistryShard interface in the next review, the design choices here will feel much more concrete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants