Skip to content

Refactor: Decouple logic for max concurrency with worker and in flight request #19100

@yongkangc

Description

@yongkangc

Right now max concurrency as a variable is only used for in-flight requests. Previously it was used for both workers and in-flight requests. We should make it clear and remove any decoupling.

What we need to do

  1. Rename max concurrency as a variable for inflight limit
  2. Remove max concurrency as a cli arg
  3. Bench different values of inflight limit

Motivation:
if max_concurrency cli arg removed, which reduced max_concurrency default from 256 to 32 and this reduction could cause perf degredation. Happened due to how we make use of inflight allowance with workers.

Context:
#18872

Metadata

Metadata

Assignees

Labels

A-engineRelated to the engine implementationC-debtA clean up/refactor of existing codeS-needs-benchmarkThis set of changes needs performance benchmarking to double-check that they help

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions