Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions config/charts/inferencepool/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,9 @@ type: application
version: 0.0.0

appVersion: "0.0.0"

dependencies:
- name: jaeger
version: "2.11.0"
repository: "https://jaegertracing.github.io/helm-charts"
condition: jaeger.enabled
87 changes: 87 additions & 0 deletions config/charts/inferencepool/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,93 @@ inferenceExtension:
Make sure that the `otelExporterEndpoint` points to your OpenTelemetry collector endpoint.
Current only the `parentbased_traceidratio` sampler is supported. You can adjust the base sampling ratio using the `samplerArg` (e.g., 0.1 means 10% of traces will be sampled).

#### Jaeger Tracing Backend

GAIE provides an opt-in Jaeger all-in-one deployment as a sub-chart for easy trace collection and visualization. This is particularly useful for development, testing, and understanding how inference requests are processed (filtered, scored) and forwarded to vLLM models.

**Quick Start with Jaeger:**

To install the InferencePool with Jaeger tracing enabled:

```bash
# Update Helm dependencies to fetch Jaeger chart
helm dependency update ./config/charts/inferencepool

# Install with Jaeger enabled
helm install vllm-llama3-8b-instruct ./config/charts/inferencepool \
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
--set inferenceExtension.tracing.enabled=true \
--set jaeger.enabled=true
```

Or using a `values.yaml` file:

```yaml
inferenceExtension:
tracing:
enabled: true
sampling:
sampler: "parentbased_traceidratio"
samplerArg: "1.0" # 100% sampling for development

jaeger:
enabled: true
```

Then install:

```bash
helm dependency update ./config/charts/inferencepool
helm install vllm-llama3-8b-instruct ./config/charts/inferencepool -f values.yaml
```

**Accessing Jaeger UI:**

Once deployed, you can access the Jaeger UI to visualize traces:

```bash
# Port-forward to access Jaeger UI
kubectl port-forward svc/vllm-llama3-8b-instruct-jaeger-query 16686:16686

# Open browser to http://localhost:16686
```

In the Jaeger UI, you can:
- Search for traces by service name (`gateway-api-inference-extension`)
- View detailed span information showing filter and scorer execution
- Analyze request routing decisions and latency
- Understand the complete inference request flow

**Configuration Options:**

The Jaeger sub-chart supports the following configuration:

| **Parameter Name** | **Description** | **Default** |
|---------------------------------------|-----------------------------------------------------------------------------------------------------|----------------------------------|
| `jaeger.enabled` | Enable Jaeger all-in-one deployment | `false` |
| `jaeger.allInOne.enabled` | Enable all-in-one deployment mode | `true` |
| `jaeger.allInOne.image.repository` | Jaeger all-in-one image repository | `jaegertracing/all-in-one` |
| `jaeger.allInOne.image.tag` | Jaeger image tag | `1.62` |
| `jaeger.allInOne.resources.limits` | Resource limits for Jaeger pod | `cpu: 500m, memory: 512Mi` |
| `jaeger.allInOne.resources.requests` | Resource requests for Jaeger pod | `cpu: 100m, memory: 128Mi` |
| `jaeger.query.service.type` | Jaeger UI service type | `ClusterIP` |
| `jaeger.query.service.port` | Jaeger UI port | `16686` |
| `jaeger.collector.service.otlp.grpc.port` | OTLP gRPC collector port | `4317` |
| `jaeger.storage.type` | Storage backend type (memory, elasticsearch, cassandra, etc.) | `memory` |

**Important Notes:**

1. **Development vs Production**: The all-in-one deployment uses in-memory storage and is suitable for development and testing. For production use, consider:
- Using a persistent storage backend (Elasticsearch, Cassandra, etc.)
- Deploying Jaeger components separately for better scalability
- Refer to [Jaeger Production Deployment](https://www.jaegertracing.io/docs/latest/deployment/) for best practices

2. **Automatic Configuration**: When `jaeger.enabled=true`, the OTLP exporter endpoint is automatically configured to point to the Jaeger collector. You don't need to manually set `inferenceExtension.tracing.otelExporterEndpoint`.

3. **Sampling Rate**: For development, you may want to set `samplerArg: "1.0"` to capture all traces. For production, use a lower value like `"0.1"` (10%) to reduce overhead.

4. **Resource Requirements**: Adjust the resource limits based on your trace volume and cluster capacity.

## Notes

This chart will only deploy an InferencePool and its corresponding EndpointPicker extension. Before install the chart, please make sure that the inference extension CRDs are installed in the cluster. For more details, please refer to the [getting started guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/).
4 changes: 4 additions & 0 deletions config/charts/inferencepool/templates/epp-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,11 @@ spec:
- name: OTEL_SERVICE_NAME
value: "gateway-api-inference-extension"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
{{- if .Values.jaeger.enabled }}
value: "http://{{ .Release.Name }}-jaeger-collector:4317"
{{- else }}
value: {{ .Values.inferenceExtension.tracing.otelExporterEndpoint | quote }}
{{- end }}
- name: OTEL_TRACES_EXPORTER
value: "otlp"
- name: OTEL_RESOURCE_ATTRIBUTES_NODE_NAME
Expand Down
43 changes: 42 additions & 1 deletion config/charts/inferencepool/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ inferenceExtension:
enabled: false
tracing:
enabled: false
# When jaeger.enabled is true, this will automatically point to the Jaeger collector
# Otherwise, you can specify your own OpenTelemetry collector endpoint
otelExporterEndpoint: "http://localhost:4317"
sampling:
sampler: "parentbased_traceidratio"
Expand Down Expand Up @@ -94,4 +96,43 @@ istio:
trafficPolicy: {}
# connectionPool:
# http:
# maxRequestsPerConnection: 256000
# maxRequestsPerConnection: 256000

# Jaeger tracing backend configuration
# When enabled, deploys Jaeger all-in-one for trace collection and visualization
jaeger:
enabled: false
# Use the all-in-one deployment mode for simplicity
# For production, consider using a more robust deployment with separate components
allInOne:
enabled: true
image:
repository: jaegertracing/all-in-one
tag: "2.11"
pullPolicy: IfNotPresent
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
# Expose Jaeger UI service
query:
service:
type: ClusterIP
port: 16686
# Collector configuration for OTLP
collector:
service:
otlp:
grpc:
port: 4317
http:
port: 4318
# Storage configuration - use in-memory for simplicity
storage:
type: memory
# Agent configuration
agent:
enabled: false