gsoc: Integrate Kubeflow SDK with OpenTelemetry for GSoC 2026 (#4294)

dhanishaphadate · web-flow · commit 822043b0c52c · 2026-02-03T18:35:25.000Z
* Add GSoC 2026 project proposal for Kubeflow SDK OpenTelemetry integration

Signed-off-by: Dhanisha Phadate &lt;dhanisha.6@gmail.com&gt;

* chore: retrigger CI

Signed-off-by: Dhanisha Phadate &lt;dhanisha.6@gmail.com&gt;

* Clean the file

Signed-off-by: Dhanisha Phadate &lt;dhanisha.6@gmail.com&gt;

* Refactoring project scope

Signed-off-by: Dhanisha Phadate &lt;dhanisha.6@gmail.com&gt;

---------

Signed-off-by: Dhanisha Phadate &lt;dhanisha.6@gmail.com&gt;
diff --git a/content/en/events/upcoming-events/gsoc-2026.md b/content/en/events/upcoming-events/gsoc-2026.md
@@ -298,6 +298,70 @@ Tracking issue: https://github.com/kubeflow/sdk/issues/238
 - Understanding of the Kubeflow Ecosystem and basic Kubernetes concepts.
 - Engage and contribute to Kubeflow community on Slack and GitHub.
 
+### Project 7 : Integrate Kubeflow SDK with OpenTelemetry
+
+**Components:** [kubeflow/sdk](https://www.github.com/kubeflow/sdk) 
+
+**Mentors:** [@kramaranya](https://github.com/kramaranya), [@dhanishaphadate](https://github.com/dhanishaphadate), [@jaiakash](https://www.github.com/jaiakash)
+
+**Contributor:** 
+
+**Details:** 
+
+The Kubeflow SDK enables users with limited Kubernetes knowledge to interact with the Kubeflow ecosystem using standard Python APIs. As AI/ML workloads become more complex and distributed, observability into pipeline execution, model training, and inference workflows becomes critical.
+
+This project aims to integrate the Kubeflow SDK with OpenTelemetry (OTel) to provide standardized, vendor-neutral telemetry for Kubeflow-based workloads. The integration will enable end-to-end visibility into SDK operations by capturing distributed traces, metrics, and logs across pipeline compilation, submission, execution, and training lifecycles.
+
+The project will also explore leveraging existing OpenTelemetry and Generative AI instrumentation patterns—such as span conventions for model execution, prompt handling, and inference steps—where applicable.
+
+[Kubeflow SDK Documentation](https://sdk.kubeflow.org/en/latest/index.html) 
+
+[Opentelemetry for genAI](https://opentelemetry.io/blog/2024/otel-generative-ai/)
+
+[Issue](https://github.com/kubeflow/sdk/issues/164)
+
+**Features Expected:**
+
+- Add OpenTelemetry instrumentation to key Kubeflow SDK components
+- Enable distributed tracing for pipeline execution and SDK operations
+- Collect and export metrics related to AI/ML workloads
+- Provide configurable OTel exporters and sampling options
+- Documentation and examples demonstrating observability setup and usage
+- cover below SDK clients 
+
+| SDK Client | Component | 
+|------------|-----------|
+| `TrainerClient` | Kubeflow Trainer |  
+| `PipelinesClient` | Kubeflow Pipelines |
+
+```
+kubeflow/
+├── trainer/       # TrainerClient - distributed training & fine-tuning
+├── optimizer/     # OptimizerClient - Katib AutoML & hyperparameter tuning  
+├── hub/           # ModelRegistryClient - model artifact management
+└── common/        # Shared utilities across clients
+```
+
+Bonus requirement to complete 
+| SDK Client | Component | 
+|------------|-----------|
+| `OptimizerClient` | Kubeflow Katib | 
+| `ModelRegistryClient` | Model Registry |
+| `SparkClient` | Spark Operator | 
+
+
+**Difficulty:** [intermediate|hard]
+
+**Size:** [350 hours]
+
+**Skills Required/Preferred:**
+
+- Python
+- Understanding of the Kubeflow Ecosystem (preferred)
+- OpenTelemetry (tracing, metrics, logging)
+- Distributed systems and observability concepts
+- Kubernetes and CRDs
+
 ### Project 10: Dynamic LLM Trainer Framework for Kubeflow
 
 **Components:**
@@ -399,6 +463,7 @@ Notebook workflows are commonly split across multiple files. Without visual comp
 - JavaScript / TypeScript (visual editor, JupyterLab extensions)
 - Familiarity with Jupyter notebooks and pipeline concepts
 - Experience or interest in working within established UI frameworks
+----
 
 ### Project 12: Kubeflow SDK/SparkClient - Batch Jobs, Observability & Production Readiness
 
@@ -473,4 +538,4 @@ Tryout reading and connecting to data warehouse and data lakehouse:
 - Kubernetes (CRDs, API, RBAC)
 - Apache Spark (Architecture, Configuration)
 - Testing (Unit, Integration, E2E)
-- Technical Writing (Documentation)
+- Technical Writing (Documentation)