Skip to content

Project ideas for 2026

Adrian Boguszewski edited this page Feb 10, 2026 · 38 revisions

1. Build a GUI Agent with local LLM/VLM and OpenVINO

Short description: You will be required to build an agent application with a graphical user interface as input. It should be able to automatically operate your computer screen or the UI interface of a specific application based on user instructions, and accomplish complex logical goals. In this pipeline, at least one model must be deployed locally using OpenVINO. During this project, you will get free access to AIPC cloud. You can refer to the following projects for ramp up.

Expected outcomes: a desktop application that provides a native GUI Agent based on local models

Skills required/preferred: Python, OpenVINO, Prompt engineering , Agentic workflow

Mentors: Ethan Yang, Zhuo Wu

Size of project: 350 hours

Difficulty: Hard

2. OpenVINO Deep Search AI Assistant on Multimodal Personal Database for AIPC

Short description: Deep Search, as one of the core functions of a personal AI assistant, significantly enhances the user experience by providing information extraction capabilities for various file types (such as Word, PowerPoint, PDF, images, and videos) and supporting multi-dimensional information queries. The localized personal knowledge base not only improves the accuracy and relevance of answers but also protects data security and provides personalized search results based on the user's private data. This project aims to develop a desktop AI localized personal knowledge base search assistant for AI PCs. By building a multimodal personal database and using Retrieval Augmented Generation (RAG) technology, this project leverages this private multimodal data to enhance local large language models (LLMs). Users can interact with the OpenVINO instant messaging AI assistant, ask questions, and perform fuzzy searches using multimodal data.

Expected outcomes:

  • A standalone desktop application capable of building a personal knowledge base from multimodal data (word, images, videos) in specified directories, and supporting information retrieval and summarization via API/application.
  • Localized deployment using OpenVINO, building a localized multimodal personal knowledge base using local multimodal LLMs and Retrieval Augmented Generation (RAG) technology.
  • Deployment on Intel AIPC, with flexible switching between GPU/NPU hardware based on task load.
  • The application will have a user interface allowing users to interact with the local LLM, perform fuzzy searches using multimodal information, and generate valuable output.

Skills required/preferred: Python or C++, OpenVINO, OpenCV, ollama, llama.cpp, LLMs, RAG, OCR, UI

Mentors: Hongbo Zhao, Kunda Xu

Size of project: 350 hours

Difficulty: Hard

3. Object Tracking in MP with OpenVINO Inference

Short description: Tracking the objects in a video stream is an important use case. It combines an object detection model with a tracking algorithm that analyzes a whole sequence of images. The current state-of-the-art algorithm is ByteTrack

The goal of the project is to implement the ByteTrack algorithm as a MediaPipe graph that could delegate inference execution to the OpenVINO inference calculator. This graph could be deployed in the OpenVINO Model Server and deployed for serving. A sample application adopting KServer API would send the stream of images and would get the information about the tracked objects in the stream.

Expected outcomes: MediaPipe graphs with the calculator implementation for ByteTrack algorithm with YOLO models.

Skills required/preferred:  C++ (for writing calculator), Python(for writing client) MediaPipe

Mentors: Adrian Tobiszewski, Dariusz Trawinski

Size of project: 175 hours

Difficulty: Medium

4. OpenVINO GenAI: Add Image-to-Video Support to LTX Video Generation Pipeline

Short description: OpenVINO GenAI is a library of popular Generative AI pipelines, optimized execution methods, and samples built on top of the high-performance OpenVINO Runtime, focused on efficient deployment and easy integration. Currently, OpenVINO GenAI provides a text-to-video generation pipeline based on the LTX model - a diffusion-based video generator that creates videos from a text prompt via iterative denoising in latent space. This project extends the LTX pipeline with image-to-video (I2V) generation, enabling users to create short videos conditioned on an input image combined with a text prompt, running on Intel CPU and GPU. Adding image conditioning provides a strong visual anchor, improving control over composition and style. The project output includes C++ and Python API updates, runnable samples, validation tool updates (OpenVINO GenAI WWB and LLM Benchmarking), and basic tests to validate functionality.

Expected outcomes: Pull-request implementing image-to-video support in the OpenVINO GenAI API including: `. Pipeline Architecture: Extension of the Text2VideoPipeline class to support image-to-video execution paths with minimal memory overhead. 2. API Parity: Full C++ and Python API support for image conditioning inputs. 3. Infrastructure: Updates to OpenVINO GenAI benchmarking tools to measure I2V throughput and latency. 4. Reproducibility: A comprehensive test suite ensuring output consistency between Python and C++ implementations.

Skills required/preferred: C++, Python, good understanding of Stable diffusion architectures, experience with Hugging Face and Diffusers libraries, experience with PyTorch (OpenVINO is a plus), Git.

Mentors: Anna Likholat, Stanislav Gonorovskii

Size of project: 350 hours

Difficulty: Medium

5. Optimize Quantized Model Inference Performance on ARM Devices with OpenVINO

Short description: The goal of this project is to design and implement a set of optimizations in the OpenVINO runtime focused on improving inference performance of quantized neural network models on ARM-based devices. The work will target commonly used quantization schemes and model types, with an emphasis on reducing inference latency, increasing throughput, improving compilation time, and minimizing memory footprint. Special attention will be given to efficiently leveraging ARM-specific features such as NEON and ARM Compute Library integrations.

Expected outcomes:

  • Improved adoption of quantized models in OpenVINO on ARM platforms
  • Reduced inference latency and increased throughput for quantized workloads
  • Faster model compilation and initialization times
  • Lower memory consumption for deploying quantized models on resource-constrained ARM devices

Skills required/preferred: C++, Mac device with ARM chip is a must-have

Mentors: Aleksandr Voron, Vladislav Golubev

Size of project: 350 hours

Difficulty: Medium

6. Develop an OpenVINO-Domain Specialized Coder Model with SFT/GRPO/RAG

Short description: OpenVINO is a critical toolkit for optimizing and deploying AI models on Intel hardware, but developing high-quality OpenVINO-related code (e.g., model inference, quantization, deployment tuning) requires deep domain expertise. This project aims to train a specialized coder model for the OpenVINO ecosystem using Supervised Fine-Tuning (SFT), GRPO, and Retrieval Augmented Generation (RAG) technologies. The model will be optimized for OpenVINO-specific scenarios: generating executable OpenVINO code, debugging deployment issues, providing performance optimization suggestions, and designing solutions for common development challenges. By integrating RAG with a curated OpenVINO knowledge base, the model will retrieve accurate domain knowledge in real time, while SFT/GRPO will refine its ability to produce high-relevance, correct OpenVINO code. The final model will be deployed locally via OpenVINO Runtime to ensure low latency on Intel CPU/GPU/NPU.

Expected outcomes:

  • A specialized OpenVINO-domain coder model trained with SFT/GRPO/RAG, capable of generating accurate, executable OpenVINO code (e.g., inference pipeline construction, quantized model optimization, OpenVINO Runtime integration), debugging code errors, and providing targeted performance tuning suggestions for OpenVINO deployments.
  • A localized deployment pipeline for the coder model based on OpenVINO Runtime, optimized for Intel AIPC/PC hardware (CPU/GPU/NPU) to achieve low latency (<100ms per code generation request) and high throughput

Skills required/preferred: Python, OpenVINO, LLMs, SFT/GRPO/RAG, experience with PyTorch

Mentors: Tao Zhou,Quan Yin

Size of project: 350 hours

Difficulty: Hard

7. Run Clawdbot End-to-End with Local Models (LLM + RAG + Safe Tooling)

Short description: Clawdbot is a personal AI assistant designed to run on your own infrastructure while interacting through the chat surfaces you already use (e.g., WhatsApp/Telegram/Slack/Discord/Signal/iMessage/Teams/WebChat), with the Gateway acting as the long-lived local control plane. This project proposes an entirely local model stack for Clawdbot: replace hosted LLM dependencies with on-device inference (a local LLM server and local embeddings), add a local RAG knowledge base exposed via Clawdbot “skills”, and ship a hardened configuration for real-world messaging inputs with sandboxed tool execution. The result is a privacy-preserving, low-latency Clawdbot deployment in which sensitive prompts, tool calls, and retrievals remain local.

Expected outcomes:

  • Clawdbot running with a local LLM provider (no hosted inference required), configured via models. providers with a local baseUrl, plus an optional failover strategy.
  • Local RAG capability delivered as Clawdbot skills, with a clear precedence/override model and reproducible ingestion.
  • Hardened operational profile for real messaging surfaces:
    • secure DM posture;
    • sandboxed risky contexts/tools
    • prompt-injection regression tests aligned with Clawdbot’s own warning that local/smaller models can increase risk.
  • Deployment kit: reference configs, scripts, and a runbook matching Clawdbot’s gateway architecture and operational constraints.

Skills required/preferred:

  • Node.js/TypeScript (Clawdbot runtime + gateway integration)
  • LLM serving (OpenAI-compatible endpoints; optionally LM Studio/vLLM/LiteLLM patterns)
  • RAG engineering (chunking, embeddings, vector DB, evaluation)
  • Security engineering for agentic systems (sandboxing, tool policy, prompt injection mitigation)
  • DevOps: system services, containerization, observability

Mentors: Quan Yin,Tao Zhou

Size of project: 350 hours

Difficulty: Hard

8. Continuous Face-Detection with Automatic Device Switching on AI PCs using OpenVINO AUTO feature

Short description: AI PCs incorporate multiple devices/inference engines for different machine-learning applications. Based on the performance, latency or power consumption requirements, an application may choose to use either NPU, GPU or a CPU for inference tasks. Usually, an application utilizes a single engine/device for the entire lifetime of the process/inference. The machine learning model being used by the application is compiled only for one device. However, it is important for the application to switch between different inference devices during runtime based on user preference, application behavior, and load/stress on the current device in use. Through this project, we want to build a face-detection application that continuously runs on the AI PC while switching between different inference devices during runtime based on user recommendations or evaluating the stress on the current engine. The inference should not end/pause while switching devices and should not lead to BSODs/System Hang/Device Crashes causing other applications to fail.

Expected outcomes:

  1. Implement low latency Face-Detection application to run on multiple devices/engines within AI PCs
  2. Utilize OpenVINO AUTO feature to demonstrate runtime switching between devices
  3. Create a GUI to prompt user to change the device during runtime based on user preference
  4. Analyze the device load and recommend user to switch to the most appropriate device to continue inference

Skills required/preferred: Python or C++, Basic ML knowledge

Mentors: Shivam Basia, Aishwarye Omer

Size of project: 175 hours

Difficulty: Easy

9. Demonstrating integration of openhands with OpenVINO Model Server

Short description: OpenHands is a popular component that provides a local GUI for AI coding agents. It supports integration with servings compatible with OpenAI api.

The goal of this project is the integrate OpenHands with OpenVINO Model Server. It would include instructions for deploying the serving with a set of models and configuring OpenHands to delegate tasks to the serving endpoints.

Expected outcomes: Receipt for deploying OpenHands with an instance of OpenVINO Model Server. Reporting usability experience and gaps analysis.

Skills required/preferred: Python, LLMs

Mentors: Michal Kulakowski, Milosz Zeglarski

Size of project: 90 hours

Difficulty: Easy to medium

10. OpenVINO Profiling using VTune & GTPin

Short description: Profiling AI model performance is a tedious and time-consuming task. Intel’s VTune provides great level of details at the high level as well as Intel’s GTPin tool provides kernel and instruction level details. This project focus on developing a GTPin plugin to find correlation between instruction level matrices and GPU stats, to identify the hotspots in the kernel and provide guidance on improving kernel level performance.

Expected outcomes: Leverage Intel OpenVino & Intel tools with custom developed plugin to automatically identify bottlenecks in GPU kernels in LLM, VLM, VLA models

Skills required/preferred: Strong in C/C++, GPU programming, GPU kernel, exposure to GPU profiling tools, AI SW execution pipeline, compiler experience is a plus

Mentors: Selvakumar Panneer, Pramit Biswas

Size of project: 175 hours

Difficulty: Hard

11. Create Interactive OpenVINO Jupyter Notebooks for Trending AI Models

Short description: This project involves developing comprehensive Jupyter notebooks that showcase how to deploy and optimize trending AI models using OpenVINO toolkit. The contributor will research and identify the most popular and emerging models in computer vision, natural language processing, and multimodal AI, then create step-by-step tutorials demonstrating model conversion, optimization, and inference with OpenVINO. Each notebook will include practical examples, performance benchmarking, and comparison between different hardware targets (CPU, GPU, NPU). The notebooks will serve as educational resources for the OpenVINO community, helping developers quickly adopt new model architectures and understand optimization techniques. The notebooks will be merged to https://github.com/openvinotoolkit/openvino_notebooks.

Expected outcomes:

  • At least one comprehensive Jupyter notebook covering trending models
  • Each notebook includes model conversion from popular frameworks (PyTorch, TensorFlow, ONNX) to OpenVINO IR format.
  • Each notebook includes a rebuilt model pipeline based on OpenVINO runtime.
  • Performance optimization examples including quantization, model compression, and hardware-specific optimizations.
  • Interactive visualizations and demos showcasing model capabilities.
  • Benchmarking results across different Intel hardware (CPU, integrated GPU, discrete GPU, NPU where applicable)
  • Documentation and best practices guide for model deployment patterns

Skills required/preferred:

  • Python programming and Jupyter notebook development
  • Experience with popular ML frameworks (PyTorch, TensorFlow, Hugging Face)
  • Basic understanding of computer vision and/or NLP concepts
  • Familiarity with model optimization techniques (quantization, pruning)
  • OpenVINO toolkit experience (preferred but not required)
  • Technical writing and documentation skills
  • Git/GitHub workflow knowledge

Mentors: Aleksandr Mokrov, Ethan Yang

Size of project: 175 hours

Difficulty: Medium 

12. Automated Agent System for Adding New Model Support to Optimum-Intel

Short description: The Optimum-Intel provides high performance inference and export pipeline for Hugging Face Pytorch models using OpenVINO backend. Adding support for new models currently requires significant manual effort: understanding model architecture, writing model-specific patching, tests and documentation. This project proposes the development of an autonomous agent system that automatically generates high-quality code to add support for new Hugging Face models into the Optimum-Intel repository. The system will analyze model config, generate model patching, create appropriate tests and generate a tiny variation of the model with reduced parameters based on the model config. The project aims to speed up the development process, automate the most repetitive parts and reduce manual effort when adding support for models to Optimum-Intel. 

Expected outcomes:

  • Multi-Agent system that analyzes a given model by Hugging Face ID, generates the code for the model support, tests, documentation, and (optionally) runs local validation and generates a tiny model based on the model config with reduced parameters for use in tests.
  • The result of the agent system workflow is a code patch and a folder containing a tiny model.

Skills required/preferred: OpenVINO, Python, PyTest, LangGraph or similar agent orchestration framework, Docker (optionally).

Mentors: Anastasiia Pnevskaia , Roman Kazantsev

Size of project: 350 hours

Difficulty: Hard 

13. No-Code AI Workflow Automation with n8n and OpenVINO Model Server for GPU/NPU

Short description: This project integrates n8n (no-code workflow automation platform) with OpenVINO Model Server to enable visual creation of AI-powered workflows that leverage Intel GPU and NPU hardware acceleration. The focus is on building a concrete "Smart Document Processing Pipeline" as the primary demonstration: a workflow that monitors local folders for incoming documents (PDFs, images, scanned forms), uses OpenVINO Model Server with document understanding models running on NPU for efficient text/table extraction, routes data through an LLM on GPU for classification and entity extraction, and stores results locally with notifications. The system showcases OpenVINO AUTO plugin for intelligent device switching—document analysis on NPU (power efficient), heavy LLM processing on GPU (performance), with CPU fallback when devices are busy. Additional workflow templates may include customer service chatbots with multimodal inputs, or automated content generation pipelines.

Expected outcomes:

  • A production-ready custom n8n node that connects to OpenVINO Model Server endpoints with support for GPU/NPU device selection
  • Podman Compose deployment package bundling n8n, OpenVINO Model Server, and supporting services
  • Implementation of the "Smart Document Processing Pipeline" demonstrating: folder monitoring, document understanding on NPU, LLM processing on GPU, local storage, and notifications
  • OpenVINO AUTO plugin integration for intelligent runtime device switching based on system load and performance requirements
  • At least 2-3 additional pre-built workflow templates (e.g., conversational AI with RAG, or multimodal customer service)
  • Comprehensive documentation including tutorials, workflow design patterns, and installation guide for AI PCs

Skills required/preferred: Node.js/TypeScript, Python, OpenVINO, Podman, REST/gRPC APIs, basic ML/AI knowledge

Mentors: Praveen Kundurthy, Max Domeika

Size of project: 350 hours

Difficulty: Medium

14. AIPlayerInsight: Next-Gen Multi-Sport Analytics Platform

Short description: AIPlayerInsight is a post-match analysis system that turns ordinary game footage into rich player telemetry and tactical insights. Using computer vision and local AI, the platform automatically tracks players and the ball, stores motion data in a time-series database, and lets coaches ask natural language questions like "Show me every time #11 hits > 20 mph in the fourth quarter." The goal is to make elite-level analytics accessible without expensive proprietary tracking hardware, while giving students hands-on experience building a full-stack, AI-powered sports product.

What students will learn:

  • Building end-to-end AI systems from raw video to structured data.
  • Designing data models for high-frequency spatial data.
  • Optimizing inference for real hardware constraints.
  • Shipping a user-facing product with real analytics value.

Expected outcomes:

  • 10x faster analysis by reducing time from whistle to data availability.
  • Surface hidden performance metrics like player fatigue (speed decay), off-ball separation, and defensive coverage patterns.
  • Lower cost by using standard HD/4K cameras instead of proprietary tracking hardware.
  • A working pipeline that ingests video, runs detection and tracking, and produces structured spatiotemporal data.
  • A queryable analytics layer that answers natural language questions and returns charts or visual overlays.
  • A polished demo that students can showcase as real-world AI + data engineering experience.

Skills required/preferred:

Required:

  • Python for backend pipelines and data processing.
  • OpenVINO for model optimization on Intel hardware.
  • YOLOv11 and TrackNet for detection and ball tracking.
  • StrongSORT for multi-object tracking and ID persistence.

Preferred:

  • React for the tactical dashboard.
  • TimescaleDB/PostgreSQL for spatiotemporal schema design.
  • Local LLM serving (Ollama, Hugging Face) and Text-to-SQL prompt engineering.
  • Intel Arc GPU AV1/HEVC video handling.

Mentors: Ben Odom

Size of project: 350 hours

Difficulty: Hard (ambitious but very rewarding for students who want real-world AI and systems experience)

15. GGUF Reader in OpenVINO for direct GGUF Execution

Short description: Currently, OpenVINO GenAI’s GGUF reader manually reconstructs model architectures by parsing metadata and building the OpenVINO model layer-by-layer. This project will add/replace the current static graph generation approach in OpenVINO GenAI with an alternative mechanism that traverses the GGML computation graph and dynamically translates it into an OpenVINO model by leveraging llama.cpp APIs and the existing OpenVINO backend implementation in llama.cpp. The student will utilize the GgmlOvDecoder and ov::frontend::ggml::FrontEnd logic, which translates GGML computation graphs directly into OpenVINO IR, allowing for dynamic graph conversion rather than static architecture reconstruction. This new GGUF Reader will act as a generic reader, automatically supporting a wider range of architectures supported by GGML without requiring manual C++ implementation for every new topology. The project involves integrating the translation logic with GenAI’s read_model pipeline, and ensuring feature parity with the existing backend execution (including quantization support and NPU specificities).

Expected outcomes:

  1. GGUF-Reader-v2: A functional GGUFReaderV2 class in OpenVINO GenAI that utilizes the llama.cpp libraries and graph translation logic in OpenVINO backend in llama.cpp to load GGUF files.
    1. Generate GGML Computation Graph: Update OpenVINO GenAI GGUF reader to generate GGML computation graph by using llama.cpp APIs and relevant source code components.
    2. Model Translation: Utilize existing GgmlOVDecoder and ov::frontend::ggml::FrontEnd implementations to translate GGML computation graph into an OpenVINO model.
  2. Integration: Seamless integration into the read_model API, allowing users to load GGUF models via the new GGUF-Reader-v2 mechanism.
  3. Test Suite: A set of regression tests comparing the output of the new GGUF-Reader-v2 against the existing reader and the original llama.cpp backend to ensure accuracy.
  4. Documentation: comprehensive documentation on the architecture of the new GGUF-Reader-v2 and instructions for adding support for new GGML operations.

Skills required/preferred: C++, llama.cpp (GGUF/GGML), OpenVINO

Mentors: Mustafa Cavus, Ravi Panchumarthy

Size of project: 350 hours

Difficulty: Hard

16. Optimizing OpenVINO GPU Performance across Executorch and LiteRT frameworks through Vulkan Backend Comparative Analysis

Short description: The goal of this project is to benchmark and analyze Vulkan backend performance on Intel GPUs across multiple AI inference frameworks (Executorch and LiteRT), identify performance gaps compared to OpenVINO GPU backend, and provide actionable optimization recommendations to enhance OpenVINO's GPU performance. Currently, ExecuTorch and LiteRT’s Vulkan backend focuses only on Android with no benchmarking on Intel AI PCs. This project will establish comprehensive performance baselines for Vulkan backend on Intel hardware, conduct comparative analysis against OpenVINO GPU execution and deliver specific optimizations strategies for improving OpenVINO GPU backend based on insights from Vulkan’s performance across computer vision and LLM workloads.

Expected outcomes:

  • Ability to execute models using Vulkan backend on Intel AI PCs.
  • Establish baseline performance of Vulkan backend on various workloads.
  • Perform comparative performance analysis of Vulkan backend and OpenVINO backend on Intel integrated and discrete GPUs.
  • Identify any performance gaps between Vulkan backend and OpenVINO backend.
  • Propose and implement solutions to bridge the gap between Vulkan and OpenVINO backends.
  • Validate the solutions with proper test cases.
  • Enable fallback to Vulkan GPU backend when OpenVINO GPU backend has any missing operators.

Skills required/preferred: Python, C++, PyTorch, TensorFlow. Good to have: OpenVINO, Executorch, LiteRT, Vulkan SDK

Mentors: Surya Siddharth Pemmaraju, Anisha Dattatraya Kulkarni

Size of project: 350 hours

Difficulty: Medium to hard

17. Validation and Optimization of llama-server Configurations for the Llama.cpp OpenVINO Backend

Short description: Llama.cpp is an open-source C++ project that lets you run large language models locally on your own machine. Llama-server is a server application built on top of llama.cpp that exposes the model through an HTTP API, making it easy to use from other programs or services. The OpenVINO backend in llama.cpp allows llama.cpp to run models using Intel’s OpenVINO runtime, which provides optimized inference on Intel hardware (CPUs, GPUs, NPUs). While the Llama.cpp OpenVINO backend enables efficient inference on Intel CPUs and accelerators, not all llama.cpp-server configurations are fully supported or function correctly when OpenVINO is enabled. These gaps limit usability, and deployment flexibility with llama-server using OpenVINO backend. This project aims to systematically examine the available configuration options of llama.cpp-server, identify incompatibilities or failures when using the OpenVINO backend, and implement robust solutions to close those gaps.

Expected outcomes:

  • Perform a configuration coverage analysis to document all supported llama.cpp-server runtime and build-time options.
  • Classify configurations by functionality (e.g., model loading, batching, KV cache, threading, streaming, parallelism).
  • Test each configuration against the OpenVINO backend and identify configurations that fail at runtime or produce incorrect results.
  • Determine whether issues originate from backend feature gaps, model graph constraints, memory layout or tensor shape assumptions, KV cache or scheduling logic, API mismatches between llama.cpp and OpenVINO.
  • Propose and implement fixes for unsupported or broken configurations. Ensure solutions are consistent with llama.cpp design principles and OpenVINO best practices.
  • Validate fixes with functional and performance tests. Clearly document supported, unsupported, and conditionally supported configurations.

Skills required/preferred: C++. Good to have: OpenVINO Toolkit, Llama.cpp, LLM architectures and inference pipelines

Mentors: Mustafa Cavus, Zijun Yu

Size of project: 350 hours

Difficulty: Medium to hard

18. Agentic Toolkit for AI PC with OpenVINO

Short description: Modern Intel AI PCs open the possibility of running highly capable agents fully on device. This extends accessibility of agents to end users where connectivity, privacy, costs, and latency may be of concern. This project’s goal is to provide an easy-to-use Toolkit to bootstrap application developers build their agents to run locally on Intel AI PC, taking full advantage of platform compute devices (CPU, GPU, NPU). This Toolkit provides APIs for agent creation and management, enable tool use, model query/management, model serving, local data context management, and agent session management. Toolkit must be integrated with at least one popular editor (e.g. VSCode) and support C/C++ and Python.

Expected outcomes:

  • Provides ability to query/use existing agentic frameworks where OpenVINO backend is integrated (e.g. LangChain, LlamaIndex, etc)
  • Use OpenVINO libraries or OpenVINO-integrated libraries/frameworks – OpenVINO model server, OpenVINO GenAI, ORT/WinML OpenVINO EP, etc.
  • Sample agent(s) created with Toolkit must run in AI PC. E.g. agent for local data query and summarization with tool use extension.
  • Sample agent(s) created with Toolkit to interact with off-device agents through A2A, showcasing agent app extensions and integration of cloud edge experiences

Skills required/preferred: Python, Bash/PowerShell scripting, familiarity with REST APIs, familiarity with Agentic systems

Mentors: Freddy Chiu, Ravi Panchumarthy

Size of project: 350 hours

Difficulty: Medium to hard

19. OpenVINO Model Server: One-Command Installer & Developer Friendly CLI

Short description: OpenVINO Model Server (OVMS) is a high-performance inference serving solution widely used for deploying optimized deep learning models across CPU, GPU, and NPU. OVMS already provides powerful backend capabilities: HuggingFace model pulling, OpenAI-compatible APIs with streaming, runtime config management, and model add/remove/list. However, bare-metal installation requires multiple manual steps. This project adds a one-command bootstrap installer that automates installation across Ubuntu, RHEL, and Windows, and a Python CLI wrapper built on top of the existing OVMS binary and its CLI flags that unifies model interaction into “ovms run ” - downloading the model, starting the server, and opening an interactive terminal chat session. Additionally, it should provide ovms models subcommands for discovering and managing models, and an ovms init wizard for first-time users.

Expected outcomes: One-Command Installer (install-ovms.sh / install-ovms.ps1): Bootstrap script that detects OS/architecture, downloads the correct OVMS package, extracts it, installs dependencies, configures environment variables persistently, and verifies with a health check. Supports Ubuntu, RHEL, and Windows. ovms run : Single command that spawns the existing OVMS binary, polls readiness via /v2/health/ready, and starts a streaming chat REPL against /v3/chat/completions with slash commands (/help, /set system, /clear, /exit). ovms models list|pull|search|info|rm: Formatted model management CLI built on top of existing --list_models and --pull CLI flags, adding HuggingFace Hub search for model discovery. ovms init: Interactive first-run wizard for task selection, model choice, and device detection. Tests & documentation: Unit/integration tests for Linux and Windows, updated quick-start guide, installer packaged for hosting, CLI packaged as pip install ovms-cli.

Skills required/preferred: Python, Bash/PowerShell scripting, familiarity with REST APIs and SSE streaming, C++ (for understanding OVMS internals), familiarity with LLM serving concepts, HuggingFace ecosystem.

Mentors: Freddy Chiu, Ravi Panchumarthy

Size of project: 350 hours

Difficulty: Medium to hard

Clone this wiki locally