Skip to content

Latest commit

 

History

History
8 lines (6 loc) · 370 Bytes

File metadata and controls

8 lines (6 loc) · 370 Bytes

VQA information

Progress Report

\textbf{Scene Understanding via VQA}
We implemented \texttt{/vision/vqa}, a service that enables open-ended scene understanding using the Vision-Language Model \texttt{qwen3-vl:8b} running locally via Ollama. The service accepts a natural-language prompt and returns the model’s response based on the live RGB camera feed.