👧🏼 Clara - Your Agentic AI Assistant

Clara is a real-time multimodal AI assistant capable of seeing, listening, understanding, and speaking back. It combines voice recognition, computer vision, large language models (Gemini), and real-time speech synthesis into a seamless and natural human-AI interaction system.

🌎 Features

🎤 Voice Input - Continuous voice capture using speech_recognition + Groq Whisper.
📷 Computer Vision - Analyzes live webcam feed with OpenCV and Gemini tools.
🧠 AI Reasoning - Gemini-powered LangGraph agent that chooses when to see or speak.
🎧 Voice Output - Real-time TTS with ElevenLabs and pydub.
💬 Gradio Interface - Live chatbot window + webcam + voice loop.

⚙️ Technologies Used

LangChain + LangGraph + Gemini
OpenCV for webcam streaming and vision
speech_recognition + Groq Whisper for transcription
ElevenLabs API + pydub for high-quality speech synthesis
Gradio for user interface
dotenv, pydub, subprocess, asyncio, ffmpeg

📂 Folder Structure

Agentic-Assistant-Clara/
├── main.py                   # Gradio UI & orchestrator
├── ai_agent.py              # LangGraph agent + Gemini config
├── tools.py                 # Image analysis tool
├── speech_to_text.py        # Voice capture + transcription
├── text_to_speech.py        # ElevenLabs TTS + playback
├── .env                     # API keys and config
├── requirements.txt         # All dependencies

⚡ How It Works

User speaks → audio is captured + transcribed via Groq Whisper.
LangGraph Gemini agent decides whether to answer via text, or call the vision tool.
If needed, the agent analyzes the current webcam frame using OpenCV and a custom image analysis tool.
The response is converted to speech using ElevenLabs and played back to the user.
All interactions are shown in a chat-style window (Gradio).

Setup Instructions

Clone the repo

git clone https://github.com/AbhaySingh71/Agentic-Assistant-Clara

Install dependencies

pip install -r requirements.txt

Create a .env file with your keys:

GROQ_API_KEY=your-groq-api-key
ELEVENLABS_API_KEY=your-elevenlabs-key
GOOGLE_API_KEY=your-gemini-api-key

Launch the app

python main.py

💡 Example Questions to Ask

"What’s behind me in the frame?"
"Do I look sleepy today?"
"What is the capital of Italy?"
"Describe what you see through the webcam."
"How many people are visible in the camera?"

📈 Potential Use Cases

AI-based visual companions
Visually-aware voice agents for the disabled
Educational AI for children ("Dora AI")
Live event narrators

🚀 Future Improvements

Browser-based deployment (Streamlit, HuggingFace Spaces)
Better face & emotion analysis
Multi-language voice support
Memory retention for contextual conversations

✍️ Author

Made with passion by Abhay Singh
GitHub | LinkedIn | Email

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
__pycache__		__pycache__
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ai_agent.py		ai_agent.py
final.mp3		final.mp3
final11.mp3		final11.mp3
main.py		main.py
requirements.txt		requirements.txt
speech_to_text.py		speech_to_text.py
text_to_speech.py		text_to_speech.py
tools.py		tools.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

👧🏼 Clara - Your Agentic AI Assistant

🌎 Features

⚙️ Technologies Used

📂 Folder Structure

⚡ How It Works

Setup Instructions

💡 Example Questions to Ask

📈 Potential Use Cases

🚀 Future Improvements

✍️ Author

About

Uh oh!

Releases

Packages

Languages

License

AbhaySingh71/Multimodal-Agentic-Assistant-Clara

Folders and files

Latest commit

History

Repository files navigation

👧🏼 Clara - Your Agentic AI Assistant

🌎 Features

⚙️ Technologies Used

📂 Folder Structure

⚡ How It Works

Setup Instructions

💡 Example Questions to Ask

📈 Potential Use Cases

🚀 Future Improvements

✍️ Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages