SOVA (Sight Over Voice Ally)

SOVA is a smart, assistive wearable designed to empower blind and visually impaired individuals by transforming visual information into audible cues. By leveraging computer vision, OCR, text-to-speech technologies, and an integrated hardware system on a Raspberry Pi 4 platform, SOVA provides real-time situational awareness and independent navigation.

Overview

SOVA (Sight Over Voice Ally) bridges the accessibility gap for blind and visually impaired (BVI) individuals. The system integrates a webcam-equipped smart glasses setup with a Raspberry Pi 4 to capture, process, and interpret visual data in real time. It then converts recognized text and objects into spoken audio, allowing users to navigate public spaces safely and independently.

Abstract

Visual impairment affects millions globally, with over 285 million people facing challenges in daily life due to reduced or lost sight. SOVA addresses these challenges by combining:

Real-Time Image Processing: Capturing live images through smart glasses.
Object & Text Recognition: Using computer vision and OCR (Tesseract) to interpret surroundings.
Audio Feedback: Converting recognized information into speech using a text-to-speech API.

This innovative approach enables BVI individuals to access crucial environmental information on the go.

Features

Real-Time Object and Text Recognition:
Processes live video streams to detect and identify objects and printed text.
Audio Feedback:
Converts visual cues into speech, allowing users to understand their surroundings without needing to see.
Wearable and Portable:
Compact design integrated into smart glasses for comfortable, continuous use.
Cost-Effective and Open-Source:
Built on affordable hardware and open-source software (OpenCV, Tesseract, mimic API).
Raspberry Pi 4 Integration:
Provides robust processing power for real-time data analysis.

Figure: Smart Glasses built with 3D printer to help the blind in their daily life.
System Architecture

SOVA comprises two primary subsystems:

Hardware Components:

Smart Glasses:
Equipped with an integrated webcam to capture the user's environment.

Raspberry Pi 4:
Acts as the central processing unit for image capture and processing.

Power, Mic, Headphone, and Connectivity Modules:

Mic: Captures user voice commands.

Headphone: Delivers audio feedback.

Camera & Flash: Capture real-time photos and video under various lighting conditions.

Push Buttons & Ultrasonic Sensor: Facilitate mode switching and measure distances.

Software Pipeline:

Image Acquisition & Preprocessing:
Captures images via the webcam and enhances them for analysis.

Computer Vision & OCR:
Uses OpenCV and Tesseract to detect objects and extract text.

Audio Synthesis:
Converts recognized text into speech via a text-to-speech engine.

Software Components

Machine Learning and Computer Vision

SOVA implements machine learning and computer vision techniques to boost recognition accuracy. Key aspects include:

Object Detection:
Identifies obstacles and objects in real time.

Image Preprocessing and Annotation:
Enhances images for optimal processing.

Evaluation Metrics:
Uses tools like confusion matrices and mean average precision (mAP) for performance evaluation.

Figure: Comparison of AI, Machine Learning, and Deep Learning methods.

Figure: Overview of computer vision tasks for environmental analysis.

OCR and Text-to-Speech

Optical Character Recognition (OCR):
Tesseract extracts text from captured images.

Text-to-Speech (TTS):
A TTS engine (e.g., mimic API) converts text into audible speech.

Figure: Process of annotating images for OCR training.

Integration and Operation Modes

The integration script coordinates various hardware components and models to operate in two primary modes: Power-Saving Mode and SOVA Mode. This ensures efficient performance while providing comprehensive assistance.

Hardware Components Utilized

Raspberry Pi 4: Real-time processing and model execution.

Microphone: Continuously captures voice commands.

Headphone: Provides clear audio feedback.

Camera and Flash: Capture high-quality images and videos in all lighting conditions.

Push Buttons: Allow mode switching from power-saving to full SOVA operation.

Ultrasonic Sensor: Measures distances in power-saving mode to inform the user.

Operation Modes

Power-Saving Mode

Purpose:
Conserve power and reduce CPU load.

Functionality:

The ultrasonic sensor measures distances at regular intervals.

Provides periodic audio feedback about nearby objects.

Monitors a push button; when pressed, switches to SOVA mode.

SOVA Mode

Purpose:
Run comprehensive assistance models based on voice commands.

Functionality:

Voice Command Detection:
Uses an offline voice recognition model (Vosk) running in the background.

Model Activation:
Depending on voice commands:

Face Recognition:

Announces known faces (e.g., friend Mohamed) or prompts to add unknown faces.

Color Detection:

Identifies and announces colors when requested.

Wallet & Keys Detection:

Guides the user to locate keys or wallet.

OCR:

Reads out text when commanded.

The script (see main.py) listens for voice commands, processes audio in real time, and launches the corresponding models via subprocesses. It also manages mode switching through push button events, ensuring a balance between energy efficiency and full functionality.

Yocto-based Linux Image

To optimize performance and reduce resource usage, we built a custom Linux image using the Yocto Project. Unlike the standard GUI-based Raspberry Pi image, our Yocto image is lightweight and tailored specifically for SOVA, providing several benefits:

Optimized Performance:
A minimal Linux distribution that loads faster and uses fewer system resources.

Reduced Footprint:
By including only the essential components, the image occupies less storage space and reduces memory usage.

Customization:
The Yocto Project's layered architecture (using OpenEmbedded and BitBake) allowed us to create a custom distribution that meets the specific needs of SOVA, including better power management and streamlined application integration.

This custom Linux image enhances the overall responsiveness of the system and extends battery life, ensuring that SOVA operates efficiently in real-time.

Usage

Navigation Assistance:
In power-saving mode, the ultrasonic sensor measures distances and provides periodic audio feedback about nearby objects.

Advanced Assistance:
In SOVA mode, the voice recognition system continuously listens for commands. For example:

"Who is here?" triggers the face recognition model.

"What is this color?" triggers the color detection model.

"Where are my keys?" or "Where is my wallet?" triggers the corresponding detection model.

"Can you read this?" activates the OCR model to read text aloud.

Unless it's all real-time analysis (offline) the user can ask any question he want and the required model will run, not 1 specific question for every model.

Interactive Model Training:
If an unknown face is detected, the system can prompt the user to add the new face by capturing headshots and training the recognition model.

Check The project video:

Results and Performance Evaluation

Performance evaluation includes:

Confusion Matrix:
Displays the accuracy of object and text recognition modules.

Mean Average Precision (mAP):
Assesses the accuracy of object detection within the user’s field of view.

These metrics ensure that SOVA delivers reliable assistance under various conditions.

Future Work

Planned enhancements include:

Advanced Object Recognition:
Integration of more sophisticated deep learning models for improved accuracy.

Enhanced Audio Feedback:
Customization of voice modulation and support for multiple languages.

Battery Optimization:
Extending operational time through further power management improvements.

User-Centered Design:
Refining ergonomics and interface based on feedback from BVI users.

Acknowledgements

This project is dedicated to the blind community, whose resilience inspires us daily. Special thanks to:

Dr. Fatma Mazen and Dr. Sara Ashry for their invaluable supervision.

Friends and colleagues (Merihan Shaban, Maysson Khalaf) for their contributions.

Eng. Ramy Adel and Eng. Mazen Osama for technical guidance and support.

Authors and Supervisors

Authors:

Hosam Ayoub Bayoumi

Hesham Yasser Ahmed

Shehab Emad Abd-ElTawwab

Amr Hosam Yassin

Supervisor:

Dr. Fatma Mazen

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
ColorDetection		ColorDetection
FaceRecognition		FaceRecognition
KeysDetection		KeysDetection
MainVoiceCommands		MainVoiceCommands
OCR		OCR
WalletDetection		WalletDetection
Yocto/Backup/conf		Yocto/Backup/conf
images		images
mimic1		mimic1
vosk-model-small-en-us-0.15		vosk-model-small-en-us-0.15
README.md		README.md
core-image-base-raspberrypi3-2.wic.bz2		core-image-base-raspberrypi3-2.wic.bz2
core-image-minimal-qemux86-64-20230218181726.rootfs.tar.bz2		core-image-minimal-qemux86-64-20230218181726.rootfs.tar.bz2
main.py		main.py
script.sh		script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SOVA (Sight Over Voice Ally)

Table of Contents

Overview

Abstract

Features

System Architecture

Software Components

Machine Learning and Computer Vision

OCR and Text-to-Speech

Integration and Operation Modes

Hardware Components Utilized

Operation Modes

Power-Saving Mode

SOVA Mode

Yocto-based Linux Image

Usage

Results and Performance Evaluation

Future Work

Acknowledgements

Authors and Supervisors

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SOVA (Sight Over Voice Ally)

Table of Contents

Overview

Abstract

Features

System Architecture

Software Components

Machine Learning and Computer Vision

OCR and Text-to-Speech

Integration and Operation Modes

Hardware Components Utilized

Operation Modes

Power-Saving Mode

SOVA Mode

Yocto-based Linux Image

Usage

Results and Performance Evaluation

Future Work

Acknowledgements

Authors and Supervisors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages