Skip to content

Refactor codebase with improved error handling, validation, and testing#63

Open
GodsBoy wants to merge 1 commit intoapurvsinghgautam:mainfrom
GodsBoy:overall-improvements
Open

Refactor codebase with improved error handling, validation, and testing#63
GodsBoy wants to merge 1 commit intoapurvsinghgautam:mainfrom
GodsBoy:overall-improvements

Conversation

@GodsBoy
Copy link

@GodsBoy GodsBoy commented Nov 27, 2025

New Files

  • constants.py - Centralized configuration eliminating code duplication
  • tests/ - Unit test suite with pytest (constants, search, scrape, CLI)
  • pytest.ini - Test configuration

Key Improvements

Security & Validation

  • Added input validation for CLI parameters (query length, thread count)
  • Path traversal protection for output file paths
  • Removed blanket warning suppression

Error Handling

  • Replaced bare except: clauses with specific exceptions (Timeout, ConnectionError, RequestException)
  • Added proper error messages and graceful degradation

Code Quality

  • Added type hints to all functions
  • Comprehensive logging throughout codebase
  • Removed duplicate USER_AGENTS and Tor config (now in constants.py)
  • Removed unused function parameters in scrape.py
  • Fixed duplicate import re in llm.py

Configuration

  • Externalized Tor settings to environment variables
  • Added configurable timeouts, worker counts, and limits
  • Ollama model discovery now cached (5-minute TTL)

Dependencies

  • Pinned version ranges in requirements.txt
  • Added pytest for testing

Environment Variables Now Supported

  • TOR_PROXY_HOST, TOR_PROXY_PORT, TOR_CONTROL_PORT
  • ROBIN_SEARCH_TIMEOUT, ROBIN_SCRAPE_TIMEOUT
  • ROBIN_MAX_SCRAPE_CHARS, ROBIN_MAX_WORKERS

## New Files
- `constants.py` - Centralized configuration eliminating code duplication
- `tests/` - Unit test suite with pytest (constants, search, scrape, CLI)
- `pytest.ini` - Test configuration

## Key Improvements

### Security & Validation
- Added input validation for CLI parameters (query length, thread count)
- Path traversal protection for output file paths
- Removed blanket warning suppression

### Error Handling
- Replaced bare `except:` clauses with specific exceptions (Timeout, ConnectionError, RequestException)
- Added proper error messages and graceful degradation

### Code Quality
- Added type hints to all functions
- Comprehensive logging throughout codebase
- Removed duplicate USER_AGENTS and Tor config (now in constants.py)
- Removed unused function parameters in scrape.py
- Fixed duplicate `import re` in llm.py

### Configuration
- Externalized Tor settings to environment variables
- Added configurable timeouts, worker counts, and limits
- Ollama model discovery now cached (5-minute TTL)

### Dependencies
- Pinned version ranges in requirements.txt
- Added pytest for testing

## Environment Variables Now Supported
- TOR_PROXY_HOST, TOR_PROXY_PORT, TOR_CONTROL_PORT
- ROBIN_SEARCH_TIMEOUT, ROBIN_SCRAPE_TIMEOUT
- ROBIN_MAX_SCRAPE_CHARS, ROBIN_MAX_WORKERS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant