-
Notifications
You must be signed in to change notification settings - Fork 137
Description
Bug Description
In development mode, the hiring agent caches resume extraction results to avoid re-processing the same PDF. However, if the initial extraction fails or produces invalid data, these corrupted cache files persist and cause all subsequent runs to return null resume data, even after the underlying issues are fixed.
Expected Behavior
- If cache files contain valid resume data, use them to speed up processing
- If cache files are corrupted/invalid, automatically detect this and reprocess the PDF
- Users should get valid resume data regardless of cache state
Actual Behavior
- System loads corrupted cache files with all null values
- Resume data remains null even after fixing environment/configuration issues
- No automatic detection or recovery from stale cache
Reproduction Steps
-
Setup: Ensure
DEVELOPMENT_MODE = True
inconfig.py
-
Create corrupted cache: Run the system with invalid configuration (e.g., wrong LLM settings) to generate a failed cache file
-
Fix configuration: Update environment variables, install missing dependencies, etc.
-
Run system: Execute
python score.py data/resume.pdf
-
Observe: Resume data shows as null despite valid configuration
Root Cause Analysis
The issue occurs in score.py
around lines 210-213:
if DEVELOPMENT_MODE and os.path.exists(cache_filename):
print(f"Loading cached data from {cache_filename}")
cached_data = json.loads(Path(cache_filename).read_text())
resume_data = JSONResume(**cached_data)
The system checks if the cache file exists but doesn't validate if the cached data is actually valid/useful.
Impact
- Severity: High - Completely blocks resume processing in development mode
- User Experience: Confusing - users may think their PDF is corrupted or system is broken
- Debugging Difficulty: Hard to diagnose without inspecting cache files manually
Example of Corrupted Cache File
{
"basics": null,
"work": null,
"volunteer": null,
"education": null,
"awards": null,
"certificates": null,
"publications": null,
"skills": null,
"languages": null,
"interests": null,
"references": null,
"projects": null
}
Proposed Solution
- Cache Validation: Add validation logic to detect corrupted/empty cache files
- Automatic Recovery: If cache is invalid, automatically delete it and reprocess
- CLI Options: Add command-line options to manage cache (clear, validate, force refresh)
- Better Logging: Improve cache-related log messages for better debugging
Environment
- OS: macOS
- Python: 3.x
- Development Mode: Enabled
- LLM Provider: Google Gemini
Workaround
Manually delete cache files: rm cache/resumecache_*.json cache/githubcache_*.json