DeVerify-backend — Hackathon project scanner & evaluator

This repository contains a backend-only scanner that:

Starts from a hackathon URL (gallery or event page).
Discovers project detail pages (gallery-first, with heuristics for sites like Devfolio).
Extracts GitHub repository links from each project page.
Fetches repository metadata: README, commit timeline and a file/folder tree.
Calls an LLM to evaluate the README and separately to evaluate repository folder organization.
Emits a clean JSON results file per run under results/.

The project is intended as a server-side tool (no frontend) for programmatic audit/triage of hackathon submissions.

Highlights / features

Playwright-based rendering for JS-heavy gallery pages (app/renderer.py).
Gallery-first discovery and site heuristics for Devpost/Devfolio (scripts/run_eval_from_gallery.py).
GitHub helpers to fetch README, list commits, and retrieve repo tree (app/github.py).
LLM adapters supporting Groq (OpenAI-compatible) and Hugging Face (app/llm.py).
Structure evaluation (folder-organization scoring) via LLM (evaluate_structure).
Output: per-repo JSON records with README and structure evaluations.

Requirements

Python 3.10 or newer
Install runtime deps:

python -m pip install -r requirements.txt

Configuration (important environment variables)

Create a .env (copy .env.example) and set the provider credentials you plan to use.

EVAL_PROVIDER (optional): groq or hf (helps selection). Defaults to automatic detection.
EVAL_API_URL (for Groq/OpenAI-compatible endpoints): e.g. https://api.groq.com/openai/v1/chat/completions.
EVAL_API_KEY or GROQ_API_KEY: API key for the eval provider.
EVAL_MODEL: model id, e.g. llama-3.1-8b-instant (Groq) or hf:google/flan-t5-small (HF prefix).
HF_API_TOKEN: Hugging Face inference token (if using HF).
GITHUB_TOKEN (optional): GitHub personal access token to increase API rate limits and access private repos you own.
RENDER_JS: set to 1 to enable Playwright rendering for pages that require JS.

Important notes about rate limits:

Provider rate limits (tokens-per-minute / requests) can cause structure_evaluation or evaluate_readme to return an error object in the JSON. If you see rate_limit_exceeded, either reduce parallelism, add retries/backoff, or upgrade your provider plan.
GitHub unauthenticated requests are also rate-limited; set GITHUB_TOKEN to increase limits.

Common usage (CLI scripts)

Quick one-repo structure test (debug):

$env:PYTHONPATH='.'
$env:EVAL_API_URL='https://api.groq.com/openai/v1/chat/completions'
$env:EVAL_API_KEY='your_key_here'
python .\scripts\debug_structure_eval.py Ananya-R2004 E-Gram-Panchayat

Run a gallery-first scan (Devfolio/Devpost style) and write clean JSON results:

$env:PYTHONPATH='.'
$env:RENDER_JS='1'          # enable Playwright rendering
$env:EVAL_API_URL='https://api.groq.com/openai/v1/chat/completions'
python .\scripts\run_eval_from_gallery.py "https://bruteforce.devfolio.co/overview" --mode full --max 50 --out results\my_run.json

Test README-only evaluation for a single repo:

$env:PYTHONPATH='.'
$env:EVAL_API_KEY='your_key_here'
python .\scripts\run_one_eval.py

Enrich saved results with commit-based metadata (human summary, last commit dates):

python .\scripts\enrich_results.py --input results\my_run.json

Output format

Each run writes a JSON array where each element is an object with fields similar to:

repo: canonical GitHub URL (https://github.com/{owner}/{repo})
raw_repo_url: original link discovered on the project page
readme_length: character count of the fetched README (0 if not found)
commits: object with exists, commit_count, pre_cutoff_commits (list), repo_created_at, error
evaluation: LLM JSON evaluating the README (see app/llm.evaluate_readme schema)
structure_evaluation: LLM JSON evaluating folder structure (see app/llm.evaluate_structure) or an error object if provider failed
repo_tree_meta: { count: number_of_paths, error: optional }
source_project: the project page URL where the repo was discovered

Example entry (abridged):

{
	"repo": "https://github.com/example/repo",
	"raw_repo_url": "https://github.com/example/repo/tree/main",
	"readme_length": 1200,
	"commits": { ... },
	"evaluation": { ... },
	"structure_evaluation": { ... },
	"repo_tree_meta": { "count": 120 }
}

Troubleshooting & recommendations

Rate limits from the eval provider (Groq/OpenAI/HF):
- Symptoms: structure_evaluation contains an error object with code: "rate_limit_exceeded" and a message indicating TPM limits.
- Fixes: throttle LLM calls, add retries/exponential backoff, reduce prompt size (fewer paths), or upgrade the provider plan.
GitHub 404s or missing READMEs:
- Ensure canonicalize_github_url normalized URLs are used (the orchestrator already canonicalizes links).
- Set GITHUB_TOKEN to improve rate limits and access private repos you own.
Playwright hangs/timeouts:
- Set RENDER_JS=1 only when needed. Increase timeouts in the renderer if pages are slow.

Implementation notes (for maintainers)

app/renderer.py — Playwright helpers used to render pages and extract anchors.
app/scraper.py — gallery-first discovery and heuristics.
app/github.py — README fetch, commit listing, repo tree fetch, canonicalization utilities.
app/llm.py — evaluate_readme, evaluate_structure, extract_repos_from_text. Supports Groq (OpenAI-compatible endpoint) and Hugging Face inference paths.
scripts/run_eval_from_gallery.py — CLI orchestrator that discovers projects and evaluates them. Use --out to save a clean JSON file.

Next improvements (planned / suggested)

Add robust retry/backoff and rate-limit-aware throttling for LLM calls (already implemented in later branches).
Add CSV or summary export for quick human review.
Add unit tests for canonicalize_github_url and the GitHub helpers.
Add a lightweight local heuristic fallback for structure scoring to reduce LLM calls.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
app		app
results		results
scripts		scripts
tests		tests
.gitignore		.gitignore
ENVIRONMENT.md		ENVIRONMENT.md
README.md		README.md
output_full_run.json		output_full_run.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeVerify-backend — Hackathon project scanner & evaluator

Highlights / features

Requirements

Configuration (important environment variables)

Common usage (CLI scripts)

Output format

Troubleshooting & recommendations

Implementation notes (for maintainers)

Next improvements (planned / suggested)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DeVerify-backend — Hackathon project scanner & evaluator

Highlights / features

Requirements

Configuration (important environment variables)

Common usage (CLI scripts)

Output format

Troubleshooting & recommendations

Implementation notes (for maintainers)

Next improvements (planned / suggested)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages