Skip to content

arya232006/DeVerify-backend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeVerify-backend — Hackathon project scanner & evaluator

This repository contains a backend-only scanner that:

  • Starts from a hackathon URL (gallery or event page).
  • Discovers project detail pages (gallery-first, with heuristics for sites like Devfolio).
  • Extracts GitHub repository links from each project page.
  • Fetches repository metadata: README, commit timeline and a file/folder tree.
  • Calls an LLM to evaluate the README and separately to evaluate repository folder organization.
  • Emits a clean JSON results file per run under results/.

The project is intended as a server-side tool (no frontend) for programmatic audit/triage of hackathon submissions.

Highlights / features

  • Playwright-based rendering for JS-heavy gallery pages (app/renderer.py).
  • Gallery-first discovery and site heuristics for Devpost/Devfolio (scripts/run_eval_from_gallery.py).
  • GitHub helpers to fetch README, list commits, and retrieve repo tree (app/github.py).
  • LLM adapters supporting Groq (OpenAI-compatible) and Hugging Face (app/llm.py).
  • Structure evaluation (folder-organization scoring) via LLM (evaluate_structure).
  • Output: per-repo JSON records with README and structure evaluations.

Requirements

  • Python 3.10 or newer
  • Install runtime deps:
python -m pip install -r requirements.txt

Configuration (important environment variables)

Create a .env (copy .env.example) and set the provider credentials you plan to use.

  • EVAL_PROVIDER (optional): groq or hf (helps selection). Defaults to automatic detection.
  • EVAL_API_URL (for Groq/OpenAI-compatible endpoints): e.g. https://api.groq.com/openai/v1/chat/completions.
  • EVAL_API_KEY or GROQ_API_KEY: API key for the eval provider.
  • EVAL_MODEL: model id, e.g. llama-3.1-8b-instant (Groq) or hf:google/flan-t5-small (HF prefix).
  • HF_API_TOKEN: Hugging Face inference token (if using HF).
  • GITHUB_TOKEN (optional): GitHub personal access token to increase API rate limits and access private repos you own.
  • RENDER_JS: set to 1 to enable Playwright rendering for pages that require JS.

Important notes about rate limits:

  • Provider rate limits (tokens-per-minute / requests) can cause structure_evaluation or evaluate_readme to return an error object in the JSON. If you see rate_limit_exceeded, either reduce parallelism, add retries/backoff, or upgrade your provider plan.
  • GitHub unauthenticated requests are also rate-limited; set GITHUB_TOKEN to increase limits.

Common usage (CLI scripts)

  1. Quick one-repo structure test (debug):
$env:PYTHONPATH='.'
$env:EVAL_API_URL='https://api.groq.com/openai/v1/chat/completions'
$env:EVAL_API_KEY='your_key_here'
python .\scripts\debug_structure_eval.py Ananya-R2004 E-Gram-Panchayat
  1. Run a gallery-first scan (Devfolio/Devpost style) and write clean JSON results:
$env:PYTHONPATH='.'
$env:RENDER_JS='1'          # enable Playwright rendering
$env:EVAL_API_URL='https://api.groq.com/openai/v1/chat/completions'
python .\scripts\run_eval_from_gallery.py "https://bruteforce.devfolio.co/overview" --mode full --max 50 --out results\my_run.json
  1. Test README-only evaluation for a single repo:
$env:PYTHONPATH='.'
$env:EVAL_API_KEY='your_key_here'
python .\scripts\run_one_eval.py
  1. Enrich saved results with commit-based metadata (human summary, last commit dates):
python .\scripts\enrich_results.py --input results\my_run.json

Output format

Each run writes a JSON array where each element is an object with fields similar to:

  • repo: canonical GitHub URL (https://github.com/{owner}/{repo})
  • raw_repo_url: original link discovered on the project page
  • readme_length: character count of the fetched README (0 if not found)
  • commits: object with exists, commit_count, pre_cutoff_commits (list), repo_created_at, error
  • evaluation: LLM JSON evaluating the README (see app/llm.evaluate_readme schema)
  • structure_evaluation: LLM JSON evaluating folder structure (see app/llm.evaluate_structure) or an error object if provider failed
  • repo_tree_meta: { count: number_of_paths, error: optional }
  • source_project: the project page URL where the repo was discovered

Example entry (abridged):

{
	"repo": "https://github.com/example/repo",
	"raw_repo_url": "https://github.com/example/repo/tree/main",
	"readme_length": 1200,
	"commits": { ... },
	"evaluation": { ... },
	"structure_evaluation": { ... },
	"repo_tree_meta": { "count": 120 }
}

Troubleshooting & recommendations

  • Rate limits from the eval provider (Groq/OpenAI/HF):

    • Symptoms: structure_evaluation contains an error object with code: "rate_limit_exceeded" and a message indicating TPM limits.
    • Fixes: throttle LLM calls, add retries/exponential backoff, reduce prompt size (fewer paths), or upgrade the provider plan.
  • GitHub 404s or missing READMEs:

    • Ensure canonicalize_github_url normalized URLs are used (the orchestrator already canonicalizes links).
    • Set GITHUB_TOKEN to improve rate limits and access private repos you own.
  • Playwright hangs/timeouts:

    • Set RENDER_JS=1 only when needed. Increase timeouts in the renderer if pages are slow.

Implementation notes (for maintainers)

  • app/renderer.py — Playwright helpers used to render pages and extract anchors.
  • app/scraper.py — gallery-first discovery and heuristics.
  • app/github.py — README fetch, commit listing, repo tree fetch, canonicalization utilities.
  • app/llm.py — evaluate_readme, evaluate_structure, extract_repos_from_text. Supports Groq (OpenAI-compatible endpoint) and Hugging Face inference paths.
  • scripts/run_eval_from_gallery.py — CLI orchestrator that discovers projects and evaluates them. Use --out to save a clean JSON file.

Next improvements (planned / suggested)

  • Add robust retry/backoff and rate-limit-aware throttling for LLM calls (already implemented in later branches).
  • Add CSV or summary export for quick human review.
  • Add unit tests for canonicalize_github_url and the GitHub helpers.
  • Add a lightweight local heuristic fallback for structure scoring to reduce LLM calls.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages