Skip to content

Issue: Inaccurate Open Source Contribution Detection #131

@akarsh-jain-790

Description

@akarsh-jain-790

Background

Recently, I was working on a tool to track and showcase my open source contributions as a widget for my GitHub profile. During development, I came across the hiring-agent project - really great tool for evaluating resumes! However, while testing it with my own GitHub profile, I discovered some significant issues with how the system detects and evaluates open source contributions.

Problem Description

The current GitHub integration has several issues and inaccurate evaluation of candidates' open source work:

1. Inaccurate Merged PR Detection 🚨

The system doesn't properly distinguish between merged PRs and closed (but not merged) PRs. This significantly underreports successful contributions.

Example: For my Zulip contributions:

  • Expected: 15 merged PRs out of 29 total
  • Actual: System reported only 1 merged PR

This is because the GitHub Search Issues API doesn't include the merged status in the response - it only returns state: "closed" for both merged and closed PRs.

2. Weak Open Source Classification

Current logic (github.py:191-192):

project_type = (
    "open_source" if contributor_count > 1 else "self_project"
)

This simplistic approach misses many legitimate open source contributions:

  • Doesn't account for fork contributions with significant work
  • Ignores community engagement indicators (stars, forks, topics)
  • Doesn't consider repository activity or maintenance status
  • Treats all single-contributor repos as "self projects" even if they're actively used open source projects

3. No External PR Analysis

The system only analyzes repositories owned by the user, completely missing the most important indicator of open source involvement: pull requests to repositories they don't own.

True open source contributors often:

  • Submit PRs to popular projects
  • Contribute to organizational repositories
  • Fix bugs in libraries they use
  • Add features to community projects

None of this is detected by the current system.

4. Missing PR Metadata

The system doesn't track:

  • PR merge status
  • Contributions to popular projects (1K+ stars)
  • Contributions to major projects (10K+ stars)
  • PR quality or impact
  • Contribution patterns over time

5. Undefined Variable Bug

File: github.py:240

elif response.status_code == 404:  # 'response' is undefined

This causes a runtime error when the GitHub API returns a 404 status.

Impact

These issues lead to:

  • Unfair evaluations: Candidates with strong open source profiles get low scores
  • False negatives: Merged PRs counted as unsuccessful contributions
  • Missed contributions: External PRs to popular projects completely ignored
  • Runtime errors: System crashes on API failures

Proposed Solution

I've developed a comprehensive fix that includes:

  1. Accurate PR Status Detection: Separate API queries for merged/open/closed PRs
  2. Enhanced Project Classification: Multi-criteria analysis for better open source detection
  3. External PR Analysis: Fetch and analyze PRs to repositories user doesn't own
  4. Quality Scoring: Assess contribution impact based on project popularity
  5. Bug Fixes: Fix undefined variable and other issues
  6. Better Insights: Show detailed contribution metrics with merge rates

Expected Outcome

After the fix:

  • ✅ Accurate merged PR detection (100% accuracy)
  • ✅ Proper recognition of external contributions
  • ✅ Fair evaluation of open source involvement
  • ✅ No runtime errors
  • ✅ Comprehensive contribution insights

I have a working implementation ready for PR that fully addresses these issues.
Would love to contribute this fix to help make hiring-agent even better! 🚀

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions