Skip to content

Fix fork contribution detection in GitHub analysis #149

@Mohd-Mursaleen

Description

@Mohd-Mursaleen

Issue Description

Problem

The current GitHub analysis has a major flaw in detecting open source contributions. Here's what's happening:

The Standard Open Source Workflow:

  1. Developer finds a project they want to contribute to (e.g., React, Django, TensorFlow)
  2. Developer forks the repository to their account
  3. Developer makes changes in their fork
  4. Developer creates a Pull Request back to the original repository
  5. PR gets merged - the contribution is now part of the original project

What the Current Code Does Wrong:
The system looks at the developer's repositories and finds their fork. But then it checks:

if repo.get("fork") and repo.get("forks_count", 0) < 5:
    continue  # SKIPS the entire repository

Why This is Broken:

  • The developer's fork typically has 0 forks (nobody forks a fork)
  • The code skips it entirely, thinking it's not important
  • But this fork represents real contributions to major open source projects!
  • The actual contribution is in the original repository, not the fork

Real Example:

  • Developer contributes to React (86k stars, 18k forks)
  • Their fork: developer/react has 0 forks
  • Current system: "Skip this, it's not significant"
  • Reality: This represents major open source contribution

Expected Behavior

The system should:

  1. Recognize when a repository is a fork
  2. Check if the developer actually contributed to the original repository
  3. If yes, count it as legitimate open source work
  4. Use the original repository's significance (stars/forks) for scoring

Current Impact

This bug means the system:

  • Completely misses contributions to major open source projects
  • Undervalues developers who follow proper open source workflow
  • Gives unfair scores by ignoring the most important type of contributions
  • Rewards people who create personal repos with fake collaborators over real open source contributors

Proposed Solution

Instead of skipping forks, the system should:

  1. Detect when a repository is a fork
  2. Get the original repository information
  3. Check the original repository's contributors list
  4. If the developer appears in the contributors list, count it as open source contribution
  5. Use the original repository's stats (stars, forks) for significance scoring

This way, a developer who contributed to React gets credit for contributing to a project with 86k stars, not penalized for having a fork with 0 forks.


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions