Skip to content

Govinda-Fichtner/capstone_eda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Predicting Developer Productivity Outcomes from AI Coding Assistants

Berkeley Professional Certificate in Machine Learning and Artificial Intelligence

Capstone — Module 20.1: Initial Report and EDA


Problem Overview

Can we predict which developers will experience actual productivity gains from AI coding assistants based on observable characteristics, and what factors explain the significant gap between perceived and measured productivity improvements?

Recent research reveals a striking paradox: experienced developers using AI tools complete tasks 19% slower while believing they are 20% faster—a 40-percentage-point perception-reality gap. Meanwhile, industry surveys show 90% adoption and over 80% of developers perceiving productivity increases. This project will use machine learning to identify the developer profiles, organizational contexts, and usage patterns that predict genuine productivity outcomes, enabling organizations to optimize their AI tool investments and adoption strategies.


Datasets

Data Source Key Variables Purpose
Stack Overflow Developer Survey 2025 (~49,000 respondents) https://www.kaggle.com/datasets/edoardogalli/stack-overflow-annual-developer-survey-2025 AI tools used, perceived productivity, trust ratings, frustration indicators, experience level, languages, job role, company size Primary dataset for modeling perception vs. outcome proxies
METR 2025 RCT (246 tasks, 16 developers) Predicted vs. actual completion times, AI treatment assignment, task familiarity Ground truth for the perception-reality gap

Proposed Techniques

1. Logistic Regression with Regularization (L1/L2)

  • Classify developers into "likely to benefit" vs. "unlikely to benefit" categories based on survey-reported outcomes and observable characteristics
  • Use regularization to handle high-dimensional feature space and identify the most predictive factors
  • Interpretable coefficients support clear business recommendations

2. Random Forest / Gradient Boosting Classification

  • Feature importance analysis to rank which developer and organizational characteristics most strongly predict the perception-reality gap
  • Handle non-linear relationships and interactions between variables (e.g., experience level × codebase complexity)
  • Compare performance against regularized logistic regression baseline

3. K-Means / Hierarchical Clustering

  • Segment developers into distinct personas based on AI usage patterns, experience profiles, and reported outcomes
  • Identify natural groupings that may reveal "AI power users" vs. "AI-slowed developers" vs. "neutral" archetypes
  • Support targeted adoption recommendations for different developer segments

Expected Business Value

The analysis will produce actionable recommendations for technology leaders:

  • Targeting: Which developer profiles should be prioritized for AI tool licenses?
  • ROI Estimation: What productivity impact should organizations realistically expect by developer segment?
  • Adoption Strategy: What organizational factors (training, platform maturity, team practices) amplify or diminish AI tool effectiveness?
  • Risk Mitigation: Which contexts show negative productivity impact, suggesting caution before widespread rollout?

Key Findings (Module 20.1 — EDA & Baseline)

1. The Perception-Reality Gap Is Confirmed

METR RCT analysis shows developers consistently overestimate AI benefits. The average developer predicted ~24% speedup but experienced negative actual speedup across most task types and familiarity levels.

2. AI Adoption Is Near-Universal but Trust Lags

  • ~49% of active developers use AI tools daily
  • Only ~33% express trust in AI output accuracy
  • Senior developers (20+ years) show higher rates of non-adoption and unfavorable sentiment

3. Baseline Model Performance

Logistic Regression with L2 regularization and balanced class weights:

Metric Score
F1-Score (test) 0.508
ROC-AUC (test) 0.858
5-Fold CV F1 0.520 +/- 0.007

The model achieves 84% recall on the positive class (genuine AI impact) with 36% precision, indicating good discrimination but room for improvement in reducing false positives.

4. Top Predictive Features (Logistic Regression Coefficients)

Feature Coefficient Direction
AI Sentiment (encoded) +1.30 Favorable sentiment predicts impact
Daily AI Usage +0.82 Frequent use predicts impact
AI Trust (encoded) +0.36 Higher trust predicts impact
Uses Cursor IDE +0.22 AI-native tooling predicts impact
Lost Confidence +0.19 Surprising positive association
Years Coding -0.17 More experience slightly reduces predicted impact

Notebook

The full analysis is in capstone_eda_baseline.ipynb, containing:

  • Data loading, inspection, and cleaning (27,852 active developers retained)
  • Feature engineering (58 encoded features + 10 model features)
  • 14 visualizations across METR RCT and SO datasets
  • Logistic Regression baseline with cross-validation

Next Steps (Module 24)

  • Random Forest and Gradient Boosting for comparison
  • K-Means clustering for developer persona segmentation
  • Hyperparameter tuning with GridSearchCV

Repository Structure

capstone-eda/
├── README.md
├── capstone_eda_baseline.ipynb    # Main analysis notebook
└── data/
    ├── Stack-Overflow-2025.zip    # SO 2025 survey data
    └── metr_data_complete.csv     # METR RCT data

Key Research References

  • METR (2025). "Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity"
  • Stack Overflow (2025). "2025 Developer Survey"
  • GitClear (2025). "AI Copilot Code Quality Research 2025"
  • Pragmatic Engineer (2025). "Cursor makes developers less effective?"

About

Capstone EDA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors