Berkeley Professional Certificate in Machine Learning and Artificial Intelligence
Capstone — Module 20.1: Initial Report and EDA
Can we predict which developers will experience actual productivity gains from AI coding assistants based on observable characteristics, and what factors explain the significant gap between perceived and measured productivity improvements?
Recent research reveals a striking paradox: experienced developers using AI tools complete tasks 19% slower while believing they are 20% faster—a 40-percentage-point perception-reality gap. Meanwhile, industry surveys show 90% adoption and over 80% of developers perceiving productivity increases. This project will use machine learning to identify the developer profiles, organizational contexts, and usage patterns that predict genuine productivity outcomes, enabling organizations to optimize their AI tool investments and adoption strategies.
| Data Source | Key Variables | Purpose |
|---|---|---|
| Stack Overflow Developer Survey 2025 (~49,000 respondents) https://www.kaggle.com/datasets/edoardogalli/stack-overflow-annual-developer-survey-2025 | AI tools used, perceived productivity, trust ratings, frustration indicators, experience level, languages, job role, company size | Primary dataset for modeling perception vs. outcome proxies |
| METR 2025 RCT (246 tasks, 16 developers) | Predicted vs. actual completion times, AI treatment assignment, task familiarity | Ground truth for the perception-reality gap |
- Classify developers into "likely to benefit" vs. "unlikely to benefit" categories based on survey-reported outcomes and observable characteristics
- Use regularization to handle high-dimensional feature space and identify the most predictive factors
- Interpretable coefficients support clear business recommendations
- Feature importance analysis to rank which developer and organizational characteristics most strongly predict the perception-reality gap
- Handle non-linear relationships and interactions between variables (e.g., experience level × codebase complexity)
- Compare performance against regularized logistic regression baseline
- Segment developers into distinct personas based on AI usage patterns, experience profiles, and reported outcomes
- Identify natural groupings that may reveal "AI power users" vs. "AI-slowed developers" vs. "neutral" archetypes
- Support targeted adoption recommendations for different developer segments
The analysis will produce actionable recommendations for technology leaders:
- Targeting: Which developer profiles should be prioritized for AI tool licenses?
- ROI Estimation: What productivity impact should organizations realistically expect by developer segment?
- Adoption Strategy: What organizational factors (training, platform maturity, team practices) amplify or diminish AI tool effectiveness?
- Risk Mitigation: Which contexts show negative productivity impact, suggesting caution before widespread rollout?
METR RCT analysis shows developers consistently overestimate AI benefits. The average developer predicted ~24% speedup but experienced negative actual speedup across most task types and familiarity levels.
- ~49% of active developers use AI tools daily
- Only ~33% express trust in AI output accuracy
- Senior developers (20+ years) show higher rates of non-adoption and unfavorable sentiment
Logistic Regression with L2 regularization and balanced class weights:
| Metric | Score |
|---|---|
| F1-Score (test) | 0.508 |
| ROC-AUC (test) | 0.858 |
| 5-Fold CV F1 | 0.520 +/- 0.007 |
The model achieves 84% recall on the positive class (genuine AI impact) with 36% precision, indicating good discrimination but room for improvement in reducing false positives.
| Feature | Coefficient | Direction |
|---|---|---|
| AI Sentiment (encoded) | +1.30 | Favorable sentiment predicts impact |
| Daily AI Usage | +0.82 | Frequent use predicts impact |
| AI Trust (encoded) | +0.36 | Higher trust predicts impact |
| Uses Cursor IDE | +0.22 | AI-native tooling predicts impact |
| Lost Confidence | +0.19 | Surprising positive association |
| Years Coding | -0.17 | More experience slightly reduces predicted impact |
The full analysis is in capstone_eda_baseline.ipynb, containing:
- Data loading, inspection, and cleaning (27,852 active developers retained)
- Feature engineering (58 encoded features + 10 model features)
- 14 visualizations across METR RCT and SO datasets
- Logistic Regression baseline with cross-validation
- Random Forest and Gradient Boosting for comparison
- K-Means clustering for developer persona segmentation
- Hyperparameter tuning with GridSearchCV
capstone-eda/
├── README.md
├── capstone_eda_baseline.ipynb # Main analysis notebook
└── data/
├── Stack-Overflow-2025.zip # SO 2025 survey data
└── metr_data_complete.csv # METR RCT data
- METR (2025). "Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity"
- Stack Overflow (2025). "2025 Developer Survey"
- GitClear (2025). "AI Copilot Code Quality Research 2025"
- Pragmatic Engineer (2025). "Cursor makes developers less effective?"