Adding something like weight_col (and possibly weight_type) to allow per-observation weights on the regression estimand, distinct from the frequency weights duckreg uses for compression.
Point estimates are the same regardless of weight interpretation: get SUM(w), SUM(w * y), and SUM(w * y^2) per stratum. But it affects analytic SEs:
- Correcting for heteroskedasticity: weights are inversely proportional to conditional outcome variance, sandwich has w_i e_i^2 that you can get from SUM(w * y^2), and DoF correction uses total weight.
- Targeting a weighted estimand: weights define the average you want and/or correct for non-representative sampling, sandwich has w_i^2 e_i^2 so you need additional statistics (SUM(w^2), SUM(w^2 * y), SUM(w^2 * y^2)), and DoF correction uses observation count.
Adding something like weight_col (and possibly weight_type) to allow per-observation weights on the regression estimand, distinct from the frequency weights duckreg uses for compression.
Point estimates are the same regardless of weight interpretation: get SUM(w), SUM(w * y), and SUM(w * y^2) per stratum. But it affects analytic SEs: