Skip to content

Commit 18e17d3

Browse files
Update to StatsModels 0.7 (#220)
* Update FixedEffectModels.jl * Update FixedEffectModels.jl * Update FixedEffectModels.jl * Update Project.toml * update * Update Project.toml * Update FixedEffectModels.jl * update benchmarks * Update partial_out.jl * Update README.md * update stata too * Update README.md * Update README.md * Update Project.toml * Update README.md * Update README.md * Update README.md * Update README.md * better printing * update version to 1.9.0 * update tests * Update FixedEffectModel.jl * use snoopcompile * precompile * Update FixedEffectModel.jl * Update FixedEffectModel.jl * Update runtests.jl * Update formula.jl * Update fit.jl * update to Julia 1.6
1 parent cfdef87 commit 18e17d3

21 files changed

Lines changed: 1237 additions & 1257 deletions

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ jobs:
1515
fail-fast: false
1616
matrix:
1717
version:
18-
- '1.3' # Replace this with the minimum Julia version that your package supports. E.g. if your package requires Julia 1.5 or higher, change this to '1.5'.
18+
- '1.6' # Replace this with the minimum Julia version that your package supports. E.g. if your package requires Julia 1.5 or higher, change this to '1.5'.
1919
- '1' # Leave this line unchanged. '1' will automatically expand to the latest stable 1.x release of Julia.
2020
os:
2121
- ubuntu-latest

Project.toml

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
name = "FixedEffectModels"
22
uuid = "9d5cd8c9-2029-5cab-9928-427838db53e3"
3-
version = "1.8.1"
3+
version = "1.9.0"
44

55
[deps]
66
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
@@ -15,15 +15,17 @@ StatsFuns = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
1515
StatsModels = "3eaba693-59b7-5ba5-a881-562e759f1c8d"
1616
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
1717
Vcov = "ec2bfdc2-55df-4fc9-b9ae-4958c2cf2486"
18+
SnoopPrecompile = "66db9d55-30c0-4569-8b51-7e840670fc0c"
1819

1920
[compat]
20-
DataFrames = "0.21, 0.22, 1.0"
21+
DataFrames = "0.21, 0.22, 1"
2122
FixedEffects = "2"
22-
Reexport = "0.1, 0.2, 1.0"
23+
Reexport = "0.1, 0.2, 1"
24+
SnoopPrecompile = "1"
2325
StatsAPI = "1"
2426
StatsBase = "0.33"
2527
StatsFuns = "0.9, 1"
26-
StatsModels = "0.6"
28+
StatsModels = "0.7"
2729
Tables = "1"
2830
Vcov = "0.7"
29-
julia = "1.3"
31+
julia = "1.6"

README.md

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,10 @@ This package estimates linear models with high dimensional categorical variables
66
The package is registered in the [`General`](https://github.com/JuliaRegistries/General) registry and so can be installed at the REPL with `] add FixedEffectModels`.
77

88
## Benchmarks
9-
The objective of the package is similar to the Stata command [`reghdfe`](https://github.com/sergiocorreia/reghdfe) and the R function [`felm`](https://cran.r-project.org/web/packages/lfe/lfe.pdf). The package tends to be much faster than these two options.
10-
11-
![benchmark](http://www.matthieugomez.com/files/fixedeffectmodels_benchmark.png)
9+
The objective of the package is similar to the Stata command [`reghdfe`](https://github.com/sergiocorreia/reghdfe) and the R packages [`lfe`](https://cran.r-project.org/web/packages/lfe/lfe.pdf) and [`fixest`](https://lrberge.github.io/fixest/). The package is much faster than `reghdfe` or `lfe`. It also tends to be a bit faster than the more recent `fixest` (depending on the exact command). For complicated models, `FixedEffectModels` can also run on Nvidia GPUs for even faster performances (see below)
1210

1311

14-
Performances are roughly similar to the newer R function [`feols`](https://cran.r-project.org/web/packages/fixest/fixest.pdf). The main difference is that `FixedEffectModels` can also run the demeaning operation on a GPU (with `method = :gpu`).
12+
![benchmark](http://www.matthieugomez.com/files/fixedeffectmodels_benchmark.png)
1513

1614
## Syntax
1715

@@ -99,14 +97,13 @@ You may use [RegressionTables.jl](https://github.com/jmboehm/RegressionTables.jl
9997

10098
## Performances
10199

102-
103100
### MultiThreads
104-
`FixedEffectModels` is multi-threaded. Use the option `nthreads` to select the number of threads to use in the estimation (defaults to `Threads.nthreads()`). That being said, multithreading does not usually make a big difference.
101+
`FixedEffectModels` is multi-threaded. Use the option `nthreads` to select the number of threads to use in the estimation (defaults to `Threads.nthreads()`).
105102

106-
### GPU
107-
The package has support for GPUs (Nvidia) (thanks to Paul Schrimpf). This can make the package an order of magnitude faster for complicated problems.
103+
### Nvidia GPU
104+
The package has support for Nvidia GPUs (thanks to Paul Schrimpf). This can make the package an order of magnitude faster for complicated problems.
108105

109-
To use GPU, run `using CUDA` before `using FixedEffectModels`. Then, estimate a model with `method = :gpu`. For maximum speed, set the floating point precision to `Float32` with `double_precision = false`.
106+
If you have a Nvidia GPU, run `using CUDA` before `using FixedEffectModels`. Then, estimate a model with `method = :gpu`. For maximum speed, set the floating point precision to `Float32` with `double_precision = false`.
110107

111108
```julia
112109
using CUDA, FixedEffectModels

benchmark/.sublime2Terminal.jl

Lines changed: 0 additions & 10 deletions
This file was deleted.

benchmark/benchmark.csv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
Order,Command,Julia,R,Stata1,simple,0.601445,1.843,1.22,1 hd fe,1.624446 ,14.831,15.513,2 hd fe,3.639817,10.626,49.384, 1 cluster,1.462648,9.255,11.155, 2 cluster,7.187382,96.958,118.67
1+
Order,Command,FixedEffectModels.jl (Julia),fixest (R),lfe (R),reghdfe (Stata)1,simple,0.35,0.317,1.843, 0.612,1 hd fe,0.463 ,0.704 ,14.831, 4.643,2 hd fe,1.00,1.297 ,10.626, 22.994, 1 cluster se,0.38058,0.700 ,9.255, 8.285, 2 clusters se,0.765,1.803,96.958, 70.44

benchmark/benchmark.jl

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
1-
using DataFrames, FixedEffectModels, Random, CategoricalArrays
2-
1+
using DataFrames, Random, CategoricalArrays
2+
@time using FixedEffectModels
3+
# 13s precompiling
34
# Very simple setup
45
N = 10000000
56
K = 100
@@ -11,17 +12,19 @@ y= 3 .* x1 .+ 5 .* x2 .+ cos.(id1) .+ cos.(id2).^2 .+ randn(N)
1112
df = DataFrame(id1 = id1, id2 = id2, x1 = x1, x2 = x2, y = y)
1213
# first time
1314
@time reg(df, @formula(y ~ x1 + x2))
14-
# 14s
15+
# 3.5s
1516
@time reg(df, @formula(y ~ x1 + x2))
16-
# 0.582029 seconds (852 allocations: 535.311 MiB, 18.28% gc time)
17+
# 0.497374 seconds (450 allocations: 691.441 MiB, 33.18% gc time)
18+
@time reg(df, @formula(y ~ x1 + x2), Vcov.cluster(:id2))
19+
# 1.898018 seconds (7.10 M allocations: 1.220 GiB, 8.20% gc time, 4.46% compilation time)
1720
@time reg(df, @formula(y ~ x1 + x2), Vcov.cluster(:id2))
18-
# 0.621690 seconds (693 allocations: 768.945 MiB, 7.69% gc time)
21+
# 0.605172 seconds (591 allocations: 768.939 MiB, 42.38% gc time)
1922
@time reg(df, @formula(y ~ x1 + x2 + fe(id1)))
20-
# 1.143941 seconds (245.39 k allocations: 942.937 MiB, 12.93% gc time, 14.99% compilation time)
23+
# 0.893835 seconds (1.03 k allocations: 929.130 MiB, 54.19% gc time)
2124
@time reg(df, @formula(y ~ x1 + x2 + fe(id1)), Vcov.cluster(:id1))
22-
# 1.242207 seconds (245.73 k allocations: 1022.348 MiB, 9.48% gc time, 14.10% compilation time)
25+
# 1.015078 seconds (1.18 k allocations: 1008.532 MiB, 56.50% gc time)
2326
@time reg(df, @formula(y ~ x1 + x2 + fe(id1) + fe(id2)))
24-
# 2.255812 seconds (351.74 k allocations: 1.076 GiB, 3.98% gc time, 12.93% compilation time)
27+
# 1.835464 seconds (4.02 k allocations: 1.057 GiB, 35.59% gc time)
2528

2629
# More complicated setup
2730
N = 800000 # number of observations
@@ -34,7 +37,7 @@ x2 = cos.(id1) + sin.(id2) + randn(N)
3437
y= 3 .* x1 .+ 5 .* x2 .+ cos.(id1) .+ cos.(id2).^2 .+ randn(N)
3538
df = DataFrame(id1 = id1, id2 = id2, x1 = x1, x2 = x2, y = y)
3639
@time reg(df, @formula(y ~ x1 + x2 + fe(id1) + fe(id2)))
37-
# 3.048292 seconds (422.51 k allocations: 114.317 MiB, 6.86% compilation time)
40+
# 2.504294 seconds (75.83 k allocations: 95.525 MiB, 0.23% gc time)
3841

3942

4043
+# fixest
@@ -48,8 +51,8 @@ X1 = rand(n)
4851
ln_y = 3 .* X1 .+ rand(n)
4952
df = DataFrame(X1 = X1, ln_y = ln_y, id1 = id1, id2 = id2, id3 = id3)
5053
@time reg(df, @formula(ln_y ~ X1 + fe(id1)), Vcov.cluster(:id1))
51-
# 0.869512 seconds (234.23 k allocations: 828.818 MiB, 18.95% compilation time)
54+
# 0.543996 seconds (873 allocations: 815.677 MiB, 34.15% gc time)
5255
@time reg(df, @formula(ln_y ~ X1 + fe(id1) + fe(id2)), Vcov.cluster(:id1))
53-
# 2.192262 seconds (300.08 k allocations: 985.534 MiB, 4.61% gc time, 9.42% compilation time)
56+
# 1.301908 seconds (3.03 k allocations: 968.729 MiB, 25.84% gc time)
5457
@time reg(df, @formula(ln_y ~ X1 + fe(id1) + fe(id2) + fe(id3)), Vcov.cluster(:id1))
55-
# 2.700051 seconds (406.80 k allocations: 1.117 GiB, 3.56% gc time, 10.41% compilation time)
58+
# 1.658832 seconds (4.17 k allocations: 1.095 GiB, 29.78% gc time)

benchmark/benchmark.md

Lines changed: 53 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -3,31 +3,61 @@
33

44
Code to reproduce this graph:
55

6-
Julia
6+
FixedEffectModels.jl v1.9.0 (Julia 1.9)
77
```julia
8-
using DataFrames, FixedEffectModels
8+
using DataFrames, CategoricalArrays, FixedEffectModels
99
N = 10000000
1010
K = 100
1111
id1 = rand(1:(N/K), N)
1212
id2 = rand(1:K, N)
1313
x1 = randn(N)
1414
x2 = randn(N)
1515
y= 3 .* x1 .+ 2 .* x2 .+ sin.(id1) .+ cos.(id2).^2 .+ randn(N)
16-
df = DataFrame(id1 = categorical(id1), id2 = categorical(id2), x1 = x1, x2 = x2, w = w, y = y)
16+
df = DataFrame(id1 = categorical(id1), id2 = categorical(id2), x1 = x1, x2 = x2, y = y)
1717
@time reg(df, @formula(y ~ x1 + x2))
18-
#0.601445 seconds (1.05 k allocations: 535.311 MiB, 31.95% gc time)
18+
# 0.338749 seconds (450 allocations: 691.441 MiB, 2.30% gc time)
1919
@time reg(df, @formula(y ~ x1 + x2 + fe(id1)))
20-
# 1.624446 seconds (1.21 k allocations: 734.353 MiB, 17.27% gc time)
20+
# 0.463058 seconds (1.00 k allocations: 929.129 MiB, 13.31% gc time)
2121
@time reg(df, @formula(y ~ x1 + x2 + fe(id1) + fe(id2)))
22-
# 3.639817 seconds (1.84 k allocations: 999.675 MiB, 11.25% gc time)
22+
# 1.006031 seconds (3.22 k allocations: 1.057 GiB, 1.68% gc time)
2323
@time reg(df, @formula(y ~ x1 + x2), Vcov.cluster(:id1))
24-
# 1.462648 seconds (499.30 k allocations: 690.102 MiB, 15.92% gc time)
25-
@time reg(df, @formula(y ~ x1 + x2, Vcov.cluster(:id1, :id2)))
26-
# 7.187382 seconds (7.02 M allocations: 2.753 GiB, 24.19% gc time)
24+
# 0.380562 seconds (580 allocations: 771.606 MiB, 3.07% gc time)
25+
@time reg(df, @formula(y ~ x1 + x2), Vcov.cluster(:id1, :id2))
26+
#0.765847 seconds (719 allocations: 1.128 GiB, 2.01% gc time)
2727
````
2828

2929

30-
R (lfe package)
30+
fixest v0.8.4 (R 4.2.2)
31+
```R
32+
library(fixest)
33+
N = 10000000
34+
K = 100
35+
df = data.frame(
36+
id1 = as.factor(sample(N/K, N, replace = TRUE)),
37+
id2 = as.factor(sample(K, N, replace = TRUE)),
38+
x1 = runif(N),
39+
x2 = runif(N)
40+
)
41+
df[, "y"] = 3 * df[, "x1"] + 2 * df[, "x2"] + sin(as.numeric(df[, "id1"])) + cos(as.numeric(df[, "id2"])) + runif(N)
42+
system.time(feols(y ~ x1 + x2, df))
43+
#> user system elapsed
44+
#> 0.280 0.036 0.317
45+
system.time(feols(y ~ x1 + x2|id1, df))
46+
#> user system elapsed
47+
#> 0.616 0.089 0.704
48+
system.time(feols(y ~ x1 + x2|id1 + id2, df))
49+
#> user system elapsed
50+
#> 1.181 0.120 1.297
51+
system.time(feols(y ~ x1 + x2, cluster = "id1", df))
52+
#> user system elapsed
53+
#> 0.630 0.071 0.700
54+
system.time(feols(y ~ x1 + x2, cluster = c("id1", "id2"), df))
55+
#> user system elapsed
56+
#> 1.570 0.197 1.803
57+
```
58+
59+
60+
lfe v2.8-8 (R 4.2.2)
3161
```R
3262
library(lfe)
3363
N = 10000000
@@ -42,22 +72,22 @@ Code to reproduce this graph:
4272
4373
system.time(felm(y ~ x1 + x2, df))
4474
#> user system elapsed
45-
#> 1.843 0.476 2.323
75+
#> 1.137 0.232 1.596
4676
system.time(felm(y ~ x1 + x2|id1, df))
4777
#> user system elapsed
48-
#> 14.831 1.342 15.993
78+
#> 7.08 0.41 7.46
4979
system.time(felm(y ~ x1 + x2|id1 + id2, df))
5080
#> user system elapsed
51-
#> 10.626 1.358 10.336
81+
#> 4.832 0.370 4.615
5282
system.time(felm(y ~ x1 + x2|0|0|id1, df))
5383
#> user system elapsed
54-
#> 9.255 0.843 10.110
84+
#> 3.712 0.287 3.996
5585
system.time(felm(y ~ x1 + x2|0|0|id1 + id2, df))
5686
#> user system elapsed
57-
#> 96.958 1.474 99.113
58-
```
87+
#> 59.119 0.889 59.946
88+
5989
60-
Stata (reghdfe version 5.2.9 06aug2018)
90+
reghdfe version 5.6.8 03mar2019 (Stata 16.1)
6191
```
6292
clear all
6393
local N = 10000000
@@ -72,13 +102,13 @@ Code to reproduce this graph:
72102
73103
set rmsg on
74104
reg y x1 x2
75-
#> r; t=1.20
76-
areg y x1 x2, a(id1)
77-
#>r; t=15.51
105+
#> r; t=0.61
106+
reghdfe y x1 x2, a(id1)
107+
#>r; t=4.64
78108
reghdfe y x1 x2, a(id1 id2)
79-
#> r; t=49.38
109+
#> r; t==22.99
80110
reg y x1 x2, cl(id1)
81-
#> r; t=11.15
111+
#> r; t=8.28
82112
ivreg2 y x1 x2, cluster(id1 id2)
83-
#> r; t=118.67
113+
#> r; t=70.44
84114
````
126 KB
Loading

benchmark/result.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
using DataFrames, CSV, Gadflydf = CSV.read("/Users/Matthieu/Dropbox/Github/FixedEffectModels.jl/benchmark/benchmark.csv")df.R = df.R ./ df.Juliadf.Stata = df.Stata ./ df.Juliadf.Julia = df.Julia ./ df.Juliamdf = melt(df[!, [:Command, :Julia, :R, :Stata]], :Command)mdf = rename(mdf, :variable => :Language)p = plot(mdf, x = "Language", y = "value", color = "Command", Guide.ylabel("Time (Ratio to Julia)"), Guide.xlabel("Model"), Guide.yticks(ticks= [1, 5, 10, 15]))draw(PNG("/Users/Matthieu/Dropbox/Github/FixedEffectModels.jl/benchmark/fixedeffectmodels_benchmark.png", 8inch, 5inch, dpi=300), p)
1+
using DataFrames, CSV, Gadflydf = CSV.read("/Users/matthieugomez/Dropbox/Github/FixedEffectModels.jl/benchmark/benchmark.csv", DataFrame)df."fixest (R)" = df."fixest (R)" ./ df."FixedEffectModels.jl (Julia)"df."lfe (R)" = df."lfe (R)" ./ df."FixedEffectModels.jl (Julia)"df."reghdfe (Stata)" = df."reghdfe (Stata)" ./ df."FixedEffectModels.jl (Julia)"df."FixedEffectModels.jl (Julia)" = df."FixedEffectModels.jl (Julia)" ./ df."FixedEffectModels.jl (Julia)"mdf = stack(df, Not([:Command, :Order]))mdf = rename(mdf, :variable => :Language)p = plot(mdf, x = "Command", y = "value", color = "Language", Guide.ylabel("Time (Ratio to Julia)"), Guide.xlabel("Command"), Scale.y_log10)draw(PNG("/Users/matthieugomez/Dropbox/Github/FixedEffectModels.jl/benchmark/fixedeffectmodels_benchmark.png", 8inch, 5inch, dpi=300), p)

benchmark/result.png

-30.9 KB
Binary file not shown.

0 commit comments

Comments
 (0)