You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The duckplyr package aims at providing a fully compatible drop-in replacement for dplyr.
48
-
To achieve this, only a carefully selected subset of dplyr's operations, R functions, and R data types are implemented.
49
-
Whenever a request cannot be handled by DuckDB, duckplyr falls back to dplyr.
48
+
All operations, R functions, and data types that are supported by dplyr should work in an identical way with duckplyr.
49
+
This is achieved in two ways:
50
+
51
+
- A carefully selected subset of dplyr operations, R functions, and R data types are implemented in DuckDB, focusing on faithful translation.
52
+
- When DuckDB does not support an operation, duckplyr falls back to dplyr, guaranteeing identical behavior.
50
53
51
54
## DuckDB operation
52
55
@@ -67,7 +70,12 @@ duckdb %>%
67
70
explain()
68
71
```
69
72
70
-
The plan shows three operations: a data frame scan (the input), a sort operation, and a projection (adding the `b` column and removing the `a` column).
73
+
The plan shows three operations:
74
+
75
+
- a data frame scan (the input),
76
+
- a sort operation,
77
+
- a projection (adding the `b` column and removing the `a` column).
78
+
71
79
Because each operation is supported by DuckDB, the resulting object contains a plan for the entire pipeline.
72
80
The plan is only executed when the data is needed.
73
81
@@ -110,7 +118,7 @@ fallback <-
110
118
select(-a)
111
119
```
112
120
113
-
The `verbose_plus_one()` function is not supported by DuckDB, so the `mutate()` step is handled by dplyr and already executed when the pipeline is defined.
121
+
The `verbose_plus_one()` function is not supported by DuckDB, so the `mutate()` step is forwarded to dplyr and already executed (eagerly) when the pipeline is defined.
114
122
This is confirmed by the `last_rel()` function:
115
123
116
124
```{r}
@@ -140,6 +148,26 @@ duckplyr::last_rel()
140
148
141
149
The `last_rel()` function confirms that only the final `select()` is handled by DuckDB again.
142
150
151
+
## Enforce DuckDB operation
152
+
153
+
For any duck frame, one can control the automatic materialization.
154
+
For fallbacks to dplyr, automatic materialization must be allowed for the frame at hand, as dplyr necessitate eager evaluation.
155
+
156
+
Therefore, by making a data frame frugal, one can ensure a pipeline will error when a fallback to dplyr would have normally happened.
157
+
See `vignette("prudence")` for details.
158
+
159
+
By using operations supported by duckplyr and avoiding fallbacks as much as possible, your pipelines will be executed by DuckDB in an optimized way.
160
+
161
+
## Configure fallbacks
162
+
163
+
Using the `fallback_sitrep()` and `fallback_config()` functions you can examine and change settings related to fallbacks.
164
+
165
+
- You can choose to make fallbacks verbose with `fallback_config(info = TRUE)`.
166
+
167
+
- You can change settings related to logging and reporting fallback to duckplyr development team to inform their work.
168
+
169
+
See `vignette("telemetry")` for details.
170
+
143
171
## Conclusion
144
172
145
173
The fallback mechanism in duckplyr allows for a seamless integration of dplyr verbs and R functions that are not supported by DuckDB.
0 commit comments