Skip to content

Commit 575def4

Browse files
krlmlrmaelle
andauthored
docs: improve fallbacks vignette (#596)
Co-authored-by: Maëlle Salmon <[email protected]>
1 parent 293ad41 commit 575def4

File tree

1 file changed

+32
-4
lines changed

1 file changed

+32
-4
lines changed

vignettes/fallback.Rmd

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,11 @@ conflict_prefer("filter", "dplyr")
4545
## Introduction
4646

4747
The duckplyr package aims at providing a fully compatible drop-in replacement for dplyr.
48-
To achieve this, only a carefully selected subset of dplyr's operations, R functions, and R data types are implemented.
49-
Whenever a request cannot be handled by DuckDB, duckplyr falls back to dplyr.
48+
All operations, R functions, and data types that are supported by dplyr should work in an identical way with duckplyr.
49+
This is achieved in two ways:
50+
51+
- A carefully selected subset of dplyr operations, R functions, and R data types are implemented in DuckDB, focusing on faithful translation.
52+
- When DuckDB does not support an operation, duckplyr falls back to dplyr, guaranteeing identical behavior.
5053

5154
## DuckDB operation
5255

@@ -67,7 +70,12 @@ duckdb %>%
6770
explain()
6871
```
6972

70-
The plan shows three operations: a data frame scan (the input), a sort operation, and a projection (adding the `b` column and removing the `a` column).
73+
The plan shows three operations:
74+
75+
- a data frame scan (the input),
76+
- a sort operation,
77+
- a projection (adding the `b` column and removing the `a` column).
78+
7179
Because each operation is supported by DuckDB, the resulting object contains a plan for the entire pipeline.
7280
The plan is only executed when the data is needed.
7381

@@ -110,7 +118,7 @@ fallback <-
110118
select(-a)
111119
```
112120

113-
The `verbose_plus_one()` function is not supported by DuckDB, so the `mutate()` step is handled by dplyr and already executed when the pipeline is defined.
121+
The `verbose_plus_one()` function is not supported by DuckDB, so the `mutate()` step is forwarded to dplyr and already executed (eagerly) when the pipeline is defined.
114122
This is confirmed by the `last_rel()` function:
115123

116124
```{r}
@@ -140,6 +148,26 @@ duckplyr::last_rel()
140148

141149
The `last_rel()` function confirms that only the final `select()` is handled by DuckDB again.
142150

151+
## Enforce DuckDB operation
152+
153+
For any duck frame, one can control the automatic materialization.
154+
For fallbacks to dplyr, automatic materialization must be allowed for the frame at hand, as dplyr necessitate eager evaluation.
155+
156+
Therefore, by making a data frame frugal, one can ensure a pipeline will error when a fallback to dplyr would have normally happened.
157+
See `vignette("prudence")` for details.
158+
159+
By using operations supported by duckplyr and avoiding fallbacks as much as possible, your pipelines will be executed by DuckDB in an optimized way.
160+
161+
## Configure fallbacks
162+
163+
Using the `fallback_sitrep()` and `fallback_config()` functions you can examine and change settings related to fallbacks.
164+
165+
- You can choose to make fallbacks verbose with `fallback_config(info = TRUE)`.
166+
167+
- You can change settings related to logging and reporting fallback to duckplyr development team to inform their work.
168+
169+
See `vignette("telemetry")` for details.
170+
143171
## Conclusion
144172

145173
The fallback mechanism in duckplyr allows for a seamless integration of dplyr verbs and R functions that are not supported by DuckDB.

0 commit comments

Comments
 (0)