Skip to content

Commit 39175a2

Browse files
maellekrlmlr
authored andcommitted
docs: improve fallbacks vignette
1 parent 13f4dde commit 39175a2

File tree

1 file changed

+31
-10
lines changed

1 file changed

+31
-10
lines changed

vignettes/fallback.Rmd

Lines changed: 31 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -45,10 +45,10 @@ conflict_prefer("filter", "dplyr")
4545
## Introduction
4646

4747
The duckplyr package aims at providing a fully compatible drop-in replacement for dplyr.
48-
To achieve this, only a carefully selected subset of dplyr's operations, R functions, and R data types are implemented.
48+
Currently, only a carefully selected subset of dplyr's operations, R functions, and R data types are implemented (see `vignette("limits")`).
4949
Whenever a request cannot be handled by DuckDB, duckplyr falls back to dplyr.
5050

51-
## DuckDB operation
51+
## A pipeline directly supported by duckplyr
5252

5353
The following operation is supported by duckplyr:
5454

@@ -67,13 +67,18 @@ duckdb %>%
6767
explain()
6868
```
6969

70-
The plan shows three operations: a data frame scan (the input), a sort operation, and a projection (adding the `b` column and removing the `a` column).
71-
Because each operation is supported by DuckDB, the resulting object contains a plan for the entire pipeline.
72-
The plan is only executed when the data is needed.
70+
The plan shows three **operations**:
7371

74-
## Relation objects
72+
- a data frame scan (the input),
73+
- a sort operation,
74+
- a projection (adding the `b` column and removing the `a` column).
7575

76-
DuckDB accepts a tree of interconnected _relation objects_ as input.
76+
Because each operation is supported by DuckDB, the resulting object contains a **plan for the entire pipeline**.
77+
The plan is only executed when the data is needed, i.e. lazily (see `vignette("prudence")`).
78+
79+
### Relation objects
80+
81+
DuckDB accepts a tree of interconnected *relation objects* as input.
7782
Each relation object represents a logical step of the execution plan.
7883
The duckplyr package translates dplyr verbs into relation objects.
7984

@@ -93,7 +98,7 @@ duckplyr::last_rel()
9398

9499
The `last_rel()` function now shows a relation that describes logical plan for executing the whole pipeline.
95100

96-
## Functionality not supported by DuckDB
101+
## A pipeline with functionality not directly supported by duckplyr
97102

98103
Using a custom function with a side effect is not supported by DuckDB and triggers a dplyr fallback:
99104

@@ -110,7 +115,7 @@ fallback <-
110115
select(-a)
111116
```
112117

113-
The `verbose_plus_one()` function is not supported by DuckDB, so the `mutate()` step is handled by dplyr and already executed when the pipeline is defined.
118+
The `verbose_plus_one()` function is not supported by DuckDB, so the `mutate()` step is handled by dplyr and already executed when the pipeline is defined, i.e. eagerly.
114119
This is confirmed by the `last_rel()` function:
115120

116121
```{r}
@@ -140,10 +145,26 @@ duckplyr::last_rel()
140145

141146
The `last_rel()` function confirms that only the final `select()` is handled by DuckDB again.
142147

148+
## Configure fallbacks
149+
150+
Using the `fallback_sitrep()` and `fallback_config()` functions you can examine and change settings related to fallbacks.
151+
152+
- You can choose to make fallbacks verbose with `fallback_config(info = TRUE)`.
153+
154+
- You can change settings related to logging and reporting fallback to duckplyr development team to inform their work. See `vignette("telemetry")`.
155+
156+
### Enforcing DuckDB operation
157+
158+
For any duck frame, one can control the automatic materialization.
159+
For fallbacks to dplyr, automatic materialization must be allowed for the frame at hand, as dplyr necessitate eager evaluation.
160+
161+
Therefore, by making a data frame frugal, one can ensure a pipeline will error when a fallback to dplyr would have normally happened. See `vignette("prudence")`.
162+
163+
By using operations supported by duckplyr and avoiding fallbacks as much as possible, your pipelines will be executed by DuckDB in an optimized way.
164+
143165
## Conclusion
144166

145167
The fallback mechanism in duckplyr allows for a seamless integration of dplyr verbs and R functions that are not supported by DuckDB.
146168
It is transparent to the user and only triggers when necessary.
147169
With small or medium-sized data sets, it will not even be noticeable in most settings.
148170

149-
See `vignette("large")` for techniques for working with large data, `vignette("limits")` for the currently implementated translations, `vignette("prudence")` for details on controlling fallback behavior, and `vignette("telemetry")` for the automatic reporting of fallback situations.

0 commit comments

Comments
 (0)