Skip to content

Commit 42ce6ad

Browse files
maellekrlmlr
authored andcommitted
docs: further improve fallbacks vignette
1 parent de036ab commit 42ce6ad

File tree

1 file changed

+20
-27
lines changed

1 file changed

+20
-27
lines changed

vignettes/fallback.Rmd

Lines changed: 20 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -45,13 +45,10 @@ conflict_prefer("filter", "dplyr")
4545
## Introduction
4646

4747
The duckplyr package aims at providing a fully compatible drop-in replacement for dplyr.
48-
All operations, R functions, and data types that are supported by dplyr should work in an identical way with duckplyr.
49-
This is achieved in two ways:
48+
Currently, only a carefully selected subset of dplyr's operations, R functions, and R data types are implemented (see `vignette("limits")`).
49+
Whenever a request cannot be handled by DuckDB, duckplyr falls back to dplyr.
5050

51-
- A carefully selected subset of dplyr operations, R functions, and R data types are implemented in DuckDB, focusing on faithful translation.
52-
- When DuckDB does not support an operation, duckplyr falls back to dplyr, guaranteeing identical behavior.
53-
54-
## DuckDB mode
51+
## A pipeline directly supported by duckplyr
5552

5653
The following operation is supported by duckplyr:
5754

@@ -70,18 +67,18 @@ duckdb %>%
7067
explain()
7168
```
7269

73-
The plan shows three operations:
70+
The plan shows three **operations**:
7471

75-
- a data frame scan (the input),
72+
- a data frame scan (the input),
7673
- a sort operation,
7774
- a projection (adding the `b` column and removing the `a` column).
7875

79-
Each operation is supported by DuckDB.
80-
The resulting object contains a plan for the entire pipeline that is executed lazily, only when the data is needed.
76+
Because each operation is supported by DuckDB, the resulting object contains a **plan for the entire pipeline**.
77+
The plan is only executed when the data is needed, i.e. lazily (see `vignette("prudence")`).
8178

82-
## Relation objects
79+
### Relation objects
8380

84-
DuckDB accepts a tree of interconnected _relation objects_ as input.
81+
DuckDB accepts a tree of interconnected *relation objects* as input.
8582
Each relation object represents a logical step of the execution plan.
8683
The duckplyr package translates dplyr verbs into relation objects.
8784

@@ -101,7 +98,7 @@ duckplyr::last_rel()
10198

10299
The `last_rel()` function now shows a relation that describes logical plan for executing the whole pipeline.
103100

104-
## Help from dplyr
101+
## A pipeline with functionality not directly supported by duckplyr
105102

106103
Using a custom function with a side effect is not supported by DuckDB and triggers a dplyr fallback:
107104

@@ -118,7 +115,7 @@ fallback <-
118115
select(-a)
119116
```
120117

121-
The `verbose_plus_one()` function is not supported by DuckDB, so the `mutate()` step is forwarded to dplyr and already executed (eagerly) when the pipeline is defined.
118+
The `verbose_plus_one()` function is not supported by DuckDB, so the `mutate()` step is handled by dplyr and already executed when the pipeline is defined, i.e. eagerly.
122119
This is confirmed by the `last_rel()` function:
123120

124121
```{r}
@@ -148,30 +145,26 @@ duckplyr::last_rel()
148145

149146
The `last_rel()` function confirms that only the final `select()` is handled by DuckDB again.
150147

151-
## Enforce DuckDB operation
152-
153-
For any duck frame, one can control the automatic materialization.
154-
For fallbacks to dplyr, automatic materialization must be allowed for the frame at hand, as dplyr necessitates eager evaluation.
155-
156-
Therefore, by making a data frame frugal, one can ensure a pipeline will error when a fallback to dplyr would have normally happened.
157-
See `vignette("prudence")` for details.
158-
159-
By using operations supported by duckplyr and avoiding fallbacks as much as possible, your pipelines will be executed by DuckDB in an optimized way.
160-
161148
## Configure fallbacks
162149

163150
Using the `fallback_sitrep()` and `fallback_config()` functions you can examine and change settings related to fallbacks.
164151

165152
- You can choose to make fallbacks verbose with `fallback_config(info = TRUE)`.
166153

167-
- You can change settings related to logging and reporting fallback to duckplyr development team to inform their work.
154+
- You can change settings related to logging and reporting fallback to duckplyr development team to inform their work. See `vignette("telemetry")`.
155+
156+
### Enforcing DuckDB operation
157+
158+
For any duck frame, one can control the automatic materialization.
159+
For fallbacks to dplyr, automatic materialization must be allowed for the frame at hand, as dplyr necessitate eager evaluation.
160+
161+
Therefore, by making a data frame frugal, one can ensure a pipeline will error when a fallback to dplyr would have normally happened. See `vignette("prudence")`.
168162

169-
See `vignette("telemetry")` for details.
163+
By using operations supported by duckplyr and avoiding fallbacks as much as possible, your pipelines will be executed by DuckDB in an optimized way.
170164

171165
## Conclusion
172166

173167
The fallback mechanism in duckplyr allows for a seamless integration of dplyr verbs and R functions that are not supported by DuckDB.
174168
It is transparent to the user and only triggers when necessary.
175169
With small or medium-sized data sets, it will not even be noticeable in most settings.
176170

177-
See `vignette("large")` for techniques for working with large data, `vignette("limits")` for the currently implementated translations, `vignette("prudence")` for details on controlling fallback behavior, and `vignette("telemetry")` for the automatic reporting of fallback situations.

0 commit comments

Comments
 (0)