-
Notifications
You must be signed in to change notification settings - Fork 41
docs: Multiple fixes for Aho-Corasick functions #1660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
R/expr-string.R
Outdated
| #' literals. To use the same character vector for all rows, use | ||
| #' `list(c(...))` instead of `c(...)` (see Examples). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this sentence because it would seem natural to use e.g. c("a", "b") to look for patterns "a" and "b" in all rows (this is what we did in most tests), but this behaviour will change. It is quite counter-intuitive to wrap this in list() (I didn't get this to work on the first try), so I think it's helpful to mention this here.
R/expr-string.R
Outdated
|
|
||
| self$`_rexpr`$str_replace_many( | ||
| as_polars_expr(patterns, as_lit = TRUE)$`_rexpr`, | ||
| as_polars_expr(patterns, as_lit = FALSE)$`_rexpr`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R/expr-string.R
Outdated
| as_polars_expr(patterns, as_lit = TRUE)$`_rexpr`, | ||
| as_polars_expr(patterns, as_lit = FALSE)$`_rexpr`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| dat <- pl$DataFrame(x = c("HELLO there", "hi there", "good bye", NA)) | ||
| expect_equal( | ||
| dat$with_columns(pl$col("x")$str$contains_any(c("hi", "hello"))), | ||
| dat$with_columns(pl$col("x")$str$contains_any(list(c("hi", "hello")))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the modified tests gave "invisible" warnings (i.e. they are not printed as warnings and don't even appear on the website), such as:
Deprecation: `str.contains_any` with a flat string datatype is deprecated.
Please use `implode` to return to previous behavior.
See [https://github.com/pola-rs/polars/issues/22149](https://github.com/pola-rs/polars/issues/22149) for more information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the Aho-Corasick string functions (str$contains_any(), str$replace_many(), str$extract_many(), and str$find_many()) to change how the patterns argument is handled. The key change is setting as_lit = FALSE for the patterns parameter in contains_any() and replace_many(), making them consistent with extract_many() and find_many(). This means strings are now parsed as column names by default, requiring users to wrap literal patterns in list(c(...)) instead of using bare vectors.
Key Changes:
- Changed
as_litparameter fromTRUEtoFALSEforpatternsinexpr_str_contains_any()andexpr_str_replace_many() - Updated documentation to clarify that patterns should be wrapped in
list(c(...))for literal values - Updated tests and examples to reflect the new syntax
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| R/expr-string.R | Changed as_lit = FALSE for patterns parameter; updated documentation and examples (contains bugs in examples) |
| tests/testthat/test-expr-string.R | Updated test cases to wrap patterns in list() (contains multiple bugs in test syntax) |
| tests/testthat/_snaps/expr-string.md | Updated error message snapshots to reflect new syntax |
| man/expr_str_replace_many.Rd | Updated parameter documentation and examples (contains bugs in examples) |
| man/expr_str_find_many.Rd | Updated parameter documentation and examples |
| man/expr_str_extract_many.Rd | Updated parameter documentation and examples |
| man/expr_str_contains_any.Rd | Updated parameter documentation and examples |
| NEWS.md | Added changelog entry explaining the behavior change |
|
Thanks for taking a look at this. |
eitsupi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This is labelled as "docs" but actually also touches tests and changes the value foras_litin some functions, so not sure what label it should have.This is another example why it would be nice to have #1641. Those messages were not marked as warnings by
testthat, didn't fail in examples, and don't even appear in the examples rendered on the website.