Skip to content

Conversation

@etiennebacher
Copy link
Collaborator

@etiennebacher etiennebacher commented Nov 19, 2025

This is labelled as "docs" but actually also touches tests and changes the value for as_lit in some functions, so not sure what label it should have.

This is another example why it would be nice to have #1641. Those messages were not marked as warnings by testthat, didn't fail in examples, and don't even appear in the examples rendered on the website.

Copilot AI review requested due to automatic review settings November 19, 2025 22:24
Copilot finished reviewing on behalf of etiennebacher November 19, 2025 22:28
R/expr-string.R Outdated
Comment on lines 1120 to 1121
#' literals. To use the same character vector for all rows, use
#' `list(c(...))` instead of `c(...)` (see Examples).
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this sentence because it would seem natural to use e.g. c("a", "b") to look for patterns "a" and "b" in all rows (this is what we did in most tests), but this behaviour will change. It is quite counter-intuitive to wrap this in list() (I didn't get this to work on the first try), so I think it's helpful to mention this here.

R/expr-string.R Outdated

self$`_rexpr`$str_replace_many(
as_polars_expr(patterns, as_lit = TRUE)$`_rexpr`,
as_polars_expr(patterns, as_lit = FALSE)$`_rexpr`,
Copy link
Collaborator Author

@etiennebacher etiennebacher Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R/expr-string.R Outdated
Comment on lines 1141 to 1143
as_polars_expr(patterns, as_lit = TRUE)$`_rexpr`,
as_polars_expr(patterns, as_lit = FALSE)$`_rexpr`,
Copy link
Collaborator Author

@etiennebacher etiennebacher Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dat <- pl$DataFrame(x = c("HELLO there", "hi there", "good bye", NA))
expect_equal(
dat$with_columns(pl$col("x")$str$contains_any(c("hi", "hello"))),
dat$with_columns(pl$col("x")$str$contains_any(list(c("hi", "hello")))),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the modified tests gave "invisible" warnings (i.e. they are not printed as warnings and don't even appear on the website), such as:

Deprecation: `str.contains_any` with a flat string datatype is deprecated.
Please use `implode` to return to previous behavior.
See [https://github.com/pola-rs/polars/issues/22149](https://github.com/pola-rs/polars/issues/22149) for more information.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the Aho-Corasick string functions (str$contains_any(), str$replace_many(), str$extract_many(), and str$find_many()) to change how the patterns argument is handled. The key change is setting as_lit = FALSE for the patterns parameter in contains_any() and replace_many(), making them consistent with extract_many() and find_many(). This means strings are now parsed as column names by default, requiring users to wrap literal patterns in list(c(...)) instead of using bare vectors.

Key Changes:

  • Changed as_lit parameter from TRUE to FALSE for patterns in expr_str_contains_any() and expr_str_replace_many()
  • Updated documentation to clarify that patterns should be wrapped in list(c(...)) for literal values
  • Updated tests and examples to reflect the new syntax

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
R/expr-string.R Changed as_lit = FALSE for patterns parameter; updated documentation and examples (contains bugs in examples)
tests/testthat/test-expr-string.R Updated test cases to wrap patterns in list() (contains multiple bugs in test syntax)
tests/testthat/_snaps/expr-string.md Updated error message snapshots to reflect new syntax
man/expr_str_replace_many.Rd Updated parameter documentation and examples (contains bugs in examples)
man/expr_str_find_many.Rd Updated parameter documentation and examples
man/expr_str_extract_many.Rd Updated parameter documentation and examples
man/expr_str_contains_any.Rd Updated parameter documentation and examples
NEWS.md Added changelog entry explaining the behavior change

@eitsupi
Copy link
Collaborator

eitsupi commented Nov 20, 2025

Thanks for taking a look at this.
Changing the internal as_lit value is a breaking change and should be postponed until 2.0.0 IMO (Same as #1572).

Copy link
Collaborator

@eitsupi eitsupi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@eitsupi eitsupi merged commit ff51354 into main Nov 22, 2025
22 of 23 checks passed
@eitsupi eitsupi deleted the aho-corasick-fixes branch November 22, 2025 01:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants