-
Notifications
You must be signed in to change notification settings - Fork 176
Add replace command with Calcite #4248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@Override | ||
public LogicalPlan visitReplace(Replace node, AnalysisContext context) { | ||
throw new UnsupportedOperationException( | ||
"Replace is supported only when " + CALCITE_ENGINE_ENABLED.getKeyValue() + "=true"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a util for this nowadays:
public static UnsupportedOperationException getOnlyForCalciteException(String feature) { |
String patternStr = pattern.toString(); | ||
if (patternStr.contains("+") | ||
|| patternStr.contains("-") | ||
|| patternStr.contains("*") | ||
|| patternStr.contains("/")) { | ||
throw new IllegalArgumentException( | ||
"Expression is not allowed in replace pattern. Only string literals are supported."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little confused by this logic, we don't allow strings that contain these characters? Or if we're trying to forbid any expressions at all, I'm not sure just the 4 arithmetic operators will be sufficient (e.g. if someone tries to boolean and
or something).
If we don't want an expression, the grammar should specify a stringLiteral
or something along those lines, instead of expression
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for @Swiddis 's question, and I also left a comment in this validation method in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this! Yes the idea was to avoid any non-string literals for pattern text. I have updated the grammar token itself to string literal to handle this use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @manasvinibs , thanks for taking this on. And I just left some comments.
} | ||
|
||
private void validate() { | ||
if (pattern == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation logic for the pattern expression hardcodes checks for mathematical operators (+
, -
, *
, /
) on L44-47. This is may miss other invalid expressions in my opinion. Maybe consider validating that pattern is a string literal by checking its type (e.g., Literal with DataType.STRING
)?
Something like this:
if (!(pattern instanceof Literal && ((Literal) pattern).getType() == DataType.STRING)) {
throw new IllegalArgumentException("Replace pattern must be a string literal.");
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! Updated in the next revision.
String patternStr = pattern.toString(); | ||
if (patternStr.contains("+") | ||
|| patternStr.contains("-") | ||
|| patternStr.contains("*") | ||
|| patternStr.contains("/")) { | ||
throw new IllegalArgumentException( | ||
"Expression is not allowed in replace pattern. Only string literals are supported."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for @Swiddis 's question, and I also left a comment in this validation method in general.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wild card support for rename command is implemented in #4019
Let's utilize the logic when we implement wild card support for replace
.
docs/user/ppl/cmd/replace.rst
Outdated
|
||
Description | ||
============ | ||
| Using ``replace`` command to replace text in one or more fields in the search result. The replaced text appears in new fields with prefix *new_*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it our intended behavior to introduce new attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently with this implementation I'm reusing existing eval replace function which also exposes the replaced values into new fields. I'm mimicking the same behavior with direct command. But, if we think instead of adding new field, we should replace on the original one I'm open to explore that option. Let me know your thoughts.
docs/user/ppl/cmd/replace.rst
Outdated
============ | ||
replace '<pattern>' WITH '<replacement>' IN <field-name>[, <field-name>]... | ||
|
||
* pattern: mandatory. The text pattern you want to replace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's clarify that we currently support only plain text pattern. (pattern
could indicate wildcard/regex/etc.)
docs/user/ppl/cmd/replace.rst
Outdated
fetched rows / total rows = 1/1 | ||
+-------------------------------+-------------------------------+ | ||
| message | new_message | | ||
|------------------------+--------------------------------------| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I see broken table format.
newFieldNames.add(fieldName); | ||
} | ||
|
||
// Then add new fields with replaced content using new_ prefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if new_xxx
already exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this behavior and I see system is automatically handling the field name collision by appending a number ("new_country0") and resolving field name conflicts gracefully. I have documented this behavior.
ef65df6
to
2d17ce7
Compare
2d17ce7
to
be806e8
Compare
a20ff96
to
966b939
Compare
context.relBuilder.call( | ||
SqlStdOperatorTable.REPLACE, fieldRef, patternNode, replacementNode); | ||
projectList.add(replaceCall); | ||
newFieldNames.add(NEW_FIELD_PREFIX + fieldName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks it does not check existing field and add suffix number as written in the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes every replace command adds a new_
column and do not conflict with existing column names. Let me know if you think we should change the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering the following doc description is right since this logic doesn't check existence of the field. (Is it something automatically done?)
* If a field with *new_* prefix already exists (e.g., 'new_country'), a number will be appended to create a unique field name (e.g., 'new_country0')
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
Outdated
Show resolved
Hide resolved
ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
Outdated
Show resolved
Hide resolved
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
Show resolved
Hide resolved
ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Manasvini B S <[email protected]>
966b939
to
3621bc9
Compare
This PR is stalled because it has been open for 2 weeks with no activity. |
Description
Implement the replace command in PPL to replace text patterns in specified fields. This PR includes the grammar implementation and basic replacement functionality. This implementation reuses the existing replace functionality. Missing/new features for regex support is added in a separate PR.
Syntax
<source> | replace '<pattern>' WITH '<replacement>' IN field_list
pattern: The text pattern to search for (case-sensitive)
replacement: The text to replace matches with
field_list: Comma-separated list of fields to perform replacement in
Replace creates new fields with prefix 'new_' containing the replaced text, while preserving original fields.
Semantics
Expected Behavior:
Implementation Approach:
Example Queries
Output Schema
For each field specified in IN clause, a new field is created:
Example:
Input fields: message="error occurred", level="error"
After replace 'error' WITH 'ERROR' IN message, level:
Related Issues
#3975
Check List
--signoff
or-s
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.