-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat(spark): Implement Spark functions url_encode
and url_decode
#17399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
/// | ||
fn decode(value: &str) -> Result<String> { | ||
// Check if the string has valid percent encoding | ||
// TODO: Support `try_url_decode` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO still valid? Should we open an issue and link to it if so?
} | ||
|
||
/// Replace b'+' with b' ' | ||
fn replace_plus(input: &[u8]) -> Cow<'_, [u8]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a test case that covers this replacement?
); | ||
} | ||
match arg_types[0] { | ||
DataType::Utf8 | DataType::Utf8View => Ok(DataType::Utf8), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you're collecting to a StringView for input type of Utf8View, so shouldn't this be more like
DataType::Utf8 | DataType::Utf8View | DataType::LargeUtf8 => Ok(arg_types[0].clone())
); | ||
} | ||
match arg_types[0] { | ||
DataType::Utf8 | DataType::Utf8View => Ok(DataType::Utf8), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as for decode
/// * `Ok(String)` - The encoded string | ||
/// | ||
fn encode(value: &str) -> Result<String> { | ||
Ok(byte_serialize(value.as_bytes()).collect::<String>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for encoding you don't need to do the inverse operations you do in decoding? Like the + -> ' ' manipulation? I am unfamiliar with the specific needs here.
# For more information, please see: | ||
# https://github.com/apache/datafusion/issues/15914 | ||
query T | ||
SELECT url_decode('https%3A%2F%2Fspark.apache.org'::string); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unfamiliar with the slt files, but if there was a way to have these do all 3 types of input (utf8, utf8view, largeutf8) that would be nice.
Which issue does this PR close?
datafusion-spark
Spark Compatible Functions #15914Rationale for this change
What changes are included in this PR?
Implement Spark functions
url_encode
andurl_decode
Are these changes tested?
Yes
Are there any user-facing changes?
Yes