Skip to content

Conversation

anhvdq
Copy link

@anhvdq anhvdq commented Sep 4, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Implement Spark functions url_encode and url_decode

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) spark labels Sep 4, 2025
///
fn decode(value: &str) -> Result<String> {
// Check if the string has valid percent encoding
// TODO: Support `try_url_decode`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO still valid? Should we open an issue and link to it if so?

}

/// Replace b'+' with b' '
fn replace_plus(input: &[u8]) -> Cow<'_, [u8]> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test case that covers this replacement?

);
}
match arg_types[0] {
DataType::Utf8 | DataType::Utf8View => Ok(DataType::Utf8),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're collecting to a StringView for input type of Utf8View, so shouldn't this be more like

DataType::Utf8 | DataType::Utf8View | DataType::LargeUtf8 => Ok(arg_types[0].clone())

);
}
match arg_types[0] {
DataType::Utf8 | DataType::Utf8View => Ok(DataType::Utf8),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as for decode

/// * `Ok(String)` - The encoded string
///
fn encode(value: &str) -> Result<String> {
Ok(byte_serialize(value.as_bytes()).collect::<String>())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for encoding you don't need to do the inverse operations you do in decoding? Like the + -> ' ' manipulation? I am unfamiliar with the specific needs here.

# For more information, please see:
# https://github.com/apache/datafusion/issues/15914
query T
SELECT url_decode('https%3A%2F%2Fspark.apache.org'::string);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unfamiliar with the slt files, but if there was a way to have these do all 3 types of input (utf8, utf8view, largeutf8) that would be nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spark sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants