Skip to content

fix: preserve column names with spaces in wr.redshift.copy()#3298

Open
hirenkumar-n-dholariya wants to merge 1 commit intoaws:mainfrom
hirenkumar-n-dholariya:hirenkumar-n-dholariya-fix/redshift-copy-column-space-rename
Open

fix: preserve column names with spaces in wr.redshift.copy()#3298
hirenkumar-n-dholariya wants to merge 1 commit intoaws:mainfrom
hirenkumar-n-dholariya:hirenkumar-n-dholariya-fix/redshift-copy-column-space-rename

Conversation

@hirenkumar-n-dholariya
Copy link
Copy Markdown

Problem

wr.redshift.copy() silently renames columns with spaces (e.g. "my col" → "my_col")
because the internal s3.to_parquet call defaults to pyarrow flavor='spark',
which sanitizes column names.

Fix

Explicitly pass pyarrow_additional_kwargs={"flavor": None} in the internal
s3.to_parquet call to preserve original column names.

Fixes #3293

Passes flavor=None to internal s3.to_parquet call to prevent pyarrow spark flavor from sanitizing column names (spaces → underscores). Fixes aws#3293
@kukushking
Copy link
Copy Markdown
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: fdccae4
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@kukushking
Copy link
Copy Markdown
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: fdccae4
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@hirenkumar-dholariya
Copy link
Copy Markdown

hirenkumar-dholariya commented Apr 10, 2026

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: fdccae4
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@kukushking Could you confirm this failure is pre-existing and unrelated to the fix? Happy to address any other feedback!
The CI failure is unrelated to this fix. The GitHubDistributedCodeBuild failure is caused by a pre-existing incompatibility between modin==0.37.1 and pandas==3.0.1pandas.read_gbq was removed in pandas 3.x, which causes an AttributeError when loading modin.

The GitHubCodeBuild (non-distributed) pipeline passed successfully.
This issue exists independently in the main branch and is not introduced by this PR.

@hirenkumar-n-dholariya
Copy link
Copy Markdown
Author

@kukushking Could you confirm this failure is pre-existing and unrelated to the fix? Happy to address any other feedback! The CI failure is unrelated to this fix. The GitHubDistributedCodeBuild failure is caused by a pre-existing incompatibility between modin==0.37.1 and pandas==3.0.1pandas.read_gbq was removed in pandas 3.x, which causes an AttributeError when loading modin.

The GitHubCodeBuild (non-distributed) pipeline passed successfully. This issue exists independently in the main branch and is not introduced by this PR.

@kukushking Could you confirm this failure is pre-existing and unrelated to the fix? Happy to address any other feedback!
The CI failure is unrelated to this fix. The GitHubDistributedCodeBuild failure is caused by a pre-existing incompatibility between modin==0.37.1 and pandas==3.0.1 — pandas.read_gbq was removed in pandas 3.x, which causes an AttributeError when loading modin.

The GitHubCodeBuild (non-distributed) pipeline passed successfully.
This issue exists independently in the main branch and is not introduced by this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

wr.redshift.copy() silently renames columns with spaces due to pyarrow defaulting to flavor='spark' in internal s3.to_parquet call

3 participants