Skip to content

Conversation

@nyoungstudios
Copy link
Contributor

@nyoungstudios nyoungstudios commented Jul 11, 2025

Please add a meaningful description for your change here

Adds .editorconfig
Modifies .pre-commit-config.yaml
All other changes where modified by running pre-commit run -all-files

For more details, keep reading:

Adds a pre-commit hook to standardize whitespaces, so we don't have stray new lines or missing new line at the end of the file, or stray spaces at the end of a line
Adds an .editorconfig file with these same settings. Previously, I found it harder to contribute to the Python codebase since the default Python indent is 4 spaces, but this project uses 2 spaces. So, adding the EditorConfig will let your code editor like VS Code or PyCharm automatically pick up this project's settings. VS Code you need to install the code extension, but PyCharm does have this built in.

It seems that we also don't need the sdks/python/scripts/run_whitespacelint.sh anymore with this change. But I wasn't quite sure how to remove it from the build.gradle yet. And for what it is worth, it is using https://github.com/bendikro/whitespacelint which doesn't look maintained.

Also, I wasn't exactly sure how the .pre-commit-config.yaml was being run in the GitHub actions or if it was at all since there were comments in the .pre-commit-config.yaml referencing to keep the yapf and pylint versions synced elsewhere.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@nyoungstudios nyoungstudios marked this pull request as ready for review July 11, 2025 03:12
@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@nyoungstudios
Copy link
Contributor Author

I don't think the GitHub action failures are related to my code changes.

@nyoungstudios
Copy link
Contributor Author

assign set of reviewers

@github-actions
Copy link
Contributor

Assigning reviewers:

R: @jrmccluskey for label python.
R: @m-trieu for label java.
R: @shunping for label go.
R: @Abacn for label build.
R: @kennknowles for label website.
R: @johnjcasey for label kafka.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@jrmccluskey
Copy link
Contributor

You've impacted 1,775 files here, I'm assuming by applying the whitespace linting to the entire code base (although I have no clue how you managed to also impact things like SVGs for the website.) I would suggest narrowing the scope significantly, as A) I'd prefer to not impact git blame on all of these files and B) actually reviewing this PR is a nightmare to ensure that all of the changes are legitimately scoped to pure whitespace changes.

@nyoungstudios
Copy link
Contributor Author

You've impacted 1,775 files here, I'm assuming by applying the whitespace linting to the entire code base (although I have no clue how you managed to also impact things like SVGs for the website.) I would suggest narrowing the scope significantly, as A) I'd prefer to not impact git blame on all of these files and B) actually reviewing this PR is a nightmare to ensure that all of the changes are legitimately scoped to pure whitespace changes.

Thanks Jack for your feedback and perspective here (also enjoyed your session when you spoke at Beam Summit too).

For SVGs, they are ultimately text files too, not binary files. So, applying the whitespace rules to them seems fair in my opinion. Fwiw, there were many SVGs that did not require any changes which means that they already follow these new linting rules. So, in my opinion, enforcing consistency here is a good thing.

If you prefer my changes to not effect the existing git blame, I could create a separate pr after this one to add the git hash for this merged change to an ignore file. You can see an example from another Apache project, Apache Airflow, here.

Finally, I do not think any human should try to manually review the whitespace changes in this pr. To make it easier to, I have pushed 4 separate commits that should allow you to review the non-whitespace changes separately. Let's go through each of them.

2f6d47d (adds editorconfig and pre-commit config)

  • First, this commit adds the .editorconfig file which makes it easier for your editor to pick up the correct indentation for the file extension. Mainly, most editors and projects use 4 spaces for Python indents where this project uses 2. Let me know if I missed any extensions though. So, this allows the new contributor to start contributing with less hassle. This .editorconfig also removes trailing whitespaces at the end of life and makes sure you have a new line at the end of the file when you save it.
  • Secondly, this commit also adds the changes in the .pre-commit-config.yaml file to enforce the whitespace linting.

be0f635 (comments out these pre-commit hooks temporarily)

  • This commit comments out the yapf and pylint pre-commit hooks in the .pre-commit-config.yaml temporarily. No need to run these pre-commit hooks if we didn't modify any Python code. And mainly to save some runtime when running the next step.

0029f5f (applies pre-commit changes)

  • This commit is the result of running pre-commit run --all-files on top of the previous commit.
  • To verify this is all I did, you can checkout my branch to the previous commit (be0f635) and run pre-commit run --all-files after installing it if you don't already have it setup locally. Then, you can do a git diff to see that the result is the same.

b38afdc (Revert "comments out these pre-commit hooks temporarily")

  • This commit reverts our commit from earlier when we commented out the yapf and pylint pre-commit hooks.

Okay, after walking through each commit and running the pre-commit command locally yourself, I think you can have more confidence in my change. But how do we know that the pre-commit hook I am adding does what it says.


Finally, in order for this to be beneficial, I think we need we need to be running the .pre-commit-config.yaml in the GitHub Actions. I am not sure where this is happening, but since I am new to this repo, I may just be missing where it is being called currently. But if not, I'll be happy to add a GitHub action that runs the pre-commit config (I've done this before, here is an example in another public repo that I made a GitHub action workflow to run the pre-commit config).

Separately, I did this script sdks/python/scripts/run_whitespacelint.sh which runs https://github.com/bendikro/whitespacelint Python library to check whitespaces if I am understanding how the CI process works. I do think this pre-commit hook I am adding is more robust and more popular than the existing tool. But any guidance on the existing process would be appreciated.

@github-actions
Copy link
Contributor

Reminder, please take a look at this pr: @jrmccluskey @m-trieu @shunping @Abacn @kennknowles @johnjcasey

@damccorm
Copy link
Contributor

Hey @nyoungstudios I think we already have tooling to cover linting and whitespace in our repo (specifically for python this is covered via tox + some other linting checks - https://cwiki.apache.org/confluence/display/BEAM/Python+Tips#PythonTips-LintandFormattingChecks)

I'm not sure we need more linting rules in the repo, and I'm hesitant to ask people to update their regular workflows to accomodate this when there's not much of a value add.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2025

Reminder, please take a look at this pr: @jrmccluskey @m-trieu @shunping @Abacn @kennknowles @johnjcasey

@damccorm
Copy link
Contributor

damccorm commented Aug 4, 2025

Closing per my last comment - feel free to comment if you disagree

@damccorm damccorm closed this Aug 4, 2025
@nyoungstudios
Copy link
Contributor Author

Thanks @damccorm for the review. Sorry for the delayed response as I got busy with other things.

Obviously, I am not an active contributor here; so, if you guys don't think this is worth adding then that is fine with me. I will say that even though this repo does have linting for Python, my updates in this pr showed that the current tools being used still miss some whitespace. If I were a code reviewer where there could be inconsistency in whitespace, I would fine that annoying. But I don't really mind that much since I am not a code reviewer here.

I will say that my biggest complaint when first contributing Python code to this repo is that it uses 2 spaces for a tab opposed to most Python projects and Python IDEs that default to 4 spaces. With that in mind, do you think I could make this pr (or open another pr) for just the .editorconfig file change?

Lots of other popular repos on GitHub use an .editorconifg file like Apache Airflow, React, Django, and VS Code.

@damccorm
Copy link
Contributor

Sure, adding this seems fine, thanks

@nyoungstudios
Copy link
Contributor Author

Thanks, I've opened the updated pr here: #35956

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment