Skip to content

Conversation

@vvolkgang
Copy link
Member

@vvolkgang vvolkgang commented Nov 11, 2025

🎟️ Tracking

PM-14880

📔 Objective

Automate pull request labelling based on changed file paths and PR title patterns following conventional commits format, as a first step towards release notes with automated ✨ categories ✨ !

  • JSON configuration: Label patterns defined in .github/label-pr.json - planned using yaml initially but it would require an external dependency, introducing the need for python environment and dependency management. Python supports json deserialization out of the box. File based config will allow us to centralise these scripts in our gh-actions repo later down the line.
  • Labeling modes (add or replace) - In order to keep labels up-to-date in our PRs we'll replace them when they're updated, with the exception of Community PRs and our automated PRs where we'll add instead, preserving labels set by other automations.
  • Raw python script without additional dependencies - faster workflows and easier local testing.

Note

  • pull_request workflow target will be added in a future PR after testing with workflow_dispatch.
  • Catch-all label set as t:misc after much deliberation, due to (1) conciseness and (2) it's just a fallback for unhandled exceptions which we should monitor and address, not expecting us to use it much.

Labels

App Labels (from file paths):

  • app:password-manager - Password manager app changes
  • app:authenticator - Authenticator app changes

Change Type Labels:

  • t:new-feature - New product features
  • t:enhancement - Existing product feature enhancements
  • t:bug - Bug fixes
  • t:tech-debt - Technical debt, refactoring, cleanup
  • t:docs - Documentation changes
  • t:ci - CI/CD changes
  • t:deps - Dependency updates
  • t:breaking-change - Breaking changes
  • t:misc - Miscellaneous (catch-all)

⏰ Reminders before review

  • Contributor guidelines followed
  • All formatters and local linters executed and passed
  • Written new unit and / or integration tests where applicable
  • Protected functional changes with optionality (feature flags)
  • Used internationalization (i18n) for all UI strings
  • CI builds passed
  • Communicated to DevOps any deployment requirements
  • Updated any necessary documentation (Confluence, contributing docs) or informed the documentation team

🦮 Reviewer guidelines

  • 👍 (:+1:) or similar for great changes
  • 📝 (:memo:) or ℹ️ (:information_source:) for notes or general info
  • ❓ (:question:) for questions
  • 🤔 (:thinking:) or 💭 (:thought_balloon:) for more open inquiry that's not quite a confirmed issue and could potentially benefit from discussion
  • 🎨 (:art:) for suggestions / improvements
  • ❌ (:x:) or ⚠️ (:warning:) for more significant problems or concerns needing attention
  • 🌱 (:seedling:) or ♻️ (:recycle:) for future improvements or indications of technical debt
  • ⛏ (:pick:) for minor or nitpick changes

@vvolkgang vvolkgang changed the title [PM-14880] ci: Add automated PR labeling based on file paths and title patterns [PM-14880] ci: Add automated PR labelling based on file paths and title patterns Nov 11, 2025
@claude
Copy link
Contributor

claude bot commented Nov 11, 2025

Claude finished @vvolkgang's task —— View job


Code Review Complete

Overall Assessment: APPROVE

This PR introduces a well-structured automated PR labeling system for CI/CD. The implementation is clean, uses appropriate tooling (Python + GitHub CLI), and includes sensible configuration management. The code quality is solid for an infrastructure change.

Review Findings:

I've identified a few areas for improvement in inline comments:

Warnings (⚠️):

  • Finding 1: Documentation states --add is default, but workflow uses --replace for normal PRs - clarify intended behavior
  • Finding 2: Silent failures in gh_get_changed_files() and gh_get_pr_title() return empty values instead of failing explicitly

Improvements (🎨):

  • Finding 3: Title pattern matching could produce false positives (e.g., "defeat:" matching "feat" pattern)

Questions (💭):

  • Finding 4: parse_known_args() is a workaround for shell quoting - consider fixing at workflow level
  • Finding 5: gradle/ appears in both app:shared and t:deps patterns - confirm if intentional
  • Finding 6: Workflow only supports manual dispatch - automated triggers noted as future work

Previous Review Status:
Multiple previous Claude reviews flagged similar issues. Most remain unaddressed but are minor quality improvements rather than blockers. The human approval from @fedemkr suggests these are acceptable for the current testing phase.

PR Description Quality:
Excellent - clear objective, implementation details, label definitions, and explicit note about future work (automated triggers). Well-structured for reviewers.

See inline comments for detailed recommendations.


@github-actions
Copy link
Contributor

github-actions bot commented Nov 11, 2025

Logo
Checkmarx One – Scan Summary & Details66f71a75-9d7e-449f-8e2d-f90b52e31533

Great job! No new security vulnerabilities introduced in this pull request

@codecov
Copy link

codecov bot commented Nov 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.97%. Comparing base (21afa81) to head (51677ba).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6157   +/-   ##
=======================================
  Coverage   84.97%   84.97%           
=======================================
  Files         735      735           
  Lines       53087    53087           
  Branches     7678     7678           
=======================================
  Hits        45113    45113           
  Misses       5291     5291           
  Partials     2683     2683           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

LABEL_PATH_PATTERNS = config["path_patterns"]

pr_number = args.pr_number
mode = "replace" if args.replace else "add"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Mode default behavior inconsistent with documentation

Details

The implementation defaults to "add" when neither flag is provided:

mode = "replace" if args.replace else "add"

However, the documentation at lines 11-12 states:

-a, --add: Add labels without removing existing ones (default)
-r, --replace: Replace all existing labels

This creates confusion about actual vs documented behavior. The workflow at .github/workflows/sdlc-label-pr.yml:68-69 also uses --replace as the default for normal PRs, suggesting replace should be the default.

Recommendation:
Update documentation to reflect actual behavior:

-a, --add: Add labels without removing existing ones
-r, --replace: Replace all existing labels (default for normal PRs)

Or update DEFAULT_MODE constant at line 30 to match documentation.

"network/",
"ui/",
"authenticatorbridge/",
"gradle/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Pattern overlap: gradle/ appears in both app:shared and t:deps

Details

The gradle/ directory pattern is defined in:

  • app:shared (line 22): Triggers app:password-manager + app:authenticator labels
  • t:deps (line 43): Triggers t:deps label

When a PR modifies gradle/ files, it will receive three labels: app:password-manager, app:authenticator, and t:deps.

This might be intentional (gradle changes affect both apps and are dependency-related), but worth confirming this is the desired behavior. If gradle changes should only be labeled as t:deps, remove it from the app:shared list.

)
changed_files = result.stdout.strip().split("\n")
return list(filter(None, changed_files))
except subprocess.CalledProcessError as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Silent failure: Returns empty list on error without clear indication

Details

When gh pr diff fails, the function prints an error message but returns an empty list [], which is indistinguishable from a PR with no changed files. This could mask genuine API failures.

The caller at line 212 doesn't distinguish between:

  1. API failure (should potentially fail the workflow)
  2. Legitimately no files changed (should continue with title-based labeling)

Current behavior:

changed_files = gh_get_changed_files(pr_number)  # Returns [] on error
print("👀 Changed files:\n" + "\n".join(changed_files) + "\n")  # Prints empty list

Recommendation:
Either:

  1. Exit the script on critical failures:
def gh_get_changed_files(pr_number: str) -> list[str]:
    try:
        # ... existing code ...
    except subprocess.CalledProcessError as e:
        print(f"::error::Error getting changed files: {e}")
        sys.exit(1)  # Critical failure
  1. Or add explicit validation after the call:
changed_files = gh_get_changed_files(pr_number)
if changed_files is None:  # Return None on error instead of []
    print("::error::Failed to retrieve changed files")
    sys.exit(1)

Same applies to gh_get_pr_title() at line 93.

GH_TOKEN: ${{ github.token }}
_PR_NUMBER: ${{ inputs.pr-number || github.event.pull_request.number }}
_LABEL_MODE: ${{ inputs.mode && format('--{0}', inputs.mode) || steps.label-mode.outputs.label_mode }}
_DRY_RUN: ${{ inputs.dry-run == true && '--dry-run' || '' }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Empty string passed as positional argument when dry-run is false

Details

When inputs.dry-run is false, the _DRY_RUN variable is set to an empty string '':

_DRY_RUN: ${{ inputs.dry-run == true && '--dry-run' || '' }}

This empty string is then passed as a positional argument to the Python script:

python3 .github/scripts/label-pr.py "$_PR_NUMBER" "$_LABEL_MODE" "$_DRY_RUN"

When _DRY_RUN="", the shell expands this to:

python3 .github/scripts/label-pr.py "1234" "--replace" ""

The Python script receives an empty string as the third argument, which argparse treats as an unknown argument. The script handles this with parse_known_args() at line 191, but this is a workaround for a shell quoting issue.

Recommendation:
Only pass _DRY_RUN if it's not empty:

- name: Label PR based on changed files
  env:
    GH_TOKEN: ${{ github.token }}
    _PR_NUMBER: ${{ inputs.pr-number || github.event.pull_request.number }}
    _LABEL_MODE: ${{ inputs.mode && format('--{0}', inputs.mode) || steps.label-mode.outputs.label_mode }}
    _DRY_RUN: ${{ inputs.dry-run == true && '--dry-run' || '' }}
  run: |
    if [ -n "$_DRY_RUN" ]; then
      python3 .github/scripts/label-pr.py "$_PR_NUMBER" "$_LABEL_MODE" "$_DRY_RUN"
    else
      python3 .github/scripts/label-pr.py "$_PR_NUMBER" "$_LABEL_MODE"
    fi

This eliminates the need for the parse_known_args() workaround.

title_labels = label_title(pr_title, LABEL_TITLE_PATTERNS)
all_labels = set(filepath_labels + title_labels)

if not any(label.startswith("t:") for label in all_labels):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Catch-all label logic may produce unexpected results

Details

The catch-all label (t:misc) is added only when there are no t: prefixed labels:

if not any(label.startswith("t:") for label in all_labels):
    all_labels.add(CATCH_ALL_LABEL)

Scenario: A PR that:

  • Modifies app/MainActivity.kt → gets app:password-manager
  • Has title "Update MainActivity layout" → no title pattern match

Result: Only app:password-manager label, no type label. Then t:misc is added.

Question: Should the catch-all only apply when both filepath and title labeling fail to produce type labels, or is the current behavior (adding t:misc whenever no type label exists) correct?

Consider if this edge case needs explicit documentation or if the logic should check:

# Only add catch-all if BOTH filepath and title labeling found no type labels
if not title_labels and not any(label.startswith("t:") for label in filepath_labels):
    all_labels.add(CATCH_ALL_LABEL)

Current behavior seems reasonable, but worth confirming intent.

for label, patterns in title_patterns.items():
for pattern in patterns:
# Check for pattern with : or ( suffix (conventional commits format)
if f"{pattern}:" in title_lower or f"{pattern}(" in title_lower:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎨 Title pattern matching could be more precise

Details

Current logic checks if pattern appears anywhere in the title with : or ( suffix:

if f"{pattern}:" in title_lower or f"{pattern}(" in title_lower:

This could produce false positives. Examples:

  • Pattern "fix" matches title "Add prefix: bug" (contains "fix:" in "prefix:")
  • Pattern "feat" matches title "defeat: the enemy" (contains "feat:")

Recommendation:
Use more precise matching to ensure pattern is at start of title or after whitespace:

# Check for pattern at beginning or after whitespace
import re
pattern_regex = rf"(^|\s){re.escape(pattern)}[:(]"
if re.search(pattern_regex, title_lower):
    print(f"📝 Title matches pattern '{pattern}' for label '{label}'")
    labels_to_apply.add(label)
    break

Or simpler string-based approach:

# Check pattern is at start or preceded by space
if title_lower.startswith(f"{pattern}:") or title_lower.startswith(f"{pattern}(") or \
   f" {pattern}:" in title_lower or f" {pattern}(" in title_lower:

Current implementation works for well-formatted conventional commits, but could be more robust.

name: SDLC / Label PR by Files

on:
workflow_dispatch:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Workflow only supports manual dispatch - PR events not yet configured

Details

The workflow currently only triggers via workflow_dispatch, requiring manual execution. The PR description mentions:

pull_request workflow target will be added in a future PR after testing with workflow_dispatch.

This is fine for initial testing, but worth noting that automated labeling won't occur until the trigger is updated to something like:

on:
  pull_request:
    types: [opened, synchronize, reopened]
  pull_request_target:  # For fork PRs
    types: [opened, synchronize, reopened]
  workflow_dispatch:
    # ... keep existing inputs for manual testing

Security consideration: When adding pull_request_target, ensure the workflow checks out the base branch (not PR branch) before running untrusted code, or limit to trusted users only.

Current implementation is safe for testing phase. Just flagging for future work.

import subprocess
import sys

DEFAULT_MODE = "add"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Documentation contradicts implementation regarding default mode.

Details

The constant DEFAULT_MODE = "add" and documentation at lines 11-12 state that --add is the default:

-a, --add: Add labels without removing existing ones (default)
-r, --replace: Replace all existing labels

However, the implementation at line 202 defaults to "add" only when --add is explicitly provided, otherwise it uses "replace":

mode = "replace" if args.replace else "add"

This means when neither flag is provided, the mode is "add", which actually matches the documentation. However, the workflow at .github/workflows/sdlc-label-pr.yml:68-69 uses --replace as the default for normal PRs, creating confusion about intended behavior.

Recommendation: Clarify whether the default should be "add" or "replace" and update either the documentation or the workflow accordingly for consistency.

)
changed_files = result.stdout.strip().split("\n")
return list(filter(None, changed_files))
except subprocess.CalledProcessError as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Silent failure returns empty list on error, masking API failures.

Details

When gh pr diff fails, the function prints an error but returns an empty list [], which is indistinguishable from a PR with no changed files. This could mask genuine API failures.

The caller at line 212 cannot distinguish between:

  1. API failure (should potentially fail the workflow)
  2. Legitimately no files changed (should continue with title-based labeling)

Current behavior:

changed_files = gh_get_changed_files(pr_number)  # Returns [] on error
print("👀 Changed files:\n" + "\n".join(changed_files) + "\n")  # Prints empty list

Recommendation: Either exit on critical failures or return None on error and handle it explicitly:

def gh_get_changed_files(pr_number: str) -> list[str] | None:
    try:
        # ... existing code ...
    except subprocess.CalledProcessError as e:
        print(f"::error::Error getting changed files: {e}")
        return None  # Signal error explicitly

# At call site:
changed_files = gh_get_changed_files(pr_number)
if changed_files is None:
    print("::error::Failed to retrieve changed files")
    sys.exit(1)

Same issue exists in gh_get_pr_title() at line 92.

for label, patterns in title_patterns.items():
for pattern in patterns:
# Check for pattern with : or ( suffix (conventional commits format)
if f"{pattern}:" in title_lower or f"{pattern}(" in title_lower:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎨 Title pattern matching could produce false positives.

Details

Current logic checks if pattern appears anywhere in the title:

if f"{pattern}:" in title_lower or f"{pattern}(" in title_lower:

This could match incorrectly. Examples:

  • Pattern "fix" matches title "Add prefix: bug" (contains "fix:" within "prefix:")
  • Pattern "feat" matches title "defeat: the enemy" (contains "feat:" within "defeat:")

Recommendation: Use more precise matching to ensure pattern is at word boundary:

import re

# At function level, check pattern at start or after whitespace
pattern_regex = rf"(^|\s){re.escape(pattern)}[:(]"
if re.search(pattern_regex, title_lower):
    print(f"📝 Title matches pattern '{pattern}' for label '{label}'")
    labels_to_apply.add(label)
    break

Or simpler string-based approach:

# Check pattern is at start or preceded by space
if title_lower.startswith(f"{pattern}:") or title_lower.startswith(f"{pattern}(") or \
   f" {pattern}:" in title_lower or f" {pattern}(" in title_lower:

Current implementation works for well-formatted conventional commits, but could be more robust.

default=DEFAULT_CONFIG_PATH,
help=f"Path to JSON config file (default: {DEFAULT_CONFIG_PATH})"
)
args, unknown = parser.parse_known_args() # required to handle --dry-run passed as an empty string ("") by the workflow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 parse_known_args() is a workaround for shell quoting issue in workflow.

Details

The comment states this is "required to handle --dry-run passed as an empty string".

This happens because the workflow at .github/workflows/sdlc-label-pr.yml:76-79 passes $_DRY_RUN which can be an empty string when dry-run is false:

_DRY_RUN: ${{ inputs.dry-run == true && '--dry-run' || '' }}

When _DRY_RUN="", the shell command becomes:

python3 .github/scripts/label-pr.py "$_PR_NUMBER" "$_LABEL_MODE" ""

The empty string is passed as a positional argument, which argparse treats as unknown.

Recommendation: Fix at the workflow level instead of working around in Python:

run: |
  if [ -n "$_DRY_RUN" ]; then
    python3 .github/scripts/label-pr.py "$_PR_NUMBER" "$_LABEL_MODE" "$_DRY_RUN"
  else
    python3 .github/scripts/label-pr.py "$_PR_NUMBER" "$_LABEL_MODE"
  fi

This eliminates the need for parse_known_args() and makes the interface cleaner.

"network/",
"ui/",
"authenticatorbridge/",
"gradle/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Pattern overlap: gradle/ appears in both app:shared and t:deps.

Details

The gradle/ directory pattern is defined in:

  • app:shared (line 22): Triggers expansion to app:password-manager + app:authenticator labels
  • t:deps (line 43): Triggers t:deps label

When a PR modifies gradle/ files, it will receive three labels: app:password-manager, app:authenticator, and t:deps.

This might be intentional (gradle changes affect both apps and are dependency-related), but worth confirming this is the desired behavior. If gradle changes should only be labeled as t:deps, remove it from the app:shared list.

@@ -0,0 +1,80 @@
name: SDLC / Label PR by Files

on:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Workflow only supports manual dispatch - automated triggers not yet configured.

Details

The workflow currently only triggers via workflow_dispatch, requiring manual execution. The PR description mentions:

pull_request workflow target will be added in a future PR after testing with workflow_dispatch.

This is appropriate for initial testing. When adding automated triggers, consider:

on:
  pull_request:
    types: [opened, synchronize, reopened]
  pull_request_target:  # For fork PRs with write permissions
    types: [opened, synchronize, reopened]
  workflow_dispatch:
    # ... keep existing inputs for manual testing

Security consideration: When adding pull_request_target, ensure the workflow checks out the base branch (not PR branch) before running any code from the PR, or validate that the Python script execution is safe for untrusted PRs.

Current implementation is safe for testing phase.

@vvolkgang vvolkgang added this pull request to the merge queue Nov 17, 2025
Merged via the queue into main with commit 4623a4f Nov 17, 2025
36 of 48 checks passed
@vvolkgang vvolkgang deleted the vvolkgang/label-prs branch November 17, 2025 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants