[PM-14880] ci: Add automated PR labelling based on file paths and title patterns #6157

vvolkgang · 2025-11-11T20:23:49Z

🎟️ Tracking

PM-14880

📔 Objective

Automate pull request labelling based on changed file paths and PR title patterns following conventional commits format, as a first step towards release notes with automated ✨ categories ✨ !

JSON configuration: Label patterns defined in .github/label-pr.json - planned using yaml initially but it would require an external dependency, introducing the need for python environment and dependency management. Python supports json deserialization out of the box. File based config will allow us to centralise these scripts in our gh-actions repo later down the line.
Labeling modes (add or replace) - In order to keep labels up-to-date in our PRs we'll replace them when they're updated, with the exception of Community PRs and our automated PRs where we'll add instead, preserving labels set by other automations.
Raw python script without additional dependencies - faster workflows and easier local testing.

Note

pull_request workflow target will be added in a future PR after testing with workflow_dispatch.
Catch-all label set as t:misc after much deliberation, due to (1) conciseness and (2) it's just a fallback for unhandled exceptions which we should monitor and address, not expecting us to use it much.

Labels

App Labels (from file paths):

app:password-manager - Password manager app changes
app:authenticator - Authenticator app changes

Change Type Labels:

t:new-feature - New product features
t:enhancement - Existing product feature enhancements
t:bug - Bug fixes
t:tech-debt - Technical debt, refactoring, cleanup
t:docs - Documentation changes
t:ci - CI/CD changes
t:deps - Dependency updates
t:breaking-change - Breaking changes
t:misc - Miscellaneous (catch-all)

⏰ Reminders before review

Contributor guidelines followed
All formatters and local linters executed and passed
Written new unit and / or integration tests where applicable
Protected functional changes with optionality (feature flags)
Used internationalization (i18n) for all UI strings
CI builds passed
Communicated to DevOps any deployment requirements
Updated any necessary documentation (Confluence, contributing docs) or informed the documentation team

🦮 Reviewer guidelines

👍 (:+1:) or similar for great changes
📝 (:memo:) or ℹ️ (:information_source:) for notes or general info
❓ (:question:) for questions
🤔 (:thinking:) or 💭 (:thought_balloon:) for more open inquiry that's not quite a confirmed issue and could potentially benefit from discussion
🎨 (:art:) for suggestions / improvements
❌ (:x:) or ⚠️ (:warning:) for more significant problems or concerns needing attention
🌱 (:seedling:) or ♻️ (:recycle:) for future improvements or indications of technical debt
⛏ (:pick:) for minor or nitpick changes

… of testing

…ipt arguments

claude · 2025-11-11T20:26:12Z

Claude finished @vvolkgang's task —— View job

Code Review Complete

Overall Assessment: APPROVE

This PR introduces a well-structured automated PR labeling system for CI/CD. The implementation is clean, uses appropriate tooling (Python + GitHub CLI), and includes sensible configuration management. The code quality is solid for an infrastructure change.

Review Findings:

I've identified a few areas for improvement in inline comments:

Warnings (⚠️):

Finding 1: Documentation states --add is default, but workflow uses --replace for normal PRs - clarify intended behavior
Finding 2: Silent failures in gh_get_changed_files() and gh_get_pr_title() return empty values instead of failing explicitly

Improvements (🎨):

Finding 3: Title pattern matching could produce false positives (e.g., "defeat:" matching "feat" pattern)

Questions (💭):

Finding 4: parse_known_args() is a workaround for shell quoting - consider fixing at workflow level
Finding 5: gradle/ appears in both app:shared and t:deps patterns - confirm if intentional
Finding 6: Workflow only supports manual dispatch - automated triggers noted as future work

Previous Review Status:
Multiple previous Claude reviews flagged similar issues. Most remain unaddressed but are minor quality improvements rather than blockers. The human approval from @fedemkr suggests these are acceptable for the current testing phase.

PR Description Quality:
Excellent - clear objective, implementation details, label definitions, and explicit note about future work (automated triggers). Well-structured for reviewers.

See inline comments for detailed recommendations.

.github/scripts/label-pr.py

github-actions · 2025-11-11T20:32:44Z

Checkmarx One – Scan Summary & Details – 66f71a75-9d7e-449f-8e2d-f90b52e31533

Great job! No new security vulnerabilities introduced in this pull request

codecov · 2025-11-11T20:56:58Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.97%. Comparing base (21afa81) to head (51677ba).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #6157   +/-   ##
=======================================
  Coverage   84.97%   84.97%           
=======================================
  Files         735      735           
  Lines       53087    53087           
  Branches     7678     7678           
=======================================
  Hits        45113    45113           
  Misses       5291     5291           
  Partials     2683     2683

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

claude · 2025-11-12T23:09:13Z

.github/scripts/label-pr.py

+    LABEL_PATH_PATTERNS = config["path_patterns"]
+
+    pr_number = args.pr_number
+    mode = "replace" if args.replace else "add"


⚠️ Mode default behavior inconsistent with documentation

Details

The implementation defaults to "add" when neither flag is provided:

mode = "replace" if args.replace else "add"

However, the documentation at lines 11-12 states:

-a, --add: Add labels without removing existing ones (default) -r, --replace: Replace all existing labels

This creates confusion about actual vs documented behavior. The workflow at .github/workflows/sdlc-label-pr.yml:68-69 also uses --replace as the default for normal PRs, suggesting replace should be the default.

Recommendation:
Update documentation to reflect actual behavior:

-a, --add: Add labels without removing existing ones -r, --replace: Replace all existing labels (default for normal PRs)

Or update DEFAULT_MODE constant at line 30 to match documentation.

claude · 2025-11-12T23:09:29Z

.github/label-pr.json

+      "network/",
+      "ui/",
+      "authenticatorbridge/",
+      "gradle/"


💭 Pattern overlap: gradle/ appears in both app:shared and t:deps

Details

The gradle/ directory pattern is defined in:

app:shared (line 22): Triggers app:password-manager + app:authenticator labels

t:deps (line 43): Triggers t:deps label

When a PR modifies gradle/ files, it will receive three labels: app:password-manager, app:authenticator, and t:deps.

This might be intentional (gradle changes affect both apps and are dependency-related), but worth confirming this is the desired behavior. If gradle changes should only be labeled as t:deps, remove it from the app:shared list.

claude · 2025-11-12T23:09:31Z

.github/scripts/label-pr.py

+        )
+        changed_files = result.stdout.strip().split("\n")
+        return list(filter(None, changed_files))
+    except subprocess.CalledProcessError as e:


⚠️ Silent failure: Returns empty list on error without clear indication

Details

When gh pr diff fails, the function prints an error message but returns an empty list [], which is indistinguishable from a PR with no changed files. This could mask genuine API failures.

The caller at line 212 doesn't distinguish between:

API failure (should potentially fail the workflow)

Legitimately no files changed (should continue with title-based labeling)

Current behavior:

changed_files = gh_get_changed_files(pr_number) # Returns [] on error print("👀 Changed files:\n" + "\n".join(changed_files) + "\n") # Prints empty list

Recommendation:
Either:

Exit the script on critical failures:

def gh_get_changed_files(pr_number: str) -> list[str]: try: # ... existing code ... except subprocess.CalledProcessError as e: print(f"::error::Error getting changed files: {e}") sys.exit(1) # Critical failure

Or add explicit validation after the call:

changed_files = gh_get_changed_files(pr_number) if changed_files is None: # Return None on error instead of [] print("::error::Failed to retrieve changed files") sys.exit(1)

Same applies to gh_get_pr_title() at line 93.

claude · 2025-11-12T23:09:50Z

.github/workflows/sdlc-label-pr.yml

+          GH_TOKEN: ${{ github.token }}
+          _PR_NUMBER: ${{ inputs.pr-number || github.event.pull_request.number }}
+          _LABEL_MODE: ${{ inputs.mode && format('--{0}', inputs.mode) || steps.label-mode.outputs.label_mode }}
+          _DRY_RUN: ${{ inputs.dry-run == true && '--dry-run' || '' }}


⚠️ Empty string passed as positional argument when dry-run is false

Details

When inputs.dry-run is false, the _DRY_RUN variable is set to an empty string '':

_DRY_RUN: ${{ inputs.dry-run == true && '--dry-run' || '' }}

This empty string is then passed as a positional argument to the Python script:

python3 .github/scripts/label-pr.py "$_PR_NUMBER" "$_LABEL_MODE" "$_DRY_RUN"

When _DRY_RUN="", the shell expands this to:

python3 .github/scripts/label-pr.py "1234" "--replace" ""

The Python script receives an empty string as the third argument, which argparse treats as an unknown argument. The script handles this with parse_known_args() at line 191, but this is a workaround for a shell quoting issue.

Recommendation:
Only pass _DRY_RUN if it's not empty:

- name: Label PR based on changed files env: GH_TOKEN: ${{ github.token }} _PR_NUMBER: ${{ inputs.pr-number || github.event.pull_request.number }} _LABEL_MODE: ${{ inputs.mode && format('--{0}', inputs.mode) || steps.label-mode.outputs.label_mode }} _DRY_RUN: ${{ inputs.dry-run == true && '--dry-run' || '' }} run: | if [ -n "$_DRY_RUN" ]; then python3 .github/scripts/label-pr.py "$_PR_NUMBER" "$_LABEL_MODE" "$_DRY_RUN" else python3 .github/scripts/label-pr.py "$_PR_NUMBER" "$_LABEL_MODE" fi

This eliminates the need for the parse_known_args() workaround.

claude · 2025-11-12T23:09:52Z

.github/scripts/label-pr.py

+    title_labels = label_title(pr_title, LABEL_TITLE_PATTERNS)
+    all_labels = set(filepath_labels + title_labels)
+
+    if not any(label.startswith("t:") for label in all_labels):


💭 Catch-all label logic may produce unexpected results

Details

The catch-all label (t:misc) is added only when there are no t: prefixed labels:

if not any(label.startswith("t:") for label in all_labels): all_labels.add(CATCH_ALL_LABEL)

Scenario: A PR that:

Modifies app/MainActivity.kt → gets app:password-manager

Has title "Update MainActivity layout" → no title pattern match

Result: Only app:password-manager label, no type label. Then t:misc is added.

Question: Should the catch-all only apply when both filepath and title labeling fail to produce type labels, or is the current behavior (adding t:misc whenever no type label exists) correct?

Consider if this edge case needs explicit documentation or if the logic should check:

# Only add catch-all if BOTH filepath and title labeling found no type labels if not title_labels and not any(label.startswith("t:") for label in filepath_labels): all_labels.add(CATCH_ALL_LABEL)

Current behavior seems reasonable, but worth confirming intent.

claude · 2025-11-12T23:10:11Z

.github/scripts/label-pr.py

+    for label, patterns in title_patterns.items():
+        for pattern in patterns:
+            # Check for pattern with : or ( suffix (conventional commits format)
+            if f"{pattern}:" in title_lower or f"{pattern}(" in title_lower:


🎨 Title pattern matching could be more precise

Details

Current logic checks if pattern appears anywhere in the title with : or ( suffix:

if f"{pattern}:" in title_lower or f"{pattern}(" in title_lower:

This could produce false positives. Examples:

Pattern "fix" matches title "Add prefix: bug" (contains "fix:" in "prefix:")

Pattern "feat" matches title "defeat: the enemy" (contains "feat:")

Recommendation:
Use more precise matching to ensure pattern is at start of title or after whitespace:

# Check for pattern at beginning or after whitespace import re pattern_regex = rf"(^|\s){re.escape(pattern)}[:(]" if re.search(pattern_regex, title_lower): print(f"📝 Title matches pattern '{pattern}' for label '{label}'") labels_to_apply.add(label) break

Or simpler string-based approach:

# Check pattern is at start or preceded by space if title_lower.startswith(f"{pattern}:") or title_lower.startswith(f"{pattern}(") or \ f" {pattern}:" in title_lower or f" {pattern}(" in title_lower:

Current implementation works for well-formatted conventional commits, but could be more robust.

claude · 2025-11-12T23:10:12Z

.github/workflows/sdlc-label-pr.yml

+name: SDLC / Label PR by Files
+
+on:
+  workflow_dispatch:


💭 Workflow only supports manual dispatch - PR events not yet configured

Details

The workflow currently only triggers via workflow_dispatch, requiring manual execution. The PR description mentions:

pull_request workflow target will be added in a future PR after testing with workflow_dispatch.

This is fine for initial testing, but worth noting that automated labeling won't occur until the trigger is updated to something like:

on: pull_request: types: [opened, synchronize, reopened] pull_request_target: # For fork PRs types: [opened, synchronize, reopened] workflow_dispatch: # ... keep existing inputs for manual testing

Security consideration: When adding pull_request_target, ensure the workflow checks out the base branch (not PR branch) before running untrusted code, or limit to trusted users only.

Current implementation is safe for testing phase. Just flagging for future work.

claude · 2025-11-17T15:57:42Z

.github/scripts/label-pr.py

+import subprocess
+import sys
+
+DEFAULT_MODE = "add"


⚠️ Documentation contradicts implementation regarding default mode.

Details

The constant DEFAULT_MODE = "add" and documentation at lines 11-12 state that --add is the default:

-a, --add: Add labels without removing existing ones (default) -r, --replace: Replace all existing labels

However, the implementation at line 202 defaults to "add" only when --add is explicitly provided, otherwise it uses "replace":

mode = "replace" if args.replace else "add"

This means when neither flag is provided, the mode is "add", which actually matches the documentation. However, the workflow at .github/workflows/sdlc-label-pr.yml:68-69 uses --replace as the default for normal PRs, creating confusion about intended behavior.

Recommendation: Clarify whether the default should be "add" or "replace" and update either the documentation or the workflow accordingly for consistency.

claude · 2025-11-17T15:57:43Z

.github/scripts/label-pr.py

+        )
+        changed_files = result.stdout.strip().split("\n")
+        return list(filter(None, changed_files))
+    except subprocess.CalledProcessError as e:


⚠️ Silent failure returns empty list on error, masking API failures.

Details

When gh pr diff fails, the function prints an error but returns an empty list [], which is indistinguishable from a PR with no changed files. This could mask genuine API failures.

The caller at line 212 cannot distinguish between:

API failure (should potentially fail the workflow)

Legitimately no files changed (should continue with title-based labeling)

Current behavior:

changed_files = gh_get_changed_files(pr_number) # Returns [] on error print("👀 Changed files:\n" + "\n".join(changed_files) + "\n") # Prints empty list

Recommendation: Either exit on critical failures or return None on error and handle it explicitly:

def gh_get_changed_files(pr_number: str) -> list[str] | None: try: # ... existing code ... except subprocess.CalledProcessError as e: print(f"::error::Error getting changed files: {e}") return None # Signal error explicitly # At call site: changed_files = gh_get_changed_files(pr_number) if changed_files is None: print("::error::Failed to retrieve changed files") sys.exit(1)

Same issue exists in gh_get_pr_title() at line 92.

claude · 2025-11-17T15:57:45Z

.github/scripts/label-pr.py

+    for label, patterns in title_patterns.items():
+        for pattern in patterns:
+            # Check for pattern with : or ( suffix (conventional commits format)
+            if f"{pattern}:" in title_lower or f"{pattern}(" in title_lower:


🎨 Title pattern matching could produce false positives.

Details

Current logic checks if pattern appears anywhere in the title:

if f"{pattern}:" in title_lower or f"{pattern}(" in title_lower:

This could match incorrectly. Examples:

Pattern "fix" matches title "Add prefix: bug" (contains "fix:" within "prefix:")

Pattern "feat" matches title "defeat: the enemy" (contains "feat:" within "defeat:")

Recommendation: Use more precise matching to ensure pattern is at word boundary:

import re # At function level, check pattern at start or after whitespace pattern_regex = rf"(^|\s){re.escape(pattern)}[:(]" if re.search(pattern_regex, title_lower): print(f"📝 Title matches pattern '{pattern}' for label '{label}'") labels_to_apply.add(label) break

Or simpler string-based approach:

# Check pattern is at start or preceded by space if title_lower.startswith(f"{pattern}:") or title_lower.startswith(f"{pattern}(") or \ f" {pattern}:" in title_lower or f" {pattern}(" in title_lower:

Current implementation works for well-formatted conventional commits, but could be more robust.

claude · 2025-11-17T15:57:46Z

.github/scripts/label-pr.py

+        default=DEFAULT_CONFIG_PATH,
+        help=f"Path to JSON config file (default: {DEFAULT_CONFIG_PATH})"
+    )
+    args, unknown = parser.parse_known_args() # required to handle --dry-run passed as an empty string ("") by the workflow


💭 parse_known_args() is a workaround for shell quoting issue in workflow.

Details

The comment states this is "required to handle --dry-run passed as an empty string".

This happens because the workflow at .github/workflows/sdlc-label-pr.yml:76-79 passes $_DRY_RUN which can be an empty string when dry-run is false:

_DRY_RUN: ${{ inputs.dry-run == true && '--dry-run' || '' }}

When _DRY_RUN="", the shell command becomes:

python3 .github/scripts/label-pr.py "$_PR_NUMBER" "$_LABEL_MODE" ""

The empty string is passed as a positional argument, which argparse treats as unknown.

Recommendation: Fix at the workflow level instead of working around in Python:

run: | if [ -n "$_DRY_RUN" ]; then python3 .github/scripts/label-pr.py "$_PR_NUMBER" "$_LABEL_MODE" "$_DRY_RUN" else python3 .github/scripts/label-pr.py "$_PR_NUMBER" "$_LABEL_MODE" fi

This eliminates the need for parse_known_args() and makes the interface cleaner.

claude · 2025-11-17T15:57:48Z

.github/label-pr.json

+      "network/",
+      "ui/",
+      "authenticatorbridge/",
+      "gradle/"


💭 Pattern overlap: gradle/ appears in both app:shared and t:deps.

Details

The gradle/ directory pattern is defined in:

app:shared (line 22): Triggers expansion to app:password-manager + app:authenticator labels

t:deps (line 43): Triggers t:deps label

When a PR modifies gradle/ files, it will receive three labels: app:password-manager, app:authenticator, and t:deps.

This might be intentional (gradle changes affect both apps and are dependency-related), but worth confirming this is the desired behavior. If gradle changes should only be labeled as t:deps, remove it from the app:shared list.

claude · 2025-11-17T15:57:49Z

.github/workflows/sdlc-label-pr.yml

@@ -0,0 +1,80 @@
+name: SDLC / Label PR by Files
+
+on:


💭 Workflow only supports manual dispatch - automated triggers not yet configured.

Details

The workflow currently only triggers via workflow_dispatch, requiring manual execution. The PR description mentions:

pull_request workflow target will be added in a future PR after testing with workflow_dispatch.

This is appropriate for initial testing. When adding automated triggers, consider:

on: pull_request: types: [opened, synchronize, reopened] pull_request_target: # For fork PRs with write permissions types: [opened, synchronize, reopened] workflow_dispatch: # ... keep existing inputs for manual testing

Security consideration: When adding pull_request_target, ensure the workflow checks out the base branch (not PR branch) before running any code from the PR, or validate that the Python script execution is safe for untrusted PRs.

Current implementation is safe for testing phase.

vvolkgang added 20 commits November 11, 2025 13:38

Remove unused .husky folder

c79e985

Label PR script and workflow - first draft

1e58893

invert loops

d204171

Add catch-all label

125a2f6

Re-inverted loops, it's less verbose

d3a3285

Implement PR Title labeling

025a994

Check for conventional commit format

8cfa32d

Add script docs and cleanup

dbccb55

Implement argparse, refactor arguments and add dry-run option

18a4838

Workflow - Add workflow_dispatch. Removing pull_request for now ahead…

104d510

… of testing

Workflow - determine label mode for pull requests and use the new scr…

b6aae4d

…ipt arguments

Address zizmor feedback

e2279b3

Workflow - support retrieving PR data on workflow_dispatch triggers

bd15be2

Fix script typo

9e3c584

Cleanup

470158c

json config draft

51c539e

Invert mode check to use the default mode

3578b08

Move arg parsing to it's own method and

352a2e4

Move config file validation to the file loading method

bcf6981

Add comment

d339d5d

vvolkgang requested review from a team, brian-livefront and david-livefront as code owners November 11, 2025 20:23

vvolkgang changed the title ~~[PM-14880] ci: Add automated PR labeling based on file paths and title patterns~~ [PM-14880] ci: Add automated PR labelling based on file paths and title patterns Nov 11, 2025

Add workflow logs

a3b7def

claude bot reviewed Nov 11, 2025

View reviewed changes

.github/scripts/label-pr.py Outdated Show resolved Hide resolved

claude bot reviewed Nov 11, 2025

View reviewed changes

.github/scripts/label-pr.py Outdated Show resolved Hide resolved

claude bot reviewed Nov 11, 2025

View reviewed changes

.github/scripts/label-pr.py Outdated Show resolved Hide resolved

vvolkgang added 8 commits November 11, 2025 21:47

Merge branch 'main' into vvolkgang/label-prs

043021f

Invert default mode check

99983ca

Return empty list / string on error

c6505e3

Improve error logs

625d9a7

Add new line to changed files

7b32be4

fix: Script error when $_DRY_RUN is empty

ee72ba2

Add minimum python version comment

701c404

Add enhancement label

1e39e26

vvolkgang force-pushed the vvolkgang/label-prs branch from 1ba0d46 to 1e39e26 Compare November 12, 2025 23:05

claude bot reviewed Nov 12, 2025

View reviewed changes

fedemkr approved these changes Nov 17, 2025

View reviewed changes

Merge branch 'main' into vvolkgang/label-prs

51677ba

claude bot reviewed Nov 17, 2025

View reviewed changes

vvolkgang added this pull request to the merge queue Nov 17, 2025

Merged via the queue into main with commit 4623a4f Nov 17, 2025
36 of 48 checks passed

vvolkgang deleted the vvolkgang/label-prs branch November 17, 2025 20:58

[PM-14880] ci: Add automated PR labelling based on file paths and title patterns #6157

[PM-14880] ci: Add automated PR labelling based on file paths and title patterns #6157

Conversation

vvolkgang commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎟️ Tracking

📔 Objective

Labels

⏰ Reminders before review

🦮 Reviewer guidelines

Uh oh!

claude bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Great job! No new security vulnerabilities introduced in this pull request

Uh oh!

codecov bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

claude bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vvolkgang commented Nov 11, 2025 •

edited

Loading

claude bot commented Nov 11, 2025 •

edited

Loading

github-actions bot commented Nov 11, 2025 •

edited

Loading

codecov bot commented Nov 11, 2025 •

edited

Loading