Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements automatic commit message generation using Claude AI by adding Git hooks that integrate with Anthropic's API. The system generates Commitizen-style conventional commit messages based on staged changes and validates them using commitlint.
Key changes:
- Adds a
prepare-commit-msghook that calls Claude API to generate conventional commit messages from git diffs - Adds a
commit-msghook for validating commit messages against conventional commit standards - Configures commitlint to enforce the
@commitlint/config-conventionalruleset
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| commitlint.config.mjs | Configures commitlint with conventional commit format validation |
| .husky/prepare-commit-msg | Implements the main logic for generating AI-powered commit messages using Claude API |
| .husky/commit-msg | Adds validation hook to ensure commit messages follow conventional commit format |
| .env.example | Documents the required ANTHROPIC_API_KEY environment variable |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| MAX_CHARS=8000 | ||
| if [ ${#DIFF} -gt $MAX_CHARS ]; then | ||
| DIFF="${DIFF:0:$MAX_CHARS}... [truncated]" |
There was a problem hiding this comment.
[nitpick] The diff truncation logic uses bash-specific string length syntax ${#DIFF} which may not work in all POSIX-compliant shells, even though the shebang specifies bash. Additionally, the truncation is done on character count rather than byte count, which could lead to issues with the API's token limit calculation.
While the shebang correctly specifies #!/bin/bash, consider documenting this dependency or using more portable POSIX syntax if cross-shell compatibility is desired. Alternatively, consider truncating based on line count or using a more sophisticated tokenization approach that respects code boundaries.
| MAX_CHARS=8000 | |
| if [ ${#DIFF} -gt $MAX_CHARS ]; then | |
| DIFF="${DIFF:0:$MAX_CHARS}... [truncated]" | |
| # Note: This script requires Bash (see shebang above). To avoid issues with multi-byte characters and API token limits, | |
| # we truncate the diff by byte count using head -c, which is POSIX-compliant and more predictable. | |
| MAX_BYTES=8000 | |
| if [ "$(printf "%s" "$DIFF" | wc -c)" -gt "$MAX_BYTES" ]; then | |
| DIFF="$(printf "%s" "$DIFF" | head -c "$MAX_BYTES")... [truncated]" |
| # Get list of changed files for context | ||
| FILES=$(git diff --cached --name-only) | ||
|
|
||
| # Truncate very large diffs to avoid token limits |
There was a problem hiding this comment.
[nitpick] The MAX_CHARS value of 8000 is a magic number without explanation. Given that Claude API has specific token limits (and the request sets max_tokens to 512 for the response), it would be helpful to document why 8000 characters was chosen and how it relates to the model's input token limits.
Consider adding a comment:
# Truncate very large diffs to avoid token limits
# Claude models typically support ~200k tokens input; 8000 chars ≈ 2000 tokens
MAX_CHARS=8000| # Truncate very large diffs to avoid token limits | |
| # Truncate very large diffs to avoid token limits | |
| # Claude models typically support ~200k tokens input; 8000 chars ≈ 2000 tokens |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| FILES=$(git diff --cached --name-only) | ||
|
|
||
| # Truncate very large diffs to avoid token limits | ||
| MAX_CHARS=8000 |
There was a problem hiding this comment.
The hardcoded MAX_CHARS value of 8000 could be made more maintainable by defining it as a configurable constant at the top of the script or documenting why this specific value was chosen. This would make it easier to adjust based on actual token limits and API constraints.
| if [ -n "$MESSAGE" ]; then | ||
| # Write generated message, keep any existing content below | ||
| echo "$MESSAGE" > "$COMMIT_MSG_FILE.tmp" | ||
| echo "" >> "$COMMIT_MSG_FILE.tmp" | ||
| echo "# --- Generated by Claude (Commitizen format) ---" >> "$COMMIT_MSG_FILE.tmp" | ||
| echo "# Types: feat|fix|docs|style|refactor|perf|test|build|ci|chore|revert" >> "$COMMIT_MSG_FILE.tmp" | ||
| cat "$COMMIT_MSG_FILE" >> "$COMMIT_MSG_FILE.tmp" | ||
| mv "$COMMIT_MSG_FILE.tmp" "$COMMIT_MSG_FILE" | ||
| fi No newline at end of file |
There was a problem hiding this comment.
This condition is redundant because MESSAGE is already checked for emptiness at line 107, and the script exits if MESSAGE is empty. This if statement will always evaluate to true at this point. Consider removing this redundant check or restructuring the logic if there's a case where MESSAGE could become empty between lines 107 and 117.
| if [ -n "$MESSAGE" ]; then | |
| # Write generated message, keep any existing content below | |
| echo "$MESSAGE" > "$COMMIT_MSG_FILE.tmp" | |
| echo "" >> "$COMMIT_MSG_FILE.tmp" | |
| echo "# --- Generated by Claude (Commitizen format) ---" >> "$COMMIT_MSG_FILE.tmp" | |
| echo "# Types: feat|fix|docs|style|refactor|perf|test|build|ci|chore|revert" >> "$COMMIT_MSG_FILE.tmp" | |
| cat "$COMMIT_MSG_FILE" >> "$COMMIT_MSG_FILE.tmp" | |
| mv "$COMMIT_MSG_FILE.tmp" "$COMMIT_MSG_FILE" | |
| fi | |
| # Write generated message, keep any existing content below | |
| echo "$MESSAGE" > "$COMMIT_MSG_FILE.tmp" | |
| echo "" >> "$COMMIT_MSG_FILE.tmp" | |
| echo "# --- Generated by Claude (Commitizen format) ---" >> "$COMMIT_MSG_FILE.tmp" | |
| echo "# Types: feat|fix|docs|style|refactor|perf|test|build|ci|chore|revert" >> "$COMMIT_MSG_FILE.tmp" | |
| cat "$COMMIT_MSG_FILE" >> "$COMMIT_MSG_FILE.tmp" | |
| mv "$COMMIT_MSG_FILE.tmp" "$COMMIT_MSG_FILE" |
|
|
||
| # Check if extraction succeeded | ||
| if [ -z "$MESSAGE" ]; then | ||
| echo "Warning: Failed to generate commit message from Claude API" >&2 |
There was a problem hiding this comment.
The error message "Failed to generate commit message from Claude API" doesn't provide sufficient context for debugging. Consider including more details such as the HTTP status code or a hint about checking the API key validity, especially since this could fail for multiple reasons (network issues, invalid API key, rate limiting, etc.).
| RESPONSE=$(curl -s https://api.anthropic.com/v1/messages \ | ||
| -H "Content-Type: application/json" \ | ||
| -H "x-api-key: $ANTHROPIC_API_KEY" \ | ||
| -H "anthropic-version: 2023-06-01" \ | ||
| -d "$(jq -n \ | ||
| --arg prompt "$PROMPT" \ | ||
| '{ | ||
| model: "claude-sonnet-4-5-20250929", | ||
| max_tokens: 512, | ||
| messages: [{role: "user", content: $prompt}] | ||
| }')") |
There was a problem hiding this comment.
The curl command lacks timeout settings, which could cause the git commit process to hang indefinitely if the API is slow or unresponsive. Consider adding a timeout using the --max-time flag (e.g., --max-time 30) to ensure the hook doesn't block the commit process for too long.
| exit 0 | ||
| fi | ||
|
|
||
| # Get the staged diff |
There was a problem hiding this comment.
Consider adding a comment explaining why --no-color is used in the git diff command. While it's clear to experienced developers that this prevents ANSI color codes from being included in the API request, documenting this helps with maintainability.
| # Get the staged diff | |
| # Get the staged diff | |
| # Use --no-color to prevent ANSI color codes from being included in the diff output, | |
| # which ensures clean input for API requests and downstream processing. |
| # Truncate very large diffs to avoid token limits | ||
| MAX_CHARS=8000 | ||
| if [ ${#DIFF} -gt $MAX_CHARS ]; then | ||
| DIFF="${DIFF:0:$MAX_CHARS}... [truncated]" |
There was a problem hiding this comment.
The string truncation using bash substring expansion doesn't account for potentially cutting in the middle of a multi-byte UTF-8 character, which could corrupt the diff content sent to the API. While this may work in many cases, consider using a more robust truncation method or at least documenting this limitation.
| # Truncate very large diffs to avoid token limits | |
| MAX_CHARS=8000 | |
| if [ ${#DIFF} -gt $MAX_CHARS ]; then | |
| DIFF="${DIFF:0:$MAX_CHARS}... [truncated]" | |
| # Truncate very large diffs to avoid token limits, ensuring we do not split multi-byte UTF-8 characters. | |
| MAX_CHARS=8000 | |
| if [ $(printf "%s" "$DIFF" | wc -c) -gt $MAX_CHARS ]; then | |
| # Use head -c to truncate by bytes, then iconv -c to remove incomplete UTF-8 sequences. | |
| DIFF="$(printf "%s" "$DIFF" | head -c $MAX_CHARS | iconv -c -f utf-8 -t utf-8)... [truncated]" |
| # Load environment variables from .env file if it exists | ||
| if [ -f .env ]; then | ||
| set -a | ||
| source .env | ||
| set +a | ||
| fi | ||
|
|
There was a problem hiding this comment.
Loading environment variables directly from a .env file in a git hook could expose sensitive information if the .env file is accidentally committed. Consider adding .env to .gitignore if not already present, or use a more secure method like direnv (as mentioned in the PR description) which keeps environment variables separate from the repository.
| # Load environment variables from .env file if it exists | |
| if [ -f .env ]; then | |
| set -a | |
| source .env | |
| set +a | |
| fi | |
| # Environment variables should be managed securely (e.g., with direnv). | |
| # Do NOT source .env files directly in git hooks to avoid leaking secrets. | |
| # See project documentation for recommended environment management. |
This PR implements automatic formatting of commit messages using Claude. It requires setting
ANTHROPIC_API_KEYas an environment variable and setting up tools likejqanddirenv.