-
Notifications
You must be signed in to change notification settings - Fork 544
Improve error message for duplicate pipeline run names #3701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. ✨ Finishing touches🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Documentation Link Check Results❌ Absolute links check failed |
b38ae4f
to
32b4a7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the user experience by providing clearer guidance when a pipeline run name conflict occurs, along with tests and documentation to support the change.
- Enhanced error handling in
create_placeholder_run
to catch duplicate run-name errors and surface a helpful, actionable message. - Added unit tests to verify the new error message and ensure other
EntityExistsError
cases remain unchanged. - Updated docs to warn about run-name uniqueness and suggest best practices.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
src/zenml/pipelines/run_utils.py | Improved duplicate run-name error detection and messaging |
tests/unit/pipelines/test_run_utils.py | Added tests for duplicate run-name error and non-duplicate behavior |
docs/book/how-to/steps-pipelines/yaml_configuration.md | Warn about unique run names and provide guidance in YAML examples |
run_alerter_tests.sh | New script to run alerter tests (scope seems unrelated) |
Comments suppressed due to low confidence (1)
run_alerter_tests.sh:1
- [nitpick] This script for running alerter tests appears unrelated to the pipeline run-name improvements. Consider moving it to a separate PR or isolating it under a more relevant feature grouping to keep this change focused.
#!/bin/bash
When users run a pipeline with a fixed `run_name` in their config.yaml and then rerun the same pipeline, they would get a confusing database error about entity existence. This change catches both EntityExistsError and RuntimeError (with IntegrityError) specifically for duplicate run names and provides a much more helpful error message. ## Changes - Add improved error handling in `create_placeholder_run()` to catch duplicate run name errors (both EntityExistsError and raw SQL IntegrityError) - Provide actionable guidance with 3 specific solutions: 1. Change the run_name to a unique value 2. Use dynamic placeholders like `run_name: "my_run_{date}_{time}"` 3. Remove the run_name to auto-generate unique names - Add comprehensive unit tests to verify the improved error message - Update documentation in yaml_configuration.md to warn about run name uniqueness ## User Experience Instead of seeing confusing database errors, users now get: ``` Pipeline run name 'my_run_name' already exists in this project. Each pipeline run must have a unique name. To fix this, you can: 1. Change the 'run_name' in your config file to a unique value 2. Use a dynamic run name with placeholders like: run_name: "my_run_name_{date}_{time}" 3. Remove the 'run_name' from your config to auto-generate unique names For more information on run naming, see: https://docs.zenml.io/concepts/steps_and_pipelines/yaml_configuration#run-name ``` 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
32b4a7a
to
c28564c
Compare
As suggested in PR review, this commit adds clearer YAML comments to the run_name placeholder examples to make it more obvious what each example demonstrates.
- Use TYPE_CHECKING to handle the optional sqlalchemy import properly - Rename to SQLIntegrityError to avoid confusion with other exceptions - This ensures mypy doesn't complain about assigning None to a type
- Change broad Exception catch to specific RuntimeError - Add parentheses for clarity in boolean logic - Align documentation wording with error message 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
src/zenml/pipelines/run_utils.py
Outdated
run, _ = Client().zen_store.get_or_create_run(run_request) | ||
return run | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This should be handled in the SQLZenStore, not in this random place (which is only one occurence where a run is created).
- Specifying the run name in a config file is not the only one way to do it, the message can simply be generic and talk about configuration instead of files.
- Moved error handling from run_utils.py to sql_zen_store.py where it architecturally belongs - Database-specific error handling now stays in the database layer - Made error message more generic (removed specific mention of 'config file') - Simplified run_utils.py by removing 40+ lines of error handling code - Updated tests to reflect the new error handling location - All code paths that create runs now benefit from improved error messages 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Remove EntityExistsError from Raises section since this function no longer explicitly raises exceptions - they are now handled in SQLZenStore. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Previously the test only verified that mocking worked correctly. This adds a proper integration test that actually runs pipelines twice with the same name to verify the duplicate name detection behavior. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Wrap logs processing in session.no_autoflush to prevent premature IntegrityError during _get_reference_schema_by_id calls. This ensures duplicate name errors are properly caught by the try/except block and converted to helpful EntityExistsError messages. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Remove "when implemented" and "should now" comments since the improved error handling is already working. Simplify test to only expect EntityExistsError now that the fix is in place. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves the error messaging for duplicate pipeline run names by handling IntegrityError exceptions in SQLZenStore and updating integration tests and documentation to demonstrate the enhanced behavior.
- Enhanced error handling in SQLZenStore to catch duplicate run names and present a user-friendly message
- Fixed the SQLAlchemy autoflush issue to ensure errors are raised at the correct time
- Updated integration tests and docs to reflect and validate the changes made
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
tests/integration/functional/pipelines/test_pipeline_run.py | Added an integration test to verify improved error handling for duplicate pipeline run names |
src/zenml/zen_stores/sql_zen_store.py | Enhanced error message handling for duplicate run names and fixed the autoflush issue with session.no_autoflush |
src/zenml/pipelines/run_utils.py | Minor update (additional newline) |
docs/book/how-to/steps-pipelines/yaml_configuration.md | Added a warning hint to document the uniqueness requirement for run names and provide best practices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidentally approved, see my latest comment
Move user-friendly error message logic from _create_run to get_or_create_run method. This removes fragile database-specific error message parsing and follows the established pattern where get_or_create_run handles user-facing errors. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Remove code duplication by creating _get_duplicate_run_name_error_message helper method that generates the user-friendly error message for duplicate pipeline run names. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…or-message # Conflicts: # src/zenml/zen_stores/sql_zen_store.py
Summary
run_name
Problem
When users run a pipeline with a duplicate run name (whether from a config file, programmatically, or any other method), they get a confusing database error. The error can come in two forms:
Additionally, there was a subtle SQLAlchemy autoflush timing issue where IntegrityErrors were being raised during
_get_reference_schema_by_id
calls instead of during the explicitsession.commit()
, preventing proper error handling.Solution
This PR catches IntegrityError in SQLZenStore's
_create_run
method and provides a much more helpful error message with actionable solutions. It also fixes the SQLAlchemy autoflush issue to ensure errors are properly caught and handled.Before (Raw SQL Error)
After (User-Friendly Error)
Changes Made
_create_run()
method to catch IntegrityError and provide user-friendly messagessession.no_autoflush
to prevent premature IntegrityError during reference lookupsTest Plan
Technical Details
The key fix was addressing the SQLAlchemy autoflush behavior where
_get_reference_schema_by_id
calls would trigger early database flushes, causing IntegrityErrors to be raised before the explicitsession.commit()
and thus not being caught by the try/except block. Usingsession.no_autoflush
ensures the error handling works as intended.🤖 Generated with Claude Code