Skip to content

Conversation

LucasG0
Copy link
Contributor

@LucasG0 LucasG0 commented Jul 24, 2025

IFC-1874

This PR makes the current locks more granular, ie instead of locking only on object kind, we also lock on values of the attributes contained within uniqueness constraint. In this way, we allow concurrent creation/updates of objects having different values on these key attributes. We cannot do it for relationships at it would require fetching relationships peers before locking.

EDIT 2025-09-30: this PR also contains fixes for the resource managers to prevent duplicate allocations. Locking is now done during the mutation but also during the allocation of the resource, thus making the locks "re-entrant" was required to avoid deadlocks within a same request.

Summary by CodeRabbit

  • New Features

    • Option to skip resource-pool allocation during object create/update.
    • Unified constraint-and-save flow that runs under managed multi-locks for atomic mutations.
  • Refactor

    • Centralized lock-name computation and multi-lock orchestration across mutation paths.
    • Preview-before-persist flow to compute validation fields and locks prior to saving.
  • Bug Fixes

    • Branch-aware locking so locks are emitted only when necessary.
    • Improved reentrant lock behavior for safer nested operations.
  • Tests

    • Added tests covering branch-specific and attribute-based locking.

Copy link
Contributor

coderabbitai bot commented Jul 24, 2025

Warning

Rate limit exceeded

@fatih-acar has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 11 minutes and 43 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 537cc7f and 3aa9ba8.

📒 Files selected for processing (6)
  • backend/infrahub/core/node/create.py (4 hunks)
  • backend/infrahub/core/node/save.py (1 hunks)
  • backend/infrahub/graphql/mutations/ipam.py (10 hunks)
  • backend/infrahub/graphql/mutations/main.py (5 hunks)
  • backend/infrahub/lock_getter.py (1 hunks)
  • backend/tests/unit/core/test_get_kinds_lock.py (2 hunks)

Walkthrough

The changes add a centralized constraint/validation-and-save helper (run_constraints_and_save) and a lock-name generator (get_lock_names_on_object_mutation) plus use of InfrahubMultiLock. Node creation now uses a two‑phase preview/persist flow and _do_create_node accepts a pre‑initialized Node. Node pool handling gained a process_pools flag to optionally skip resource allocation. InfrahubLock now tracks recursion with a ContextVar. GraphQL mutation handlers were refactored to compute lock sets and perform lock-protected persistence. Unit tests were updated to reflect the new locking utilities.

Pre-merge checks and finishing touches and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.21% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly captures the core change of the pull request by indicating that lock names will now include attribute values, directly reflecting the move to more granular locking based on uniqueness constraint attributes.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the group/backend Issue related to the backend (API Server, Git Agent) label Jul 24, 2025
Copy link

codspeed-hq bot commented Jul 24, 2025

CodSpeed Performance Report

Merging #6888 will not alter performance

Comparing lgu-lock-on-attributes (3aa9ba8) with stable (c17d0e0)1

Summary

✅ 10 untouched

Footnotes

  1. No successful run was found on stable (3cd6a7b) during the generation of this report, so c17d0e0 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@LucasG0 LucasG0 force-pushed the lgu-lock-on-attributes branch 2 times, most recently from 6206bb8 to 2628683 Compare July 24, 2025 13:38
@LucasG0 LucasG0 marked this pull request as ready for review July 24, 2025 13:38
@LucasG0 LucasG0 requested a review from a team as a code owner July 24, 2025 13:38
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
backend/infrahub/core/node/save.py (1)

10-26: Add documentation and consider error handling.

The implementation looks good, but consider these improvements:

  1. Add a docstring explaining the function's purpose, parameters, and behavior
  2. Consider what happens if constraint checks fail - will the locks be properly released?

Add documentation:

 async def run_constraints_and_save(
     node: Node,
     node_constraint_runner: NodeConstraintRunner,
     fields_to_validate: list[str],
     fields_to_save: list[str],
     db: InfrahubDatabase,
     branch: Branch,
     skip_uniqueness_check: bool = False,
 ) -> None:
+    """Run constraint checks and save node within a locked context.
+    
+    This function acquires locks based on the node's schema and uniqueness constraints,
+    runs constraint validation, and saves the node to the database.
+    
+    Args:
+        node: The node to validate and save
+        node_constraint_runner: Runner for constraint checks
+        fields_to_validate: Fields to validate during constraint checking
+        fields_to_save: Fields to save (empty list saves all fields)
+        db: Database connection
+        branch: Branch context
+        skip_uniqueness_check: Whether to skip uniqueness constraint checks
+    
+    Raises:
+        Any exceptions from constraint checks or save operations
+    """
     schema_branch = db.schema.get_schema_branch(name=branch.name)
backend/infrahub/graphql/mutations/ipam.py (1)

122-131: Consider reducing code duplication in reconciliation calls.

The if/else pattern for conditional locking appears multiple times with nearly identical reconciliation calls. Consider extracting this pattern to reduce duplication.

Consider creating a helper method to encapsulate the conditional locking pattern:

+    @classmethod
+    async def _reconcile_with_optional_lock(
+        cls,
+        reconciler: IpamReconciler,
+        ip_value: IPv4Interface | ipaddress.IPv6Interface | ipaddress.IPv4Network | ipaddress.IPv6Network,
+        namespace: str,
+        node_uuid: str,
+        lock_name: str | None = None,
+        is_delete: bool = False,
+    ) -> Node:
+        """Run reconciliation with optional locking."""
+        reconcile_kwargs = {
+            "ip_value": ip_value,
+            "namespace": namespace,
+            "node_uuid": node_uuid,
+        }
+        if hasattr(reconciler.reconcile, 'is_delete'):
+            reconcile_kwargs["is_delete"] = is_delete
+            
+        if lock_name:
+            async with InfrahubMultiLock(lock_registry=lock.registry, locks=[lock_name]):
+                return await reconciler.reconcile(**reconcile_kwargs)
+        else:
+            return await reconciler.reconcile(**reconcile_kwargs)

     # Then use it in the methods:
-        if lock_name := cls._get_lock_name(namespace_id, branch):
-            async with InfrahubMultiLock(lock_registry=lock.registry, locks=[lock_name]):
-                reconciled_address = await reconciler.reconcile(
-                    ip_value=ip_address, namespace=namespace_id, node_uuid=address.get_id()
-                )
-        else:
-            reconciled_address = await reconciler.reconcile(
-                ip_value=ip_address, namespace=namespace_id, node_uuid=address.get_id()
-            )
+        lock_name = cls._get_lock_name(namespace_id, branch)
+        reconciled_address = await cls._reconcile_with_optional_lock(
+            reconciler=reconciler,
+            ip_value=ip_address,
+            namespace=namespace_id,
+            node_uuid=address.get_id(),
+            lock_name=lock_name,
+        )

Also applies to: 168-177

backend/infrahub/lock_utils.py (4)

47-47: Fix docstring to reference the correct constant name

The docstring references KINDS_TO_LOCK_ON_ANY_BRANCH but the actual constant is KINDS_CONCURRENT_MUTATIONS_NOT_ALLOWED.

-    Check whether kind or any kind generic is in KINDS_TO_LOCK_ON_ANY_BRANCH.
+    Check whether kind or any kind generic is in KINDS_CONCURRENT_MUTATIONS_NOT_ALLOWED.

16-16: Clarify the docstring about constraint override behavior

The docstring is misleading. It states "we only need to lock on the generic" but the implementation removes the node schema kind from the list while keeping both the node kind and generic kinds when appropriate.

-    it means node schema overrided this constraint, in which case we only need to lock on the generic.
+    it means node schema overrode this constraint, in which case we exclude the node schema kind from locking.

23-41: Potential issue with list manipulation logic

The use of pop(0) on line 40 assumes that kinds[0] is always node_schema.kind, but this is only guaranteed if node_schema.uniqueness_constraints is truthy (lines 23-25). If the logic changes in the future, this could lead to removing the wrong item.

Consider using explicit removal instead:

-                kinds.pop(0)
+                if node_schema.kind in kinds:
+                    kinds.remove(node_schema.kind)

66-66: Fix grammatical error in docstring

-    concurrent mutations are only allowed on non-main branch as objects validations will be performed at least when merging in main branch.
+    concurrent mutations are only allowed on non-main branch as object validations will be performed at least when merging in main branch.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99495f7 and 2628683.

📒 Files selected for processing (6)
  • backend/infrahub/core/node/create.py (2 hunks)
  • backend/infrahub/core/node/save.py (1 hunks)
  • backend/infrahub/graphql/mutations/ipam.py (8 hunks)
  • backend/infrahub/graphql/mutations/main.py (4 hunks)
  • backend/infrahub/lock_utils.py (1 hunks)
  • backend/tests/unit/core/test_get_kinds_lock.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
backend/**/*

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

  • backend/infrahub/core/node/create.py
  • backend/infrahub/core/node/save.py
  • backend/tests/unit/core/test_get_kinds_lock.py
  • backend/infrahub/graphql/mutations/main.py
  • backend/infrahub/graphql/mutations/ipam.py
  • backend/infrahub/lock_utils.py
backend/tests/**/*

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

Place backend tests in backend/tests/

Files:

  • backend/tests/unit/core/test_get_kinds_lock.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
PR: opsmill/infrahub#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-22T08:13:56.198Z
Learning: Backend implements the core graph-based infrastructure datastore, API, and business logic in `backend/infrahub/`
Learnt from: CR
PR: opsmill/infrahub#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-22T08:13:56.198Z
Learning: Infrahub offers a central hub to manage the data, templates and playbooks that powers your infrastructure by combining the version control and branch management capabilities similar to Git with the flexible data model and UI of a graph database
🔇 Additional comments (6)
backend/infrahub/core/node/create.py (1)

156-163: Good consolidation of constraint checking and saving logic.

The refactoring to use run_constraints_and_save improves code maintainability by encapsulating locking, constraint validation, and saving into a single atomic operation. This ensures better data integrity during node creation.

backend/tests/unit/core/test_get_kinds_lock.py (1)

39-83: Excellent test coverage for the new locking behavior.

The new tests comprehensively cover:

  • Branch-specific locking behavior (test_lock_other_branch)
  • Cross-branch locking requirements (test_lock_groups_on_other_branches)
  • Attribute-based uniqueness constraint lock names (test_lock_names_only_attributes)

The test structure is clear and covers important edge cases for the refactored locking logic.

backend/infrahub/graphql/mutations/ipam.py (2)

104-109: Good consolidation of lock name logic.

The _get_lock_name method properly encapsulates the logic for determining when locking is needed, returning None for non-default branches to avoid unnecessary locking.


259-264: Clear documentation of IP prefix locking requirements.

The comment explains why IP prefix mutations need locking on all branches due to cardinality-one relationships, which is helpful for understanding the different locking strategy compared to IP addresses.

backend/infrahub/graphql/mutations/main.py (2)

268-276: Excellent refactoring to use centralized constraint and save logic.

The replacement of separate constraint checking and saving with run_constraints_and_save improves code consistency and ensures that locking is properly handled at the core layer rather than in the GraphQL mutation layer.


146-151: Good removal of locking logic from wrapper methods.

The simplified wrapper methods now focus solely on their primary responsibility without concerning themselves with locking details. This improves separation of concerns and makes the code more maintainable.

Also applies to: 207-223

Copy link
Contributor

@ajtmccarty ajtmccarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please add a PR description that explains why the changes in _get_kind_lock_names_on_object_mutation will help with the locking slowness?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually utils means that a file contains a bunch of things that don't really belong together and I don't think that is the case here. I think this file can be locks.py or even lock_getter.py

return False


def _get_lock_names_on_object_mutation(node: Node, branch: Branch, schema_branch: SchemaBranch) -> list[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least one of these should be a public method and I think it's this one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right

@fatih-acar
Copy link
Contributor

fatih-acar commented Jul 24, 2025

Quick remark: we should be very careful with user inputs, in this case you are lucky that Redis keys are very permissive regarding character set. I'm not sure it would be the case with all implementations.

We can instead of using the attribute value directly, use some kind of hex hash of it.

Edit: not saying we should do it right now, I don't even know performance implications of such a change.

@LucasG0
Copy link
Contributor Author

LucasG0 commented Jul 24, 2025

@fatih-acar good point, I'll update to use the hash, I don't expect having performances issues on that, hashing a 1 million length string takes less than 1ms.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
backend/infrahub/lock_getter.py (1)

63-109: ⚠️ Use a stable hash function instead of Python's built-in hash()

The implementation correctly uses hashing as discussed in the PR comments. However, Python's built-in hash() function is not stable across Python processes due to hash randomization. This could lead to different lock names for the same data across different service instances or restarts.

Consider using a cryptographic hash function like SHA-256 or a non-cryptographic but stable hash like xxhash.

+import hashlib
+
 def get_lock_names_on_object_mutation(node: Node, branch: Branch, schema_branch: SchemaBranch) -> list[str]:
     """
     Return lock names for object on which we want to avoid concurrent mutation (create/update). Except for some specific kinds,
     concurrent mutations are only allowed on non-main branch as objects validations will be performed at least when merging in main branch.
     Lock names include kind, some generic kinds, and values of attributes of corresponding uniqueness constraints.
     """
 
     if not branch.is_default and not _should_kind_be_locked_on_any_branch(node.get_kind(), schema_branch):
         return []
 
     lock_kinds = _get_kinds_to_lock_on_object_mutation(node.get_kind(), schema_branch)
     lock_names = []
     for kind in lock_kinds:
         schema = schema_branch.get(name=kind)
         ucs = schema.uniqueness_constraints
         if ucs is None:
             continue
 
         ucs_lock_names = []
         uc_attributes_names = set()
 
         for uc in ucs:
             uc_attributes_values = []
             # Keep only attributes constraints
             for field_path in uc:
                 # Some attributes may exist in different uniqueness constraints, we de-duplicate them
                 if field_path in uc_attributes_names:
                     continue
 
                 # Exclude relationships uniqueness constraints
                 schema_path = schema.parse_schema_path(path=field_path, schema=schema_branch)
                 if schema_path.related_schema is not None or schema_path.attribute_schema is None:
                     continue
 
                 uc_attributes_names.add(field_path)
-                value_hashed = str(hash(str(getattr(node, schema_path.attribute_schema.name).value)))
+                value_str = str(getattr(node, schema_path.attribute_schema.name).value)
+                value_hashed = hashlib.sha256(value_str.encode()).hexdigest()[:16]  # Use first 16 chars for brevity
                 uc_attributes_values.append(value_hashed)
 
             if uc_attributes_values:
                 uc_lock_name = ".".join(uc_attributes_values)
                 ucs_lock_names.append(uc_lock_name)
 
         partial_lock_name = kind + "." + ".".join(ucs_lock_names)
         lock_names.append(build_object_lock_name(partial_lock_name))
 
     return lock_names
🧹 Nitpick comments (2)
backend/tests/unit/core/test_get_kinds_lock.py (1)

64-83: Verify the exclusion of owner relationship from lock name

The uniqueness constraint includes three paths: ["name__value", "color__value", "owner__name"], but the expected lock name only includes hashes for "mercedes" and "blue", excluding the owner relationship.

Based on the PR objectives stating "this approach is not applied to relationships", this appears intentional. However, consider adding a comment in the test to clarify this behavior for future maintainers.

 car_person_schema_unregistered.nodes[0].uniqueness_constraints = [
     ["name__value", "color__value", "owner__name"]
 ]
 registry.schema.register_schema(schema=car_person_schema_unregistered, branch=default_branch.name)
 
 schema_branch = registry.schema.get_schema_branch(name=default_branch.name)
 person = await create_and_save(db=db, schema="TestPerson", name="John")
 car = await create_and_save(db=db, schema="TestCar", name="mercedes", color="blue", owner=person)
+# Note: owner__name is excluded from lock names as relationships are not included in lock calculations
 assert get_lock_names_on_object_mutation(car, branch=default_branch, schema_branch=schema_branch) == [
     "global.object.TestCar." + str(hash("mercedes")) + "." + str(hash("blue"))
 ]
backend/infrahub/lock_getter.py (1)

11-43: Consider refactoring the list modification logic for better clarity

The function logic is correct, but using pop(0) while building the list could be clearer. Consider building the final list without modification.

 def _get_kinds_to_lock_on_object_mutation(kind: str, schema_branch: SchemaBranch) -> list[str]:
     """
     Return kinds for which we want to lock during creating / updating an object of a given schema node.
     Lock should be performed on schema kind and its generics having a uniqueness_constraint defined.
     If a generic uniqueness constraint is the same as the node schema one,
     it means node schema overrided this constraint, in which case we only need to lock on the generic.
     """
 
     node_schema = schema_branch.get(name=kind)
 
     schema_uc = None
-    kinds = []
+    include_node_kind = False
     if node_schema.uniqueness_constraints:
-        kinds.append(node_schema.kind)
+        include_node_kind = True
         schema_uc = node_schema.uniqueness_constraints
 
     if isinstance(node_schema, GenericSchema):
-        return kinds
+        return [node_schema.kind] if include_node_kind else []
 
     generics_kinds = node_schema.inherit_from
 
-    node_schema_kind_removed = False
+    kinds = []
     for generic_kind in generics_kinds:
         generic_uc = schema_branch.get(name=generic_kind).uniqueness_constraints
         if generic_uc:
             kinds.append(generic_kind)
-            if not node_schema_kind_removed and generic_uc == schema_uc:
+            if include_node_kind and generic_uc == schema_uc:
                 # Check whether we should remove original schema kind as it simply overrides uniqueness_constraint
                 # of a generic
-                kinds.pop(0)
-                node_schema_kind_removed = True
+                include_node_kind = False
+    
+    if include_node_kind:
+        kinds.insert(0, node_schema.kind)
+    
     return kinds
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2628683 and f488755.

📒 Files selected for processing (3)
  • backend/infrahub/core/node/save.py (1 hunks)
  • backend/infrahub/lock_getter.py (1 hunks)
  • backend/tests/unit/core/test_get_kinds_lock.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/infrahub/core/node/save.py
🧰 Additional context used
📓 Path-based instructions (2)
backend/tests/**/*

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

Place backend tests in backend/tests/

Files:

  • backend/tests/unit/core/test_get_kinds_lock.py
backend/**/*

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

  • backend/tests/unit/core/test_get_kinds_lock.py
  • backend/infrahub/lock_getter.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: E2E-testing-version-upgrade / From 1.1.0
  • GitHub Check: backend-benchmark
  • GitHub Check: backend-docker-integration
  • GitHub Check: E2E-testing-invoke-demo-start
  • GitHub Check: validate-generated-documentation
  • GitHub Check: E2E-testing-playwright
  • GitHub Check: backend-tests-functional
  • GitHub Check: backend-tests-integration
  • GitHub Check: backend-tests-unit
🔇 Additional comments (6)
backend/tests/unit/core/test_get_kinds_lock.py (4)

1-12: Imports look good!

The new imports are appropriate for testing the refactored lock functionality.


16-38: Test coverage for kind locking logic is comprehensive!

The test properly covers different scenarios: inherited uniqueness constraints, generic relationships, and cases with no constraints.


39-51: Good test for branch-specific locking behavior!

This test correctly verifies that objects on non-default branches return empty lock names, which is the expected behavior.


52-63: No changes required: CoreGraphQLQueryGroup is correctly locked via inheritance

The CoreGraphQLQueryGroup schema definition inherits from InfrahubKind.GENERICGROUP, so _should_kind_be_locked_on_any_branch("CoreGraphQLQueryGroup", …) rightly returns True.
No update needed here.

backend/infrahub/lock_getter.py (2)

1-9: Imports and constant definition look good!

The constant clearly identifies kinds that require special locking behavior across all branches.


45-61: Clean implementation for branch-agnostic locking check!

The function efficiently checks both the kind and its inherited kinds.

@LucasG0 LucasG0 requested a review from ajtmccarty July 24, 2025 16:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
backend/infrahub/lock_getter.py (2)

37-41: Consider using a more explicit approach for removing the node schema kind.

The use of kinds.pop(0) assumes that the node schema kind is always at index 0. While this is currently true due to the order of operations, it makes the code fragile and dependent on ordering.

Consider storing the node schema kind separately and removing it explicitly:

     node_schema_kind_removed = False
     for generic_kind in generics_kinds:
         generic_uc = schema_branch.get(name=generic_kind).uniqueness_constraints
         if generic_uc:
             kinds.append(generic_kind)
             if not node_schema_kind_removed and generic_uc == schema_uc:
                 # Check whether we should remove original schema kind as it simply overrides uniqueness_constraint
                 # of a generic
-                kinds.pop(0)
+                if node_schema.kind in kinds:
+                    kinds.remove(node_schema.kind)
                 node_schema_kind_removed = True

112-113: Handle edge case when all uniqueness constraints are relationship-based.

When all uniqueness constraints are relationship-based (and thus skipped), ucs_lock_names will be empty, resulting in lock names with consecutive dots like "kind..".

Consider handling this edge case:

-        partial_lock_name = kind + "." + ".".join(ucs_lock_names)
+        if ucs_lock_names:
+            partial_lock_name = kind + "." + ".".join(ucs_lock_names)
+        else:
+            partial_lock_name = kind

This ensures consistent lock name formatting.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f488755 and 3a7793a.

📒 Files selected for processing (2)
  • backend/infrahub/lock_getter.py (1 hunks)
  • backend/tests/unit/core/test_get_kinds_lock.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/tests/unit/core/test_get_kinds_lock.py
🧰 Additional context used
📓 Path-based instructions (1)
backend/**/*

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

  • backend/infrahub/lock_getter.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: E2E-testing-version-upgrade / From 1.1.0
  • GitHub Check: backend-docker-integration
  • GitHub Check: E2E-testing-playwright
  • GitHub Check: validate-generated-documentation
  • GitHub Check: backend-tests-functional
  • GitHub Check: backend-benchmark
  • GitHub Check: E2E-testing-invoke-demo-start
  • GitHub Check: backend-tests-unit
  • GitHub Check: backend-tests-integration
🔇 Additional comments (2)
backend/infrahub/lock_getter.py (2)

1-9: LGTM!

Imports and constant definition are appropriate for the module's functionality.


45-61: LGTM!

The function correctly checks if a kind or its generics require locking on any branch.

@LucasG0 LucasG0 force-pushed the lgu-lock-on-attributes branch from e973da4 to 5f677ea Compare July 25, 2025 08:40
@LucasG0 LucasG0 requested a review from a team July 25, 2025 08:56
@fatih-acar
Copy link
Contributor

fatih-acar commented Jul 25, 2025

I think there's an issue with objects depending on a resource manager such as NumberPool... I end up with duplicate numbers allocated.

I think we still require to lock exclusively on kind when the object uses resource manager...

Edit: this issue is NOT related to #6866

@fatih-acar
Copy link
Contributor

I think there's an issue with objects depending on a resource manager such as NumberPool... I end up with duplicate numbers allocated.

I think we still require to lock exclusively on kind when the object uses resource manager...

Edit: this issue is NOT related to #6866

#7025 is not enough to fix the issue because the get_resource() call is done within a DB transaction when we're inside the Create/Upsert mutation path...
One solution to fix this would be to apply the same filter we do on uniqueness constraints, but also on attributes that depend on a resource manager.

@LucasG0 LucasG0 closed this Sep 17, 2025
@LucasG0 LucasG0 reopened this Sep 17, 2025
@fatih-acar fatih-acar force-pushed the lgu-lock-on-attributes branch 2 times, most recently from 80f493f to 40b213b Compare September 19, 2025 08:42
@fatih-acar fatih-acar force-pushed the lgu-lock-on-attributes branch 3 times, most recently from b8a059b to 1c1a372 Compare September 30, 2025 12:34
@fatih-acar
Copy link
Contributor

Fixes #7254

@fatih-acar
Copy link
Contributor

We can probably re-enable locking within branches now that they are more granular?

fatih-acar and others added 12 commits September 30, 2025 18:06
We should also lock regardless of the branch when allocating from a
resource manager.

Signed-off-by: Fatih Acar <[email protected]>
BROKEN: this does not handle double lock/deadlock issues when we acquire
a resource pool lock in run_constraints_and_save() and then in
get_resource(). We probably require a mechanism to check if the lock is
actually held by the current worker, if so it can proceed without
acquiring the lock.

This reverts commit 31d7940.
Make locks re-entrant within the same asyncio context.

Signed-off-by: Fatih Acar <[email protected]>
WIP: still need to fix the uniqueness constraint lock on update.

Signed-off-by: Fatih Acar <[email protected]>
Fixes "Lock names include attributes values"

Signed-off-by: Fatih Acar <[email protected]>
@fatih-acar fatih-acar force-pushed the lgu-lock-on-attributes branch from 5a949a9 to 746a0d1 Compare September 30, 2025 16:06
@fatih-acar fatih-acar marked this pull request as ready for review September 30, 2025 16:31
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f677ea and 746a0d1.

📒 Files selected for processing (9)
  • backend/infrahub/core/attribute.py (2 hunks)
  • backend/infrahub/core/node/__init__.py (9 hunks)
  • backend/infrahub/core/node/create.py (4 hunks)
  • backend/infrahub/core/node/save.py (1 hunks)
  • backend/infrahub/graphql/mutations/ipam.py (10 hunks)
  • backend/infrahub/graphql/mutations/main.py (5 hunks)
  • backend/infrahub/lock.py (3 hunks)
  • backend/infrahub/lock_getter.py (1 hunks)
  • backend/tests/unit/core/test_get_kinds_lock.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
backend/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

  • backend/infrahub/lock_getter.py
  • backend/infrahub/core/attribute.py
  • backend/infrahub/graphql/mutations/main.py
  • backend/infrahub/graphql/mutations/ipam.py
  • backend/infrahub/core/node/save.py
  • backend/infrahub/lock.py
  • backend/infrahub/core/node/__init__.py
  • backend/infrahub/core/node/create.py
  • backend/tests/unit/core/test_get_kinds_lock.py
**/*.py

📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)

**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpful

**/*.py: Use type hints for all function parameters and return values in Python code
Prefer asynchronous implementations when feasible (use Async whenever possible)
Define asynchronous functions with async def
Use await for asynchronous calls
Use Pydantic models instead of standard library dataclasses
All Python functions must include a docstring
Use Ruff and MyPy for linting and type checking

Use ruff and mypy to validate and lint Python files

**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Use async def for asynchronous functions
Use await for asynchronous calls within async functions
Use Pydantic models instead of dataclasses for data modeling
All Python functions must have docstrings
Docstrings must use triple quotes (""")
Follow Google-style docstring format
Docstrings should include, when applicable: brief description, detailed description, Args/Parameters (without types), Returns, Raises, and Examples
Validate and lint Python with ruff and mypy
Format Python code using the project’s formatter (invoke format)

Files:

  • backend/infrahub/lock_getter.py
  • backend/infrahub/core/attribute.py
  • backend/infrahub/graphql/mutations/main.py
  • backend/infrahub/graphql/mutations/ipam.py
  • backend/infrahub/core/node/save.py
  • backend/infrahub/lock.py
  • backend/infrahub/core/node/__init__.py
  • backend/infrahub/core/node/create.py
  • backend/tests/unit/core/test_get_kinds_lock.py
backend/tests/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Place backend tests in backend/tests/

Files:

  • backend/tests/unit/core/test_get_kinds_lock.py
🧬 Code graph analysis (9)
backend/infrahub/lock_getter.py (4)
backend/infrahub/core/schema/schema_branch.py (1)
  • SchemaBranch (72-2314)
backend/infrahub/lock.py (2)
  • build_object_lock_name (291-292)
  • get (262-268)
backend/infrahub/core/relationship/model.py (1)
  • RelationshipManager (718-1242)
backend/infrahub/core/schema/basenode_schema.py (1)
  • parse_schema_path (426-474)
backend/infrahub/core/attribute.py (1)
backend/infrahub/core/node/__init__.py (2)
  • from_graphql (966-994)
  • handle_pool (260-321)
backend/infrahub/graphql/mutations/main.py (6)
backend/infrahub/core/node/__init__.py (4)
  • save (814-824)
  • get_kind (89-91)
  • from_graphql (966-994)
  • Node (72-1062)
backend/infrahub/core/node/save.py (1)
  • run_constraints_and_save (12-57)
backend/infrahub/lock.py (1)
  • InfrahubMultiLock (49-73)
backend/infrahub/lock_getter.py (1)
  • get_lock_names_on_object_mutation (75-146)
backend/infrahub/core/node/create.py (2)
  • get_profile_ids (116-120)
  • refresh_for_profile_update (123-144)
backend/infrahub/core/manager.py (4)
  • NodeManager (132-1360)
  • get_one_by_id_or_default_filter (869-882)
  • get_one_by_id_or_default_filter (886-899)
  • get_one_by_id_or_default_filter (902-949)
backend/infrahub/graphql/mutations/ipam.py (5)
backend/infrahub/core/node/create.py (1)
  • get_profile_ids (116-120)
backend/infrahub/lock.py (2)
  • InfrahubMultiLock (49-73)
  • build_object_lock_name (291-292)
backend/infrahub/lock_getter.py (1)
  • get_lock_names_on_object_mutation (75-146)
backend/infrahub/core/node/__init__.py (2)
  • Node (72-1062)
  • from_graphql (966-994)
backend/infrahub/graphql/mutations/main.py (1)
  • mutate_update_object (282-356)
backend/infrahub/core/node/save.py (5)
backend/infrahub/core/constraint/node/runner.py (1)
  • NodeConstraintRunner (13-43)
backend/infrahub/core/node/__init__.py (2)
  • Node (72-1062)
  • save (814-824)
backend/infrahub/lock.py (1)
  • InfrahubMultiLock (49-73)
backend/infrahub/lock_getter.py (1)
  • get_lock_names_on_object_mutation (75-146)
backend/infrahub/core/node/create.py (1)
  • _persist (198-212)
backend/infrahub/lock.py (1)
backend/infrahub/core/timestamp.py (1)
  • current_timestamp (33-34)
backend/infrahub/core/node/__init__.py (2)
backend/infrahub/core/protocols_base.py (3)
  • InfrahubDatabase (35-65)
  • new (84-84)
  • from_graphql (104-104)
backend/infrahub/core/attribute.py (2)
  • validate (194-203)
  • from_graphql (581-629)
backend/infrahub/core/node/create.py (5)
backend/infrahub/core/constraint/node/runner.py (1)
  • NodeConstraintRunner (13-43)
backend/infrahub/core/node/save.py (2)
  • run_constraints_and_save (12-57)
  • _persist (47-51)
backend/infrahub/dependencies/registry.py (1)
  • get_component_registry (61-62)
backend/infrahub/lock.py (1)
  • InfrahubMultiLock (49-73)
backend/infrahub/lock_getter.py (1)
  • get_lock_names_on_object_mutation (75-146)
backend/tests/unit/core/test_get_kinds_lock.py (3)
backend/infrahub/lock_getter.py (4)
  • _get_kinds_to_lock_on_object_mutation (17-48)
  • _hash (69-72)
  • _should_kind_be_locked_on_any_branch (51-66)
  • get_lock_names_on_object_mutation (75-146)
backend/tests/node_creation.py (1)
  • create_and_save (8-12)
backend/tests/conftest.py (1)
  • car_person_schema_unregistered (497-563)
🔇 Additional comments (36)
backend/infrahub/lock.py (3)

135-136: LGTM! Re-entrant lock state initialization.

The new instance variables _owner_task and _recursion_depth are initialized correctly to support per-task ownership tracking and recursive locking semantics.


159-177: Verify metrics behavior on re-entrant acquisition.

The re-entrant locking logic correctly increments _recursion_depth and returns early when the current task already owns the lock. However, acquire_time is not reset, and no acquisition metric is recorded for re-entrant calls. Confirm that this is the intended behavior, as re-entrant acquisitions bypass the lock acquisition metric instrumentation and do not update acquire_time.


178-202: LGTM! Recursive release logic with ownership enforcement.

The release logic correctly enforces ownership, supports recursive release by decrementing _recursion_depth, and only performs final cleanup (metrics, underlying lock release, ownership clear, event signal) on the final release when recursion depth reaches zero.

backend/infrahub/core/node/__init__.py (4)

435-440: LGTM! Propagating process_pools parameter.

The _process_fields method now accepts process_pools: bool = True and correctly propagates it to _process_fields_attributes at line 494.


531-536: LGTM! process_pools parameter accepted and used.

The _process_fields_attributes method now requires process_pools: bool as a parameter (no default value), which is correctly provided by the caller _process_fields at line 494.


966-994: LGTM! from_graphql updated with process_pools parameter.

The from_graphql method signature now includes process_pools: bool = True, and the docstring (lines 972-981) correctly documents the new parameter. The parameter is correctly propagated to attribute.from_graphql at line 988.


561-569: Remove gating concern: validation always occurs later.

The initial gate around the first attribute.validate is safe because a second, unconditional call to attribute.validate at line 644 ensures all attributes are validated regardless of process_pools or from_pool.

Likely an incorrect or invalid review comment.

backend/infrahub/core/attribute.py (1)

598-602: Capture and propagate errors from handle_pool.
In backend/infrahub/core/attribute.py line 601, you pass an ephemeral errors=[] to await self.node.handle_pool(...), causing any allocation or validation errors to be lost. Either collect these errors into a local variable and handle or re-raise them, or switch to using the shared errors list pattern from the other call site.

backend/infrahub/core/node/create.py (4)

3-3: LGTM! Imports align with the new locking strategy.

The new imports support the centralized locking and constraint validation flow.

Also applies to: 5-5, 11-11, 14-15


198-220: LGTM! Multi-lock context properly guards persistence.

The nested _persist function and InfrahubMultiLock context manager ensure that:

  1. Locks are computed before acquisition (via preview object)
  2. Lock acquisition is atomic across all required locks
  3. Persistence happens within the locked region
  4. Transaction handling is correctly nested inside the lock context

This addresses the concurrency concerns raised in the PR objectives.


147-176: Approve empty fields_to_save for new nodes
The save method calls _create whenever self._existing is False—ignoring the fields argument—so passing an empty list does not prevent the node from being persisted.


193-196: Preview object with process_pools=False is implemented correctly
Node.new’s process_pools parameter is defined and honored, so lock‐name computation avoids premature resource allocation as intended.

backend/infrahub/lock_getter.py (6)

1-15: LGTM! Imports and constants are appropriate.

The constant KINDS_CONCURRENT_MUTATIONS_NOT_ALLOWED clearly documents which kinds require locking on any branch (not just the default branch).


17-48: LGTM! Optimization correctly avoids redundant locking.

The function correctly identifies when a node schema overrides a generic's uniqueness constraint and avoids locking both. The kinds.pop(0) operation is safe because the node kind is always added first (line 30) before iterating generics.


51-66: LGTM! Branch locking policy is correctly enforced.

The function properly checks both the node kind and its generics against the disallowed list. The early return for GenericSchema is correct since generics don't inherit from other types.


69-72: LGTM! SHA-256 ensures deterministic lock names across processes.

The comment clearly explains why the built-in hash() is avoided (randomization), and SHA-256 provides the necessary consistency for distributed locking.


84-100: LGTM! Resource pool locking prevents duplicate allocations.

This code directly addresses the issue raised in PR comments about objects depending on resource managers ending up with duplicate allocations. By acquiring locks on resource pool IDs during mutations, the code ensures exclusive access during allocation.

The deduplication check (if lock_name not in lock_names) prevents redundant locks when multiple attributes or relationships use the same pool.


102-103: LGTM! Branch-based locking optimization is sound.

This early return implements the PR's key optimization: allowing concurrent mutations on non-default branches (except for specific kinds) since validation will occur during merge to main. This reduces lock contention while maintaining safety.

backend/infrahub/core/node/save.py (4)

1-10: LGTM! Imports are clean and complete.

All necessary imports are present for the centralized constraint validation and persistence flow.


37-38: LGTM! Validation prevents misuse of the API.

The check ensures that callers who want to manage locks externally (manage_lock=False) must provide the lock names. The error message is clear.


40-45: LGTM! Lock resolution is correct.

The function correctly uses provided locks when available, or computes them via get_lock_names_on_object_mutation. Converting to a list ensures mutability for the lock registry.


47-57: LGTM! Dual-mode locking design is well-implemented.

The nested _persist function cleanly encapsulates the critical section, and the conditional lock acquisition supports both:

  1. Standalone usage where this function manages locks
  2. Nested usage where the caller has already acquired locks (e.g., when calling from _do_create_node)

This design enables re-entrant locking mentioned in the PR objectives.

backend/tests/unit/core/test_get_kinds_lock.py (5)

1-13: LGTM!

The imports align with the new centralized locking strategy and test helper usage. The deepcopy import is appropriately used to avoid mutating shared fixtures.


40-51: LGTM!

The test correctly validates that objects on non-default branches do not acquire locks (unless the kind specifically requires locking on any branch), which aligns with the PR's goal of enabling more granular locking.


53-63: LGTM!

The test correctly validates that specific kinds (CoreGraphQLQueryGroup) require locking on all branches, ensuring critical objects remain protected from concurrent mutations regardless of branch context.


65-83: LGTM!

The test correctly validates attribute-based lock name generation. Note that the uniqueness constraint includes a relationship field (owner__name), but the lock name correctly excludes it—this aligns with the implementation in lock_getter.py which skips relationship-based constraints.


85-100: LGTM!

The test correctly validates handling of optional attributes in lock names. When an attribute is part of a uniqueness constraint but has no value, it correctly contributes an empty hash (_hash("")) to the lock name.

backend/infrahub/graphql/mutations/main.py (4)

1-46: LGTM!

The import changes correctly introduce the centralized locking utilities (InfrahubMultiLock, get_lock_names_on_object_mutation) and the unified constraint-and-save helper (run_constraints_and_save), aligning with the PR's refactoring goals.


213-233: Excellent use of preview pattern to avoid side effects during lock computation.

The code correctly:

  • Fetches a clean copy of the object to preview the mutation
  • Calls from_graphql with process_pools=False to prevent resource pool allocations during lock name computation
  • Computes lock names before acquiring locks or starting transactions

This prevents the duplicate allocation issue mentioned in the PR objectives.


235-258: LGTM! Correct lock and transaction nesting.

The implementation properly:

  • Acquires locks at the outermost scope via InfrahubMultiLock
  • Passes manage_lock=False to the inner mutation to avoid re-acquiring locks (preventing deadlock)
  • Handles both transactional and non-transactional database contexts
  • Maintains consistent lock ordering across the mutation lifecycle

281-356: LGTM! Well-structured mutation orchestration.

The refactored mutate_update_object correctly:

  • Expands the signature to support the unified constraint-and-save flow
  • Includes a comprehensive docstring following Google style (as per coding guidelines)
  • Uses conditional apply_data to support both fresh mutations and preview-based flows
  • Delegates constraint validation and persistence to run_constraints_and_save
  • Properly handles lock computation and profile refresh

The complexity is justified by the need to support multiple mutation paths (direct, preview-based, locked, unlocked) with a single implementation.

backend/infrahub/graphql/mutations/ipam.py (5)

1-29: LGTM!

The import changes correctly add profile tracking (get_profile_ids) and centralized lock computation (get_lock_names_on_object_mutation) utilities, aligning with the broader mutation refactoring.


228-242: LGTM! Consistent nested lock ordering.

The implementation correctly uses nested InfrahubMultiLock contexts with consistent ordering:

  1. Namespace-level lock (outer)
  2. Object-level lock (inner)
  3. Database transaction (innermost)

This pattern, combined with re-entrant locking (per PR objectives), prevents deadlocks while ensuring proper isolation for IPAM reconciliation.


392-402: Verify resource pool handling to prevent duplicate allocations.

Similar to IPAddress, this code applies GraphQL data directly to the prefix object (line 393) with default process_pools=True before computing lock names. If prefix attributes depend on resource managers, this could trigger allocation before the lock is acquired.

The same mitigation as suggested for IPAddress would apply here: use the preview pattern or verify that IPPrefix doesn't use resource pools.

The verification script from the previous comment will cover both IPAddress and IPPrefix schemas.


479-487: LGTM! Delete operation correctly uses locking.

The delete flow properly acquires the namespace-level lock before starting the transaction and performing IPAM reconciliation, ensuring consistency during concurrent operations.


215-225: No resource pools on IPAddress—no changes needed.

IPAddress objects don’t use any pool attributes (only IPAddressPool nodes are backed by pools), and from_graphql doesn’t support a process_pools flag. The mutation as written won’t trigger duplicate allocations.

Comment on lines +581 to 587
async def from_graphql(
self,
data: dict,
db: InfrahubDatabase,
process_pools: bool = True,
) -> bool:
"""Update attr from GraphQL payload"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Update docstring to document parameters.

The from_graphql method signature now includes process_pools: bool = True, but the docstring (line 587) is brief and does not document any parameters. Please update the docstring to include an Args section documenting the data, db, and process_pools parameters, as well as a Returns section.

As per coding guidelines

Apply this diff to update the docstring:

     async def from_graphql(
         self,
         data: dict,
         db: InfrahubDatabase,
         process_pools: bool = True,
     ) -> bool:
-        """Update attr from GraphQL payload"""
+        """Update attribute from a GraphQL payload.
+
+        Args:
+            data: GraphQL payload to apply.
+            db: Database connection used for related lookups.
+            process_pools: Whether resource pool allocations should be performed; when False, pool allocation is skipped.
+
+        Returns:
+            True if the attribute changed, otherwise False.
+        """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def from_graphql(
self,
data: dict,
db: InfrahubDatabase,
process_pools: bool = True,
) -> bool:
"""Update attr from GraphQL payload"""
async def from_graphql(
self,
data: dict,
db: InfrahubDatabase,
process_pools: bool = True,
) -> bool:
"""Update attribute from a GraphQL payload.
Args:
data: GraphQL payload to apply.
db: Database connection used for related lookups.
process_pools: Whether resource pool allocations should be performed; when False, pool allocation is skipped.
Returns:
True if the attribute changed, otherwise False.
"""
🤖 Prompt for AI Agents
In backend/infrahub/core/attribute.py around lines 581 to 587, the from_graphql
method docstring is too brief and missing parameter and return documentation;
update the docstring to include an Args section documenting: data (dict) —
GraphQL payload used to update the attribute; db (InfrahubDatabase) — database
instance used for lookups/updates; process_pools (bool, default True) — whether
to process pool-related data; and add a Returns section describing the boolean
return (True on successful update, False otherwise). Keep formatting consistent
with the project's docstring style and place the Args and Returns under the
existing one-line summary.

Comment on lines +260 to +266
async def handle_pool(
self,
db: InfrahubDatabase,
attribute: BaseAttribute,
errors: list,
allocate_resources: bool = True,
) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Update docstring to document allocate_resources parameter.

The method signature now includes allocate_resources: bool = True, but the docstring (lines 267-271) does not document this new parameter. Please update the docstring to include an Args section with the allocate_resources parameter.

As per coding guidelines

Apply this diff to update the docstring:

     async def handle_pool(
         self,
         db: InfrahubDatabase,
         attribute: BaseAttribute,
         errors: list,
         allocate_resources: bool = True,
     ) -> None:
-        """Evaluate if a resource has been requested from a pool and apply the resource
+        """Evaluate if a resource has been requested from a pool and apply the resource.
+
+        Args:
+            db: Database connection.
+            attribute: Attribute to evaluate for pool allocation.
+            errors: List to append validation errors.
+            allocate_resources: Whether to perform pool allocation; when False, pool allocation is skipped.
 
         This method only works on number pools, currently Integer is the only type that has the from_pool
         within the create code.
         """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def handle_pool(
self,
db: InfrahubDatabase,
attribute: BaseAttribute,
errors: list,
allocate_resources: bool = True,
) -> None:
async def handle_pool(
self,
db: InfrahubDatabase,
attribute: BaseAttribute,
errors: list,
allocate_resources: bool = True,
) -> None:
"""Evaluate if a resource has been requested from a pool and apply the resource.
Args:
db: Database connection.
attribute: Attribute to evaluate for pool allocation.
errors: List to append validation errors.
allocate_resources: Whether to perform pool allocation; when False, pool allocation is skipped.
This method only works on number pools, currently Integer is the only type that has the from_pool
within the create code.
"""
🤖 Prompt for AI Agents
In backend/infrahub/core/node/__init__.py around lines 260 to 266, the method
handle_pool now accepts an allocate_resources: bool = True parameter but the
docstring was not updated; add or update the docstring's Args section to include
allocate_resources with a brief description (type bool, default True) explaining
that when True resources will be allocated during handling and when False
allocation is skipped, and ensure the Args list includes the existing parameters
(db, attribute, errors) in the same style and formatting as the project's
docstring conventions.

Comment on lines +697 to +703
async def new(
self,
db: InfrahubDatabase,
id: str | None = None,
process_pools: bool = True,
**kwargs: Any,
) -> Self:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Update docstring to document process_pools parameter.

The new method signature now includes process_pools: bool = True, but there is no docstring present for this method. Please add a docstring that includes an Args section documenting the process_pools parameter.

As per coding guidelines

Apply this diff to add a docstring:

     async def new(
         self,
         db: InfrahubDatabase,
         id: str | None = None,
         process_pools: bool = True,
         **kwargs: Any,
     ) -> Self:
+        """Create a new node instance with the provided data.
+
+        Args:
+            db: Database connection.
+            id: Optional UUID for the node; generated if not provided.
+            process_pools: Whether to perform resource pool allocations; when False, pool allocation is skipped.
+            **kwargs: Field values for the node.
+
+        Returns:
+            The initialized node instance.
+        """
         if id and not is_valid_uuid(id):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def new(
self,
db: InfrahubDatabase,
id: str | None = None,
process_pools: bool = True,
**kwargs: Any,
) -> Self:
async def new(
self,
db: InfrahubDatabase,
id: str | None = None,
process_pools: bool = True,
**kwargs: Any,
) -> Self:
"""Create a new node instance with the provided data.
Args:
db: Database connection.
id: Optional UUID for the node; generated if not provided.
process_pools: Whether to perform resource pool allocations; when False, pool allocation is skipped.
**kwargs: Field values for the node.
Returns:
The initialized node instance.
"""
if id and not is_valid_uuid(id):
🤖 Prompt for AI Agents
In backend/infrahub/core/node/__init__.py around lines 697 to 703, the async
new(...) method lacks a docstring and must document the new process_pools: bool
= True parameter; add a concise docstring above the method including a short
description of the factory behavior and an Args section that documents db, id,
process_pools (explain that process_pools controls whether pool-related
processing is performed, default True), and **kwargs, and mention the return
type in a Returns section (Self or instance). Ensure formatting follows existing
docstring style in the file.

Comment on lines +12 to +35
async def run_constraints_and_save(
node: Node,
node_constraint_runner: NodeConstraintRunner,
fields_to_validate: list[str],
fields_to_save: list[str],
db: InfrahubDatabase,
branch: Branch,
skip_uniqueness_check: bool = False,
lock_names: Sequence[str] | None = None,
manage_lock: bool = True,
) -> None:
"""Validate a node and persist it, optionally reusing an existing lock context.
Args:
node: The node instance to validate and persist.
node_constraint_runner: Runner executing node-level constraints.
fields_to_validate: Field names that must be validated.
fields_to_save: Field names that must be persisted.
db: Database connection or transaction to use for persistence.
branch: Branch associated with the mutation.
skip_uniqueness_check: Whether to skip uniqueness constraints.
lock_names: Precomputed lock identifiers to reuse when ``manage_lock`` is False.
manage_lock: Whether this helper should acquire and release locks itself.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add Raises section to docstring.

The docstring is missing a Raises section to document the ValueError raised on line 38 when manage_lock is False but lock_names is None.

Apply this diff:

     Args:
         node: The node instance to validate and persist.
         node_constraint_runner: Runner executing node-level constraints.
         fields_to_validate: Field names that must be validated.
         fields_to_save: Field names that must be persisted.
         db: Database connection or transaction to use for persistence.
         branch: Branch associated with the mutation.
         skip_uniqueness_check: Whether to skip uniqueness constraints.
         lock_names: Precomputed lock identifiers to reuse when ``manage_lock`` is False.
         manage_lock: Whether this helper should acquire and release locks itself.
+
+    Raises:
+        ValueError: When manage_lock is False but lock_names is not provided.
     """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def run_constraints_and_save(
node: Node,
node_constraint_runner: NodeConstraintRunner,
fields_to_validate: list[str],
fields_to_save: list[str],
db: InfrahubDatabase,
branch: Branch,
skip_uniqueness_check: bool = False,
lock_names: Sequence[str] | None = None,
manage_lock: bool = True,
) -> None:
"""Validate a node and persist it, optionally reusing an existing lock context.
Args:
node: The node instance to validate and persist.
node_constraint_runner: Runner executing node-level constraints.
fields_to_validate: Field names that must be validated.
fields_to_save: Field names that must be persisted.
db: Database connection or transaction to use for persistence.
branch: Branch associated with the mutation.
skip_uniqueness_check: Whether to skip uniqueness constraints.
lock_names: Precomputed lock identifiers to reuse when ``manage_lock`` is False.
manage_lock: Whether this helper should acquire and release locks itself.
"""
async def run_constraints_and_save(
node: Node,
node_constraint_runner: NodeConstraintRunner,
fields_to_validate: list[str],
fields_to_save: list[str],
db: InfrahubDatabase,
branch: Branch,
skip_uniqueness_check: bool = False,
lock_names: Sequence[str] | None = None,
manage_lock: bool = True,
) -> None:
"""Validate a node and persist it, optionally reusing an existing lock context.
Args:
node: The node instance to validate and persist.
node_constraint_runner: Runner executing node-level constraints.
fields_to_validate: Field names that must be validated.
fields_to_save: Field names that must be persisted.
db: Database connection or transaction to use for persistence.
branch: Branch associated with the mutation.
skip_uniqueness_check: Whether to skip uniqueness constraints.
lock_names: Precomputed lock identifiers to reuse when ``manage_lock`` is False.
manage_lock: Whether this helper should acquire and release locks itself.
Raises:
ValueError: When manage_lock is False but lock_names is not provided.
"""
🤖 Prompt for AI Agents
In backend/infrahub/core/node/save.py around lines 12 to 35, the function
docstring is missing a Raises section documenting that a ValueError is raised
when manage_lock is False but lock_names is None; add a "Raises" section to the
docstring that specifies ValueError is raised in that exact condition (include
the parameter names and a brief message such as "manage_lock is False but
lock_names is None"), so callers understand the precondition and error behavior.

Previous implementation was wrong due to asyncio context switching.

Signed-off-by: Fatih Acar <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 746a0d1 and 537cc7f.

📒 Files selected for processing (3)
  • backend/infrahub/graphql/mutations/ipam.py (10 hunks)
  • backend/infrahub/lock.py (3 hunks)
  • backend/infrahub/lock_getter.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
backend/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

  • backend/infrahub/lock_getter.py
  • backend/infrahub/lock.py
  • backend/infrahub/graphql/mutations/ipam.py
**/*.py

📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)

**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpful

**/*.py: Use type hints for all function parameters and return values in Python code
Prefer asynchronous implementations when feasible (use Async whenever possible)
Define asynchronous functions with async def
Use await for asynchronous calls
Use Pydantic models instead of standard library dataclasses
All Python functions must include a docstring
Use Ruff and MyPy for linting and type checking

Use ruff and mypy to validate and lint Python files

**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Use async def for asynchronous functions
Use await for asynchronous calls within async functions
Use Pydantic models instead of dataclasses for data modeling
All Python functions must have docstrings
Docstrings must use triple quotes (""")
Follow Google-style docstring format
Docstrings should include, when applicable: brief description, detailed description, Args/Parameters (without types), Returns, Raises, and Examples
Validate and lint Python with ruff and mypy
Format Python code using the project’s formatter (invoke format)

Files:

  • backend/infrahub/lock_getter.py
  • backend/infrahub/lock.py
  • backend/infrahub/graphql/mutations/ipam.py
🧬 Code graph analysis (3)
backend/infrahub/lock_getter.py (4)
backend/infrahub/core/schema/schema_branch.py (1)
  • SchemaBranch (72-2314)
backend/infrahub/lock.py (2)
  • build_object_lock_name (283-284)
  • get (254-260)
backend/infrahub/core/relationship/model.py (2)
  • RelationshipManager (718-1242)
  • pop (678-684)
backend/infrahub/core/schema/basenode_schema.py (1)
  • parse_schema_path (426-474)
backend/infrahub/lock.py (1)
backend/infrahub/core/timestamp.py (1)
  • current_timestamp (33-34)
backend/infrahub/graphql/mutations/ipam.py (5)
backend/infrahub/core/node/create.py (1)
  • get_profile_ids (116-120)
backend/infrahub/lock.py (2)
  • InfrahubMultiLock (50-74)
  • build_object_lock_name (283-284)
backend/infrahub/lock_getter.py (1)
  • get_lock_names_on_object_mutation (54-125)
backend/infrahub/core/node/__init__.py (2)
  • Node (72-1062)
  • from_graphql (966-994)
backend/infrahub/graphql/mutations/main.py (1)
  • mutate_update_object (282-356)
🪛 GitHub Actions: CI
backend/infrahub/lock_getter.py

[error] 54-54: Ruff: ARG001 Unused function argument: branch.

🔇 Additional comments (11)
backend/infrahub/lock.py (4)

8-8: LGTM!

The ContextVar import is necessary for implementing re-entrant locking semantics.


136-136: LGTM!

The ContextVar initialization correctly provides per-context recursion tracking with a unique name to avoid collisions between different lock instances.


159-172: LGTM!

The re-entrant acquire logic is correct: it increments the recursion depth if the lock is already held in the current context and returns early without performing a duplicate acquisition. The outermost acquire initializes the depth to 1 and records metrics appropriately.


174-194: LGTM!

The re-entrant release logic correctly handles:

  • Error case: raises RuntimeError if releasing without ownership
  • Nested releases: decrements depth without releasing the underlying lock
  • Final release: records metrics, releases the underlying lock, and clears the recursion context

The implementation ensures that metrics are only recorded for the outermost acquisition/release pair.

backend/infrahub/lock_getter.py (3)

14-45: LGTM!

The function correctly identifies kinds requiring locks by examining uniqueness constraints on both the node schema and its generics. The logic to remove the node kind when it simply overrides a generic's constraint (lines 40-44) prevents duplicate locking and is well-documented.


48-51: LGTM!

The use of SHA-256 ensures consistent lock names across different processes, which is essential for distributed locking. The comment clearly explains why Python's built-in hash() is unsuitable.


81-125: LGTM!

The uniqueness constraint processing correctly:

  • Deduplicates attributes across multiple constraints (lines 96-97)
  • Excludes relationship constraints as intended per PR objectives (lines 100-102)
  • Handles optional/None attributes gracefully (lines 106-109)
  • Skips kinds with no qualifying attributes to prevent malformed lock names (lines 119-120)
  • Builds hierarchical lock names from hashed attribute values (lines 122-123)

The implementation addresses previous review feedback and handles edge cases appropriately.

backend/infrahub/graphql/mutations/ipam.py (4)

131-152: LGTM, pending signature fix.

The refactored mutate_create correctly:

  • Applies @retry_db_transaction decorator for resilience
  • Acquires locks before opening the transaction (lines 145-146)
  • Performs reconciliation within the transaction scope (lines 147-149)

The structure is sound once the signature inconsistency on line 144 is resolved.


292-332: LGTM!

The IPPrefix mutation correctly:

  • Documents the cardinality-one relationship locking requirement (lines 293-295)
  • Acquires namespace-level lock before the transaction (line 325)
  • Performs reconciliation within the transaction scope (lines 327-329)
  • Applies @retry_db_transaction decorator for resilience

The implementation is consistent and well-structured.


366-484: LGTM!

Both mutate_update and mutate_delete correctly implement the centralized locking pattern:

  • mutate_update: nested locks (namespace → object) wrap the transaction and reconciliation (lines 402-417)
  • mutate_delete: namespace lock wraps the transaction and reconciliation (lines 477-481)

The lock acquisition and transaction boundaries are properly aligned, ensuring data consistency during concurrent operations.


189-242: Ensure _get_lock_names(namespace_id, branch) signature matches its call. Lock ordering across IPAM update mutations is consistent (namespace → object → transaction).

@fatih-acar fatih-acar force-pushed the lgu-lock-on-attributes branch 3 times, most recently from 763f5cc to 7b6f018 Compare September 30, 2025 20:22
We can enable locking for consistency on branches now that they are more
granular and avoid performance penalty.

Signed-off-by: Fatih Acar <[email protected]>
@fatih-acar fatih-acar force-pushed the lgu-lock-on-attributes branch from 7b6f018 to 3aa9ba8 Compare September 30, 2025 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
group/backend Issue related to the backend (API Server, Git Agent)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants