Skip to content

Conversation

valeriy42
Copy link
Contributor

@valeriy42 valeriy42 commented Jul 30, 2025

A follow-up to #131990. This PR ensures that only assigned allocations and not current allocations are used in the memory requirements calculation in AssignmentPlan.

This change led to the simplification of the code in ZoneAwareAssignmentPlanner and TrainedModelRebalancer.

This PR also improves readability by adding comments, code documentation, renaming variables, and making the flow of if statements more straightforward.

Marking is a non-issue since the bug was already documented in #131990.

@valeriy42 valeriy42 added >bug :ml Machine learning v9.2.0 labels Jul 30, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @valeriy42, I've created a changelog YAML for you.

valeriy42 and others added 4 commits July 30, 2025 12:45
# Conflicts:
#	x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/assignment/TrainedModelAssignmentRebalancer.java
#	x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/assignment/planning/AssignmentPlan.java
#	x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/assignment/planning/ZoneAwareAssignmentPlanner.java
@valeriy42 valeriy42 added >non-issue and removed >bug labels Jul 30, 2025
@valeriy42 valeriy42 added v9.1.0 v8.19.1 v9.0.5 v8.18.5 auto-backport Automatically create backport pull requests when merged labels Jul 30, 2025
@valeriy42 valeriy42 changed the title [ML] Fix double-accounting for allocations [ML] Make AssignmentPlan to consider only assigned allocations Jul 30, 2025
Map<AssignmentPlan.Node, Integer> sourceNodeAssignments = source.assignments(deployment).orElse(Map.of());
for (Map.Entry<AssignmentPlan.Node, Integer> sourceAssignment : sourceNodeAssignments.entrySet()) {
AssignmentPlan.Node node = originalNodeById.get(sourceAssignment.getKey().id());
dest.assignModelToNode(deployment, node, sourceAssignment.getValue());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, calling assignedModelToNode is enough to correctly account for allocated memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a unit test for the copyAssignments method in TrainedModelAssignmentRebalancerTests

);
planBuilder.accountMemory(m, originalNode, requiredMemory);
}
finalPlanBuilder.assignModelToNode(originalDeployment, originalNode, assignment.getValue());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, calling assignedModelToNode is enough to correctly account for allocated memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make AssignmentPlan::accountMemory() a private method so it is not accidentally called by one of the planners

@valeriy42 valeriy42 requested a review from Copilot July 30, 2025 13:45
@valeriy42 valeriy42 marked this pull request as ready for review July 30, 2025 13:45
@valeriy42 valeriy42 requested a review from jan-elastic July 30, 2025 13:45
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the AssignmentPlan to consider only assigned allocations (not current allocations) in memory requirement calculations. The main change eliminates the need for manual memory accounting of current allocations since AssignmentPlan.Builder now properly calculates memory requirements based only on newly assigned allocations.

Key changes:

  • Memory calculation logic updated to exclude current allocations from requirements
  • Simplified code in ZoneAwareAssignmentPlanner and TrainedModelRebalancer by removing manual memory accounting
  • Improved code readability with better documentation and variable naming

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
AssignmentPlanTests.java Updated test expectations to reflect corrected memory calculations
ZoneAwareAssignmentPlanner.java Removed manual memory accounting for current allocations and improved documentation
AssignmentPlanner.java Enhanced documentation and improved code flow with clearer variable names
AssignmentPlan.java Added comprehensive documentation and simplified allocation tracking methods
TrainedModelAssignmentRebalancer.java Removed manual memory accounting logic and improved variable naming

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Jul 30, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

valeriy42 and others added 2 commits July 30, 2025 15:52
…rence/assignment/planning/AssignmentPlan.java

Co-authored-by: Copilot <[email protected]>
…rence/assignment/planning/AssignmentPlanner.java

Co-authored-by: Copilot <[email protected]>
@@ -131,7 +132,7 @@ public void testAssignModelToNode_GivenNewPlanSatisfiesCurrentAssignment() {
builder.assignModelToNode(m, n, 1);

assertThat(builder.getRemainingCores(n), equalTo(2));
assertThat(builder.getRemainingMemory(n), equalTo(ByteSizeValue.ofMb(350).getBytes()));
assertThat(builder.getRemainingMemory(n), equalTo(ByteSizeValue.ofMb(50).getBytes()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 50?

If I understand correctly, the node has 350MB and you've assigned a 30MB deployment to it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it consists of:

  • 240MB of memory overhead
  • 2 x 30MB (=model size)

These tests could use some explanation of the numbers.

Copy link
Contributor

@jan-elastic jan-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, see the one comment.

(Disclaimer: this code is pretty hard to read, so not 100% sure. Another pair of eyes might be worth it...)

@valeriy42 valeriy42 requested a review from davidkyle July 30, 2025 14:32
@valeriy42 valeriy42 merged commit 80c47f3 into elastic:main Jul 31, 2025
33 checks passed
@valeriy42 valeriy42 deleted the bug/allocations-5 branch July 31, 2025 14:25
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.1
8.19 Commit could not be cherrypicked due to conflicts
9.0
8.18 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 132170

valeriy42 added a commit to valeriy42/elasticsearch that referenced this pull request Jul 31, 2025
…ic#132170)

A follow-up to elastic#131990. This PR ensures that only assigned allocations and not current allocations are used in the memory requirements calculation in AssignmentPlan.

This change led to the simplification of the code in ZoneAwareAssignmentPlanner and TrainedModelRebalancer.

This PR also improves readability by adding comments, code documentation, renaming variables, and making the flow of if statements more straightforward.

Marking is a non-issue since the bug was already documented in elastic#131990.
valeriy42 added a commit to valeriy42/elasticsearch that referenced this pull request Jul 31, 2025
…ic#132170)

A follow-up to elastic#131990. This PR ensures that only assigned allocations and not current allocations are used in the memory requirements calculation in AssignmentPlan.

This change led to the simplification of the code in ZoneAwareAssignmentPlanner and TrainedModelRebalancer.

This PR also improves readability by adding comments, code documentation, renaming variables, and making the flow of if statements more straightforward.

Marking is a non-issue since the bug was already documented in elastic#131990.
@valeriy42
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.19
8.18

Questions ?

Please refer to the Backport tool documentation

valeriy42 added a commit to valeriy42/elasticsearch that referenced this pull request Jul 31, 2025
…ic#132170)

A follow-up to elastic#131990. This PR ensures that only assigned allocations and not current allocations are used in the memory requirements calculation in AssignmentPlan.

This change led to the simplification of the code in ZoneAwareAssignmentPlanner and TrainedModelRebalancer.

This PR also improves readability by adding comments, code documentation, renaming variables, and making the flow of if statements more straightforward.

Marking is a non-issue since the bug was already documented in elastic#131990.

(cherry picked from commit 80c47f3)
valeriy42 added a commit to valeriy42/elasticsearch that referenced this pull request Jul 31, 2025
…ic#132170)

A follow-up to elastic#131990. This PR ensures that only assigned allocations and not current allocations are used in the memory requirements calculation in AssignmentPlan.

This change led to the simplification of the code in ZoneAwareAssignmentPlanner and TrainedModelRebalancer.

This PR also improves readability by adding comments, code documentation, renaming variables, and making the flow of if statements more straightforward.

Marking is a non-issue since the bug was already documented in elastic#131990.

(cherry picked from commit 80c47f3)
elasticsearchmachine pushed a commit that referenced this pull request Jul 31, 2025
…) (#132271)

A follow-up to #131990. This PR ensures that only assigned allocations and not current allocations are used in the memory requirements calculation in AssignmentPlan.

This change led to the simplification of the code in ZoneAwareAssignmentPlanner and TrainedModelRebalancer.

This PR also improves readability by adding comments, code documentation, renaming variables, and making the flow of if statements more straightforward.

Marking is a non-issue since the bug was already documented in #131990.
elasticsearchmachine pushed a commit that referenced this pull request Jul 31, 2025
…) (#132270)

A follow-up to #131990. This PR ensures that only assigned allocations and not current allocations are used in the memory requirements calculation in AssignmentPlan.

This change led to the simplification of the code in ZoneAwareAssignmentPlanner and TrainedModelRebalancer.

This PR also improves readability by adding comments, code documentation, renaming variables, and making the flow of if statements more straightforward.

Marking is a non-issue since the bug was already documented in #131990.
smalyshev pushed a commit to smalyshev/elasticsearch that referenced this pull request Jul 31, 2025
…ic#132170)

A follow-up to elastic#131990. This PR ensures that only assigned allocations and not current allocations are used in the memory requirements calculation in AssignmentPlan.

This change led to the simplification of the code in ZoneAwareAssignmentPlanner and TrainedModelRebalancer.

This PR also improves readability by adding comments, code documentation, renaming variables, and making the flow of if statements more straightforward.

Marking is a non-issue since the bug was already documented in elastic#131990.
elasticsearchmachine pushed a commit that referenced this pull request Aug 1, 2025
…#132170) (#132275)

* [ML] Make AssignmentPlan to consider only assigned allocations (#132170)

A follow-up to #131990. This PR ensures that only assigned allocations and not current allocations are used in the memory requirements calculation in AssignmentPlan.

This change led to the simplification of the code in ZoneAwareAssignmentPlanner and TrainedModelRebalancer.

This PR also improves readability by adding comments, code documentation, renaming variables, and making the flow of if statements more straightforward.

Marking is a non-issue since the bug was already documented in #131990.

(cherry picked from commit 80c47f3)

* Fix backport errors

* Fix unit test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged backport pending :ml Machine learning >non-issue Team:ML Meta label for the ML team v8.18.5 v8.19.1 v9.0.5 v9.1.0 v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants