Skip to content

Conversation

ChrisHegarty
Copy link
Contributor

This commit upgrades hppc to 0.9.1.

The motivation for this upgrade is that the yet-to-be-released Lucene 9.11 has a new dependency from the org.apache.lucene.join module to hppc, and that dependency uses the the module name com.carrotsearch.hppc. Hppc has added an explicit automatic module name in the manifest, which effectively changes the auto module name from the plain hppc to com.carrotsearch.hppc. Without this change the org.apache.lucene.join module will fail to resolve during startup, when we upgrade to Lucene 9.11

@ChrisHegarty ChrisHegarty added :Core/Infra/Core Core issues without another label >upgrade Team:Core/Infra Meta label for core/infra team v8.15.0 labels May 24, 2024
@ChrisHegarty ChrisHegarty requested a review from a team as a code owner May 24, 2024 11:04
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine
Copy link
Collaborator

Hi @ChrisHegarty, I've created a changelog YAML for you.

@ChrisHegarty
Copy link
Contributor Author

Linking the Lucene change that resulted in this dependency - apache/lucene#13392

@ChrisHegarty
Copy link
Contributor Author

A number of failures are being triggered in the CI. Likely the same root cause:

./gradlew ':x-pack:qa:core-rest-tests-with-security:yamlRestTest' --tests "org.elasticsearch.xpack.security.CoreWithSecurityClientYamlTestSuiteIT.test {yaml=msearch/10_basic/Search with new response format}" -Dtests.seed=42F259C5B6119E57 -Dtests.locale=ga-IE -Dtests.timezone=Pacific/Apia -Druntime.java=22
2024-05-25T00:05:25,427][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [test-cluster-0] fatal error in thread [elasticsearch[test-cluster-0][masterService#updateTask][T#1]], exiting java.lang.AssertionError: RoutingNodes [routing_nodes:
-----node_id[CqzlFnXeQL-S7taJxG6h_A][V]
--------[index_1][0], node[CqzlFnXeQL-S7taJxG6h_A], [P], s[STARTED], a[id=Dcm-EX5LSGu4NEnsorI8kw], failed_attempts[0]
--------[index_2][0], node[CqzlFnXeQL-S7taJxG6h_A], [P], recovery_source[new shard recovery], s[INITIALIZING], a[id=ItkC4QfoTXqPuEUB-Z5x_g], unassigned_info[[reason=INDEX_CREATED], at[2024-05-24T11:05:25.356Z], delayed=false, allocation_status[no_attempt]], failed_attempts[0]
---- unassigned
--------[index_1][0], node[null], [R], recovery_source[peer recovery], s[UNASSIGNED], unassigned_info[[reason=INDEX_CREATED], at[2024-05-24T11:05:24.902Z], delayed=false, allocation_status[no_attempt]], failed_attempts[0]
--------[index_2][0], node[null], [R], recovery_source[peer recovery], s[UNASSIGNED], unassigned_info[[reason=INDEX_CREATED], at[2024-05-24T11:05:25.356Z], delayed=false, allocation_status[no_attempt]], failed_attempts[0]
] are not consistent with this cluster state [routing_nodes:
-----node_id[CqzlFnXeQL-S7taJxG6h_A][V]
--------[index_2][0], node[CqzlFnXeQL-S7taJxG6h_A], [P], recovery_source[new shard recovery], s[INITIALIZING], a[id=ItkC4QfoTXqPuEUB-Z5x_g], unassigned_info[[reason=INDEX_CREATED], at[2024-05-24T11:05:25.356Z], delayed=false, allocation_status[no_attempt]], failed_attempts[0]
--------[index_1][0], node[CqzlFnXeQL-S7taJxG6h_A], [P], s[STARTED], a[id=Dcm-EX5LSGu4NEnsorI8kw], failed_attempts[0]
---- unassigned
--------[index_2][0], node[null], [R], recovery_source[peer recovery], s[UNASSIGNED], unassigned_info[[reason=INDEX_CREATED], at[2024-05-24T11:05:25.356Z], delayed=false, allocation_status[no_attempt]], failed_attempts[0]
--------[index_1][0], node[null], [R], recovery_source[peer recovery], s[UNASSIGNED], unassigned_info[[reason=INDEX_CREATED], at[2024-05-24T11:05:24.902Z], delayed=false, allocation_status[no_attempt]], failed_attempts[0]
]
	at [email protected]/org.elasticsearch.cluster.ClusterState.assertConsistentRoutingNodes(ClusterState.java:246)
	at [email protected]/org.elasticsearch.cluster.ClusterState.<init>(ClusterState.java:231)
	at [email protected]/org.elasticsearch.cluster.ClusterState$Builder.build(ClusterState.java:962)
	at [email protected]/org.elasticsearch.cluster.service.MasterService.patchVersions(MasterService.java:516)
	at [email protected]/org.elasticsearch.cluster.service.MasterService.executeAndPublishBatch(MasterService.java:231)
	at [email protected]/org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.lambda$run$2(MasterService.java:1656)
	at [email protected]/org.elasticsearch.action.ActionListener.run(ActionListener.java:433)
	at [email protected]/org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.run(MasterService.java:1653)
	at [email protected]/org.elasticsearch.cluster.service.MasterService$5.lambda$doRun$0(MasterService.java:1248)
	at [email protected]/org.elasticsearch.action.ActionListener.run(ActionListener.java:433)
	at [email protected]/org.elasticsearch.cluster.service.MasterService$5.doRun(MasterService.java:1227)
	at [email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
	at [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1570)

Hmm... My initial intuition here is that the assertion/equality is dependant on list/iteration order or something? that may have changed in hppc implementations.

@rjernst
Copy link
Member

rjernst commented May 24, 2024

I just closed the same PR that I've been off and on trying to get in for 2 years! :)
#84168

I think the issue is inconsistency in entry ordering. It seemed quite pervasive, which is why I eventually gave up.

@ChrisHegarty
Copy link
Contributor Author

@elasticmachine update branch

@ChrisHegarty
Copy link
Contributor Author

Ok, so there is more than just an issue with UnassignedShards. There seem to be other areas that depend upon ordering, as can be seen by subsequent test failures (even after working around the equality in UnassignedShards.

@ChrisHegarty
Copy link
Contributor Author

Lucene has removed its dependency on hppc completely. The upgrade to 0.9.1 is no longer required in order to facilitate the Lucene upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team >upgrade v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants