Avoid reconstructing HNSW graphs during segment merging. #15003

Pulkitg64 · 2025-07-29T10:08:45Z

Description

This is a draft PR to optimize HNSW graph merging during singleton merges. When merging a single segment with deletions, the current implementation reconstructs the entire graph with only live nodes, which is a time-consuming process. This PR avoids full graph reconstruction by dropping deleted nodes and renumbering the remaining live nodes.

TODOs:

Add specific unit tests
Benchmarks (luceneutil)

github-actions · 2025-07-29T10:09:36Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

jpountz · 2025-08-01T18:35:42Z

I don't feel qualified to do the review, but I agree with the motivation. I wonder if this optimization could be applied when there are more than 1 segment to merge by first applying deletions on the bigger segment to merge and then adding vectors from other segments?

msokolov

this seems like a promising direction! I left a bunch of comments. My main one is about whether we should do this on-heap to make it more flexible (eg so we could use it when merging multiple graphs, too).

msokolov · 2025-08-01T19:02:01Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java

@@ -69,6 +70,8 @@ public final class Lucene99HnswVectorsWriter extends KnnVectorsWriter {

  private static final long SHALLOW_RAM_BYTES_USED =
      RamUsageEstimator.shallowSizeOfInstance(Lucene99HnswVectorsWriter.class);
+  static final int DELETE_THRESHOLD_PERCENT = 30;


I'm curious if we have done any testing to motivate this choice? I guess as the number of gaps in the neighborhoods left behind by removing the deleted nodes in the graph increases we would expect to see a drop-off in recall, or maybe performance? but I don't have a good intuition about whether there is a knee in the curve, or how strong the effect is

msokolov · 2025-08-01T19:04:15Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java

+   * @throws IOException If an error occurs while writing to the vector index
+   */
+  private HnswGraph deleteNodesWriteGraph(
+      Lucene99HnswVectorsReader.OffHeapHnswGraph graph,


Could we change the signature to accept an HnswGraph?

msokolov · 2025-08-01T19:07:27Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java

+      // Count and collect valid nodes
+      int validNodeCount = 0;
+      for (int node : sortedNodes) {
+        if (docMap.get(node) != -1) {


we might be able to pass in the size of the new graph? At least in the main case of merging we should know (I think?)

msokolov · 2025-08-01T19:10:17Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java

+      }
+
+      // Special case for top level with no valid nodes
+      if (level == numLevels - 1 && validNodeCount == 0 && level > 0) {


if level ==0 and validNodeCount == 0 the new graph should be empty. I'm not sure how that case will get handled here?

In this case though (the top level would be empty) -- isn't it also possible that a lower level is empty?

if level ==0 and validNodeCount == 0 the new graph should be empty. I'm not sure how that case will get handled here?

This means 100% nodes are deleted, right? I think we will never reach this case as entry condition to this function is checking if deletes are less than 30%.

msokolov · 2025-08-01T19:12:58Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java

+        validNodeCount = 1; // We'll create one connection to lower level
+      }
+
+      validNodesPerLevel[level] = new int[validNodeCount];


I wonder if we could avoid the up-front counting, allocate a full-sized array and then use only the part of it that we fill up

msokolov · 2025-08-01T19:14:33Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java

+            Math.toIntExact(vectorIndex.getFilePointer() - offsetStart);
+      }
+
+      // Special case for empty top level


Perhaps we should special case the first empty level we findand make that the top level, unless it is the bottom level in which case the whole graph is empty

msokolov · 2025-08-01T19:16:14Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java

+
+  /** Writes neighbors with delta encoding to the vector index. */
+  private void writeNeighbors(
+      Lucene99HnswVectorsReader.OffHeapHnswGraph graph,


can we delegate to an existing method (maybe with a refactor) to ensure we write in the same format? EG what if we switch to GroupVarInt encoding - we want to make sure this method tracks that change

msokolov · 2025-08-01T19:22:00Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java

  @Override
  public void mergeOneField(FieldInfo fieldInfo, MergeState mergeState) throws IOException {
    CloseableRandomVectorScorerSupplier scorerSupplier =
        flatVectorWriter.mergeOneFieldToIndex(fieldInfo, mergeState);
    try {
      long vectorIndexOffset = vectorIndex.getFilePointer();
+
+      if (mergeState.liveDocs.length == 1


have you seen IncrementalHnswGraphMerge and MergingHnswGraphBuilder? They select the biggest graph with no deletions and merge the other segments' graphs into it. Could we expose a utility method here for rewriting a graph (in memory) to drop deletions, and then use it there?

Here we are somewhat mixing the on-disk graph format with the logic of dropping deleted nodes, which I think we could abstract out intoi the util.hnsw realm?

Just saw that class. I think this is a good idea. Will do it in next revision.

Pulkitg64 · 2025-08-04T06:37:31Z

I wonder if this optimization could be applied when there are more than 1 segment to merge by first applying deletions on the bigger segment to merge and then adding vectors from other segments?

@jpountz Yes good idea, let me try doing that only in this PR

this seems like a promising direction! I left a bunch of comments. My main one is about whether we should do this on-heap to make it more flexible (eg so we could use it when merging multiple graphs, too).

Thanks @msokolov . Yes I think that would be best way forward for this optimization. Working on it.

Pulkitg64 · 2025-08-05T12:49:57Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java

+
+      // Process nodes at this level
+      for (int node : sortedNodes) {
+        if (docMap.get(node) == -1) {


This is incorrect. Graph does not store docIDs but instead they store ordinal. Whereas docMap maps oldDocIds to new DocIDs.
The correct implementation is to create a map which maps old ords to new ords.

Will fix this in next revision.

…erging

github-actions · 2025-08-09T15:44:11Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

github-actions · 2025-08-09T15:48:56Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

…deletes

github-actions · 2025-08-09T16:37:38Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Pulkitg64 · 2025-08-11T10:42:03Z

The failing test is running fine on my macOS desktop and I have not changed anything in the related classes. Even with the same failing seed I am unable to reproduce the issue. Not sure why this test in failing in check.

TestBPReorderingMergePolicy > testReorderOnAddIndexes FAILED
    java.lang.AssertionError: Called on the wrong instance
        at [email protected]/org.apache.lucene.tests.codecs.asserting.AssertingKnnVectorsFormat$AssertingKnnVectorsReader.getFloatVectorValues(AssertingKnnVectorsFormat.java:140)
        at [email protected]/org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader.getFloatVectorValues(PerFieldKnnVectorsFormat.java:289)
        at [email protected]/org.apache.lucene.index.CodecReader.getFloatVectorValues(CodecReader.java:244)
        at [email protected]/org.apache.lucene.index.SlowCompositeCodecReaderWrapper$SlowCompositeKnnVectorsReaderWrapper.getFloatVectorValues(SlowCompositeCodecReaderWrapper.java:842)
        at [email protected]/org.apache.lucene.index.CodecReader.getFloatVectorValues(CodecReader.java:244)
        at org.apache.lucene.misc.index.BpVectorReorderer.computeDocMap(BpVectorReorderer.java:590)
        at org.apache.lucene.misc.index.BPReorderingMergePolicy$1.reorder(BPReorderingMergePolicy.java:138)
        at [email protected]/org.apache.lucene.index.IndexWriter.addIndexesReaderMerge(IndexWriter.java:3426)
        at [email protected]/org.apache.lucene.index.IndexWriter$AddIndexesMergeSource.merge(IndexWriter.java:3334)
        at [email protected]/org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:664)
        at [email protected]/org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:726)

Pulkitg64 · 2025-08-13T04:41:38Z

I created a pull request to my own repository, there tests are working fine: Pulkitg64#1

This issue looks to be transient and next commit should fix this.

Pulkitg64 · 2025-08-13T13:33:20Z

Adding some KnnPerfTestResults where I tried to simulate deletes while indexing docs. We are seeing consistent improvement in Indexing Time and Indexing Rate (except one weird case when we deleted 40% docs) without impacting recall.

Num Docs: 1MM
Max-Conn: 32
Beam-Width: 250
Quantize Bits: 32
Topk: 100

Experiment	Baseline			Candidate			% Change
% Deletes	Recall	Indexing Time	Indexing Rate	Recall	Indexing Time	Indexing Rate	Indexing Time	Indexing Rate
25	0.952	692	1443	0.955	576	1734	-17%	20%
30	0.952	581	1719	0.958	517	1932	-11%	12%
40	0.951	560	1782	0.945	553	1805	-1%	1%
50	0.96	446	2241	0.953	421	2371	-6%	6%
60	0.974	234	4265	0.972	208	4804	-11%	13%

msokolov · 2025-08-13T16:34:17Z

I am confused! This PR suddenly got so much simpler, which is great, but I feel like it dropped a few things that seemed important. EG we are no longer checking the largest graph to see if its delete % is below a threshold? Also I think we are now ignoring the various edge cases around upper-level graph layers possibly becoming empty?

Pulkitg64 · 2025-08-14T12:48:14Z

With MaxConn = 16, I am seeing much better results. But on a weird case with 25% delete I am seeing regression in indexing rate. Trying maxConn=8 in next benchmark run

Experiment	Baseline			Candidate			% Change
% Deletes	Recall	Indexing Time (s)	Indexing Rate (docs/s)	Recall	Indexing Time (s)	Indexing Rate (docs/s)	Indexing Time	Indexing Rate
25	0.922	453	2205	0.914	484	2063	7%	-6%
30	0.918	470	2125	0.94	279	3581	-41%	69%
40	0.903	494	2022	0.942	258	3867	-48%	91%
50	0.915	421	2372	0.946	223	4466	-47%	88%
60	0.934	301	3303	0.947	214	4658	-29%	41%

benwtrent · 2025-08-14T12:53:06Z

@Pulkitg64 what exactly are you benchmarking? It seems like the latest version of this PR does nothing to actually correct the graph nodes?

We should handle:

If layers get completely removed (do we promote new nodes?)
Removing deleted nodes and reconnecting the neighbors to their nearest non-deleted
Completely throwing away the graph if deletion percentage is above a certain threshold (the first commit of hits PR had that at 30%, I think it can maybe be as high as 50%).

benwtrent · 2025-08-14T12:55:53Z

Ah, maybe I don't fully grok the current impl. It seems like its doing the "largest graph" thing, but now its more clever and doing the initialized graph thing and that is where the deletes are being removed?

Pulkitg64 · 2025-08-14T13:03:46Z

I am confused! This PR suddenly got so much simpler, which is great,

Yeah, the initGraph implementation in InitializedHnswGraphBuilder.java simplifies lot of things for us as it is already providing support of creating OnHeapHnswGraph by passing OffHeapGraph from the older segment.

we are no longer checking the largest graph to see if its delete % is below a threshold?

Yes, in the first revision I added the arbitrary percentage without doing any testing. But this time, I wanted to see the impact of mergePolicy that kicks in when delete % is higher than certain threshold. I thought we may not need to add explicit check of checking delete % of largest graph because merge policy will automatically take care of this.

Also I think we are now ignoring the various edge cases around upper-level graph layers possibly becoming empty?

initGraph implementation takes care of it. In the implementation we start with top level and if there is no live node in that level the new entry node is never set and when we will iterate to next level with some live nodes there we will set the new entry node. Hence in this way we remove the risk of empty upper layer in the graph. But on the other there is still risk of completely deleting middle layer which we need to take care I believe.

Pulkitg64 · 2025-08-14T13:06:44Z

Ah, maybe I don't fully grok the current impl. It seems like its doing the "largest graph" thing, but now its more clever and doing the initialized graph thing and that is where the deletes are being removed?

That's right @benwtrent, we are skipping deleted nodes from the largest graph in the initGraph implementation.

benwtrent · 2025-08-14T13:09:34Z

@Pulkitg64 pretty damn clever ;). I gotta think through this. Intuitively, it SHOULD work, even for singleton merges

msokolov · 2025-08-14T15:35:36Z

It's fascinating that we actually see recall improving in many cases! Intuitively, I think when we merge more segments in we have an opportunity to patch up the holes left by the deleted docs, and maybe we somehow end up doing that in an even better way the second time around?

I do wonder what recall will look like for graphs with high deletion rates that are singleton-merged only? I wonder if we could test that with luceneutil by creating a single-segment index (with force-merge), deleting 50% of the docs, and then force-merging again?

Pulkitg64 · 2025-08-22T08:44:27Z

Based on @msokolov suggestion, I ran the benchmarks by simulating singleton merging. For this I indexed 1M docs and then force merge the segments then delete documents and then again force merge the segment.

I am seeing consistent improvement (about 50x speedup) in force merge time after deletes but also degradation in recall numbers (about 10%). It's probably because of disconnectedness issue (Let me try to find connectedness number of these graphs as well.)

Experiment	Baseline		Candidate		Change
Delete Pct	Recall	Force Merge Time (s)	Recall	Force Merge Time	Recall	Force Merge Time
50% delete	0.892	417.52	0.763	8.43	-14%	50x
40% delete	0.887	505.74	0.799	9.91	-10%	50x
30% delete	0.88	585	0.822	10.98	-7%	53x
20% delete	0.878	677	0.802	12.4	-9%	54x
10% delete	0.874	772.42	0.856	13.5	-2%	59x

benwtrent · 2025-08-25T19:27:47Z

It's probably because of disconnectedness issue (Let me try to find connectedness number of these graphs as well.)

I would think so. My gut is that we don't actually go through and "fixup" anything when there is just one graph. We just pick the biggest one, and since there are no more vectors to add, we just drop connections on the ground.

I would expect us to have to iterate through the graph and for every vector that is significantly disconnected, attempt to reconnect it with NNDescent starting at is original place in the graph (initializing with neighbor's neighbors if all its connections were removed).

Pulkitg64 · 2025-09-02T12:35:39Z

Thanks @benwtrent for the suggestion. For now, I am thinking that we can keep threshold of 10% deletes i.e. we will consider only those segments for merging without building graph from scratch for which delete % is less than or equal to 10%.

I can create a separate issue/PR for fixing the graph (reconnecting nodes) and try to increase delete threshold from 10%. Please let me know your thoughts

I re-ran the benchmark again with varying delete % till 15% and results are similar only.

Experiment	Baseline		Candidate		Change
Delete Pct	Recall	Force Merge Time (s)	Recall	Force Merge Time	Recall	Force Merge Time
0% delete	0.872	0	0.873	0
2% delete	0.871	831	0.866	13	-1%	64x
5% delete	0.873	810	0.863	13	-1%	62x
8% delete	0.874	783	0.861	13	-1%	60x
10% delete	0.874	773	0.857	13	-2%	60x
15% delete	0.876	730	0.848	12	-3%	60x

Also ran with different max-conn by keeping the delete % threshold as 10%:

Experiment		Baseline		Candidate		Change
Max Con	Delete Pct	Recall	Force Merge Time (s)	Recall	Force Merge Time	Recall	Force Merge Time
32	10% delete	0.874	773	0.857	13	-2%	60x
16	10% delete	0.811	550	0.793	12	-2%	45x
8	10% delete	0.696	360	0.675	12	-3%	30x

Raising a new revision with the threshold limit.

Avoid reconstructing HNSW graph during singleton merging

05029df

github-actions bot added the module:core/codecs label Jul 29, 2025

Pulkitg64 changed the title ~~Avoid reconstructing HNSW graph during singleton merging~~ Avoid reconstructing HNSW graph during singleton merges Jul 29, 2025

msokolov reviewed Aug 1, 2025

View reviewed changes

Pulkitg64 commented Aug 5, 2025

View reviewed changes

Made changes in HNSW util classes to accomodate deletes for segment m…

47c4d1c

…erging

github-actions bot added the module:core/hnsw label Aug 9, 2025

Restored codec files to original state

42e8c0a

github-actions bot removed the module:core/codecs label Aug 9, 2025

Fixed getNewOrdMapping function in ConcurrentHnswMerger for handling …

a6c5272

…deletes

Pulkitg64 changed the title ~~Avoid reconstructing HNSW graph during singleton merges~~ Avoid reconstructing HNSW graphs during segment merging. Aug 11, 2025

Avoid reconstructing HNSW graphs during segment merging. #15003

Are you sure you want to change the base?

Avoid reconstructing HNSW graphs during segment merging. #15003

Conversation

Pulkitg64 commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

github-actions bot commented Jul 29, 2025

Uh oh!

jpountz commented Aug 1, 2025

Uh oh!

msokolov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pulkitg64 commented Aug 4, 2025

Uh oh!

Pulkitg64 Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 9, 2025

Uh oh!

github-actions bot commented Aug 9, 2025

Uh oh!

github-actions bot commented Aug 9, 2025

Uh oh!

Pulkitg64 commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pulkitg64 commented Aug 13, 2025

Uh oh!

Pulkitg64 commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

msokolov commented Aug 13, 2025

Uh oh!

Pulkitg64 commented Aug 14, 2025

Uh oh!

benwtrent commented Aug 14, 2025

Uh oh!

benwtrent commented Aug 14, 2025

Uh oh!

Pulkitg64 commented Aug 14, 2025

Uh oh!

Pulkitg64 commented Aug 14, 2025

Uh oh!

benwtrent commented Aug 14, 2025

Uh oh!

msokolov commented Aug 14, 2025

Uh oh!

Pulkitg64 commented Aug 22, 2025

Uh oh!

benwtrent commented Aug 25, 2025

Pulkitg64 commented Jul 29, 2025 •

edited

Loading

Pulkitg64 Aug 5, 2025 •

edited

Loading

Pulkitg64 commented Aug 11, 2025 •

edited

Loading

Pulkitg64 commented Aug 13, 2025 •

edited

Loading

Pulkitg64 commented Sep 2, 2025 •

edited

Loading