[NoSQL] Ensure index elements with non-null values#4018
[NoSQL] Ensure index elements with non-null values#4018snazy wants to merge 5 commits intoapache:mainfrom
null values#4018Conversation
Within the current implementation, it is possible that the iterators of the `Index` API return elements with `null` values. This may cause NPEs. This change enforces that index elements returned by the `Index` API iterators can never yield `null` values. The actual fix is in `IndexSpi` using the new iterator base class. The only place where `null` values could "escape" from the index API was in the `.(reverse)iterator()` functions in `IndexSpi`. The iterator implementation has been updated to skip elements with `null` values. While the mentione change alone would have been sufficient, it seemed better to make the null-safety guarantees visible at the API level. Chosing `Map.Entry` as the type for index elements seemed to be logical at the time of writing. As both `Map.Entry.getKey()` and `Map.Entry.getValue()` are nullable, it makes it difficult to "see" potential non-nullable violations in the code base. To address this issue, the `Map.Entry` interface has been replaced in the API with a new `Index.Element` interface with `@Nonnull` annotated functions. The implementation internal and package-private interface `IndexElement` was renamed to `InternalIndexElement`, now implementing `Index.Element`. Background information: When an "embedded" serialized index becomes too big, it is spilled out to separately stored index stripes. The embedded index then tracks the changes on top of the spilled out index stripes. Such a change can be an update (a different non-null value) or a remove. In case of a removal, the value becomes `null`. Such removed elements must be visible within the index implementation, but never be exposed via the API.
2da3604 to
8565b48
Compare
.../persistence/api/src/main/java/org/apache/polaris/persistence/nosql/api/index/IndexElem.java
Outdated
Show resolved
Hide resolved
...pl/src/main/java/org/apache/polaris/persistence/nosql/impl/indexes/AbstractIndexElement.java
Show resolved
Hide resolved
...pl/src/main/java/org/apache/polaris/persistence/nosql/impl/indexes/InternalIndexElement.java
Outdated
Show resolved
Hide resolved
...e/impl/src/main/java/org/apache/polaris/persistence/nosql/impl/indexes/IndexElementIter.java
Outdated
Show resolved
Hide resolved
| return key().equals(other.key()) && Objects.equals(valueNullable(), other.valueNullable()); | ||
| } | ||
| if (o instanceof Index.Element<?> other) { | ||
| return key().equals(other.key()) && Objects.equals(valueNullable(), other.value()); |
There was a problem hiding this comment.
If the comparison is reversed and other happens to be an IndexElem, then it will use value() from this AbstractIndexElement... so it looks like comparison is not commutative 🤔
There was a problem hiding this comment.
Do we actually need polymorphic comparison for these classes?
Why not consider different sub-types as "not equal" all the time?
There was a problem hiding this comment.
The comparison is commutative. Yes, the "intentionally broken contract" by InternalIndexElement makes it a bit difficult to reason about.
Internally, the indexes implementation works only with instances of InternalIndexElement, aka AbstractIndexElement. You cannot put() an *Element instance to the index.
With this, call sites can get instances of the internal representation, but as Index.Element and never with a null value.
So from a call-site the comparison is commutative.
| var u = embedded.getElement(key); | ||
| if (u != null) { | ||
| return u.getValue() != null; | ||
| return u.valueNullable() != null; |
There was a problem hiding this comment.
nit: maybe add u.hasValue() and avoid nullable getters?
There was a problem hiding this comment.
Hm, I'd like to avoid any unnecessary calls with in this performance sensitive part.
…laris/persistence/nosql/impl/indexes/InternalIndexElement.java Co-authored-by: Alexandre Dutra <adutra@apache.org>
Within the current implementation, it is possible that the iterators of the
IndexAPI return elements withnullvalues. This may cause NPEs.This change enforces that index elements returned by the
IndexAPI iterators can never yieldnullvalues. The actual fix is inIndexSpiusing the new iterator base class. The only place wherenullvalues could "escape" from the index API was in the.(reverse)iterator()functions inIndexSpi. The iterator implementation has been updated to skip elements withnullvalues.While the mentione change alone would have been sufficient, it seemed better to make the null-safety guarantees visible at the API level.
Chosing
Map.Entryas the type for index elements seemed to be logical at the time of writing. As bothMap.Entry.getKey()andMap.Entry.getValue()are nullable, it makes it difficult to "see" potential non-nullable violations in the code base. To address this issue, theMap.Entryinterface has been replaced in the API with a newIndex.Elementinterface with@Nonnullannotated functions. The implementation internal and package-private interfaceIndexElementwas renamed toInternalIndexElement, now implementingIndex.Element.Background information: When an "embedded" serialized index becomes too big, it is spilled out to separately stored index stripes. The embedded index then tracks the changes on top of the spilled out index stripes. Such a change can be an update (a different non-null value) or a remove. In case of a removal, the value becomes
null. Such removed elements must be visible within the index implementation, but never be exposed via the API.