Skip to content

Clarify the lifecycle of indexes with the lifecycle of a deal #1002

@jacobheun

Description

@jacobheun

The purpose of this issue is to clarify and discuss the various potential state changes for indexing over the lifecycle of a deal that has been marked for indexing. This is not intended to cover the existing lifecycle of indexing, but the desired lifecycle. Once solidified, we can create accompanying issues to resolve discrepancies in the current implementation to match the desired state.

Legend

  • 🚧 - Needs discussion & decision
  • 🍏 - Alignment reached

The Lifecycle of Indexing

🚧 Indexing a new Deal

Note: Not covering how deals are identified for indexing in this section as there is a separate effort to solidify requirements around that. See #689 and filecoin-project/notary-governance#666 for more details.

For discussion purposes, let’s assume that once the above issues are complete, there will be some way to identify if a specific deal should be indexed or not, and that there will be a mechanism to account for this for existing deals.

When a new deal has been successfully published, if an unsealed copy exists and the deal is marked for indexing, it should be immediately registered with the index provider/marked for indexing.

🚧 Deletion of an unsealed copy

When an unsealed copy is deleted today, indexes are not removed. There is currently support in the Network Indexers to include metadata on whether or not the data is unsealed, but it’s not being leveraged correctly today (all announced indexes are being marked as unsealed).

Screenshot of cid.contact displaying unsealed status

We need a mechanism to detect the removal of unsealed copies (as they can be rm’d manually). The section on Repairing Indexes below, speaks to how this might be accomplished. Upon detection of deletion we can perform one of the following actions (need to decide between the options):

Option 1 - Remove the indexes(recommended): When an unsealed copy is removed, as the unsealing process is a non trivial operation, we should assume the copy will not become available in a short time frame. As such, the local indexes should be removed and we should announce the deletions to the network indexers. This frees up space both locally and on indexers. If unsealed copies were expected to be created/deleted often, then this option might be less reasonable, but this is not the case today.

Option 2 - Update Index Metadata: When an unsealed copy is removed we can update the metadata of the indexes for that deal to specify that no unsealed copy exists. This would still allow discovery of the SP who has the content, but retrieval would not function without an unseal. The advantage of this option is that a client could pay the Unseal price to get the data, knowing who has it. However, it’s worth noting that retrieval flows requiring unsealing are not particularly clear and would need further work to likely become viable.

If this option is chosen, we may want to change the indexing logic of sector expiration and will definitely need to change how removed sectors are handled.

🚧 A sector is unsealed

When we detect a sector has been unsealed, and that sector is eligible for indexing, it should be registered with the index provider for reindexing (assuming unsealed deletion - option1 is selected).

🚧 Expiration of a sector

As long as the unsealed copy of the sector exists, the indexes should also exist. No changes should occur until the unsealed copy is removed.

🚧 Removal of a sector

Same as sector expiration, this should be a no-op, as index changes would be triggered from changes to the unsealed copy only.

If unsealed deletion option 2 is selected, removal of a sector when there is no unsealed copy, will require deletion of indexes and announcement to the network indexers.

🚧 Repairing Indexes

One of the issues facing retrieval reliability is index metadata getting out of sync with unsealed copies (or the lack thereof). There are several reasons this may be occurring, but it often requires manual intervention by SP’s to repair and the visibility into when this needs to happen is not clear. A proposal that has been discussed recently is to have an automatic repair job for indexing, to automatically ensure unsealed copies, eligible for indexing receive an integrity check and are repaired if there is an issue. This would NOT include automatic unsealing of data as this is a resource intensive process.

An extension of this proposal, given that unsealed copies may be deleted or created manually by SP’s, is to have new index creation, and repair all belong to this “repair” service. This service could be a background process that is continually repairing/registering/removing indexes with limited resource consumption. This could remove some operational overhead for common errors reported with retrievals. Specifics of how this could/should work can be flushed out in a followup issue.

Related Issues & Discussions

Metadata

Metadata

Type

No type

Projects

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions