Skip to content

Commit 3a9a346

Browse files
authored
Merge pull request #2012 from dandi/enh-doi-draft
Design document for the Zenodo like DOI per dandiset
2 parents 17874de + 24ea6eb commit 3a9a346

File tree

1 file changed

+254
-0
lines changed

1 file changed

+254
-0
lines changed

doc/design/doi-generation-2.md

Lines changed: 254 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,254 @@
1+
# DOI for Draft Dandisets
2+
3+
Authors: Yaroslav O. Halchenko, Dorota Jarecka, Austin Macdonald
4+
5+
## Overview
6+
7+
This document describes an updated strategy for DOI management within the Dandi Archive.
8+
Upon creation, every public Dandiset will receive a **Dandiset DOI** that will represent the current draft and all future versions.
9+
Every public published version of a Dandiset will receive a **Version DOI**.
10+
11+
For example:
12+
- Dandiset DOI: `https://doi.org/10.48324/dandi.000027/`
13+
- Version DOI: `https://doi.org/10.48324/dandi.000027/0.210831.2033`
14+
15+
Dandiset DOI redirect will always refer to the DLP.
16+
17+
At creation the `Dandiset DOI` will be a DataCite `Draft DOI`.
18+
`Dandiset DOI` will remain a `Draft DOI` until there is a published version, at that time we will "promote" to a DataCite `Findable DOI`.
19+
For each published version there will be a `Version DOI` created as a `Findable DOIs`.
20+
21+
## The current approach
22+
23+
- [initial design doc](./doi-generation-1.md)
24+
- overall:
25+
- leave DOI absent upon dandiset creation
26+
- upon publication
27+
- inject fake DOI (but do not save) and validate
28+
- after validation, create a new `Version DOI` (function `create_doi`)
29+
- publish dandiset
30+
31+
### Issues with the Existing Approach
32+
33+
- [Stop injecting "fake" DOIs into draft dandisets](https://github.com/dandi/dandi-archive/issues/1709)
34+
- [Unpublished Dandisets display a DOI under `Cite As`](https://github.com/dandi/dandi-archive/issues/1932)
35+
36+
## Background
37+
38+
Initially proposed/discussed in
39+
40+
- [Create and maintain a "Findable" DOI for the Dandiset as a whole](https://github.com/dandi/dandi-archive/issues/1319)
41+
42+
and boils down to the adoption of approach of Zenodo of having a DOI which always points to the latest version of the record.
43+
[Zenodo uses the language](https://support.zenodo.org/help/en-gb/1-upload-deposit/97-what-is-doi-versioning) `Concept DOI` to mean a top-level DOI that references all versions, which we will refer to as `Dandiset DOI`.
44+
45+
DataCite allows for three types of DOIs ([DataCite](https://support.datacite.org/docs/what-does-the-state-of-the-doi-mean)):
46+
47+
- `Draft`. We do not use those.
48+
*Can be deleted, and they require only the identifier itself in order to be created or saved. They can be updated to either Registered or Findable DOIs. Registered and Findable DOIs may not be returned to the Draft state, which means that changing the state of a Draft is final.*
49+
- `Registered`. Like `Findable` but not indexed for search, so we do not use them.
50+
- `Findable`. Is the type we use for published dandisets.
51+
Requires to be valid (pass validation to fit the datacite schema) to be created.
52+
53+
## Proposed Solution
54+
55+
- For **Public dandisets**:
56+
- Upon creation:
57+
- mint a `Dandiset DOI` (a DataCite `Draft DOI`) `10.48324/dandi.{dandiset.id}` with *minimal metadata* entered during creation request (title, description, license)
58+
- URL should point be DLP `https://dandiarchive.org/dandiset/{dandiset.id}`
59+
- If minting a DOI fails, we need to raise exception to inform developers about the issue but proceed with the creation of the dandiset.
60+
- Upon updates to a draft dandiset metadata **prior to first publication**:
61+
- Update the datacite metadata of the `Draft DOI`, (leave as draft)
62+
- If validation fails, log error and continue
63+
- Upon deletion of a draft dandiset metadata **prior to first publication**:
64+
- Delete the `Dandiset DOI` (Draft) from Datacite
65+
- Upon **first publication** of a dandiset:
66+
- Mint a new `Version DOI` (Findable) (already done currently), ie `10.48324/dandi.{dandiset.id}/{version}`
67+
- Update `Dandiset DOI` metadata to match published version
68+
- promote `Dandiset DOI` (Draft) to `Findable DOI`
69+
- Upon updates to draft dandiset metadata **after the first publication**"
70+
- no-op. The `Dandiset DOI` metadata will match the most recent publication.
71+
- Upon deletion of a published dandiset version (`VersionViewSet.destroy`) :
72+
- "hide" the `Version DOI` (Findable) to `Registered DOI`
73+
- Upon deletion of a dandiset (`DandisetViewSet.destroy`):
74+
- "hide" the `Dandiset DOI` if `Findable` and delete if `Draft`
75+
- Upon **subsequent publications** of a dandiset:
76+
- Mint a new `Version DOI`
77+
- Update `Dandiset DOI` metadata to match published version
78+
- For **embargoed dandiset**:
79+
- Upon creation, no DOI is created.
80+
- Upon changes to embargoed dandiset metadata record, don't do anything.
81+
- Upon deletion of an embargoed dandiset: don't do anything.
82+
- Upon unembargoing dandiset:
83+
- Mint `Dandiset DOI` (Draft) with latest metadata
84+
85+
### Cautions
86+
87+
If DANDI_DOI_PUBLISH is false (default)
88+
- creation as `Findable` should be disabled
89+
- update to `Findable` and `Registered` should be disabled
90+
91+
If all DOI configuration options are not set:
92+
- all required options:
93+
- `DANDI_DOI_API_URL`
94+
- `DANDI_DOI_API_USER`
95+
- `DANDI_DOI_API_PASSWORD`
96+
- `DANDI_DOI_API_PREFIX`
97+
- DOIs CRUD through Datacite API should be entirely disabled
98+
- DOI (the string) should not be added to the version
99+
100+
101+
### Sequence Diagram
102+
103+
```mermaid
104+
sequenceDiagram
105+
participant User
106+
participant DandiArchive as Dandi Archive
107+
participant DataCite
108+
109+
Note over User,DataCite: Public Dandiset Creation
110+
User->>DandiArchive: Create public dandiset
111+
DandiArchive->>DataCite: Mint Dandiset DOI (Draft) (10.48324/dandi.{id})
112+
Note right of DataCite: Minimal metadata + DLP URL
113+
alt DOI minting successful
114+
DataCite-->>DandiArchive: Return Dandiset DOI (Draft)
115+
else DOI minting fails
116+
DataCite-->>DandiArchive: Error
117+
Note right of DandiArchive: Log error but continue dandiset creation
118+
end
119+
DandiArchive-->>User: Return dandiset with or without Dandiset DOI
120+
121+
Note over User,DataCite: Metadata Updates (Non-embargoed Draft)
122+
User->>DandiArchive: Update dandiset metadata
123+
alt No Dandiset DOI exists (previous mint failed)
124+
DandiArchive->>DataCite: Mint Dandiset DOI (Draft)
125+
DataCite-->>DandiArchive: Return Dandiset DOI (Draft)
126+
else Dandiset DOI exists
127+
DandiArchive->>DataCite: Update metadata of Dandiset DOI (Draft)
128+
alt Update successful
129+
DataCite-->>DandiArchive: Confirm update
130+
else Update fails
131+
DataCite-->>DandiArchive: Error
132+
Note right of DandiArchive: Log error but continue
133+
end
134+
end
135+
DandiArchive-->>User: Return updated dandiset
136+
137+
Note over User,DataCite: Embargoed Dandiset Handling
138+
User->>DandiArchive: Create embargoed dandiset
139+
DandiArchive-->>DandiArchive: No DOI created for embargoed dandisets
140+
DandiArchive-->>User: Return dandiset without DOI
141+
142+
User->>DandiArchive: Update embargoed dandiset
143+
DandiArchive-->>DandiArchive: No DOI updates for embargoed dandisets
144+
DandiArchive-->>User: Return updated dandiset
145+
146+
User->>DandiArchive: Unembargo dandiset
147+
DandiArchive->>DataCite: Mint Dandiset DOI (Draft)
148+
Note right of DataCite: DLP URL + current metadata
149+
DataCite-->>DandiArchive: Return Dandiset DOI (Draft)
150+
DandiArchive-->>User: Return unembargoed dandiset with DOI
151+
152+
Note over User,DataCite: Dandiset Publication
153+
User->>DandiArchive: Publish dandiset
154+
DataCite-->>DandiArchive: Return Version DOI (Findable)
155+
156+
alt Dandiset DOI is Draft (first publication)
157+
DandiArchive->>DataCite: Mint Version DOI (Findable) (10.48324/dandi.{id}/{version})
158+
DandiArchive->>DataCite: Update Dandiset DOI with version metadata
159+
DandiArchive->>DataCite: Promote Dandiset DOI to Findable
160+
DataCite-->>DandiArchive: Confirm update
161+
else Already Findable (already published at least once)
162+
DandiArchive->>DataCite: Update Dandiset DOI metadata
163+
DataCite-->>DandiArchive: Confirm update
164+
end
165+
Note right of DandiArchive: Dandiset DOI keeps URL pointing to DLP
166+
DandiArchive-->>User: Return published dandiset with both DOIs
167+
168+
Note over User,DataCite: Dandiset Deletion
169+
User->>DandiArchive: Delete dandiset
170+
alt Dandiset DOI is Draft
171+
DandiArchive->>DataCite: Delete Draft DOI
172+
DataCite-->>DandiArchive: Confirm deletion
173+
else Dandiset DOI is Findable
174+
DandiArchive->>DataCite: "Hide" DOI (Convert to "Registered")
175+
DandiArchive->>DataCite: Point DOI to tombstone page
176+
DataCite-->>DandiArchive: Confirm update
177+
end
178+
DandiArchive-->>User: Confirm deletion
179+
```
180+
181+
### Migration
182+
183+
A django-admin script should be created and executed to create a `Dandiset DOI` for all existing dandisets.
184+
185+
No DB migration will be needed, as no new field will be added to `Dandiset` model, and
186+
instead, the `Dandiset DOI` will be stored in the "draft" `Version`.
187+
188+
### Dandi Schema Changes
189+
190+
`dandi-schema` function `to_datacite` is currently only able to create a `Draft DOI` (`publish=True`) or create a `Findable DOI` (`publish=False`)
191+
192+
It will need to be extended to:
193+
- "publish" `Draft DOI` to `Findable DOI`
194+
- "hide" `Findable DOI` to `Registered DOI`
195+
- Produce `Dandiset DOI` and `Version DOI` (only does version DOI currently)
196+
197+
We will keep (and deprecate) the `publish` parameter, and add a new parameter `event` which is either:
198+
- (None): Draft DOI
199+
- `publish`: Findable DOI
200+
- `hide`: Registered DOI
201+
202+
In the current implementation, only published dandisets are given a DOI, so we are using the pydantic validation for `PublishedDandiset`.
203+
This is too restrictive for our case.
204+
Instead, we'll try `PublishedDandiset` first, then fallback to unvalidated.
205+
206+
## Alternatives Explored
207+
208+
#### Prevent Findable DOIs when not validated
209+
210+
If we fallback to unvalidated, we could prevent the DOI from becoming findable.
211+
Instead though, we've opted to just try to update the DOI via Datacite anyway and handle the API failure if it happens.
212+
213+
214+
### Creating DOIs for Embargoed Dandisets
215+
216+
We opted not to create DOIs for embargoed Dandisets because:
217+
- We own the prefix, and so there is no need to "reserve"
218+
- We should avoid sending any potentially secret metadata to a 3rd party, even if it is not publicly searchable.
219+
- If we were to create a DOI with fake metadata that probably would not have any value at all.
220+
- What the DOIs will eventually be upon publication is semantically determined, so the value can be used even prior to being "real".
221+
222+
We might reconsider, if decision would be made to expose metadata of Embargoed Dandisets for the purpose of discovery.
223+
224+
### Promoting Draft DOIs to Findable for Draft Dandisets
225+
226+
There might be some value in having a `Findable DOI` (Version DOI and/or Dandiset DOI) that points to the draft version of a Dandiset.
227+
This is because `Draft` DOI is not visible/usable by users.
228+
229+
However, if we promote the `Draft DOI` to `Findable` as soon as it is valid, and the user then change it to be invalid again, the DOI metadata will be wrong.
230+
We discussed annotating the DOI, ie "potentially incorrect metadata", but we ultimately decided that the messiness is not worth the value.
231+
232+
How Findable DOIs for Draft Dandisets would work upon deletion of a dandiset:
233+
- if DOI was a `Draft` DOI - just delete it as well.
234+
- if DOI was a `Findable` DOI - convert to `Registered` DOI (follows [datacite best practices](https://support.datacite.org/docs/tombstone-pages))
235+
- Also at the level of the DANDI archive itself we should provide tombstone page so URL is still "working" (#3211)
236+
- If no tombstone page support added, just adjusted URL in datacite record to point to https://www.datacite.org/invalid.html
237+
238+
## Concerns to keep in mind/address
239+
240+
- **Question to clear up**: what happens to `Draft DOI` if metadata record is invalid?
241+
- It seems to create one with no metadata, but does it update only the fields it knows about?
242+
- **Question to clear up** If we add to validation procedures to dandiset updates, (validation against datacite metadata record), we can report errors to the user so they can be addressed prior to attempted publication. May be we should validate only if no other errors (our schema validation) were detected to reduce noise, or just give a summary that "Metadata is not satisfying datacite model, fix known metadata errors first."
243+
- **TODO: figure out how to annotate Draft version, so it always says that it is a draft version and thus potentially not used for citation if that could be avoided**
244+
- We do not need to annotate `Draft DOI` metadata since it is not visible.
245+
- If the `Dandiset DOI` is visible on the Draft Dandiset page, we should consider changing the "Cite As" or add an additional field.
246+
- Zenodo's "Concept DOIs" are presented as "Cite all versions" but we didn't think this was clear enough.
247+
- We may want to include `Dandiset DOI` somewhere on published versions too, in addition to the `Version DOI` which we currently use.
248+
- The "Draft Dandiset" Version will be populated with `Dandiset DOI`, so this may not be necessary.
249+
- Should we somehow reflect interactions with DataCite in Audit log? Possible things to log:
250+
- `Dandiset DOI`
251+
- Success/Fail creation of `Draft DOI`
252+
- Success/Fail promotion of `Draft DOI` to `Findable DOI` (Expected to fail if metadata is incomplete)
253+
- `Version DOI`
254+
- Success/Fail creation of `Findable DOI`

0 commit comments

Comments
 (0)