Skip to content

Conversation

@MyPyDavid
Copy link
Member

@MyPyDavid MyPyDavid commented Nov 4, 2024

  • this PR adds the publication workflow with file upload to Zenodo. The export format and view can be selected in the form.
  • the record id is stored in the project.value with a configurable attribute (eg. https://rdmorganiser.github.io/terms/project/metadata/publication/zenodo/concept_record_id)
    • setting keys: publish_record_id_attribute_prefix and publish_record_id_attribute_key
  • this record_id is used to create new versions of the zenodo record (an existing publication will be updated with newer versions).
  • added support for InvenioRDM API
    • setting keys: zenodo_url (the instance url), zenodo_auth_scope (need to set the scope to user:email )

refactoring

  • moved classes into metadata, forms.py and methods into utils.py

@MyPyDavid MyPyDavid requested a review from cpfaff November 4, 2024 18:37
@MyPyDavid
Copy link
Member Author

Hi @cpfaff , do you want to check out this branch maybe?
Ive included your changes but also did a lot of refactoring again, hope that it's clear enough, we could discuss it in a meeting.

@MyPyDavid MyPyDavid requested review from jochenklar and removed request for cpfaff December 11, 2024 13:07
@MyPyDavid
Copy link
Member Author

maybe it would be better to try and merge this PR before adding other parts for the REST API to it.
I've asked @jochenklar for the Review

@MyPyDavid
Copy link
Member Author

can we merge this? @jochenklar

Copy link
Member

@jochenklar jochenklar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I finally got some time to look at the plugin, sorry for the long wait. I guess it works (almost), but I have some remarks regarding the architecture.

from rdmo.projects.exports import Export


class ZenodoMetadataExport(Export):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is a bit misleading since this is not an export plugin, or at least it is not sopposed to be used independently, right? I also think it should not inherit Export. Maybe ZenodoMetadataBuilder, since this is how it is used. It could also be a mixin for the export plugins or part of the base class, depends on why exaclty this pattern was chosen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is more like a mixin and does not need Export, it is only using these self.get_text and self.get_values methods. I think the base class was getting too big that's why I placed it in a new one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now refactored this Metadata stuff into a metadata package and some builder classes. These should have no external dependencies besides the settings, the project or snapshot related things are provided by the caller of these builders in the get_post_data method.
It has some duplication with all these field names but maybe its cleaner and clearer now?

'metadata': self._filter_empty_values(metadata)
}

def _filter_empty_values(self, metadata: Dict[str, Any]) -> Dict[str, Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is more readable than:

a = settings.A
if a:
    metadata['a'] = a

but ok...

@@ -0,0 +1,83 @@
from django.http import HttpResponse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some missing newlines between the methods in this file.


def render_project_views(project, snapshot, attachments_format, view=None):

if view is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this for debugging, if yes, it should be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok it seems I need the view for the plugin to work. Maybe this should be a template whithin the plugin?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so that the user can customize with a custom template? Maybe the view should be selected in the initial form as well by the user to make it easier to customize?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ive added a selection for the view and export_format to the form.

@MyPyDavid MyPyDavid self-assigned this Jul 24, 2025

record_versions_url = self.validate_record_id_from_project_value_at_zenodo()
# TODO, currently the authentication can get stuck when trying out the dataset export
# first and this one afterwards, a 403 needs to be handled in the Export class.
Copy link
Member Author

@MyPyDavid MyPyDavid Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have one problem with the authentication sometimes. When I first export the dataset and afterwards try to publish the snapshot then I get a 403 on the post call (and the Oauth Error template with None as message) . Could this be because of the different callback urls that are set for each export provider and that the token is then invalid? I can only get out of it by closing the browser.
Should these plugins not just have a single authentication and callback uri? Or does it need to handled in the https://github.com/rdmorganiser/rdmo/blob/79917de8dfdb8be0c988bc72f1352b307ce03cb1/rdmo/services/providers.py#L62 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this issue zenodo/zenodo#2168 is related somehow?

Copy link
Member Author

@MyPyDavid MyPyDavid Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm getting another authentication error ERROR: callback error: b'{"error": "invalid_grant"}' (400) after clicking "Authorize application" on the https://sandbox.zenodo.org/oauth/authorize website. Maybe it's similar to the issue reported in zenodo/zenodo-rdm#878.
After clicking around in the debug tools,and browsing back and forth, I now get another error WARNING: post error: b'{"status": 400, "message": "Invalid value e."}' (400). It seems that I had a mistake in my metadata after all and was sending e.g. 'languages': [{'id': 'e'}] ).
After fixing that the request works again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have one problem with the authentication sometimes. When I first export the dataset and afterwards try to publish the snapshot then I get a 403 on the post call (and the Oauth Error template with None as message) . Could this be because of the different callback urls that are set for each export provider and that the token is then invalid? I can only get out of it by closing the browser. Should these plugins not just have a single authentication and callback uri? Or does it need to handled in the https://github.com/rdmorganiser/rdmo/blob/79917de8dfdb8be0c988bc72f1352b307ce03cb1/rdmo/services/providers.py#L62 ?

I've added a post_with_retry method that pops the access_token and re-tries the self.post(...) after it failed with an "OAuth error". The user just needs to re-authenticate but it seems to work.
I think an extra callback post_failure should be added to https://github.com/rdmorganiser/rdmo/blob/79917de8dfdb8be0c988bc72f1352b307ce03cb1/rdmo/services/providers.py#L67 in parallel to post_success so that it can be implemented and adapted by a plugin. Or was there a problem with storing/retreiving the access token?

@MyPyDavid MyPyDavid requested a review from jochenklar July 28, 2025 15:38
@jochenklar
Copy link
Member

There is still a problem. For some reason, the stored zenodoid is one lower than the actual id.

@jochenklar
Copy link
Member

Also, I am not sure creating the attribute in the Plugin is a good idea. In any case the URI prefix should not be hardcoded.

@MyPyDavid
Copy link
Member Author

There is still a problem. For some reason, the stored zenodoid is one lower than the actual id.

This is not a problem but actually a feature. After initially creating a record and uploading a file, Zenodo returns a conceptrecid (https://developers.zenodo.org/#quickstart-upload) or in InvenioRDM the internal Concept version PID (https://inveniordm.docs.cern.ch/reference/metadata/#internal-pids-id-pid-parentid).

From https://inveniordm.docs.cern.ch/operate/customize/dois/#parent-or-concept-dois:

By default InvenioRDM will create two DOIs when an initial record is published, and create one DOI each time a new version of the record is published. The first DOI is the version DOI, which represents the specific record that is published. The second DOI is the parent DOI, which represents the concept of the record and will always resolve to the latest version. This feature has been implemented in Zenodo for many years, and the concept DOI enables researchers to cite something that won't change when they make changes to their records.

So that concept id is used when publishing another rdmo snapshot to the previously created record and thereby creating a new version in that record.

ctpfaff and others added 15 commits August 6, 2025 10:48
* Collects metadata for the project snapshot
* Creates snapshot questions based document
* Creates zenodo deposit, adds metadata, adds the file
  and publishes the deposit.
Refactored the Zenodo Export Provider to improve code quality, reduce
duplication, and enhance error handling.
Signed-off-by: David Wallace <[email protected]>
Signed-off-by: David Wallace <[email protected]>
Signed-off-by: David Wallace <[email protected]>
Signed-off-by: David Wallace <[email protected]>
@MyPyDavid MyPyDavid force-pushed the add-publication-and-versioning branch from d6acdf2 to 627f8f5 Compare August 6, 2025 08:54
Signed-off-by: David Wallace <[email protected]>
@MyPyDavid
Copy link
Member Author

MyPyDavid commented Aug 6, 2025

Also, I am not sure creating the attribute in the Plugin is a good idea. In any case the URI prefix should not be hardcoded.

yes, Ive made it configurable. I've also tested against the sandbox https://inveniordm.web.cern.ch/ instance and made it work with a configurable setting for the zenodo_auth_scope.

Now in the readme:

    'zenodo_url': 'https://zenodo.org',  # optional, default https://zenodo.org , or your own InvenioRDM instance url
    'zenodo_auth_scope': 'deposit:write',  # optional, default 'deposit:write' or 'user:email' for InvenioRDM
    'publish_record_id_attribute_prefix': 'https://rdmorganiser.github.io/terms',  # optional, default is shown here
    'publish_record_id_attribute_key': 'project/metadata/publication/zenodo/concept_record_id',  # optional, default is shown here    

Now I've rebased and it's ready I think!

@MyPyDavid MyPyDavid requested a review from jochenklar August 6, 2025 08:57
Copy link
Member

@jochenklar jochenklar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, we need to talk about this again. The current classes in metadata prevent reuse and customization.



@dataclass
class ZenodoMetadata:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This model can be kept, maybe add a function get_metadata instead of get_post_data

@MyPyDavid
Copy link
Member Author

Sorry, we need to talk about this again. The current classes in metadata prevent reuse and customization.

I went further down the metadata rabbit hole and tried to implement the complete schemas for the payload (incl metadata) send to the APIs with attrs classes. Is it customizable enough like this??

identifier: str | None = None

@attrs.define
class InvenioMetadataV6:
Copy link
Member Author

@MyPyDavid MyPyDavid Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the Invenio metadata

@MyPyDavid MyPyDavid requested a review from jochenklar October 13, 2025 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants