Skip to content

Conversation

@juliaflanders
Copy link
Contributor

@juliaflanders juliaflanders commented Oct 20, 2023

Made changes to the ODD file to accommodate the use of <biblStruct> in DHQ:

  • reinstate <biblStruct> and its children
  • add namesdates module
  • add necessary attributes

I have put a test file temporarily at articles/999998/test_with_biblStructs.xml, which can be deleted once we're happy with the schema changes.

Made changes to the ODD file to accommodate the use of <biblStruct> in DHQ:
--reinstate <biblStruct> and its children
--add namesdates module
--add necessary attributes
@juliaflanders juliaflanders requested a review from sydb October 20, 2023 21:43
Copy link
Contributor

@sydb sydb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @who and @level comments should be addressed before merge. Do you want me to do that or do you?

<moduleRef key="textstructure"
except="argument byline closer div1 div2 div3 div4 div5 div6 div7 docAuthor docDate docEdition docImprint docTitle imprimatur opener titlePage titlePart"/>
<moduleRef key="figures"/>
<moduleRef key="namesdates" include="persName orgName placeName surname forename"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously necessary (given that our <biblStruct>s use <persName> with <surname> and <forename> children). But I suspect you don’t want these elements showing anywhere except inside <biblStruct>. If so, we need to either

  1. Remove these elements from their common classes (e.g., <persName> from model.nameLike.agent) and add them back to content model of places the are actually needed (e.g., <author>); OR
  2. Add a Schematron constraint that whines if one of these is outside a <biblStruct>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Does not have to be done on this PR, of course.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per @juliaflanders go with #​2, just a <constraintSpec> will do as we do not expect this to happen much if at all.

<elementSpec ident="note" mode="change" module="core">
<classes mode="change">
<memberOf key="model.common" mode="add"/>
<memberOf key="model.imprintPart" mode="add"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that in DHQ <note> is not a member of model.global, and thus is not already in the content model of <imprint>, I think this is the right way to do this. But my instinct is that this information would be better served by a <ptr> (or perhaps <idno>) that is a direct child of <monogr>, instead. That allows the URL itself to be on the @target attribute, and thus be checked for syntactic correctness (although to be fair, that does not say much), and allows a @type to classify what the pointer is to. Furthermore it is what TEI-in-Libraries recommends as a best practice.

<desc>Provides a classification of the bibliographic item</desc>
<datatype>
<dataRef key="teidata.enumerated"/>
</datatype>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume we are planning to add a controlled vocabulary at a later date, yes?

@juliaflanders
Copy link
Contributor Author

juliaflanders commented Oct 21, 2023

Syd, thank you for the swift review! One thing that may not have been clear here is that all of our <biblStructs> will be autogenerated by Zotero, and we can't control their encoding. So the test file I included with this branch is sort of a reference format that the schema needs to match. I'm not certain but I think one or two of your suggestions above (on ll. 1707 and 1998) may have presumed that we could choose a more elegant encoding solution.

With that proviso--i.e. as long as the resulting schema will validate the biblStructs in the test file--please do feel free to go ahead and make any modifications to the ODD that you think are best! And thanks again.

 * typo fix: @when-iso → @when
 * restore title/@Level by deleting the deletion, rather than adding new attr
@sydb
Copy link
Contributor

sydb commented Oct 22, 2023

Minor mods made (and pushed to this branch).

As for the format of the <biblStruct> itself, with respect, I don’t think the assertion that our data format is from Zotero is quite correct. I think the Zotero-generated bibliographic citations are tucked (by TEIGarage) into the XML as JSON inside a processing instruction. Our program common/xslt/convert_tei2dhq.xsl then converts that JSON into a <biblStruct>.[1] The string "<note" only occurs in that file once, and it is the spot that generates the URL as a <note>. Just moving the entire <xsl:if> a few lines down (after the </imprint> but before the </monogr>) and changing the <note> to a <ptr type="original" target="{$zotero-item-map?URL}"/> would do the trick.

Note
[1] Quite cleverly, BTW, @amclark42 did a really nice job. It’s a bit of a pain because all of the information is included at every citation, but we want just a small snippet of info at the citation point, and the whole thing in the back matter.

@juliaflanders
Copy link
Contributor Author

juliaflanders commented Oct 23, 2023

I should have put it more precisely (and also realize this is something to coordinate with @amclark42): in many cases, our <biblStructs> will come directly as an export from Zotero (authors will export from their Zotero library as TEI and send us the results to be pasted in). So we would either need to:

  • have our own generated <biblStruct>s match what Zotero exports
  • be prepared to alter the generated <biblStruct>s to match the ones we generate (i.e. following some encoding we prefer for some reason)
  • be prepared to accommodate two different <biblStruct> encodings

I think @amclark42 's thoughts on which makes most sense are likely more relevant than mine; I feel like that third option on principle seems inelegant but the differences might be very small. I prefer to avoid option 2 because it would add a step for the encoders.

@amclark42
Copy link
Contributor

@sydb Echoing @juliaflanders, the approach we're taking in the XSLT is to use Zotero's processing instructions to produce <biblStruct>s that look like the ones the author would give us if they exported TEI <biblStruct>s from Zotero. This way, regardless of whether the metadata came from an export or the Word plug-in, Biblio and the DHQ display XSLTs should be able to parse them.

To be honest, I'm right there with you — I'd prefer <idno>s or <ptr>s too, and I'd really love it if those <note>s were placed outside the <monogr>. When Julia and I first started planning out this workflow, I cared enough about this to look into Zotero's export process, to see if I could suggest changes to it. We totally can! Zotero's TEI translator is right there on GitHub. The code is in Javascript, so I don't feel it's worth my time to try to fix things and submit a PR. But submitting a GitHub issue is still an option! I'm not up to taking the lead on that but I'll be happy to cosign if you decide to.

@amclark42
Copy link
Contributor

(That said, I think I've now spotted a few places where my translation isn't matching up with Zotero. Sigh. Back to it.)

@sydb
Copy link
Contributor

sydb commented Oct 25, 2023

More than happy to take the lead on a Zotero issue!
So I think this PR can be merged as it is now, with the realization that we may want further updates down the line.
@juliaflanders — Let me what your thoughts are on <persName> (whether schema should restrict it except for within <biblStruct>, and if so, whether that restriction should be RELAX NG or Schematron).
@amclark42 — Let me know when you have finished updated the translation; can you update the articles/999998/test_with_biblStructs.xml file, too?

@sydb
Copy link
Contributor

sydb commented Oct 25, 2023

See zotero/translators#3171, if interested.

@juliaflanders juliaflanders changed the base branch from main to encoding_workflow September 29, 2025 14:34
@jawalsh
Copy link
Contributor

jawalsh commented Sep 29, 2025

Some outstanding issues:

  1. persName/orgName/placeName, etc. issue. Syd is writing Schematron to alert us if these appear outside biblStruct
  2. Waiting on Zotero, to see if they are going to fix note[@type = 'url']. If they won't fix, we will write XSLT to convert these note elements to ptr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants