-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture
The newton_chymistry web application is an XProc pipeline, which is hosted by a Java web Servlet called XProc-Z, which in turn is hosted in Apache Tomcat. The web application also uses an instance of Apache Solr as a search engine.
When the application receives an HTTP request from a browser, Tomcat invokes the xproc-z Servlet to handle the request. In turn the Servlet invokes an XProc pipeline, and passes it the details of the HTTP request. The pipeline is responsible for generating each HTTP response.
- XProc pipelines are stored in files with the extension
.xpl, in thexprocfolder. - XSLT transformations are stored in the
xsltfolder. - Figure images from the manuscripts are stored (broken down by MS identifier) in the
figurefolder. These files are transmitted directly to the browser without being transformed. - Other static resources, including icons and other images, JavaScript libraries, are stored in the
staticfolder. These files are transmitted directly to the browser without being transformed. - The
schemafolder contains a TEI ODD file derived from the TEI corpus, and a RelaxNG schema derived from the ODD. - The
p4folder contains TEI P4 files downloaded from the Xubmit P4 repository, along with several external entity files. - The
p5folder contains just TEI P5 files, either derived from the P4 files in thep4folder, or directly downloaded from the Xubmit P5 repository. - The root folder also contains a metadata schema definition called
search-fields.xmlwhich defines the Solr schema and the search and browse interface, as well as amenus.jsonfile which defines the site menus and the "site index" page.
In the chymistry web application the main XProc pipeline, called main, is defined in the file xproc-z.xpl. See the installation page for details on how the pipeline is specified.
The main pipeline examines each HTTP request and delegates it to one of a number of sub-pipelines, each of which handles a particular class of request.
As well as dispatching the requests to the sub-pipelines, the main pipeline is responsible for adding the global navigation and branding banners to HTML responses.
In the case of manuscript HTML pages, the pipeline calls several sub-pipelines and integrates the results: converting the P5 into HTML, performing hit-highlighting using Solr, inserting the image viewer, and converting annotations into popup HTML details elements.
The add-site-navigation pipeline is used as the last step on any pipeline which produces HTML. This pipeline transforms the output HTML by adding a global header and footer, including menus, and finally inserts the IU institutional page header.
The site menus are generated from the menus.json file.
This XProc file contains several generic and low-level utility pipelines, for serving static files, making HTTP responses, etc.
This XProc file contains pipelines responsible for converting TEI files from P4 to P5.
-
download-p4downloads the TEI P4 corpus from Xubmit to the P4 folder -
convert-to-p5converts all the P4 files in thep4folder into P5 and saves them in thep5folder -
transform-p4-to-p5transforms a single P4 file into P5, through a series of XSLT transformations
This XProc file contains pipelines for site administration.
-
admin-formgenerates an administrative user interface, containing buttons and links for invoking other pipelines to download TEI, perform format conversions, reindex, etc. -
download-p5downloads the TEI P5 corpus from Xubmit to the P5 folder -
download-bibliographydownloads the bibliography file from Xubmit to the P5 folder
This XProc file contains pipelines for analyzing the TEI corpus.
-
list-classification-attributeslists the values of "classification" attributes (rend,type, andplace) used in the TEI corpus -
sample-xml-textgenerates a "representative" sample TEI file by extracting one of every distinct piece of markup from the entire corpus -
list-attributes-by-elementgenerates a list of all the attributes used for a given element type -
list-elementsgenerates a list of all the elements used -
list-metadatagenerates a list of the document id and title metadata.
This XProc file contains the bulk of the application; mostly pipelines responsible for processing TEI P5 files in different ways.
-
update-schemapushes a new schema definition (fromsearch-fields.xml) to the Solr search engine -
reindexreindexes the TEI corpus as metadata records in Solr -
generate-indexerconverts thesearch-fields.xmlmetadata definition file into an XSLT transformation which can then be used to convert a TEI document into a Solr metadata record -
p5-as-solrextracts the search fields defined insearch-fields.xmlfrom a single TEI document into a Solr metadata record -
convert-p5-to-solrconverts a single TEI document into a Solr metadata record, including search fields defined insearch-fields.xmlas well as full text fieldsintroduction,diplomatic,normalized, and the search result fieldmetadata-summary. -
p5-as-iiifconverts a single TEI document into a IIIF manifest -
iiif-annotation-listgenerates a IIIF annotation list for a particular folio in a TEI P5 file -
bibliogaphy-as-htmlconverts the TEI bibliography file to HTML -
p5-as-htmlconverts a TEI P5 manuscript file to HTML -
p5-as-xmlserves a TEI P5 file verbatim, as XML -
list-p5generates a page listing of the TEI P5 files
Several pages in the site are specified as plain XHTML pages, stored in the html folder. The sub-pipeline html-page is used to display the contents of these pages. That pipeline attempts to load the requested page, and if the page is not found, displays a 404.
This XProc file contains pipelines which performs queries against the Solr search engine.
This pipeline is invoked when a user either clicks the "search" button or clicks on a facet value in the search form.
The facet values which appear on the search form are submit buttons, each of which has its own target URL containing the currently selected set of facets; this allows the user to incrementally specify a query by clicking a facet value which then is added to the set. However, this also means that the form must be submitted using the HTTP POST method (the GET method does not permit the target URL to contain its own parameters). In order to retain a bookmarkable or shareable URL at each stage of the browse process, the search pipeline includes a sub-pipeline which redirects these POST requests to equivalent GET requests in which the parameters are encoded in the URL.
When the pipeline receives a GET request, it parses the parameters in the request URL, and makes use of the field definitions in the search-fields.xml to generate a query to Solr, using Solr's JSON Facet API. The pipeline then formats the result of the Solr query into an HTML page which includes the results alongside the search and browse interface in which search field and facet values are set to the desired values.
This pipeline is used to add hit highlighting to HTML renditions of the P5 manuscripts. The pipeline is invoked from the main pipeline to post-process the HTML renditions of the P5. If the page URL does not include a highlight parameter, the pipeline simply copies the HTML unchanged. If a highlight parameter is present in the URL, it is interpreted as the text to highlight. The pipeline queries Solr to generate a list of "snippets" of the text, in which the highlighted text appears in context. The pipeline then searches the HTML page to find each snippet, generating HTML highlights using the HTML mark elements, and hyperlinks linking each mark element to the next and previous.
This pipeline does not perform Latent Semantic Analysis; it simply delegates all lsa requests to a back end server, and reformats the resulting HTML to include the site's global navigation.