Releases · NCEAS/metadig-engine

23 Oct 20:14

jeanetteclark

v.3.2.1

577ce5e

v.3.2.1 Latest

Latest

Patch release to fix bug when retrieving old run documents.

Full Changelog: v.3.2.0...v.3.2.1

Assets 2

22 Oct 18:11

jeanetteclark

v.3.2.0

2126816

v.3.2.0

Minor release with the not so minor new ability to parse JSON metadata. This enables metadig-engine to run quality checks on json metadata (notably, schema.org).

Schema Updates

Implementing this required some major, but non-breaking, changes to metadig. The metadig schema was changed to include a more general expression element alongside xpath in the selector field. This prevents us from having to shoehorn the jq expressions needed to extract information from json metadata into the xpath field. This expression element has a syntax attribute describing what syntax the expression is written in. Thus, the more flexible expression could (and eventually, should) entirely replace xpath.

 <xs:complexType name="selector">
    <xs:sequence>
      <xs:element name="name" type="xs:string" />
      <xs:element name="xpath" type="xs:string" minOccurs="0" maxOccurs="unbounded" />
      <xs:element name="subSelector" type="tns:selector" minOccurs="0" />
      <xs:element name="namespaces" minOccurs="0">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="namespace" type="tns:namespace" nillable="true" minOccurs="0"
              maxOccurs="unbounded" />
          </xs:sequence>
        </xs:complexType>
      </xs:element>
      <xs:element name="expression" type = "tns:expression" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="namespaceAware" type="xs:boolean" />
  </xs:complexType>

Note that this change is a backwards compatible change from previous versions of the schema. xpath is optional and is still handled as previously. Because of the backwards compatibility, and difficulties in getting the JAXB class model to support multiple versions of the schema, this version of metadig-engine will automatically migrate forward any old schema versions found to the newest version.

Document Processing

In order to process json documents, we added a MetadataDialect interface into the existing suite of processing tools, along with a MetadataDialectFactory. This allowed us to make minimal changes to the calling code, which now instead of running:

XMLDialect xml = new XMLDialect(IOUtils.toInputStream(metadataContent, "UTF-8"));

runs:

MetadataDialect docDialect = MetadataDialectFactory.createDialect(sysMeta,
				IOUtils.toInputStream(metadataContent, "UTF-8"));

createDialect then calls the interface implementations XMLDialect or JSONDialect as appropriate, and additional document processing steps proceed from there. This infrastructure also enables additional metadata formats to be added if needed.

What's Changed

Feature support json by @jeanetteclark in #496

Full Changelog: v.3.1.4...v.3.2.0

Contributors

jeanetteclark

Assets 2

22 Aug 17:12

jeanetteclark

v.3.1.4

715514c

v.3.1.4

What's Changed

fixed a minor bug that caused metadata-only reports to not be run

Full Changelog: v.3.1.3...v.3.1.4

Assets 2

18 Aug 18:18

jeanetteclark

v.3.1.3

1476e13

v.3.1.3

What's Changed

Update helm charts to use bitnami legacy: DataONEorg/k8s-cluster#65
Run docker containers as the metadig user: #390

Full Changelog: v.3.1.2...v.3.1.3

Assets 2

16 Jul 22:41

jeanetteclark

v.3.1.2

96fee8c

v.3.1.2

What's Changed

Patch release for data quality suite by @jeanetteclark in #503

A couple of changes here:

Fix a bug where large objects returned from a solr query would not get read into memory because of a check of how many non-blocking bytes there were to be read. For large results, this returns 0 (still not sure why), but we figured out that we don't actually need it so we just took it out
Don't run data quality suite if there are no data pids. This helps memory and execution speed, since no python sub-processes are getting spun up to just return "no data objects found."
Add memory limits to all of the deployments (scorer, worker, scheduler, controller). This will help prevent us from taking down the whole k8s cluster

Full Changelog: v.3.1.1...v.3.1.2

Contributors

jeanetteclark

Assets 2

02 Jul 23:03

jeanetteclark

v.3.1.1

a01bcb8

v3.1.1

Patch release with minor updates for the data quality suite.

What's Changed

Bump org.postgresql:postgresql from 42.7.4 to 42.7.7 by @dependabot in #498
Make some improvements to worker dockerfile for data quality suite by @jeanetteclark in #500

Full Changelog: v.3.1.0...v.3.1.1

Contributors

jeanetteclark and dependabot

Assets 2

02 May 17:01

jeanetteclark

v.3.1.0

2124c40

v.3.1.0

This release enables metadig-engine to run data quality checks by:

giving metadig-engine configurable access to a metacat hashstore
passing data pids to the check dispatcher

Additional other bug fixes and improvements are listed below.

What's Changed

Fix dependabot warnings by @jeanetteclark in #441
Feature-464: HashStore-java Library Import & Junit5 Refactor by @doulikecookiedough in #467
Support data quality checks by integrating hashstore support by @jeanetteclark in #468
Bug-475: metadig-controller CORS Access-Control-Allow-Origin Issue by @doulikecookiedough in #476
Bug-473: Unable to Run Quality Suite Due to Incorrect Config by @doulikecookiedough in #477
Bump commons-io:commons-io from 2.15.1 to 2.17.0 by @dependabot in #452
Bump com.rabbitmq:amqp-client from 5.18.0 to 5.22.0 by @dependabot in #451
Bump org.postgresql:postgresql from 42.7.2 to 42.7.4 by @dependabot in #449
Bump org.renjin:renjin-script-engine from 0.8.2567 to 0.9.2726 by @dependabot in #446

New Contributors

@doulikecookiedough made their first contribution in #467

Full Changelog: v.3.0.2...v.3.1.0

Contributors

doulikecookiedough, jeanetteclark, and dependabot

Assets 2

06 May 17:11

jeanetteclark

v.3.0.1

95ce5a5

v.3.0.1

What's Changed

This is a patch release to improve performance and stability for postgres and rabbitmq.

Improve Postgres stability - #420

Shortly after deploying 3.0.0, a bug was found where postgres connections were getting overloaded when more than 100 workers were deployed. To fix this we modified the pgbouncer configuration slightly so that the max number of user connections and max number of db connections were the same, plus a few extra connections for the db to allow for superuser processes. Currently this number is set at 200, so that is the max workers that should be deployed currently.

Also in this release is a slight refactor to close database connections from the java client using a try-with-resources pattern to ensure connections are not stranded.

Recover RabbitMQ dropped connections

This was a minor but important change - sometimes an exception was thrown when trying to recover a connection because the channel was already closed. In the catch block that does the connection recovery, we removed the channel.close() and only closed the connection before reopening both.

Other minor improvements

#416
#419

Assets 2

08 Mar 23:18

jeanetteclark

v.3.0.0

896ec0c

v.3.0.0

What's Changed

Replace Jython with Jep to enable use of modern python in checks by @jeanetteclark in #399

Previously, the quality engine used the Java ScriptEngine to execute Python check scripts. The script engine instance used to run these scripts was based on a Jython interpreter. Although this worked fine, Jython is perpetually stuck at supporting python 2.7, which officially lost support from the Python foundation in 2020. Additionally, Jython does not support CPython libraries such as pandas and numpy. Being stuck in python 2.7, and not being able to use CPython, severely limits the capabilities of any python check that the engine could run.

Although several options were considered, ultimately we decided to use Jep, since it supports CPython libraries and works with a standard Java install. Along with replacing Jython with Jep in this release, the rest of the metadig ecosystem was also upgraded to support Python 3.x. This included releases for:

With the new support for CPython checks and a modern python version, this release paves the way for data quality checks to be implemented in metadig.

Note that this is a breaking change since metadig can no longer run Python 2.7, and mismatched versions of the various metadig components may result in unexpected errors.

Full Changelog: v.2.5.0...v.3.0.0

Contributors

jeanetteclark

Assets 2

15 Aug 21:21

jeanetteclark

v.2.5.0

6bd5e1f

MetaDIG Quality Engine 2.5.0

In this release:

upgrade to Java 17
fix a small bug in the stuck job monitor (#361)
resolve dependabot alerts
add new DataONE hosted repositories to quality and scoring tasks
update documentation and diagrams

Assets 2

Releases: NCEAS/metadig-engine

v.3.2.1

Uh oh!

v.3.2.0

Schema Updates

Document Processing

What's Changed

Contributors

Uh oh!

v.3.1.4

What's Changed

Uh oh!

v.3.1.3

What's Changed

Uh oh!

v.3.1.2

What's Changed

Contributors

Uh oh!

v3.1.1

What's Changed

Contributors

Uh oh!

v.3.1.0

What's Changed

New Contributors

Contributors

Uh oh!

v.3.0.1

What's Changed

Improve Postgres stability - #420

Recover RabbitMQ dropped connections

Other minor improvements

Uh oh!

v.3.0.0

What's Changed

Contributors

Uh oh!

MetaDIG Quality Engine 2.5.0

Uh oh!