Skip to content

BasePwCpInputGenerator: Improve parameters input validation#676

Draft
mbercx wants to merge 2 commits intoaiidateam:mainfrom
mbercx:fix/675/pwcp-parameter-validation
Draft

BasePwCpInputGenerator: Improve parameters input validation#676
mbercx wants to merge 2 commits intoaiidateam:mainfrom
mbercx:fix/675/pwcp-parameter-validation

Conversation

@mbercx
Copy link
Copy Markdown
Member

@mbercx mbercx commented Apr 18, 2021

Fixes #675

First working example of how we can add validation of the input
parameters of the Quantum ESPRESSO calculation jobs by converting the
INPUT-{}.XML files that are generated for the QE documentation into JSON
schemas that are then used in the validator of the parameters input.

The convert_to_json_schema() function in the calculations.schemas.utils.py
module is used to generate the calculations.schemas.pw.parameters.yaml
file, based on the calculations.helpers.INPUT_PW-6.4.xml file. This
yaml file is then loaded in the validator for the parameters input and
used to check if the parameters adhere to the prescriptions of the JSON
schema.

First working example of how we can add validation of the input
parameters of the Quantum ESPRESSO calculation jobs by converting the
INPUT-{}.XML files that are generated for the QE documentation into JSON
schemas that are then used in the validator of the `parameters` input.

The `convert_to_json_schema()` function in the `calculations.schemas.utils.py`
module is used to generate the `calculations.schemas.pw.parameters.yaml`
file, based on the `calculations.helpers.INPUT_PW-6.4.xml` file. This
yaml file is then loaded in the validator for the `parameters` input and
used to check if the parameters adhere to the prescriptions of the JSON
schema.
@mbercx
Copy link
Copy Markdown
Member Author

mbercx commented Apr 18, 2021

@sphuber this is a first working example of converting an INPUT_PW-{}.xml file into a JSON schema in .yaml format that can then be used to validate the input parameters. Here's the code that I used to generate the .yaml file:

import xml, yaml

from aiida_quantumespresso.calculations.schemas.utils import convert_to_json_schema

xml_path = 'INPUT_PW-6.4.xml'

with open(xml_path, 'r') as file:
    json_schema = convert_to_json_schema(xml.dom.minidom.parse(file))
    
with open('parameters.yaml', 'w') as file:
    file.write(yaml.safe_dump(json_schema, default_flow_style=False))

Currently the validation checks:

  • If the input namelist and tag are in the list of those of the JSON schema (additionalProperties: false).
  • If the input is of the correct type.
  • In case there are only a limited number of options: if the input is in the list of options (enum).

Example [1] from #675, where:

builder.pw.parameters['CONTROL']['calculation'] = 'GME stock'

Now gives:

ValidationError: 'GME stock' is not one of ['scf', 'nscf', 'bands', 'relax', 'md', 'vc-relax', 'vc-md']

Failed validating 'enum' in schema['properties']['CONTROL']['properties']['calculation']:
    {'enum': ['scf', 'nscf', 'bands', 'relax', 'md', 'vc-relax', 'vc-md'],
     'type': 'string'}

On instance['CONTROL']['calculation']:
    'GME stock'

Example [2] from #675, where:

builder.pw.parameters['CONTROL']['wf_collect'] = 'at the moon'

Now gives:

ValidationError: 'at the moon' is not of type 'boolean'

Failed validating 'type' in schema['properties']['CONTROL']['properties']['wf_collect']:
    {'type': 'boolean'}

On instance['CONTROL']['wf_collect']:
    'at the moon'

@mbercx
Copy link
Copy Markdown
Member Author

mbercx commented Apr 18, 2021

And deeper down the validation rabbit hole I go...

To do some basic testing of which schema accepts which parameters, I've added the generate_params_from_schema() function in the calculations.schemas.utils.py module. This is a rather rudimentary approach at generating an example dictionary that fills in every input for the schema with an acceptable value. So, while these parameters most likely don't make sense, they are "complete", and hence good for testing the jsonschema validation when additionalProperties: false.

Unfortunately, it turned out that e.g. taking the schema that corresponds to the latest version (6.7) doesn't validate the generated parameters by previous schemas (since inputs/options were removed) or vice versa (since inputs/options were added). So, I wanted to look into merging the JSON schemas into one that has all the inputs/options, that can be used as a default.

This lead me to the genson package, which can be used for exactly that purpose. By using the SchemaBuilder, one can simply add multiple schemas and genson will try to merge them into one that accepts data that conforms to each schema. However, the 'enum' keyword was not yet supported, so I took a stab at adding it, and it seems to be working nicely for my use case.

Merging the pw.x JSON schemas for major versions (5 and 6) works fine (i.e. there are no conflicts), but not between schemas that correspond to different major versions. This seems to be in part related to the fact that the 5.X versions of the XML input files for pw.x do not have the option (opt) variables defined, so the generated JSON schemas are limited to checking the type, since the enums are generated from the opt. This is similar for all cp.x XML input files, i.e. in QE v6.X the XML files also do not have options defined. The advantage here is that for the cp.x schemas I can generate one schema that successfully validates generated parameters from all schemas.

So far I've just added a few schemas, and chosen the all-encompassing schema for cp.x, and the one generated for major version 6.X for pw.x as the defaults used in the validate_parameters method. This seems to be working nicely, however someone trying to run v5.X for pw.x could run into validation errors.

A couple more points to hash out/work on:

  • Although I'm partial to only providing support to v6.0, I think it makes more sense in the end to move towards including version information in the Code (set as an extra, for now). This will not only allow us to make the validation more strict (and correct, in the case of running v5.X), but also come in handy when we want to indicate that the Code is SIRIUS-enabled (e.g. version == 'sirius-6.5'. This will need to be documented properly, though, or maybe enforced and added to the code setup step. Can we maybe demand more information based on the calculation plugin during the code setup?
  • Remove obsolete code from the BasePwCpInputGenerator._generate_PWCPinputdata() method. An example:
    # I put the first-level keys as uppercase (i.e., namelist and card names)
    # and the second-level keys as lowercase
    # (deeper levels are unchanged)
    input_params = _uppercase_dict(parameters.get_dict(), dict_name='parameters')
    input_params = {k: _lowercase_dict(v, dict_name=k) for k, v in input_params.items()}

    This code served no purpose anymore, since errors in the case with be caught by the validation.
  • The documentation also indicates which inputs are REQUIRED, so I can also add this to the JSON schema.
  • Add input validation that relies on other inputs to a top-level input validator. This might also include the JSON schema validation in case we need to check the version of the input Code.

@mbercx mbercx added this to the v4.0.0 milestone Jan 27, 2022
@mbercx mbercx removed this from the v4.0.0 milestone May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BasePwCpInputGenerator: Improve parameters input validation

1 participant