Skip to content

bebbi/dicom-curate

Repository files navigation

dicom-curate

Organize and de-identify DICOM header values and file hierarchies based on a provided configuration object.

⚠️ Disclaimer

This project is currently in a pre-1.0.0 state. APIs and behavior may change at any time without notice.

You're welcome to open issues, but please only do so if you're also willing to contribute a pull request.

Why

This provides an open configuration language and a ready-to-use library for modifying DICOM headers for the purpose of de-identification and organization.

The library can be used in a toolkit-agnostic way, because it provides access to functionality to modify decoded DICOM headers in "DICOM json" format.

Usage

Consuming Dicom-Curate

The build output includes:

  • An ESM build, generated by esbuild with proper CommonJS dependency handling
  • A UMD and a minified UMD build, generated by Rollup

They can be consumed as follows:

File Used by How to import / include
dist/esm/index.js Modern bundlers, ESM-aware tools import ... from 'dicom-curate'
dicom-curate.umd.js CommonJS, Node.js require('dicom-curate') or require('dicom-curate/umd')
dicom-curate.umd.min.js Browsers via CDN or <script> <script src=".../dicom-curate.umd.min.js"></script>

Use the unminified UMD build (/umd) is primarily intended for demos and debugging.

Examples

Converting a nested input folder structure containing DICOM files to a cleaned output folder destination (note: this uses a browser API only supported in Chrome and Edge browsers):

import { curateMany, OrganizeOptions } from 'dicom-curate'

const options: OrganizeOptions = {
  inputType: 'directory',
  inputDirectory, // input folder directory handle
  outputDirectory, // output folder directory handle
  curationSpec, // DICOM curation specification
  columnMapping, // csv file handle to add csv-based mapping
}

// Read input, map headers, write to well-structured output.
curateMany(options, onProgressCallback)

Alternatively, a list of Files is accepted:

const options: OrganizeOptions = {
  inputType: 'files',
  inputFiles, // list of `File` objects
  outputDirectory, // output folder directory handle
  curationSpec, // DICOM curation specification
  columnMappings, // csv file handle to add csv-based mapping
}

If outputDirectory is omitted, output Blobs will be passed to the onProgressCallback function instead.

You can also call curateOne directly and receive a promise with the mapped blob:

import { curateOne, extractColumnMappings } from 'dicom-curate'

// Data prep responsibility for optional table is with caller
const columnMappings = extractColumnMappings([
  { subjectID: 'SubjectID1', blindedID: 'BlindedID1' },
  { subjectID: 'SubjectID2', blindedID: 'BlindedID2' },
])

curateOne({
  fileInfo, // path, name, size, kind, blob
  mappingOptions: { curationSpec, columnMappings },
})

An example DICOM curation function:

import type { TCurationSpecification } from 'dicom-curate'

/*
 * Curation specification for batch-curating DICOM files.
 */
export function sampleBatchCurationSpecification(): TCurationSpecification {
  const hostProps = {
    protocolNumber: 'Sample_Protocol_Number',
    activityProviderName: 'Sample_CRO',
    centerSubjectId: /^[A-Z]{2}\d{2}-\d{3}$/,
    timepointNames: ['Visit 1', 'Visit 2', 'Visit 3'],
    // Folder "scan": the trial-specific/provider-assigned series name
    scanNames: ['3DT1 Sagittal', 'PET-Abdomen'],
  }

  return {
    // Review the required input folder structure (all DICOM files need minimally this folder depth)
    // This configuration depends on correct centerSubjectId, timepoint, scan folder names.
    inputPathPattern:
      'protocolNumber/activityProvider/centerSubjectId/timepoint/scan',

    additionalData: {
      // collect from a csv file. A client can use regex to validate the input.
      type: 'load',
      collect: {
        CURR_ID: hostProps.centerSubjectId,
        StudyDescription: hostProps.timepointNames,
        MAPPED_ID: /BLIND_\d+/,
      },
      // With this, can refer to mappings as parser.getMapping('blindedId')
      mapping: {
        // Using the CSV
        blindedId: {
          value: (parser) => parser.getDicom('PatientID'),
          lookup: (row) => row['CURR_ID'],
          replace: (row) => row['MAPPED_ID'],
        },
      },
    },

    version: '3.0',
    hostProps,

    // This specifies the standardized DICOM de-identification
    dicomPS315EOptions: {
      cleanDescriptorsOption: true,
      cleanDescriptorsExceptions: ['SeriesDescription'],
      retainLongitudinalTemporalInformationOptions: 'Full',
      retainPatientCharacteristicsOption: [
        'PatientWeight',
        'PatientSize',
        'PatientAge',
        'PatientSex',
        'SelectorASValue',
      ],
      retainDeviceIdentityOption: true,
      retainUIDsOption: 'Hashed',
      retainSafePrivateOption: 'Quarantine',
      retainInstitutionIdentityOption: true,
    },

    modifyDicomHeader(parser) {
      const scan = parser.getFilePathComp('scan')
      const centerSubjectId = parser.getFilePathComp('centerSubjectId')

      return {
        // Align the PatientID DICOM header with the centerSubjectId folder name.
        PatientID: centerSubjectId,
        // This example maps PatientIDs based on the mapping CSV file.
        // PatientID: parser.getMapping('blindedId'),
        PatientName: centerSubjectId,
        // Align the StudyDescription DICOM header with the timepoint folder name.
        StudyDescription: parser.getFilePathComp('timepoint'),
        // The party responsible for assigning a standard ClinicalTrialSeriesDescription
        ClinicalTrialCoordinatingCenterName: hostProps.activityProviderName,
        // Align the ClinicalTrialSeriesDescription DICOM header with the scan folder name.
        ClinicalTrialSeriesDescription: scan,
      }
    },

    outputFilePathComponents(parser) {
      const scan = parser.getFilePathComp('scan')
      const centerSubjectId = parser.getFilePathComp('centerSubjectId')

      return [
        parser.getFilePathComp('protocolNumber'),
        parser.getFilePathComp('activityProvider'),
        centerSubjectId,
        parser.getFilePathComp('timepoint'),
        scan + '=' + parser.getDicom('SeriesNumber'),
        parser.getFilePathComp(parser.FILEBASENAME) + '.dcm',
      ]
    },

    // This section defines the validation rules for the input DICOMs.
    // The processing continues on errors, but errors will have to be fixed
    // or reviewed between the parties.
    errors(parser) {
      return [
        // File path
        [
          'Invalid study folder name',
          parser.getFilePathComp('protocolNumber') !== hostProps.protocolNumber,
        ],
        // DICOM header
        ['Missing Modality', parser.missingDicom('Modality')],
        ['Missing SOP Class UID', parser.missingDicom('SOPClassUID')],
      ]
    },
  }
}

DICOM Conformance Notes

dicom-curate

  • does not use an Encrypted Attributes Sequence
  • does not anonymize burnt-in information or modify PixelData
  • populates the PatientIdentityRemoved attribute with YES
  • populates the LongitudinalTemporalInformationModified attribute per DICOM PS3.15E
  • populates the DeidentificationMethod attribute with information about this README
  • populates the DeidentificationMethodCodeSequence with the CID7050 codes of provided options, per PS3.15E
  • keeps only the following in File Meta Information: 'FileMetaInformationVersion', 'ImplementationClassUID', 'ImplementationVersionName', 'MediaStorageSOPClassUID', as well as setting the 'TransferSyntaxUID' to 'Explit little Endian', and 'MediaStorageSOPInstanceUID' to the correct SOP instance UID.
  • cleans sequences ('SQ') by recursively applying the de-identification rules to each Dataset in each Item of the Sequence.
  • uses an allow-list approach, by removing everything not defined in PS3.06 or handled in PS3.15E1.1.
  • identifies and removes additional ID attributes beyond PS3.15E1.1 by parsing PS3.06 and finding all attributes ending on "ID(s)", but not UID(s) that are not defined in PS3.15E. This ID list is defined in "src/config/dicom/retainAdditionalIds.ts", and a few of them are manually annotated to be retained if the "retain device identifier option" is activated.
  • keeps the 'EncapsulatedDocument' attribute if modality is "DOC", unless overridden
  • keeps the 'VerifyingObserverSequence' if modality is SR, unless overridden
  • allows the users to describe all cleaning configurations in the curationSpec file
  • implements the following PS3.15E options:
    • 'retainDeviceIdentityOption': Keeps the attributes marked as 'K' and performs the default action on all other attributes
    • 'cleanDescriptorsOption' by removing all description and comment Attributes except those comment attributes explicitly listed in the cleanDescriptorExceptions list.
    • 'retainLongitudinalTemporalInformationOptions': this considers all temporal attributes (DA, TM, DT), as described as a possible approach in PS3.15E. Possible values are 'Full' (keep all temporal info intact), 'Off' (remove all temporal attributes or add defaults per PS3.15E), or 'Offset' (move all temporal attributes by a duration. An ISO-8601 compliant duration dateOffset parameter must be passed).
    • 'retainDeviceIdentityOption': true or false. If true, overrides retainLongitudinalTemporalInformationOptions for the respective attributes to keep.
    • 'retainUIDsOption': 'On', 'Hashed'.
      • If 'On', maintain all UIDs.
      • If 'Off', replaces instance UIDs with arbitrary new UIDs, maintaining referential integrity within a single run.
        • maximum protection
        • only maintains referential integrity within a run
        • do not use for de-identifying data in multiple batches
      • If 'Hashed', creates a new UID using an using a decentrally repeatable, hash-based method.
        • maintains referential integrity even if de-identifying data in separate, or decentralized, batches
        • use if the risk of re-identifying by UID is not bigger than the risk of re-identifying by PixelData
        • do not use if you want to specifically protect UIDs from an auxiliary knowledge attack, e.g. an attacker that knows possible input UIDs
      • For compatibility, the 'Off' option is now treated the same way as 'Hashed'.
      • There are more instance UIDs in part PS3.06 than described in PS3.15E for protection, therefore this option identifies the following uids for protection: 1. All instance UIDs per PS3.15E, 2. Any additional UIDs with a value not well-known in DICOM, per table PS3.06A (Registry of DICOM Unique Identifiers). This protects instance UIDs but also private class UIDs, which is intentional.
    • 'retainSafePrivateOption': 'Quarantine' or 'Off'. If 'Quarantine', keeps all private tags but creates a quarantine log for manual review
    • 'retainInstitutionIdentityOption': true or false
  • does not currently clean structured content

About

Organize and de-identify DICOM part 10 files per a configuration.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 7