Skip to content

SIARD spec SQL to XML possible flaw on unsigned datatypes #48

@solfeggietto

Description

@solfeggietto

Referring to eCH-0165 SIARD spec. v2.1 (02.07.2018) as is the latest published one from DILCISBoard\SIARD readme startpage. Section P_4.3-3 lists the SQL:2008 built-in data types converted to XML. As i understand it neither SQL:2008/SQL:1999 nor MSSQL supports/declares unsigned datatypes (=>0 in values).

However XML 1.0 and databases in Oracle and MySQL do support implementation and usage of unsigned datatypes. This leaves the SIARD 2.1 spec uncovered in how to define the SIARD extraction of such datatypes.

As a result the SIARD extraction may be successful or fail. For example may unsigned integer datatypes be transformed to the signed equivalient and the data may be correct still. However if an unsigned int or unsigned bigint is transformed to its corresponding signed int or signed bigint, there may be a data value overflow if the positive value is above the max signed valid because the unsigned datataype have twice as high range as the signed one as the absense of negative values adds to the unsigned positive range.

A fast test showed that SIARD Suite 2.1.127 transforms an unsigned bigint into a SQL:2008 signed bigint => XML xs:integer, while DBPTK 2.1 (2.6.4) [edited] transform an unsigned bigint into a SQL:2008 datatype "NUMERIC (20)" => XML xs:decimal.

From a preservation perspective we also need look into the SIARD producing tools, on what data tests are done while doing the actual transformation. Will a too large data value be detected and reported/logged as an error, or will data distortion or deletion occur (data wrong or missing in the resulting row of this table[n].xml

In general its better to have an extraction stop/skip data whil reporting the talbe or field/valus as an error than to silently do what seems to be a successful extraction, but that may be distorting or deleting data.

Metadata

Metadata

Labels

SIARD2.1Issues that relates to SIARD2.1 and not the CIT Specificationtools / implementationsIssues that relate to implementations of the SIARD format and not to the standard itself

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions