Skip to content

Always make the last separators mandatory #2

@cipriancraciun

Description

@cipriancraciun

First of all, given there is no clear specification, I interpret the current USV as described in issue #1.

Thus, my suggestion is to make the unit / record / group / file separators mandatory at the end of each such block.

The reasons are:

  • many parsers would most likely be lenient and just ignore a RS that is immediately followed by a GS or FS; (this plays into the next ones;)
  • in security there is the rule of "canonical representation", thus especially if one were to sign an USV file, there should be a canonical representation (enforced by the parser), so that an attacker can't just fill the file with separators that a lenient parser would just ignore;
  • at the moment the empty string is a valid USV; how should it be interpreted? as a single file, with a single group, with a single record, with an empty field (this would result from my interpretation described in Provide an actual specification (i.e. BNF or equivalent form) #1)? should it be interpreted as a single file, with a single group, but with no records? perhaps as an empty list of no files?
  • also, given that separators are not mandatory, any file (that is an UTF-8 valid one) that doesn't contain separators is a valid USV file with a single file/group/record/unit;
  • detecting a truncated file -- at the moment a single value-1<US>value-2 is a valid USV, however it might also be the prefix of a longer file that contained more records, but which was truncated; having the last separators mandatory, make the truncation detectable; (granted, the stream might get truncated at <FS> boundaries and not be detected, but given that most USV file would contain only one file, that would be an acceptable trade-off;)

And, if those are not convincing enough, here is a practical reason: it's simpler to write the formatter, because one can just print the last separator without checking if this was indeed the last item in its block:

for f in files :
  for g in f.groups :
    for r in g.records :
      for u in r.units :
        print(u.value)
        print(US)
      print(RS)
    print(GS)
  print(FS)

(I'll leave to others to think about the implementation where the last separator is not mandatory.) :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions