You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, given there is no clear specification, I interpret the current USV as described in issue #1.
Thus, my suggestion is to make the unit / record / group / file separators mandatory at the end of each such block.
The reasons are:
many parsers would most likely be lenient and just ignore a RS that is immediately followed by a GS or FS; (this plays into the next ones;)
in security there is the rule of "canonical representation", thus especially if one were to sign an USV file, there should be a canonical representation (enforced by the parser), so that an attacker can't just fill the file with separators that a lenient parser would just ignore;
at the moment the empty string is a valid USV; how should it be interpreted? as a single file, with a single group, with a single record, with an empty field (this would result from my interpretation described in Provide an actual specification (i.e. BNF or equivalent form) #1)? should it be interpreted as a single file, with a single group, but with no records? perhaps as an empty list of no files?
also, given that separators are not mandatory, any file (that is an UTF-8 valid one) that doesn't contain separators is a valid USV file with a single file/group/record/unit;
detecting a truncated file -- at the moment a single value-1<US>value-2 is a valid USV, however it might also be the prefix of a longer file that contained more records, but which was truncated; having the last separators mandatory, make the truncation detectable; (granted, the stream might get truncated at <FS> boundaries and not be detected, but given that most USV file would contain only one file, that would be an acceptable trade-off;)
And, if those are not convincing enough, here is a practical reason: it's simpler to write the formatter, because one can just print the last separator without checking if this was indeed the last item in its block:
for f in files :
for g in f.groups :
for r in g.records :
for u in r.units :
print(u.value)
print(US)
print(RS)
print(GS)
print(FS)
(I'll leave to others to think about the implementation where the last separator is not mandatory.) :)
First of all, given there is no clear specification, I interpret the current USV as described in issue #1.
Thus, my suggestion is to make the unit / record / group / file separators mandatory at the end of each such block.
The reasons are:
value-1<US>value-2is a valid USV, however it might also be the prefix of a longer file that contained more records, but which was truncated; having the last separators mandatory, make the truncation detectable; (granted, the stream might get truncated at<FS>boundaries and not be detected, but given that most USV file would contain only one file, that would be an acceptable trade-off;)And, if those are not convincing enough, here is a practical reason: it's simpler to write the formatter, because one can just print the last separator without checking if this was indeed the last item in its block:
(I'll leave to others to think about the implementation where the last separator is not mandatory.) :)