|
Action: |
Update
Description changed from 1. codepoints outside BMP, as literals and in data. If I put in a value that requires use of a high/low surrogate pair,
is that an error, does it require me to put in two separate %#...; thingys, one for each of the surrogates (in which
case these are not really code points in ISO10646). If I put in a codepoint for one of the supplemental characters and
the schema itself is written in UTF-16 then that has to translate into literal surrogate pair. Ok, but I’m very
uncertain about all this stuffTracker Issue: illegal character encodings for parsing and unparsing. TBD: how do these
make it into the infoset or are they replaced, and if so how TBD: can one represent these in the infoset for output?
Ideally not, but…
2. "round trip" for infoset. Should we omit the whole point?
3. Versions everywhere - infoset document element dfdlversion - should we specify that this matches the dfdl namespace
URL suffix, and the string used in <xs:appinfo source="…" >. Consistency here is best.
4. Infoset string dataValue left-to-right order? How can this be correct. Does this mean a right-to-left language will
have its strings where the first character someone would read is NOT at index 1 in the infoset strings? I guess to me
the string should be in "reading" order for the language it is written in.
5. Infoset [schema] is an absolute or relative SCD. Why bother allowing absolute?
6. means to indicate that an implementation supports non-standard extensions.
7. ES Entity - what is output? What definition of empty space does ES map to? WSP, WSP+, or WSP*?
8. some annotations take a 'name' attribute for reference from the 'ref' attribute of other annotations. Isn't this '
name' also part of the syntax of format annotations?
9. semantics of expressions containing relative paths that are inherited via ref to a dfdl:defineFormat. (also section
10.3)
10. XPath term - we are not consistent about using the term XPath, or "expression" when referring to our expression
language. I prefer to call it our expression language, and then in the section that defines it state that it is a strict
subset of XPath 2.0.
11. defineVariable name attribute should be an NCName, not a QName.
12. fn:position is unclear given that we've just said we don't support sequences in the expression language.
13. what are the units for dfdl:representationLength? Bits, bytes, or characters? (and similarly dfdl:unpaddedLength)
14. using a non-negative valued xs:long as the return value for dfdl:representationLength and dfdl:unpaddedLength would
make a Java-based implementation easier, and it has plenty of magnitude.
15. \n in regular expressions - clarify relationship of this to entities like NL entity. Also, if I include an entity
like WSP* in a regular expression (can I?) does it then match accordingly? It appears that some of our multi-valued
entities like WSP+ create conditional "matching" behavior without having to use regular expressions, e.g., when WSP+ is
used as a separator. But can you use entities like WSP+ in a regular expression? It seems you should be able to use
regular "single valued" entities in a regular expression, its these multi-valued ones that have tricky semantics.
16. order of sections. Scoping rules section should come before variables section, which uses these concepts.
17. remove 10.2 (multiple annotations at the same point). Redundant.
18. 10.4 - break up Rule 3. Separate rules for when no duplicates are allowed and those which allow overriding. Don't
mix the two in the same rule.
19. (search for "supplement") the bcd, zoned, packed supplement has been merged. Does the grammar need to grow more
terminals as this paragraph suggests.
20. Grammar for Sequence - Where is FinalUnusedRegion. It is in the pictures but not the grammar.
21. Figure 8 - These pictures are nice, but should we keep them, given that they must be maintained. The grammar is
sufficient, though not as pretty.
22. Move to the section on the Built-in specifications:" The namespace URIs which identify these standard format
definitions contain version identification so that future versions of this standard can provide new versions of these
definitions which define properties or change their definitions.
These built-in format definitions are complete in that they provide a self-consistent definition for all relevant
representation properties. Their intended use is as a base for extension. By extending from one of these provided
definitions a DFDL schema author can be assured that there is a base of suitable representation properties from which to
start."
23. 12 - properties aren't organized the way listed in the bullet list. Adjust this section to match subsequent
presentation.
24. 12 - TBD: Need some words to cover use of XPATH expressions as property values, including the case where the result
must evaluate to a Boolean.
25. big-Endian, little-Endian - strange inconsistent naming convention. We should use one standard convention. So far it
has been bigEndian, and littleEndian. I.e., caml case. Most other identifiers work this way.
26. Case sensitivity of enum names - did we say whether this is case sensitive or not? I believe it should be case
sensitive.
27. Dfdl:ignoreCase - should this be a Boolean? In general we're trying to use enums for these. Suggest caseSensitivity=
"strict/lax"
28. 14.1 Alignment - zero-based thinking here. But all the bits stuff and everything else in DFDL uses 1-based reasoning
. Need to revisit to make this sensible for 1 based world.
29. finalTerminatorCanBeMissing - spec is not clear. Also is there a finalSeparatorCanBeMissing
30. Dfdl:length - Issue: when can an expression return length value of zero? Ever? Is this illegal? We say that there
must be at least 1 bit in the rep of any required item, or optional item that is present. (I suppose a string that is
required and has no default value can have length 0 in lengthUnits of characters.
31. Dfdl:lengthKind 'endOfParent" Issue: unclear. This inductive relationship about when an element can have lengthKind
end of parent needs to be explained better. (perhaps with a picture?) The right thing is not to try to repeat the
explanation both here and in the later section about this. Just put it in one place. I suggest this section should be
shorter and xref to the other one.
32. move notes about OMG/CAM model relationship to an appendix. There are many other such sections - we have a doc
somewhere which had multiple columns and related a number of product-specific format spec language keywords to DFDL. I
suggest this OMG/CAM stuff (which is closely related to an IBM product also) goes in that other document.
33. binaryMilliSeconds should be binaryMilliseconds. Milli is not a word in English.
34. 14.3.5 - fix wording " For example the last is the parent is a known fixed length"
35. 14.3.6.3 - Byte Order Mark - add quick note that a required or optional BOM can be modeled as a separate element
that appears before the string. (I put suggested wording into the document itself already.)
36. Dfdl:escapeKind - need to reword " On parsing 'escapeCharacter’, unless escaped by the escapeEscapeCharacter, and
escapeEscapeCharacter when it precedes the 'escapeCharacter’ are removed from the data."
37. escapeKind - for escapeBlock - must the open and close be at the beginning and end of the element, or can the
escapeBlockStart and End appear interior to the data?
38. 15.2 - bidi properties apply only to the value, not the delimiters - is this right?
39. Dfdl: textBidiNumeralShapes - if nominal and national are the enums, and they mean Arabic shapes or Arabic-indic
shapes for numerals then why aren't the enum values arabicShapNumerals and arabicIndicShapeNumerals instead?
40. textStringPadCharacter textNumberPadCharacter - did we agree that this character must be a "minimum width" character
if the char set encoding is variable width? (i.e., the pad char must be 1 byte if the encoding is UTF-8.
41. numberPattern - consider moving the example out of the box/table to a following discussion.
42. numberPattern - "The number of '0' characters must match the number of digits in the representation otherwise it is
a schema definition error." Issue: what if lengthUnits is bytes for this particular underlying string. E.g., it's a
prefixed string, the prefix gives the number of bytes, and then the string has a particular number format.
43. numberExponentCharacter - Issue: allow a list of characters? Restrict to EFG or efg only? Many libraries tolerate
many exponent characters. Perhaps we should just tolerate any of EFGefg on input, and have this be used only to decide
what to generate when unparsing?
44. numberInfinityRep numberNanRep - Is this applicable only to xs:double and xs:float? Also, what I've seen requires a
distinction of sign. I.e., there are positive and negative infinities often printing as -inf and +inf.
45. numberRoundingMode - "the rounding increment is specified as part of the pattern" - I did not find any such
discussion of how to specify it. What the heck is a rounding increment anyway?
46. 15.5 - Heading - just list the types it is applicable to.xs:decimal and all sub-types.
47. binaryDecimalVirtualPoint - does it make sense for this to be zero? Or is is an error (schema def) to specify a xs:
decimal but have the virtual decimal point be at position zero. Also - is this zero based indexing or 1 based. I.e., is
000.0 position 1 or position 2 for the decimal point?
48. calendarStrictChecking - out-of-band and in-band - tehse are formal terms need formal defs/glossary entries.
49. calendarTimeZone - should there be standard variables for locale-oriented things like this? E.g., we have a standard
variable named dfdl:byteOrder which defaults to bigEndian. Some of our built-in format specifications that we get via
include/import will specify dfdl:byteOrder="{ $dfdl:byteOrder }". Some of these locale-oriented things for numbers and
calendars might want to work similarly.
50. 15.11 - hexBinary - the 2nd table needs explanation.
51. nilKind - literalCharacter and literalValue both require that representation='text' or type is xs:string based.
52. nilValue - this word is singular, but takes a list. Do our other properties that take lists also obey this
convention, i.e., singular English words in the names even though a list is allowed? (dfdl:extraEscapedCharacters - is
plural…. So there's at least one property different from this convention)
53. 15.13 Properties for default value control - should we suggest: Warning appropriate for DFDL schemas with non-zero
minOccurs and no default value.
54. 15.14.1.2 - Initiators and Output - formatting of table needs to be cleaned up. Fonts are wrong, lines don’t line
up vertically, and width of lines doesn't match the rest of the spec. Ideally, it should be rejiggered so that it fits
all on one page if there is a page break before the heading.
55. 16.1 empty sequences - need to clarify that a sequence with content length zero has dfdl:lengthKind="explicit" and
dfdl:length="0". That is, if we're still allowing sequences to have dfdl:length properties.
56. separatorPosition - semantics of final separator can be missing - right now it says that the last postfix separator
is always optional.
57. Sequence groups (16.3.1) Do we allow children of an unoredered sequence with initiators to be anything other than
elements? (I think we decided the children must be elements in this case - but the TBD is still here.)
58. 18.3 Arrays with XPath expressions - unclear - need to reword.
59. InputValueCalc and arrays - Let's rule it out on anything variable-occurrence (arrays or optional). If an array is
fixed length then I don't see why this is a problem. But we could just say scalar only and I'd be happy.
60. 20 Non-primitive DFDL Schema Constructs - what's left here of this material should be merged into the scoping
section (and fixed).
to 1. codepoints outside BMP, as literals and in data. If I put in a value that requires use of a high/low surrogate pair,
is that an error, does it require me to put in two separate %#...; thingys, one for each of the surrogates (in which
case these are not really code points in ISO10646). If I put in a codepoint for one of the supplemental characters and
the schema itself is written in UTF-16 then that has to translate into literal surrogate pair. Ok, but I’m very
uncertain about all this stuff
2. illegal character byte sequences for parsing and unparsing. how do these make it into the infoset or are they
replaced, and if so how and can one represent these in the infoset for output? (array of xs:byte?, or hexBinary? - is
that the solution?)
3. "round trip" for infoset. Should we omit the whole point?
4. Versions everywhere - infoset document element dfdlversion - should we specify that this matches the dfdl namespace
URL suffix, and the string used in <xs:appinfo source="…" >. Consistency here is best.
5. Infoset string dataValue left-to-right order? How can this be correct. Does this mean a right-to-left language will
have its strings where the first character someone would read is NOT at index 1 in the infoset strings? I guess to me
the string should be in "reading" order for the language it is written in.
6. Infoset [schema] is an absolute or relative SCD. Why bother allowing absolute?
7. means to indicate that an implementation supports non-standard extensions.
8. ES Entity - what is output? What definition of empty space does ES map to? WSP, WSP+, or WSP*?
9. some annotations take a 'name' attribute for reference from the 'ref' attribute of other annotations. Isn't this '
name' also part of the syntax of format annotations?
10. semantics of expressions containing relative paths that are inherited via ref to a dfdl:defineFormat. (also section
10.3)
11. XPath term - we are not consistent about using the term XPath, or "expression" when referring to our expression
language. I prefer to call it our expression language, and then in the section that defines it state that it is a strict
subset of XPath 2.0.
12. defineVariable name attribute should be an NCName, not a QName.
13. fn:position is unclear given that we've just said we don't support sequences in the expression language.
14. what are the units for dfdl:representationLength? Bits, bytes, or characters? (and similarly dfdl:unpaddedLength)
15. using a non-negative valued xs:long as the return value for dfdl:representationLength and dfdl:unpaddedLength would
make a Java-based implementation easier, and it has plenty of magnitude.
16. \n in regular expressions - clarify relationship of this to entities like NL entity. Also, if I include an entity
like WSP* in a regular expression (can I?) does it then match accordingly? It appears that some of our multi-valued
entities like WSP+ create conditional "matching" behavior without having to use regular expressions, e.g., when WSP+ is
used as a separator. But can you use entities like WSP+ in a regular expression? It seems you should be able to use
regular "single valued" entities in a regular expression, its these multi-valued ones that have tricky semantics.
17. order of sections. Scoping rules section should come before variables section, which uses these concepts.
18. remove 10.2 (multiple annotations at the same point). Redundant.
19. 10.4 - break up Rule 3. Separate rules for when no duplicates are allowed and those which allow overriding. Don't
mix the two in the same rule.
20. (search for "supplement") the bcd, zoned, packed supplement has been merged. Does the grammar need to grow more
terminals as this paragraph suggests.
21. Grammar for Sequence - Where is FinalUnusedRegion. It is in the pictures but not the grammar.
22. Figure 8 - These pictures are nice, but should we keep them, given that they must be maintained. The grammar is
sufficient, though not as pretty.
23. Move to the section on the Built-in specifications:" The namespace URIs which identify these standard format
definitions contain version identification so that future versions of this standard can provide new versions of these
definitions which define properties or change their definitions.
These built-in format definitions are complete in that they provide a self-consistent definition for all relevant
representation properties. Their intended use is as a base for extension. By extending from one of these provided
definitions a DFDL schema author can be assured that there is a base of suitable representation properties from which to
start."
24. 12 - properties aren't organized the way listed in the bullet list. Adjust this section to match subsequent
presentation.
25. 12 - TBD: Need some words to cover use of XPATH expressions as property values, including the case where the result
must evaluate to a Boolean.
26. big-Endian, little-Endian - strange inconsistent naming convention. We should use one standard convention. So far it
has been bigEndian, and littleEndian. I.e., caml case. Most other identifiers work this way.
27. Case sensitivity of enum names - did we say whether this is case sensitive or not? I believe it should be case
sensitive.
28. Dfdl:ignoreCase - should this be a Boolean? In general we're trying to use enums for these. Suggest caseSensitivity=
"strict/lax"
29. 14.1 Alignment - zero-based thinking here. But all the bits stuff and everything else in DFDL uses 1-based reasoning
. Need to revisit to make this sensible for 1 based world.
30. finalTerminatorCanBeMissing - spec is not clear. Also is there a finalSeparatorCanBeMissing
31. Dfdl:length - Issue: when can an expression return length value of zero? Ever? Is this illegal? We say that there
must be at least 1 bit in the rep of any required item, or optional item that is present. (I suppose a string that is
required and has no default value can have length 0 in lengthUnits of characters.
32. Dfdl:lengthKind 'endOfParent" Issue: unclear. This inductive relationship about when an element can have lengthKind
end of parent needs to be explained better. (perhaps with a picture?) The right thing is not to try to repeat the
explanation both here and in the later section about this. Just put it in one place. I suggest this section should be
shorter and xref to the other one.
33. move notes about OMG/CAM model relationship to an appendix. There are many other such sections - we have a doc
somewhere which had multiple columns and related a number of product-specific format spec language keywords to DFDL. I
suggest this OMG/CAM stuff (which is closely related to an IBM product also) goes in that other document.
34. binaryMilliSeconds should be binaryMilliseconds. Milli is not a word in English.
35. 14.3.5 - fix wording " For example the last is the parent is a known fixed length"
36. 14.3.6.3 - Byte Order Mark - add quick note that a required or optional BOM can be modeled as a separate element
that appears before the string. (I put suggested wording into the document itself already.)
37. Dfdl:escapeKind - need to reword " On parsing 'escapeCharacter’, unless escaped by the escapeEscapeCharacter, and
escapeEscapeCharacter when it precedes the 'escapeCharacter’ are removed from the data."
38. escapeKind - for escapeBlock - must the open and close be at the beginning and end of the element, or can the
escapeBlockStart and End appear interior to the data?
39. 15.2 - bidi properties apply only to the value, not the delimiters - is this right?
40. Dfdl: textBidiNumeralShapes - if nominal and national are the enums, and they mean Arabic shapes or Arabic-indic
shapes for numerals then why aren't the enum values arabicShapNumerals and arabicIndicShapeNumerals instead?
41. textStringPadCharacter textNumberPadCharacter - did we agree that this character must be a "minimum width" character
if the char set encoding is variable width? (i.e., the pad char must be 1 byte if the encoding is UTF-8.
42. numberPattern - consider moving the example out of the box/table to a following discussion.
43. numberPattern - "The number of '0' characters must match the number of digits in the representation otherwise it is
a schema definition error." Issue: what if lengthUnits is bytes for this particular underlying string. E.g., it's a
prefixed string, the prefix gives the number of bytes, and then the string has a particular number format.
44. numberExponentCharacter - Issue: allow a list of characters? Restrict to EFG or efg only? Many libraries tolerate
many exponent characters. Perhaps we should just tolerate any of EFGefg on input, and have this be used only to decide
what to generate when unparsing?
45. numberInfinityRep numberNanRep - Is this applicable only to xs:double and xs:float? Also, what I've seen requires a
distinction of sign. I.e., there are positive and negative infinities often printing as -inf and +inf.
46. numberRoundingMode - "the rounding increment is specified as part of the pattern" - I did not find any such
discussion of how to specify it. What the heck is a rounding increment anyway?
47. 15.5 - Heading - just list the types it is applicable to.xs:decimal and all sub-types.
48. binaryDecimalVirtualPoint - does it make sense for this to be zero? Or is is an error (schema def) to specify a xs:
decimal but have the virtual decimal point be at position zero. Also - is this zero based indexing or 1 based. I.e., is
000.0 position 1 or position 2 for the decimal point?
49. calendarStrictChecking - out-of-band and in-band - tehse are formal terms need formal defs/glossary entries.
50. calendarTimeZone - should there be standard variables for locale-oriented things like this? E.g., we have a standard
variable named dfdl:byteOrder which defaults to bigEndian. Some of our built-in format specifications that we get via
include/import will specify dfdl:byteOrder="{ $dfdl:byteOrder }". Some of these locale-oriented things for numbers and
calendars might want to work similarly.
51. 15.11 - hexBinary - the 2nd table needs explanation.
52. nilKind - literalCharacter and literalValue both require that representation='text' or type is xs:string based.
53. nilValue - this word is singular, but takes a list. Do our other properties that take lists also obey this
convention, i.e., singular English words in the names even though a list is allowed? (dfdl:extraEscapedCharacters - is
plural…. So there's at least one property different from this convention)
54. 15.13 Properties for default value control - should we suggest: Warning appropriate for DFDL schemas with non-zero
minOccurs and no default value.
55. 15.14.1.2 - Initiators and Output - formatting of table needs to be cleaned up. Fonts are wrong, lines don’t line
up vertically, and width of lines doesn't match the rest of the spec. Ideally, it should be rejiggered so that it fits
all on one page if there is a page break before the heading.
56. 16.1 empty sequences - need to clarify that a sequence with content length zero has dfdl:lengthKind="explicit" and
dfdl:length="0". That is, if we're still allowing sequences to have dfdl:length properties.
57. separatorPosition - semantics of final separator can be missing - right now it says that the last postfix separator
is always optional.
58. Sequence groups (16.3.1) Do we allow children of an unoredered sequence with initiators to be anything other than
elements? (I think we decided the children must be elements in this case - but the TBD is still here.)
59. 18.3 Arrays with XPath expressions - unclear - need to reword.
60. InputValueCalc and arrays - Let's rule it out on anything variable-occurrence (arrays or optional). If an array is
fixed length then I don't see why this is a problem. But we could just say scalar only and I'd be happy.
61. 20 Non-primitive DFDL Schema Constructs - what's left here of this material should be merged into the scoping
section (and fixed).
Title changed from 037 - consolidated list of 60 smaller items from MikeB review to 037 - consolidated list of 61 smaller items from MikeB review
|