Description |
1. codepoints outside BMP, as literals and in data. If I put in a value that
requires use of a high/low surrogate pair, is that an error, does it require me
to put in two separate %#...; thingys, one for each of the surrogates (in which
case these are not really code points in ISO10646). If I put in a codepoint for
one of the supplemental characters and the schema itself is written in UTF-16
then that has to translate into literal surrogate pair. Ok, but I’m very
uncertain about all this stuffTracker Issue: illegal character encodings for
parsing and unparsing. TBD: how do these make it into the infoset or are they
replaced, and if so how TBD: can one represent these in the infoset for output?
Ideally not, but…
2. "round trip" for infoset. Should we omit the whole point?
3. Versions everywhere - infoset document element dfdlversion - should we
specify that this matches the dfdl namespace URL suffix, and the string used in
<xs:appinfo source="…" >. Consistency here is best.
4. Infoset string dataValue left-to-right order? How can this be correct. Does
this mean a right-to-left language will have its strings where the first
character someone would read is NOT at index 1 in the infoset strings? I guess
to me the string should be in "reading" order for the language it is written in.
5. Infoset [schema] is an absolute or relative SCD. Why bother allowing absolute
?
6. means to indicate that an implementation supports non-standard extensions.
7. ES Entity - what is output? What definition of empty space does ES map to?
WSP, WSP+, or WSP*?
8. some annotations take a 'name' attribute for reference from the 'ref'
attribute of other annotations. Isn't this 'name' also part of the syntax of
format annotations?
9. semantics of expressions containing relative paths that are inherited via ref
to a dfdl:defineFormat. (also section 10.3)
10. XPath term - we are not consistent about using the term XPath, or "
expression" when referring to our expression language. I prefer to call it our
expression language, and then in the section that defines it state that it is a
strict subset of XPath 2.0.
11. defineVariable name attribute should be an NCName, not a QName.
12. fn:position is unclear given that we've just said we don't support sequences
in the expression language.
13. what are the units for dfdl:representationLength? Bits, bytes, or characters
? (and similarly dfdl:unpaddedLength)
14. using a non-negative valued xs:long as the return value for dfdl:
representationLength and dfdl:unpaddedLength would make a Java-based
implementation easier, and it has plenty of magnitude.
15. \n in regular expressions - clarify relationship of this to entities like NL
entity. Also, if I include an entity like WSP* in a regular expression (can I?)
does it then match accordingly? It appears that some of our multi-valued
entities like WSP+ create conditional "matching" behavior without having to use
regular expressions, e.g., when WSP+ is used as a separator. But can you use
entities like WSP+ in a regular expression? It seems you should be able to use
regular "single valued" entities in a regular expression, its these multi-valued
ones that have tricky semantics.
16. order of sections. Scoping rules section should come before variables
section, which uses these concepts.
17. remove 10.2 (multiple annotations at the same point). Redundant.
18. 10.4 - break up Rule 3. Separate rules for when no duplicates are allowed
and those which allow overriding. Don't mix the two in the same rule.
19. (search for "supplement") the bcd, zoned, packed supplement has been merged.
Does the grammar need to grow more terminals as this paragraph suggests.
20. Grammar for Sequence - Where is FinalUnusedRegion. It is in the pictures but
not the grammar.
21. Figure 8 - These pictures are nice, but should we keep them, given that they
must be maintained. The grammar is sufficient, though not as pretty.
22. Move to the section on the Built-in specifications:" The namespace URIs
which identify these standard format definitions contain version identification
so that future versions of this standard can provide new versions of these
definitions which define properties or change their definitions.
These built-in format definitions are complete in that they provide a self-
consistent definition for all relevant representation properties. Their intended
use is as a base for extension. By extending from one of these provided
definitions a DFDL schema author can be assured that there is a base of suitable
representation properties from which to start."
23. 12 - properties aren't organized the way listed in the bullet list. Adjust
this section to match subsequent presentation.
24. 12 - TBD: Need some words to cover use of XPATH expressions as property
values, including the case where the result must evaluate to a Boolean.
25. big-Endian, little-Endian - strange inconsistent naming convention. We
should use one standard convention. So far it has been bigEndian, and
littleEndian. I.e., caml case. Most other identifiers work this way.
26. Case sensitivity of enum names - did we say whether this is case sensitive
or not? I believe it should be case sensitive.
27. Dfdl:ignoreCase - should this be a Boolean? In general we're trying to use
enums for these. Suggest caseSensitivity="strict/lax"
28. 14.1 Alignment - zero-based thinking here. But all the bits stuff and
everything else in DFDL uses 1-based reasoning. Need to revisit to make this
sensible for 1 based world.
29. finalTerminatorCanBeMissing - spec is not clear. Also is there a
finalSeparatorCanBeMissing
30. Dfdl:length - Issue: when can an expression return length value of zero?
Ever? Is this illegal? We say that there must be at least 1 bit in the rep of
any required item, or optional item that is present. (I suppose a string that is
required and has no default value can have length 0 in lengthUnits of
characters.
31. Dfdl:lengthKind 'endOfParent" Issue: unclear. This inductive relationship
about when an element can have lengthKind end of parent needs to be explained
better. (perhaps with a picture?) The right thing is not to try to repeat the
explanation both here and in the later section about this. Just put it in one
place. I suggest this section should be shorter and xref to the other one.
32. move notes about OMG/CAM model relationship to an appendix. There are many
other such sections - we have a doc somewhere which had multiple columns and
related a number of product-specific format spec language keywords to DFDL. I
suggest this OMG/CAM stuff (which is closely related to an IBM product also)
goes in that other document.
33. binaryMilliSeconds should be binaryMilliseconds. Milli is not a word in
English.
34. 14.3.5 - fix wording " For example the last is the parent is a known fixed
length"
35. 14.3.6.3 - Byte Order Mark - add quick note that a required or optional BOM
can be modeled as a separate element that appears before the string. (I put
suggested wording into the document itself already.)
36. Dfdl:escapeKind - need to reword " On parsing 'escapeCharacter’, unless
escaped by the escapeEscapeCharacter, and escapeEscapeCharacter when it precedes
the 'escapeCharacter’ are removed from the data."
37. escapeKind - for escapeBlock - must the open and close be at the beginning
and end of the element, or can the escapeBlockStart and End appear interior to
the data?
38. 15.2 - bidi properties apply only to the value, not the delimiters - is this
right?
39. Dfdl: textBidiNumeralShapes - if nominal and national are the enums, and
they mean Arabic shapes or Arabic-indic shapes for numerals then why aren't the
enum values arabicShapNumerals and arabicIndicShapeNumerals instead?
40. textStringPadCharacter textNumberPadCharacter - did we agree that this
character must be a "minimum width" character if the char set encoding is
variable width? (i.e., the pad char must be 1 byte if the encoding is UTF-8.
41. numberPattern - consider moving the example out of the box/table to a
following discussion.
42. numberPattern - "The number of '0' characters must match the number of
digits in the representation otherwise it is a schema definition error." Issue:
what if lengthUnits is bytes for this particular underlying string. E.g., it's a
prefixed string, the prefix gives the number of bytes, and then the string has
a particular number format.
43. numberExponentCharacter - Issue: allow a list of characters? Restrict to EFG
or efg only? Many libraries tolerate many exponent characters. Perhaps we
should just tolerate any of EFGefg on input, and have this be used only to
decide what to generate when unparsing?
44. numberInfinityRep numberNanRep - Is this applicable only to xs:double and xs
:float? Also, what I've seen requires a distinction of sign. I.e., there are
positive and negative infinities often printing as -inf and +inf.
45. numberRoundingMode - "the rounding increment is specified as part of the
pattern" - I did not find any such discussion of how to specify it. What the
heck is a rounding increment anyway?
46. 15.5 - Heading - just list the types it is applicable to.xs:decimal and all
sub-types.
47. binaryDecimalVirtualPoint - does it make sense for this to be zero? Or is is
an error (schema def) to specify a xs:decimal but have the virtual decimal
point be at position zero. Also - is this zero based indexing or 1 based. I.e.,
is 000.0 position 1 or position 2 for the decimal point?
48. calendarStrictChecking - out-of-band and in-band - tehse are formal terms
need formal defs/glossary entries.
49. calendarTimeZone - should there be standard variables for locale-oriented
things like this? E.g., we have a standard variable named dfdl:byteOrder which
defaults to bigEndian. Some of our built-in format specifications that we get
via include/import will specify dfdl:byteOrder="{ $dfdl:byteOrder }". Some of
these locale-oriented things for numbers and calendars might want to work
similarly.
50. 15.11 - hexBinary - the 2nd table needs explanation.
51. nilKind - literalCharacter and literalValue both require that representation
='text' or type is xs:string based.
52. nilValue - this word is singular, but takes a list. Do our other properties
that take lists also obey this convention, i.e., singular English words in the
names even though a list is allowed? (dfdl:extraEscapedCharacters - is plural….
So there's at least one property different from this convention)
53. 15.13 Properties for default value control - should we suggest: Warning
appropriate for DFDL schemas with non-zero minOccurs and no default value.
54. 15.14.1.2 - Initiators and Output - formatting of table needs to be cleaned
up. Fonts are wrong, lines don’t line up vertically, and width of lines doesn't
match the rest of the spec. Ideally, it should be rejiggered so that it fits
all on one page if there is a page break before the heading.
55. 16.1 empty sequences - need to clarify that a sequence with content length
zero has dfdl:lengthKind="explicit" and dfdl:length="0". That is, if we're still
allowing sequences to have dfdl:length properties.
56. separatorPosition - semantics of final separator can be missing - right now
it says that the last postfix separator is always optional.
57. Sequence groups (16.3.1) Do we allow children of an unoredered sequence with
initiators to be anything other than elements? (I think we decided the children
must be elements in this case - but the TBD is still here.)
58. 18.3 Arrays with XPath expressions - unclear - need to reword.
59. InputValueCalc and arrays - Let's rule it out on anything variable-
occurrence (arrays or optional). If an array is fixed length then I don't see
why this is a problem. But we could just say scalar only and I'd be happy.
60. 20 Non-primitive DFDL Schema Constructs - what's left here of this material
should be merged into the scoping section (and fixed).
|
1. codepoints outside BMP, as literals and in data. If I put in a value that
requires use of a high/low surrogate pair, is that an error, does it require me
to put in two separate %#...; thingys, one for each of the surrogates (in which
case these are not really code points in ISO10646). If I put in a codepoint for
one of the supplemental characters and the schema itself is written in UTF-16
then that has to translate into literal surrogate pair. Ok, but I’m very
uncertain about all this stuff
2. illegal character byte sequences for parsing and unparsing. how do these make
it into the infoset or are they replaced, and if so how and can one represent
these in the infoset for output? (array of xs:byte?, or hexBinary? - is that the
solution?)
3. "round trip" for infoset. Should we omit the whole point?
4. Versions everywhere - infoset document element dfdlversion - should we
specify that this matches the dfdl namespace URL suffix, and the string used in
<xs:appinfo source="…" >. Consistency here is best.
5. Infoset string dataValue left-to-right order? How can this be correct. Does
this mean a right-to-left language will have its strings where the first
character someone would read is NOT at index 1 in the infoset strings? I guess
to me the string should be in "reading" order for the language it is written in.
6. Infoset [schema] is an absolute or relative SCD. Why bother allowing absolute
?
7. means to indicate that an implementation supports non-standard extensions.
8. ES Entity - what is output? What definition of empty space does ES map to?
WSP, WSP+, or WSP*?
9. some annotations take a 'name' attribute for reference from the 'ref'
attribute of other annotations. Isn't this 'name' also part of the syntax of
format annotations?
10. semantics of expressions containing relative paths that are inherited via
ref to a dfdl:defineFormat. (also section 10.3)
11. XPath term - we are not consistent about using the term XPath, or "
expression" when referring to our expression language. I prefer to call it our
expression language, and then in the section that defines it state that it is a
strict subset of XPath 2.0.
12. defineVariable name attribute should be an NCName, not a QName.
13. fn:position is unclear given that we've just said we don't support sequences
in the expression language.
14. what are the units for dfdl:representationLength? Bits, bytes, or characters
? (and similarly dfdl:unpaddedLength)
15. using a non-negative valued xs:long as the return value for dfdl:
representationLength and dfdl:unpaddedLength would make a Java-based
implementation easier, and it has plenty of magnitude.
16. \n in regular expressions - clarify relationship of this to entities like NL
entity. Also, if I include an entity like WSP* in a regular expression (can I?)
does it then match accordingly? It appears that some of our multi-valued
entities like WSP+ create conditional "matching" behavior without having to use
regular expressions, e.g., when WSP+ is used as a separator. But can you use
entities like WSP+ in a regular expression? It seems you should be able to use
regular "single valued" entities in a regular expression, its these multi-valued
ones that have tricky semantics.
17. order of sections. Scoping rules section should come before variables
section, which uses these concepts.
18. remove 10.2 (multiple annotations at the same point). Redundant.
19. 10.4 - break up Rule 3. Separate rules for when no duplicates are allowed
and those which allow overriding. Don't mix the two in the same rule.
20. (search for "supplement") the bcd, zoned, packed supplement has been merged.
Does the grammar need to grow more terminals as this paragraph suggests.
21. Grammar for Sequence - Where is FinalUnusedRegion. It is in the pictures but
not the grammar.
22. Figure 8 - These pictures are nice, but should we keep them, given that they
must be maintained. The grammar is sufficient, though not as pretty.
23. Move to the section on the Built-in specifications:" The namespace URIs
which identify these standard format definitions contain version identification
so that future versions of this standard can provide new versions of these
definitions which define properties or change their definitions.
These built-in format definitions are complete in that they provide a self-
consistent definition for all relevant representation properties. Their intended
use is as a base for extension. By extending from one of these provided
definitions a DFDL schema author can be assured that there is a base of suitable
representation properties from which to start."
24. 12 - properties aren't organized the way listed in the bullet list. Adjust
this section to match subsequent presentation.
25. 12 - TBD: Need some words to cover use of XPATH expressions as property
values, including the case where the result must evaluate to a Boolean.
26. big-Endian, little-Endian - strange inconsistent naming convention. We
should use one standard convention. So far it has been bigEndian, and
littleEndian. I.e., caml case. Most other identifiers work this way.
27. Case sensitivity of enum names - did we say whether this is case sensitive
or not? I believe it should be case sensitive.
28. Dfdl:ignoreCase - should this be a Boolean? In general we're trying to use
enums for these. Suggest caseSensitivity="strict/lax"
29. 14.1 Alignment - zero-based thinking here. But all the bits stuff and
everything else in DFDL uses 1-based reasoning. Need to revisit to make this
sensible for 1 based world.
30. finalTerminatorCanBeMissing - spec is not clear. Also is there a
finalSeparatorCanBeMissing
31. Dfdl:length - Issue: when can an expression return length value of zero?
Ever? Is this illegal? We say that there must be at least 1 bit in the rep of
any required item, or optional item that is present. (I suppose a string that is
required and has no default value can have length 0 in lengthUnits of
characters.
32. Dfdl:lengthKind 'endOfParent" Issue: unclear. This inductive relationship
about when an element can have lengthKind end of parent needs to be explained
better. (perhaps with a picture?) The right thing is not to try to repeat the
explanation both here and in the later section about this. Just put it in one
place. I suggest this section should be shorter and xref to the other one.
33. move notes about OMG/CAM model relationship to an appendix. There are many
other such sections - we have a doc somewhere which had multiple columns and
related a number of product-specific format spec language keywords to DFDL. I
suggest this OMG/CAM stuff (which is closely related to an IBM product also)
goes in that other document.
34. binaryMilliSeconds should be binaryMilliseconds. Milli is not a word in
English.
35. 14.3.5 - fix wording " For example the last is the parent is a known fixed
length"
36. 14.3.6.3 - Byte Order Mark - add quick note that a required or optional BOM
can be modeled as a separate element that appears before the string. (I put
suggested wording into the document itself already.)
37. Dfdl:escapeKind - need to reword " On parsing 'escapeCharacter’, unless
escaped by the escapeEscapeCharacter, and escapeEscapeCharacter when it precedes
the 'escapeCharacter’ are removed from the data."
38. escapeKind - for escapeBlock - must the open and close be at the beginning
and end of the element, or can the escapeBlockStart and End appear interior to
the data?
39. 15.2 - bidi properties apply only to the value, not the delimiters - is this
right?
40. Dfdl: textBidiNumeralShapes - if nominal and national are the enums, and
they mean Arabic shapes or Arabic-indic shapes for numerals then why aren't the
enum values arabicShapNumerals and arabicIndicShapeNumerals instead?
41. textStringPadCharacter textNumberPadCharacter - did we agree that this
character must be a "minimum width" character if the char set encoding is
variable width? (i.e., the pad char must be 1 byte if the encoding is UTF-8.
42. numberPattern - consider moving the example out of the box/table to a
following discussion.
43. numberPattern - "The number of '0' characters must match the number of
digits in the representation otherwise it is a schema definition error." Issue:
what if lengthUnits is bytes for this particular underlying string. E.g., it's a
prefixed string, the prefix gives the number of bytes, and then the string has
a particular number format.
44. numberExponentCharacter - Issue: allow a list of characters? Restrict to EFG
or efg only? Many libraries tolerate many exponent characters. Perhaps we
should just tolerate any of EFGefg on input, and have this be used only to
decide what to generate when unparsing?
45. numberInfinityRep numberNanRep - Is this applicable only to xs:double and xs
:float? Also, what I've seen requires a distinction of sign. I.e., there are
positive and negative infinities often printing as -inf and +inf.
46. numberRoundingMode - "the rounding increment is specified as part of the
pattern" - I did not find any such discussion of how to specify it. What the
heck is a rounding increment anyway?
47. 15.5 - Heading - just list the types it is applicable to.xs:decimal and all
sub-types.
48. binaryDecimalVirtualPoint - does it make sense for this to be zero? Or is is
an error (schema def) to specify a xs:decimal but have the virtual decimal
point be at position zero. Also - is this zero based indexing or 1 based. I.e.,
is 000.0 position 1 or position 2 for the decimal point?
49. calendarStrictChecking - out-of-band and in-band - tehse are formal terms
need formal defs/glossary entries.
50. calendarTimeZone - should there be standard variables for locale-oriented
things like this? E.g., we have a standard variable named dfdl:byteOrder which
defaults to bigEndian. Some of our built-in format specifications that we get
via include/import will specify dfdl:byteOrder="{ $dfdl:byteOrder }". Some of
these locale-oriented things for numbers and calendars might want to work
similarly.
51. 15.11 - hexBinary - the 2nd table needs explanation.
52. nilKind - literalCharacter and literalValue both require that representation
='text' or type is xs:string based.
53. nilValue - this word is singular, but takes a list. Do our other properties
that take lists also obey this convention, i.e., singular English words in the
names even though a list is allowed? (dfdl:extraEscapedCharacters - is plural….
So there's at least one property different from this convention)
54. 15.13 Properties for default value control - should we suggest: Warning
appropriate for DFDL schemas with non-zero minOccurs and no default value.
55. 15.14.1.2 - Initiators and Output - formatting of table needs to be cleaned
up. Fonts are wrong, lines don’t line up vertically, and width of lines doesn't
match the rest of the spec. Ideally, it should be rejiggered so that it fits
all on one page if there is a page break before the heading.
56. 16.1 empty sequences - need to clarify that a sequence with content length
zero has dfdl:lengthKind="explicit" and dfdl:length="0". That is, if we're still
allowing sequences to have dfdl:length properties.
57. separatorPosition - semantics of final separator can be missing - right now
it says that the last postfix separator is always optional.
58. Sequence groups (16.3.1) Do we allow children of an unoredered sequence with
initiators to be anything other than elements? (I think we decided the children
must be elements in this case - but the TBD is still here.)
59. 18.3 Arrays with XPath expressions - unclear - need to reword.
60. InputValueCalc and arrays - Let's rule it out on anything variable-
occurrence (arrays or optional). If an array is fixed length then I don't see
why this is a problem. But we could just say scalar only and I'd be happy.
61. 20 Non-primitive DFDL Schema Constructs - what's left here of this material
should be merged into the scoping section (and fixed).
|
01/06/2010 4:49 PM EST |
Michael J Beckerle |