Description: |
This section (2.3.1 in this draft), is problematic as we’re trying to allow simple DFDL implementations to not do a
bunch of static checking, yet if implementations differ on when Schema Definition errors are detected, then the second
paragraph says they are converted to processing errors. This lets different implementations do very different things in
terms of how the speculative parsing back-tracks around.
Consider two cases: type checking, grammar ambiguity.
Grammar ambiguity is a very tricky case. Unless a DFDL implementation can prove a grammar to be unambiguous, then it is
very hard to say that any particular combinatino of delimiters make up a legal DFDL schema definition. If the parser
simply fails because the grammar was ambiguous, there’s no way to tell the difference between this and just broken data
without proving the grammar is unambiguous. In general it is formally undecidable whether a grammar is ambiguous or
unambiguous. (http://books.google.com/books?id=lIuu53IcKWoC&pg=PT217&lpg=PT217&dq=proving+a+grammar+is+unambiguous&source=bl&ots=wie8TAt-MT&sig=ZSD7tIwnXZIT8Ic91BWMH2H2dKg&hl=en&ei=hAQ5S5vPOIri7APc37CKBg&sa=X&oi=book_result&ct=result&resnum=10&ved=0CDAQ6AEwCQ#v=onepage&q=proving%20a%20grammar%20is%20unambiguous&f=false)
Since DFDL v1.0 doesn’t allow recursive declarations/definitions, it may be possible to provide the ambiguity or unambiguity of a DFDL schema (or rather, the data syntax grammar described by it – if you want to bother to distinguish the two), but recursion isn’t something we want to rule out for the future, so I believe the right thing is for grammar ambiguity to NOT be checked at all - i.e., DFDL implementations will just fail to parse (processing error - not a schema definition error.).
Type checking is decidable in DFDL’s expression language, so we could always detect type safety before run time; however, if we allow a simplistic DFDL implementation to
just check types at run time then this would, by the definition in this section (2.3.1), issue processing errors when it
detects these at run time, thereby allowing backtracking of the speculative parser to be driven off of type-checks in
the expression language.
This feels like a schema def error that while caught at runtime, should NOT be turned into a processing error, but
should cause the DFDL processor to "fail" in some implementation-specific way (i.e., implementations might provide ways
to recover, i.e., an implementation would likely move on to the next unit of data, hoping it won't need the part of the
schema that had the type-check problem, and hope for the best)
So, the language that says schema def errors detected at run time are converted to processing errors should be changed.
Schema def errors is a term that should be reserved for things that cause the processor to "fail", and do not cause
speculative parse backtracking through alternatives.
Processing errors should be used for all other errors. |