06/20/2007 11:02 AM
post5834
|
Comments on the draft spec
Background: We've been working developing clients for GridSAM (http://www.realitygrid.org/AHE), and are likely to be
users of other BES implementations, hence our interest in the spec. I have some minor comments on the draft and have
spotted some typos and things which possibly need clarifying, or possibly need considering in future versions of the
spec.
Page 9, 4.2.1: "a data staging operations" should read "a data staging operation"
Page 10, paragraph 2: I think "BESs MUST not" should read "BESs MUST NOT"
Page 15, table: LocalResourceManagerType and BESExtension "URI's" should read "URIs" in the description column
Page 15, table: Regarding the CPUArchitecture attribute. Have the BES WG considered the situation where a BES will be
managing job submission on a machine made up of multiple processor architectures (for example a parallel machine with
both scalar and vector processors)? Would this necessitate separate BESs to submit to the different parts of the machine
?
Page 15, table: Regarding CPUCount - how will a BES implementation differentiate between CPU cores and actual CPUs? It
would be useful if a user could distinguish between processors and cores.
Page 15, table: CPUSpeed - related to the points above, some parallel machines might be composed of CPUs of different
speeds. How would the BES deal with this?
--> I guess that these attribute definitions are constrained by their equivalents in JSDL, but it might be worth
considering how they are addressed in future versions of the spec.
On a related note, but I don't think this is covered in the text: When submitting jobs to a parallel machine with some
local resource manager, it can often be useful to specify the queue that you want to submit to. As far as I'm aware
there is no way to specify queue details in JSDL. It would be useful if the BES provided a way to specify a queue when
submitting a job, otherwise we might end up with a situation where a separate BES instance needs to be run for each
queue on the machine.
Page 16, 6.1.4: Will all failed activities be reported by TotalNumberOfActivites? From looking at the spec, I'm unsure
how the lifetime of failed activities is managed. A user won't be able to call TermiateActivites, since a state
transition between Failed and Terminated can't occur. Failed jobs need to persist for some time, as the user want to
query the job to find its state, but there is no need for failed jobs to persist for ever. Is the mechanism for
managing failed jobs left to the BES implementor to decide? If so, it might be worth mentioning this.
Page 17: Numbering incorrect from 1.1.3
Page 19, 1.2.2.1: I think that "from which we require status information" would sound better as "from which status
information is required".
Page 22, 3.1, paragraph 2: I think " BES MUST not" should read "BES MUST NOT"
Page 23, 3.3, paragraph 2: '"absolute time", That said time' should read '"absolute time". That said, time'
Hope this is useful. Keep up the good work.
|
|
|