This is a static archive of the previous Open Grid Forum GridForge content management system saved from host forge.ogf.org file /sf/wiki/do/viewPage/projects.pgi-wg/wiki/ExecutionServicePoints at Thu, 03 Nov 2022 00:04:36 GMT SourceForge : View Wiki Page: ExecutionServicePoints

Project Home

Tracker

Documents

Tasks

Source Code

Discussions

File Releases

Wiki

Project Admin
Search Wiki Pages Project: pgi-wg     Wiki > ExecutionServicePoints > View Wiki Page
wiki2229: ExecutionServicePoints

Open Issues


Issues posted by Balazs Konya in a mail dated 27 January 2010:

Q1)  Service Vector Limits

  • Work out the details of how vector limits of operations are published, decide upon the granularity of these limits (per portType, per operations, etc..).We agreed to expose vector limits per operation - GLUE2 schema extension (TODO Balazs)
  • Define FAULT mechanism (TODO Morris).

DONE Q2)  Which of the Execution portType operations are mandatory?

What about the CreateValidatedActivity?// note: In Lund there was an initial agreement that all the operations are mandatory
  • On portType Level: Execution, Information (Mandatory) - Delegation (Optional)
  • On operation Level: man. Execution portType - Mandatory for all operation
  • On operation Level: man. Information portType - Mandatory for all operation
  • On operation Level: opt. Delegation portType - Mandatory for all operation

Q3)  Activity ID: any special requirement (should be specified in conjunction with EPR)?

Currently the specs only states "service creates an instance of each activity that is identified by a global unique ID assigned by the service". What return type of Create and re-use in other vector operations.

How about UNICORE Gateway issue, Find out why that is required? maybe the address has to be there or the extension mechanism?

  • OPTION 1 : URI issue: hostname:8080/GATEWAYSITEX/BESService/besinstance?273890471238904701234
  • OPTION : URI issue: hostname:8080/BESService/SharingOne/besinstance?273890471238904701234
  • OPTION : URI issue: hostname:8080/BESService/SharingTwo/besinstance?273890471238904701234
  • OPTION 2: Reference Parameter in EPR - Web Service (WS) - Addressing
  • UNICORE: ok and lots of positive experience
  • gLite: jobid - created by user, serviceinstanceid - created by service, use EPR as return type of job submit activity, not used in other operations (only jobid itself) - not valid according to specification since EPR once created is opaque...
  • ARC: Lightweight EPRs ok - question where to stop? Published in information systems using GLUE2 with big EPRs is not really 'lovely'.
  • GENESIS: WS-Naming, RNS et al. -

Standard addressing scheme as suggested by Morris:

<wsa:EndpointReference>
  <wsa:Address>xs:anyURI - PGI SERVICE + besinstance?GUID</wsa:Address>
  <wsa:ReferenceParameters>xs:any* - GUID </wsa:ReferenceParameters> ?
</wsa:EndpointReference>

Example:
<wsa:EndpointReference>
  <wsa:Address>A-REX service uri + GUID</wsa:Address>
  <wsa:ReferenceParameters>GUID </wsa:ReferenceParameters> ?
  ********
  opaque to our clients - but should not be too big - limit the use of this somehow?!
  ***
</wsa:EndpointReference>

ChangeActivityStatus needs an EPR[] as input in this context.

What you gain with EPR:

  1. Tooling support
  2. It's standard way of doing it
  3. Extensions for other non-PGI specs
  4. a lot of other infrastructures put information into the EPR
  5. to be interoperable with these infrastructures a minimalistic EPR helps (which is not a performance problem)
  6. Agreeing to EPR would broaden the range of supporting clients
  7. Scientists use one single client with two services in two Grids

Problems with EPRs: TBD:by ARC and gLite

  1. Ok with gLite with EPR and generated server id
  2. Etienne is afraid that EPRs are suitable for Services, NOT for Jobs (see chapter 3.D below)

Inside an Activity EPR, the Service MUST be the Service which is managing this Activity.

Agreed:

  1. ActivityID should define the address of the managing service of the corresponding activity
  2. JobID = ActivityID
  3. GUID = unique number generated by the service

ActivityID -> GUID + URI of service

To be discussed:

  1. What the structure of an ActivityID is open for discussion
  2. This might be an EPR...
  3. ... or just url-rewriting

Suggestions from Etienne, which should be quite easy to accept or reject :

3.A) PGI Terminology

  • Identifier with Global Uniqueness = Anything which uniquely identifies something on a global level.   It MAY be implemented by GUID (128 bits), and MAY also be implemented by other means.
  • UUID = Universally Unique Identifier, as defined by RFC 4122 (IETF). This is a specification for a 128-bits Identifier with Global Uniqueness (represented as 32 hexadecimal characters), often implemented as GUID.
  • Job     = Activity, therefore
    Job ID = Activity ID   (It must be an Identifier with Global Uniqueness)
  • Client = Job Submitter
  • Job migration = Job delegation

3.B) Local and Global Uniqueness

  1. As agreed before, IDs generated by a client can NOT be trusted to be unique at all.
  2. IDs generated by a server are trusted :
    - to be unique for this server,
    - but NOT to be globally unique (among all servers)
  3. If we really want an ID to be globally unique, we need something like a GUID.

3.C) Human traceability of Jobs

    In case of problems, the Job ID MUST be suitable for manual transfer by a human. Therefore, the Job ID MUST :
  1. contain only 7-bits non-blank printable characters (such as MIME64 or GUID), or 7-bits XML with non-meaningful blanks and newlines.
  2. NOT be too long.  I suggest to specify a maximum length of 1000 chars.

3.D) EPRs are suitable for Services, NOT for Jobs

  • As far as I know, an EPR is an URL of a Service to which a Client MAY submit requests.
  • Therefore, anything pointed by an EPR is a Service instance which MUST be able to receive requests from Clients.
  • A Job is NOT a Service, but an Activity managed by an Execution Service. Therefore, the Job itself does NOT receive requests from Clients, and the Job ID MUST NOT be an EndPoint.
  • In particular, PGI can decide that a Job ID MAY have an XML syntax, or PGI can decide that a Job ID MUST have an XML syntax, but in any case, the Job ID MUST NOT begin with <wsa:EndpointReference>.

3.E) Execution Service responsible for the management of a particular Job

  • Specification says:   'service creates an instance of each activity that is identified by a global unique ID assigned by the service'
  • As soon as a Job is created, there is an Execution Service responsible for the management of this Job.
  • This responsible Execution Service MAY delegate the execution of the Job to a second Execution Service (and so on ...).
  • The responsible Execution Service MAY return to the Client some information about the second Execution Service, and the Client MAY be able to use this information.
  • But the Client MAY also be unable to use this information.
  • Therefore, the responsible Execution Service MUST stay wholly responsible for the Job during the Job lifetime.

3.F) Endpoint(s) of Execution Service(s)

    At the beginning, a Client has picked a (hopefully) suitable EndPoint of an Execution Service from a GLUE-based Information Service, and has submitted a vector of Job descriptions to this EndPoint.

3.F.A) If the Execution Service receiving a vector of Job descriptions is designed so that the Client has to submit ALL subsequent requests to the ORIGINAL Endpoint, then :

  1. The Client does NOT need to extract any Endpoint from the Job ID, and
  2. The Job ID can be completely opaque (such as GUID).

3.F.B) If the Execution Service receiving a vector of Job descriptions is NOT designed so that the Client has to submit ALL subsequent requests to the ORIGINAL Endpoint, then :

  1. Each Job ID must be built so that the Client easily extracts the EndPoint of the Execution Service responsible for the Job,
  2. This EndPoint must stay the SAME for the whole Job lifetime, as explained in chapter 3.E above.
  3. PGI has to specify a limited number of Job ID syntaxes, which MUST be understood by ALL Clients. For example :
    - URL + '?' + GUID
    - <pgi:jobid>
          <wsa:Address>xs:anyURI - PGI SERVICE URL + '?' + GUID
          </wsa:Address>
          <wsa:ReferenceParameters>xs:any* - GUID
          </wsa:ReferenceParameters> ?
        </pgi:jobid>

3.G) Instances and Shares

  • A Client MUST NOT have to bother with Instances and Shares of an Execution Service.
  • The Execution Service MUST store any information about Instances, Shares, ... inside the Job ID in such a way that :
    A Client willing to issue a request for the Job just has to send the Request containing the whole Job ID to the URL of the responsible Execution Service as explained in chapter 3.E above.

Analysis:

  1. How do you address a remote execution service in order to submit a job (1 Hop, execution service address).
  2. What do you get back (list - all, teminilogy not, but need examples) - numbers, uris, eprs, etc. defined as the 'entity'
  3. Consider re-use the 'entity' do something remotely with the execution service... (e.g. cancelactivity/canceljob)

gLite, UNICORE, GENESIS, ARC, ...


Q4)  Finalize the holdpoint specification within job description



Q5)  Decide how/if state changes are tightly coupled to operations, is it allowed that an operation aggregates multiple state changes?



Q6)  Elaborate on how data staging can be monitored via the state model.



Q7)  Decide whether some kind of "LEASE" feature is to be introduced within Execution portType.




Comments and questions by Bernd Schuller; mail dated 14 January 2010

After reading through the current draft 0.38 from http://forge.gridforum.org/sf/go/doc15839?nav=1 and listening to a presentation by Morris, I want to make a few comments.
I'll try to focus on compute functionality, to keep this mail reasonably short.
Overall I think the PGI looks very promising and I really appreciate your hard work! Having been present in the UMD/EMI project preparation I know exactly how hard it can be ;)

Q8)  Requirements Doc

The requirements doc mentioned in the introduction is not accessible for lesser mortals, on https://forge.gridforum.org/sf/go/doc15590 I get "permission denied".
Maybe you could copy it to the pgi-wg area?

Q9)  CreateActivity

Since the validation steps can take some time, it is impractical to wait for these steps to finish before assigning activity IDs and returning the response.
  • Clients or intermediaries will run into timeouts.
  • The system should create the activities immediately, and assign them a state like "new" or "validating".
  • IMO every remote operation that can take more than a couple of seconds to generate the response should be made asynchronous.
  • Just think of held locks and shared resources like DB connections together with concurrent access by many clients... we've been there with UNICORE and have been forced to keep web service processing times as low as possible.

Q10)  Change activity state

I don't really see a reason for all this generic stuff.
  • In reality you want to start, abort, hold, resume etc the processing of an activity, so why not make this more explicit.
  • A compromise might be to do something like requestActivityStateChange("Hold"), etc, and define the mandatory list of "target states" supported by this operation.

Q11)  Cancel activity / Wipe

Isn't this a special case of "Change activity state" ?

Q12)  Delegation port type

Nice idea. However you should support also SAML assertions here (proxy certs are so 1995!)

Q13)  Automatic Re-Submission

In Section 5.1.2 What does "automatic resubmission" mean?
  • Resubmission to the batch system?
  • Or do you possibly see the PGI execution service as something "above" a normal execution service (like e.g. a gLite WMS?).
Section 5.1.7 seems to support this view.
IMO resubmitting a failed job to the batch system makes no sense, it will probably just fail again ;) So what is the idea?

Q14)  "Delegated" state (Section 5.1.4)

  • Allowing to delegate to an off-site execution service (like a different Grid middleware) adds complexity and messes up a lot of things, like credential delegation, state, working directory access, etc etc.
  • Should "PGI execute" not focus on a simple, practical service for job execution?
  • This forwarding business seems to be quite out of scope…
  • How shall manual data staging be done if the session directory is off-site?
  • In the intro it lists "request routing" as a requirement, but I'd reconsider that.

Q15)  "Output sandbox"

  • I'd try to avoid gLite specific terms :)
  • Maybe the "directory containing the output files produced by the job".
  • At least define the term "output sandbox" somewhere.

Q16)  JSDL Considerations

I fully support Steven's statements regarding the reuse of JSDL.
In some places you duplicate parts that already exist in JSDL and JSDL-POSIX, sometimes with less functionality.

Some examples:

  • 7.2 executable name, path, arguments. This can be done by a JSDL-Posix element, which covers even more, such as environment, stdout/err/in.
  • 7.3.1.4 UserTag can be replaced by JSDL JobAnnotation
  • 7.3.6.2 Input,output,error,environment -> JSDL-Posix

IMO JSDL-Posix (possibly with extensions) can be used in all places where you need to directly specify the execution of a process.
Similarly the normal Application (ApplicationName, ApplicationVersion) (again possibly with extensions) can be used to define execution of a pre-installed software.

Q17)  Other JSDL Issues

7.3.2.9 LogDir :
  • In the interest of interoperability, I'd assume that the internals of how a middleware stores its "grid-specific diagnostics“ is irrelevant to the job description.
  • E.g. UNICORE would store this in a database, not in a directory on the execution system.

Q18)  Which elements must be supported?

In general it is not clear to me which of these elements MUST be supported by a PGI implementation.

Q19)  Start time

7.3.2.14 Start time. This is reservation functionality which opens a new can of worms :-)
  • What happens if the RMS does not support this, or the request cannot be granted?
  • If you want to support reservation, you need to reflect this in the state model and in the possible errors a user might get.
  • Also reservation is not listed as a requirement in the Introduction.

Q20)  Notifications

7.3.2.15 Notifications
This should not be "custom format" but "comma separated list of e-mail addresses"

Summary

  • Summarizing: I like the port types and the basic data and execution model, also data staging and credential delegation looks good.
  • You should re-consider the job description part and clearly identify the minimal set that has to be supported by every compliant implementation.
  • Also I'd try to keep all implementation-specific behaviour out of the spec, like where logs are stored and what is purged by a "purge“ operation.
  • What is important is the behaviour and session directory access that a user can expect of any PGI service in each activity state (maybe a table would be helpful).




Comments and questions from Steven Newhouse, mail dated 18 December 2009

Q21)  ChangeActivityState

This reminded me of discussions we had in the late stages of BES.
  • A similar operation was moved in & out of the draft before finally being removed.
  • There was discomfort as to the amount of influence an end-user agent could have on the underlying state machine.
  • A user could not force a job to execute if it was queued if there were not processes (Mri processors/cores?) available.
With the expanded state model that is now proposed there is more scope for user control - fine.
  • You have the notion of legal and illegal state change requests but I could not see these documented on the state diagram.
  • I would certainly propose changing this to 'RequestActivityStateChange' to make it clear that this is an initiation and that it make take some time for the underlying state engine to change.
You effectively acknowledge this in your estimated time to change return value. I can see it being useful to have a 'this has already been done' option in the return body.
  • I'm not sure how useful any value other than zero effectively is here.
  • To me you are effectively saying this has either been done or you need to set up a notification or a poll to check back to see when it has completed.
The estimate itself does not seem to be that valuable...

Q22)  Partitioning

The three port types all seem sensibly scoped.
I have great concerns about the AGU-JSDL section.
  • I'm not sure if this is just a place holder at the moment.
  • I would be VERY concerned if this AGU-JSDL forks away from the JSDL activity.
  • JSDL has gained considerable traction and is being further developed in the JSDL 2.0 discussions.
  • I would have thought PGI would be better placed supporting a MINIMAL extension to JSDL to support its immediate needs, and feeding the other improvements it could see being needed into the broader JSDL discussions.

Q23)  Delegation & Information PortTypes

  • This must have broader use than just beyond PGI.
  • If there are opportunities to do this why not put these into separate specifications even if they are still done through the PGI-WG.

Q24)  The routing behind OGSA-BES - the figure!





Resolved Issues

Here'll come the resolved questions with their resolution.

 




The Open Grid Forum Contact Webmaster | Report a problem | GridForge Help
This is a static archive of the previous Open Grid Forum GridForge content management system saved from host forge.ogf.org file /sf/wiki/do/viewPage/projects.pgi-wg/wiki/ExecutionServicePoints at Thu, 03 Nov 2022 00:04:36 GMT