SourceForge : Post

Project: Editor Discussion > REC: JSDL Spec v1.0 > Fitting JSDL to GRAM > List of Posts

Forum Topic - Fitting JSDL to GRAM: (10 Items)

View: as

08/11/2005 1:17 PM

post4653

Fitting JSDL to GRAM

In trying to layout a hypothetical GRAM job description in JSDL, I've come up with some questions and comments that are
based on our requirements:

1) The Resources element is inadequate for our future plans of providing better support for heterogenous clusters (i.e.
clusters that have different types of nodes). At very least the cardinality restriction is problematic.

2) The DataStaging elements don't support situations where both the target and the source are external to the execution
system. It's conceivable that there is some behind the scenes mechanism whereby data is retrieved from or pushed out to
a seperate system. Unless this system is mounted as a shared file system to the execution system, staging to or from
that system appears to be impossible as a standard feature.

3) The spec indicates that a FileName path begining with a slash is not allowed, but I found it desirable to specify,
for example, a user-specific scratch FileSystem (i.e. a base scratch directory with a user-specific sub-directory) and
then use '/' as a file name to copy an entire directory into that space. In general, can't a begining slash simply mean
the root of the associated FileSystem? Also, why can't an absolute path with no FileSystemName simply be with respect
to the default FS? This seems like a sane default that would save people annoying repetition.

4) I find being able to include both the Target and Source elements in a DataStaging element to be nonsensical. If you
already have the file to stage in, why do you need to stage it back out to the exact place you sent it from? It seems
to me that the staging directives would be clearer if at very least the Target and Source elements were mutually
exclusive.

5) The mandatory CreationFlag seems inappropriate since it doesn't really make sense with directory staging. Why not
spec out a default value and have this element optional? This also doesn't make sense if there is a stand alone file
deletion DataStaging element (i.e. clean up other files that were generated but not staged in).

6) In the Resources element, why is there a CandidateHosts section with nothing but a list of HostName elements when
there can be many FileSystem elements? It seems abit inconsistent. Is there a reason for the different approaches?

7) This last comment is sort of a "spur on debate" comment. It's impossible to extend any of the comples types in JSDL
through the XSD extension mechanism since there's no way to distinguish between elements extended through the XSD
extension mechanism and ones that are part of the any element. GRAM solved this by have an explicit extensions element,
inside of which had an any. The disadvantage to this is that the elements in the any are not "equal" to the elements
defined in the JSDL schema. So there are pros and cons of each. Just curious if anybody had thought about this issue.

None

09/20/2005 9:13 AM

post4654

Extension question

We have had what you have suggested before in the schema and decided to move it out into how it is currently implemented
. The xsd:any is at the same level as the element that is being extended. We have decided to leave it as is.

-- The JSDL Team

Peter Lane

08/30/2005 1:12 PM

post4655

Clarification of #7

Say I want to extend JobDescription_Type so that I can specify requirements as per #1 since the existing Resources 
element is not adequate for my own purposes.  I want to extend it this way so that I can validate my extensions against 
a schema instead of risking typos when using xsd:any.  Thus I would like it to look like the following:

<xsd:complexType name="ExtendedJobDescription_Type">
        <xsd:sequence>
            <xsd:element ref="jsdl:JobIdentification" minOccurs="0"/>
            <xsd:element ref="jsdl:Application" minOccurs="0"/>
            <xsd:element ref="jsdl:Resources" minOccurs="0"/>
            <xsd:element ref="jsdl:DataStaging" minOccurs="0" maxOccurs="unbounded"/>
            <xsd:element ref="gram:HeterogeneousResources" minOccurs="0"/>
            <xsd:any namespace="##other" minOccurs="0" maxOccurs="unbounded"/>
        </xsd:sequence>         <xsd:anyAttribute namespace="##other" processContents="lax"/>
    </xsd:complexType>

This is not possible by doing the following:

<xsd:complexType name="ExtendedJobDescription_Type">
  <xsd:extension base="JobDescription_Type">
    <xsd:sequence>
      <xsd:element ref="gram:HeterogeneousResources" minOccurs="0"/>
    </xsd:sequence>
  </extension>
</xsd:complexType>

The problem is that the gram:HeterogeneiousResources element is indistinguishable from the xsd:any element at the end of
 JobDescription_Type, and so the parser doesn't know whether to assign it to the element in the extended type or to the 
any element.  The solution GRAM implements is to have a specific element for extensions.  For example:

<xsd:complexType name="JobDescription_Type">
        <xsd:sequence>
            <xsd:element ref="jsdl:JobIdentification" minOccurs="0"/>
            <xsd:element ref="jsdl:Application" minOccurs="0"/>
            <xsd:element ref="jsdl:Resources" minOccurs="0"/>
            <xsd:element ref="jsdl:DataStaging" minOccurs="0" maxOccurs="unbounded"/>
             <xsd:element name="extensions" minOccurs="0" maxOccurs="1">
                <xsd:any namespace="##other" minOccurs="0" maxOccurs="unbounded"/>
            </xsd:element>
        </xsd:sequence>
        <xsd:anyAttribute namespace="##other" processContents="lax"/>
    </xsd:complexType>

The problem with this is that it makes all of the extended elements "inferior" to the elements defined in the schema 
since they are forced into a specific wrapping sub-element (i.e. it makes it blatantly clear that the extension elements
 are not part of the original schema, and some people don't like this).  The advantage is that you can now extend the 
type if you wish to.

Peter Lane

08/30/2005 12:42 PM

post4656

Response to answer #5

If service semantics are out of scope, I wonder why it's neccessary to remove the option that the service creator has 
for defining a default for the file deletion element.  There are a lot of optional arguments that could be interpreted 
as having undetermined affects on the job execution if they are not specified.  Why make a restriction here especially 
when it is nonsensical for a portion of the staging directives one might compose?  I think if there is a need to have a 
standard default here that some other standard like BES should be allowed to define it.  As it stands now that freedom 
does not exist.

Darren Pulsipher

08/30/2005 10:07 AM

post4657

Items with resolutions

The JSDL team has met and discussed the issues that you have brought up.

1) We have purposfully left out the ability to define heterogenous cluster resource requirements at this time. We see 
this as an extension in the future of JSDL.

2) Resolved by An\'s comments. File Staging can be done using the DataStagining elements with virtual filesystems and no
 application execution.

3) Resolved by  An\'s comments. Abolute paths can be mimiced using the FileSystem as the root.

4) Resolved by An\'s clarification. Mis-understanding of StageIn and StageOut exclusivity. We will look back at 
documentation and example for more clarity.

5) The JSDL spec does not have implicit defaults. Everything is defined or the behavior is undetermined. The same rules 
apply for directory staging as file staging. Secondly the file deletion is not a primary use case and should not be a 
behavior that you depend on. It is more a side effect of file staging.

6) The Candidate Host is a different case than the FileSystem.  You have to provide all of the Filesystems listed while 
you can choose a number, one or more, of the CandidateHosts listed. These are not related at all.

7) We have discussed this and would like you to give us an example of what you mean. 

-- The JSDL team

None

08/30/2005 7:09 AM

post4658

split this into separate threads?

Should these questions and their (partial?) answers be split into separate comment threads?  I'm having trouble figuring
 out what has been resolved.

--Karl Czajkowski

Peter Lane

08/25/2005 7:08 PM

post4659

considerations

2) Actually, since MountPoint is optional, I'm thinking the way to solve this is to simply specify a "virtual" 
FileSystem that is then mapped at the service to a specific base URL.  I.e. any local file path that is with respect to,
 say, a "TRANSFER" FileSystem is converted to, say,  a GridFTP URL by the service and transfered accordingly.

3) That works too.  I guess I'm too used to absolute paths.  ;-)

1) Thanks for the examples, but it still looks impossible to, for example, select 2 x86 machines and 4 OS X machines.  
You either have to select explicitly by HostName or select nodes of only one type.  What I need to be able to do is 
select specific quantities of multiple types of nodes.

An Ly

08/24/2005 10:23 PM

post4660

considerations

2) The "stage out" operation for data staging is defined to move local (or more accurately, locally available) files/
dirs from the execution system to the target, not from any abitrary URI. The typical scenario is probably one where the 
file/dir has been created/updated on the system (or again, more accurately, on a file/dir locally available to the 
system) by a job that completed on the same system.

Considering the same scenario, if for some reason the execution of the job can directly produce/modify files to/on these
 arbitrary (non-locally available) locations, then could you possibly skip the "stage out" operation, and somehow 
instead "point" the production/modification directly to the "final" destination?

3) How about:

<FileSystem name="ROOT">
    <MountPoint>/</MountPoint>
</FileSystem>

<DataStaging>
    <FileSystemName>ROOT</FileSystemName>
    <FileName>.</FileName>
    ...
</DataStaging>

1) Here are a few examples for consideration. Suppose we have:

cluster1:

    node1 (x86, Windows NT)
    node2 (x86, Windows XP)
    node3 (SPARC, NetBSD)    

cluster2:

    node4 (x86, NetBSD)
    node5 (SPARC, SUN Solaris)
    node6 (PowerPC, Mac OS X)

For specific hosts:

<Resources>
    <CandidateHosts>
        <HostName>node1</HostName>
        <HostName>node4</HostName>
    </CandidateHosts>
</Resources>

For x86 hosts in cluster1 only:

<Resources>
    <CandidateHosts>
        <HostName>cluster1</HostName>
    </CandidateHosts>
    <CPUArchitecture>
        <CPUArchitectureName>x86</CPUArchitectureName>
    </CPUArchitecture>
</Resources>

For hosts in cluster2 only:

<Resources>
    <CandidateHosts>
        <HostName>cluster2</HostName>
    </CandidateHosts>
</Resources>

For NetBSD hosts in both clusters:

<Resources>
    <CandidateHosts>
        <HostName>cluster1</HostName>
        <HostName>cluster2</HostName>
    </CandidateHosts>
    <OperatingSystem>
        <OperatingSystemType>
            <OperatingSystemName>NetBSD</OperatingSystemName>
        </OperatingSystemType>
    </OperatingSystem>
</Resources>

For any SPARC hosts (possibly including hosts known to the system that are outside of the clusters):

<Resources>
    <CPUArchitecture>
        <CPUArchitectureName>sparc</CPUArchitectureName>
    </CPUArchitecture>
</Resources>

For any host:

<Resources/>

Peter Lane

08/23/2005 1:22 PM

post4661

considerations

2) The problem is that Source and Target are not taken together but as part of two separate staging operations.  So 
while I acknowledge that Source and Target can take URIs, this doesn't help the situation where I want to, for example, 
stage out a file from one URI to some other URI.  That said, I can always use extensions, but it seems redoing something
 because it isn't flexible enough is bad and isn't really an "extension" so much as it is a hack to get it to work right
.

3) Right, I read this.  And in 6.5.2.1 it says FileName cannot start with a '/', so do I assume that <FileName></
FileName> (i.e. element specified with no text) with no FileSystemName element would essentially be equivalent to a unix
 path of "/"?  i'm simply trying to figure out how to indicate the root directory of the file system whether default or 
specified in FileSystemName.

4) Right, this makes sense.  I'm not sure what I was thinking anymore for this one.

An Ly

08/22/2005 7:31 PM

post4662

considerations

Here are some considerations for the points that you've brought up:

2) The Source and Target elements may use a URI to point to a location that is external to the system. Please refer to 
examples in 6.5.6.6 and 6.5.8.6. It is also possible to use extensions under the FileSystem or DataStaging element to 
specify an external location referencing scheme for the underlying system. See 6.4.4.5 and 6.5.1.5.

3) If no FileSystemName is defined, then the FileName is relative to the WorkingDirectory, if specified. Otherwise, the 
system will determine the base location. See 6.5.3.1.

4) If both Source and Target are specified, there is no requirement they must have the same value. It is quite possible 
that a desired "stage in" location is entirely different from a "stage out" location. Source and Target are not 
mandatory either.

5) Will have to check the JSDL-WG mailing list archives for the discussions on CreationFlag. IIRC, it may have concluded
 in trying to reduce ambiguity (where it actually applies, I suppose).

Return