SourceForge : View Wiki Page: GLUE20CommentsOnConceptualModel

Search Wiki Pages Project: GLUE Wiki > GLUE20CommentsOnConceptualModel > View Wiki Page

wiki2055: GLUE20CommentsOnConceptualModel

Items marked as DONE are included in a new GLUE draft. The others are in the queue.

C.1(DONE):

Improve and consolidate specification of DNs in the public comment version of GLUE 2.0, two kinds of DNs (Distinguished Names) with different delimiters are specified.

Section 16.3.8 defines as DNs: "X509 uses a X500 namespace represented as several Relative Domain-Names (RDNs) concatenated by forward-slashes". A slash-separated DN notation is also used in the examples throughout the document. I was not able to find such a definition in the X509 spec. As X509 stay rather general, are you sure it implements a forward-slash notation ?

Section 17.4., in contrast, defines a DataType DN_T as a RFC 4515 Distinguished name. RFC 4515 says "There is zero or more relative distinguished names, separated by, for a distinguished name."

I propose to either - specify both delimiters, fix the X509 citation and state clearly in which cases which notation is to be used, or - decide for the RFC4515 notation (comma separated), which seems to be (better) standardized and rewrite the examples.

Also at the beginning of section 16.3.8, the sentence "It must start ..." (state ?) should be improved.

A: the referenced RFC is actually RFC 4514; we choose to rely on IETF RFC4514; only one format to easy the comparison

C.2(DONE):

Hetergeneous systems: The current schema doesn't seem to include any way in which a system can be hetergeneous, i.e. with different types of worker node. We have two clusters which are arranged like this. The other point is that at the moment it appears rather messy to cover multiple different queues within a single compute resource.

A: clarified during OGF24 that we already address heterogeneity

C.3(DONE):

Relationship between User Domain and Admin Domain: In the main entities diagram, I would like to suggest a relationship between the User Domain and Admin Domain. This relationship is of the form of a Service Level agreement.

A: talked to Laurence, we do not address this issue in the current version of the document

C.4(DONE):

Section 7: I think the first sentence referring to "specializations" could be reworded - it's ambiguous for me what the word means in this context.

A: asked for clarification to Matt, explained the meaning; no need anymore for change

C.5(DONE):

Section 17.23 AppEnvState_t: For the pending removal description, isn't "is due to be removed" more appropriate than "as soon as possible"? I think perhaps giving indications of time is out of context here.

A: ACCEPT the suggested change: "is due to be removed"

C.6(DONE):

Extending the Entities with OtherInfo: OtherInfo in form of String is a possibility for particular implementations easily extend the information of Entity. But unfortunately not all of Entities have that Field.

I would suggest to include in all Entities the field: OtherInfo - as it is in some of them in form of placeholder to publish info that does not fit in any other attribute. Free-form string, comma-separated tags, (name, value ) pair are all examples of valid syntax.

A: ACCEPT: we move otherInfo in the Entity class and we remove from all other classes

Miscellanous observations

First, thank you very much for your work :

Our EDGeS project will have to publish information on entities of 'Desktop Grids' (Grids of Computer Scavenging), where resources are volatile, and GLUE schema 2.0 seems flexible enough and have adequate place-holders for unknown data.

C.7(DONE):

Chapter 5.3 Contact I suggest to add a 'Name' (String, 0..1) property : Even if this property will not always contain the name of a real person, it can contain the detailed name of a responsibility.

A: the role is considered by the Contact.Type attribute; for the Name of a real person, we suggest to use the OtherInfo attribute or Extension class because the use case is not considered to be relevant by other participants

C.8(DONE):

Chapter 5.6 Endpoint Attribute 'TrustedCA' : Typo in the description : 'issues' --> 'issued'

A: ACCEPT

C.9(DONE): Chapter 5.11 Policy - Attribute 'Rule' : Typo in the description : 'is provide' --> 'is provided'

A: ACCEPT

C.10(DONE):

- Last paragraph : 'then these policy instances SHOULD be expected to be consumed independently' : Could you be more explicit ?

A: Sergio will propose a more clear description

C.11(DONE):

Chapter 6.1 ComputingService End of the second paragraph : 'as part of the computing service' : I suggest to add 'same' before 'computing service'.

A: ACCEPT

C.12(DONE): Chapter 6.6 ExecutionEnvironment Entity 'ExecutionEnvironment' : Typo in the description : 'envonrment' --> 'environment'

A: ACCEPT

C.13(DONE)

Chapter 6.10 ToStorageService Entity 'ToStorageService' : This entity is not at all symmetrical to the 'ToComputingService' entity. In order to avoid confusion, I therefore suggest to rename 'ToStorageService' as 'ToPosixStorageService'.

A: we prefer to keep the name as it is; adding the term POSIX does not clarify that much; the name strategy is to enforce the fact that it is a directed relationship to a service and the meaning of the association should be gathered by the definition

C.14(DONE)

Chapter 7. Conceptual Model of the Storage Service Throughout this chapter, you use both words 'capacity' and 'extent' as if they are synonyms. - If yes, please state it clearly at the beginning of the chapter. - If not, please explain the difference.

A: action on Sergio

C.15(DONE)

Chapter 8. Relationship to OGF Reference Model - Can you provide more explanations : For example, what is the meaning of the arrow between 'Entity' and 'GridComponent' ? - Is it possible for you to show examples ?

A: Hiro suggested to put a reference to OGF ref model doc; Sergio to add few words more to explain inheritance

C.16(DONE)

Chapter 9. Security Considerations I suggest to write, at least, that concrete data models must ensure availability and reliability of the published data. Therefore : - Resiliency to DoS attacks is mandatory, - Resiliency to intrusion and counterfeits is mandatory, - Dynamic redundancy can help.

A: concerns which are more pertinent: authenticity, accuracy, privacy, ...; Paul to send a draft paragraph

C.17(DONE)

Chapter 17. Appendix B: Data Types Is it possible to sort the enumeration types alphabetically ? That would permit a reader in a hurry to find an enumeration type quicker.

A: ACCEPT

C.18(DONE)

Chapter 17.5 Capability_t Value 'executionmanagement.candidatesetgenerator' : Typos in the description : 'a nit of workcan' --> 'a unit of work can'

A: ACCEPT

C.19(DONE)

Chapter 17.9 EndpointHealthState_t I would suggest to add following value : 'compromised' : It was possible to check that there are security issues

A: we consider this to be covered by critical, for more details, we suggest to use the HealthStateInfo

C.20(DONE)

Chapter 17.19 Platform_t I suggest to be consistent with 'draft-ggf-jsdl-spec-28.doc' describing JSDL, and to use the values listed in Table 5-2 'Processor Architectures' of Chapter 5.2.1 of the JSDL document. http://www.ogf.org/documents/GFD.136.pdf

A: keep our list, email JSDL group to advice that x86_64 was superseed b amd64, also IA64 was superseded by Itanium

C.21(DONE)

Chapter 17.21 OSFamily_t I suggest to be consistent with 'draft-ggf-jsdl-spec-28.doc' describing JSDL, and to use the values listed in Table 5-4 'Operating System Types' of Chapter 5.2.3 of the JSDL document.

A: what we defined was driven by our use cases; we don't find the JSDL enumeration useful for our working scenario

C.22(DONE):

Chapter 5.10 Activity - There are 2 association ends named 'Activity.Id' In chapter 5.10 'Activity', there are 2 association ends named 'Activity.Id'. In order to avoid ambiguity, I suggest to rename them :

One as 'ReferredActivity.Id'
One as 'ReferredByActivity.Id'

A: the association end is identified by className.attributeName, therefore the naming is fine; the association is bi-directional, therefore there is no need to express referring and referred

C.23(DONE) ipService/ipHost/ipPort instead of endpoint? The document assumes that all services are accessible via a URI - this leads to us 'making up' URI schemes for services to just express their ip hostname/port/service e.g. lfc://host:port

RFC 2307 (http://www.ietf.org/rfc/rfc2307.txt) gives a standard mapping for entities related to TCP/IP networking and services (e.g. hostname/port/service combinations) to LDAP. Perhaps a similar model could be used in GLUE 2.0?

This mapping is already used by some common services for dynamic lookup, e.g. Apache ActiveMQ :

A: we consider the URI flexible and extensible enough; add motivations for URI-based ID of endpoints; James to send more info

C.24(DONE)

ComputingActivity and Usage Records: The information in a ComputingActivity seems to have overlap with that in the Usage Records. Perhaps an effort should be made to standardize between these two efforts

A: we are already aware of this; there was a contributed comparison: http://forge.ogf.org/sf/docman/do/viewDocument/projects.glue-wg/docman.root.background/doc15252/

Not equivalent concept in GLUE 2.0 and not needed
- Record Identity
- Record Create Time
- Process Identity

Mapping to be verified
- UR.JobName-> ComputingActivity.RequestedApplicationEnvironment
  - should map to ComputingActivity.Name
- UR.State -> ComputingActivity.State (what about values, do they go into profile?)
  - in UR, the state is the final state, while we track states during the whole life of the job

Missing in GLUE 2.0
- UR.Charge
  - no mapping in GLUE 2.0 and no change for now; if needed, people should use extensibility to make experience
- UR.Network
  - no mapping in GLUE 2.0 and no change for now; if needed, people should use extensibility to make experience
- UR.Processors
  - it should map to RequestedSlots, there should be more investigation on comparing UR.Processor definition with GLUE2.Slot definition

Thanks to Michael Parkin for the useful analysis

C.25(DONE)

typographical errors Table of contents: All of the page numbers are incorrect. Sec. 6.3, p21, MinWallTime entry: "than" -> "then". Sec. 6.4, p24, CacheTotal and CacheFree entries: "consequent" -> "subsequent".

A: OK

C.26(DONE)

disk space types and sizes There is a general need for users to be able to specify the minimum amount of free disk space for a job. For more complicated jobs, this requirement also entails specifying the amount of free disk space "locally" (e.g. on the worker node itself) and the amount of free disk space on a "shared" area, particularly for parallel jobs. This draft moves closer to satisfying these requirements and is appreciated. However, I do think that some changes are needed to satisfy fully those requirements.

In Sec. 6.4 for the ComputingManager entity, I have a couple comments related to the WorkingArea* attributes:

1) On my site, we typically give normal and MPI jobs different current working directories (WorkingAreas). For the normal jobs, we create a temporary area in /var for the job. This area is removed at the end of the job. For MPI jobs, we set the working directory to the shared home directory so that all of the processes in the MPI job see the same files. For this case, are we expected to publish multiple ComputingManager entities? Probably this is really a question on whether this information is placed correctly in the overall model.

2) Actually for all jobs both the temporary area in /var and the shared home are always visible. We set the current working directory as appropriate; however, a job technically has access to both of these areas. I imagine that there may be jobs (esp. parallel ones) that would like to take advantage of both. Perhaps you should consider being able to publish multiple "WorkingAreas" for a ComputingManager with a mechanism to identify the default for a particular job.

A: using multiple computing manager instances is not the right solution; we refined the attributes to address the above comments

C.27(DONE)

The "Cache*" attributes imply that some caching mechanism will be made available to users. However, this brings lots of questions about who the cache is open to, who manages contention for the cache between multiple jobs, how the existance of files in the cache are published to users, etc. These two attributes provide too little information for end-users to know what to do and I question whether including them is useful.

A: we improved definitions in order to clarify the goal

C.28

For the "ScratchDir" attribute, the description says this is a shared area. However, the extent of that sharing isn't clear. Is it shared between all grid jobs (from the same user) or all processes in a job? For a normal job, would a dedicated area on the local disk count as a "ScratchDir"? It is important to clearly distinguish the "TmpDir", "ScratchDir", and "ApplicationDir" properties so that system administrators publish something consistent and these values are meaningful to users.

For the three attributes "TmpDir", "ScratchDir", and "ApplicationDir", the values are described as absolute paths or paths. However, all of these are very likely to be different depending on the user running a job. Are environmental variables allowed to be published for these values? If not, in cases where a unique value cannot be given, what should be published?

The "*Dir" attributes indicate that at least three different types of storage can potentially be made available to users. There are certainly applications that will want to take advantage of the differences between those storage resources. However, this also means that users will want to specify how much space they need in those various areas. Currently, there are no attributes to actually indicate how much space is available (to a job) in those areas. Having attributes that indicate the existance of these areas without giving parameters such as the size is probably not terribly helpful to users.

A: ask to JP about their view/requirements

C.29(DONE) The last comment raises a similar problem with Sec. 6.3 "Computing Share". There is only one attribute ("MaxDiskSpace") to specify the maximum disk space policy. If multiple types of storage areas are advertised (as in the ComputingManager), then the policy should also contain attributes corresponding to each type.

In addition, the description of MaxDiskSpace implies that the value corresponds to the "WorkingArea" of the ComputingManager. If that is the case, then explicitly saying "WorkingArea" would make it clear what the limit applies to.

A: clarified relationship between MaxDiskSpace and WorkingArea

NB: JSDL has a DiskSpace param, to evaluate to what GLUE param it compares

C.30(DONE)

WallTime and CPUTime specifications Consistently throughout the specification WallTime and CPUTime attributes are specified per slot. However, this is likely to make it very difficult to publish reliable values for queue limits when parallel jobs are permitted on a site.

Many batch systems enforce overall wall times and total CPU times; hence, those are the values that a system administrator will set when configuring her batch system. These will correspond to the "MaxTotal*Time" attributes (Sec. 6.3 Computing Share). However, what will she publish for the "Max*Time" attributes? If only normal jobs are accepted there is no problem; they are the same as "MaxTotal*Time". But if the site accepts parallel jobs with up to 100 slots, what is the correct value to publish for "Max*Time"? The actual limit depends on the number of slots requested; a single correct number cannot be published.

No doubt one can configure the batch system for per slot limits, but this is certainly not the usual case nor the most straight-forward. I suspect that there will be many sites that publish incorrect values for the "per slot" attributes diminishing the utility of these values for scheduling.

The "per slot" values are also likely to be much less interesting for users as well. The typical case is that one has an parallelized application and one increases the number of CPUs to find the most efficient scale of the application. With the "per slot" values, the user must recalculate the CPU and Wall limits everytime she resubmits the job with a different number of slots. However, the total CPU consumed is approximately the same and the wall clock time diminishes. This means that it would be much more convenient to specify the wall and total CPU limits once and only have to change one parameter in the job description.

Overall, I would suggest revisiting the decision to use "per slot" values. For users and for system administrators the "total" values are likely to be more consistent and useful.

A: MaxTotalWallTime: suggestion to change it into a MaxMultiSlotWallTime, which is the max wall time for a multi-slot job counted considering a job as a whole (no sum of single slot wall time)

C.31

serviceType_t

The second elements of the proposed serviceTypes in chapter 17.6 are not consistent with the definition of the second element in the same chapter (which was defined as middleware name).

There should be a clearly defined distinction in that type tree between middleware and grid organisations to avoid the usage of different strings for the same service by different grids.

For globus, e.g. org.globus.ws-gram could possibly be better than org.teragrid.ws-gram because the former notation would be consistent with the definition of org.glite.wms

It could also be an improvement to define the second element as the name of the service implementation provider (like globus-alliance, EGEE, etc.)

A: to ask for clarification to JP and Balazs

C.32(PARTIALLY DONE; MISSING EXAMPLES AND REVISION ON ENUM)

ComputingService / serviceType_t Given that the computing service represents the abstract functionality of a system then the serviceType_t as defined in the document doesn't make sense. It must be compute service or data service or something alike.

A: change definition of Service_t to be namespace-based classification; namespace can be middleware name, organization, or something else; reserve namespace org.glue, org.ogf; revise values accordingly; add examples for storage services

C.33(DONE)

ExecutionEnvironment It is not obvious from the definition/description of the ExecutionEnvironment what the definition of Instances is. That is is the number of homogeneous nodes in a cluster which can be requested by a ComputeManager. The concept allows moddeling heterogeneous clusters - it needed some discussion before it was understood.

A: in the introduction of the ExecEnv class, explained how can be used for heterogeneous cluster modeling

C.34(DONE) number of jobs per middleware We were looking for a way to include the number of total/running/waitung jobs submitted by a specific middleware. Could the ComputingEndpoint be extended by these values?

A: we agree to add the following attributes in the computingEndpoint: TotalJobs, RunningJobs, WaitingJobs, StagingJobs, SuspendedJobs, PreLRMSWaitingJobs they are optional; to clarify in the text that they could be not always meaningful (e.g., stateless computing endpoint) or easy to measure

C.35(DONE) There are quite a lot of typos to eliminate - and even words that a spell checker should pick up.

A: we'll double check

C.36(DONE) In the introduction you refer to a conceptual model but then in section 3 you refer to an information provider - this does not sound conceptual.

A: "In Appendix 16, we provide guidelines for place-holder values that MUST be used when the attributes have no good default value or when the information provider is unable to obtain a dynamic value." can be changed to "In Appendix A, we provide guidelines for place-holder values that MUST be used when the attributes have no good default value or when the attribute cannot be measured for some reason."

C.37(DONE) The last paragraph of section does not add anything. Are you allowed to add and delete from GLUE and still call it GLUE?

A: add at the end of last paragraph the following sentece: "Such extensions MUST NOT be considered part of the GLUE specification, nevertheless we RECOMMEND submitting them to the GLUE WG for consideration."

C.38 In chapter 5, the terminology gets confused. You use the term entity (as in ER I presume) where you have previously used class and you also have an entity called "entity". I would change, for example, at the top of page 7 to read "This entity is the root class from which all the GLUE classes inherit ...". Your UML diagram then has classes and one class is called Entity.

A: remove all terms "entities" when used as common term and not referring to the GLUE Entity class Stephen: in general, it is difficult to remove all common terms which are also class names; suggestion to adopt a style for making explicit when the word should be interpreted as class name vs. common term

C.39(DONE) Entity - creation time - is this the creation time of this description of the entity or of its real world counterpart? In either case it cannot be optional - if we ignore theological debates - everything was created at some time. This is supposed to be a conceptual model as it says in the introduction.

A: we address this issue by clarify what we mean by multiplicity; in Section 4, we have "The second part refers to the properties of the class; for each of them, the following characteristics are described: the property name, the data type, the multiplicity concerning how many values are allowed (* means zero or more), the unit of measurement and a description. For easy of reading, the properties that are inherited from a parent class are also listed." can be rephrased to: "The second part refers to the properties of the class; for each of them, the following characteristics are described: the property name, the data type, the multiplicity concerning how many values are allowed (* means zero or more), the unit of measurement and a description. For easy of reading, the properties that are inherited from a parent class are also listed. As regards the multiplicity, the value of zero means that it is allowed to refrain from publishing a value for the related property even though this can be measured. "

C.40(DONE) I see, reading ahead a bit, in Appendix A that you explain about place holders for unknown data. Again this is not conceptual unless some information is inherently unknowable. All of appendix A should be in the particular bindings. For example if it is relational and you don't know the value then you can use NULL. It might be the responsibility of a specific profile to define what is actually meant by missing information - e.g. "Don't know", "not applicable", "I am not going to tell you" etc.

A: the aim of GLUE is to let people create representations of Grid entities which can be exchanged across multiple Grid infrastructures; therefore we need to also address the case that values cannot be determined for some reason; we need to define how to exchange such information in an interoperable way; Appendix A addresses this issue by defining reserved values which carry this semantics

C.41(DONE) Location - latitude and longitude - this cannot be optional for a geographical location. On the contrary there are many geographical locations that have no human readable name.

A: Location can be associated to a AdminDomain; and AdminDomain can be aggregated, for instance to represent a national Grid infrastructure; therefore we need location entity being able to capture different granularity; if we make the name optional, then just the LocalID is mandatory, which is opaque; we prefer to mandate the presence of a name which can be always defined/found for a location regardless the granularity of the geographical position

Change the description of location from "A geographical position" to "A geographical region where the granularity can vary from an exact position to spanning different countries not necessary connected"

C.42(DONE) AdminDomain - Distributed - I don't like the definition very much. More of a problem is implementing the optional status. Many information systems only have 2 values for booleans.

A: the comment can be rewritten as "how to express that an optional attribute which type is boolean is not used in an instance of the related class"; boolean offers TRUE/FALSE; in GLUE 2.0, at the moment, we have 7 optional attributes which type is boolean; a possibility is to define a default value to be used (to discuss with Paul)

- option 1: we make all boolean mandatory - option 2: delegate to profile how the UNDEFINED can be represented - option 3: have a default to be used when the value can't be retrieved

Solution: to replace base Boolean data type with an ExtendedBoolean = (TRUE, FALSE, UNDEFINED) - ternary logic; first two values map to the base boolean meaning; UNDEFINED is the place-holder to be used as defined in Appendix A (extend the appendix to consider this);

In Appendix A, make explicit that the way not to publish values for optional attributes is defined in the realization document

C.43(DOME) I suggest that you are very clear about what this document is, how it relates to the bindings and whether or not you are going to use profiles to define what information should actually be published and how to interpret missing/null information.

A: we will extend the introduction to state that mapping to concrete data models are in the realization document (other docs with new mappings may appear); also, profiles SHOULD appear to define how to generate/use the information in production scenarios (e.g., a profile can decide that an attribute which is optional in the schema, is considered mandatory in a certain grid infrastructure; another case, optional attributes are never published);

C.44(DONE) RAM per job slot MainMemorySize/CPUMultiplicity could give RAM per job but it still doesn`t allow for the Max RAM of an execution queue to be set. A soft and hard limit is often configured by prudent admins. As I watch ATLAS reco jobs being butchered by various Batch Systems, this is the quantity I`m most interested in.

A: the MaxMemory per job is a batch system policy, therefore, this should go into the ComputingShare; what kind of hard limit on memory we can define in batch systems? On RAM and/or virtual memory? Maarten: we can set limit on virtual memory Can we also have GuaranteedMainMemory?

In ComputingShare we have MaxMemory which partially cover this comment; we refine as follow:

remove MaxMemory
add MaxMainMemory: if the limit is hit, then the LRMS could kill the job
add GuaranteedMainMemory
add MaxVirtualMemory: if the limit is hit, then the LRMS could kill the job
add GuaranteedVirtualMemory

Show Details