SourceForge : artf5887: Queries using WS-Enumeration

Project: RUS-WG Trackers > Doc Change Request > View Artifact

Artifact artf5887 : Queries using WS-Enumeration

Tracker:	Doc Change Request
Title:	Queries using WS-Enumeration
Description:	Notes to slide 9 (add a operation to query using a WS-Enumeration): OGSA-DAI is doing a very similar thing by buffering the result set in memory and allowing clients to return part of it at a time. This does however not help performance since the same amount of data is returned in the end. The main reason for this proposal was not performance but scalability, especially the size of the SOAP messages that are returned for large result sets. Some DBs allow you to use cursors to return results, WS-Enumeration would fit in nicely with that. WS-Enumeration is part of the WS-Transfer stack, in WS-RF you would create a new resource and use that to return the result in smaller packets. Can WS-Enumeration be used outside the WS-Transfer stack? A RUSReplyTooBig fault is a good thing to have, however it can be difficult to use, most implementation just behave very rude when they run out of memory (e.g. just go offline, or reply with authentication/authorization faults). One caution is that this is a very dynamic fault, so even a query with only one record could be a too big if another very large query is under processing. The operation could be tied together with the query operation if you allow for an extra parameter specifying the maximum size of the result and allowing the server to either return the result or a WS-Enumeration in case the result size exceeded the maximum. The important point here is that the server cannot make this decision unilaterally, the client must also be able to specify its limitations.
Submitted By:	Gilbert Netzer
Submitted On:	05/28/2007 4:35 AM EDT
Last Modified:	07/12/2007 11:00 PM EDT

	Status / Comments		Change Log		Associations (2)		Attachments

Status
Group: *
Status:*	Open
Category: *
Customer: *
Priority: *	3
Assigned To: *	None
Reported in Release: *
Fixed in Release: *
Estimated Hours: *	0
Actual Hours: *	0

Comments

Xiaoyu Chen: 07/12/2007 11:01 PM EDT

Comment:

First of all, i would like to clarify and extend requirements on RUS query operation a little bit by classification of RUS operations in more generic 
way. 
The RUS operations can be roughly classifed by two main criteria:
1) Batch processing vs. Atomic Processing
Most of RUS operations are being defined targetting at batch processing on multiple usage record instances. However, this never stops the 
implementation put restrictions on atomic processing on single usage record per transaction. 
2) Synchronous vs. Asynchronous processing
Being conformative to WS-I profile, the RUS operations are essentially stateless and only allow synchronous processing. There are potential problems 
for implementations, in paritcular for querying large amount of usage records/data. Then asynchronous processing is introduced for this porblem 
specifically. 
_But essentially, the potential issue can happen to any RUS operations._ 
for example, the RUS::insertUsageRecords allowing batch insertion of multiple usage records, which could be huge as well. The WS-enumeration 
specification does not do any good for insertion as well as other operations like update and deletion. Luckily there is not really any meaningful 
output for these operations. But the client-side does expect operation results and "RUSRecordIdList" as defined in current spec (version 1.7). We can 
easily add another status code, PENDING, within _RUS:operationResult_ and a session id within _RUSRecordIdList_ element corresponding to each usage 
record sequence as immediate returns for insert, update, deletion operations. Then we might need another operation to get the real execution status 
back to the client, _RUS::getOperationResult_ ?

Then coming back to WS-Enumeration and RUS query operations.
I basically agree on what Gilbert proposed solution by introducing WS-Enumeration operations quoted from Gilbert's message:

>To sum up I want to suggest the following:
>1. Replace the extractUsageRecords operation >with the <wsen:Enumerate> method.
Would it be better to keep _RUS::extractUsageRecords operation (you mean version 1.9? because in version 1.7 is "RUS::extractRUSUsageRecords) for 
synchronous query, while using _wsen:Enumerate_ and _wsen:Pull_ for asynchrous query????

>2. State the servers SHOULD implement the ><wsman:OptimizeEnumeration> extension.
I see the WS-management document, which covers many aspects on related specifications inculding ws-enumeration. I don't whether this document is 
approved or draft. But it would be easier to narrow specification dependencies for RUS specification? since the _RUS::extractUsageRecords_ does work 
on returning small number of usage records. 

>3. State the client MUST understand the result of a 
><wsman:OptimizeEnumeration> extension if they >used this extension in their request.
If leaving ws-management out scope, this point is not useful any more (see comments 2). 

cheers!

Action:

Update

Gilbert Netzer: 07/10/2007 5:13 AM EDT

Comment:

After studying the WS-Enumeration sepcification (http://www.w3.org/Submission/WS-Enumeration/) more closely, I have a few thoughts on how to implement
a form of querying the RUS that can handle large result sets more memory and resource efficient.

To sum up I want to suggest the following:

1. Replace the extractUsageRecords operation with the <wsen:Enumerate>
method.
2. State the servers SHOULD implement the <wsman:OptimizeEnumeration>
extension.
3. State the client MUST understand the result of a
<wsman:OptimizeEnumeration> extension if they used this extension in
their request.

First of all I think we should use WS-Enumeration for accessing a "large" result set in handy pieces. The advantage would be that existing enumeration
implementations (on both server and client side) can be used to do the actual data transfer. Such implementations will probably also show up in
middleware toolkits. For instance the next version of the Globus Toolkit will contain a WS-Enumeration implementation. Besides the advantage for
implementors, this would also mean that we can save us the trouble of specifying our own flavor of enumeration that essentially duplicates the
functionality of an existing specification. The downside is that the WS-Enumeration is not part of WS-RF and therefore duplicates some of the
lifecycle management functionality of the WS-RF (via the Renew, GetStatus, Release and EnumerationEnd operations). I do however not think that this is
a big issue.

In WS-Enumeration the optional Enumerate operation is used to request a new enumeration from the data source, in our case this would be the RUS. In
this request a filter condition can be specified using the <wsen:Filter> element. This element can be of any given dialect, however currently only the
XPath dialect is specified. This basically is equivalent to the specification of the searchTerm currently used by the extraction methods (extract*)
so it could actually replace the extraction methods of the RUS.

One problem with the WS-Enumeration specification is that a client always has to use at least two requests (one Enumerate and one Pull operation) to
get the requested data even if the result contains only a small number of data items. This problem was already addressed by the WS-Management
specification (http://www.dmtf.org/standards/published_documents/DSP0226.pdf) which provides a extension <wsman:OptimizeEnumeration> using the WS-
Enumeration extensiability points to accomplish this in a backward compatible manner. The RUS specification could leverage this to actually accomplish
a method to extract UsageRecord data in a very similar way that to what is currently possible plus add good support for large result sets.

The problem with this approach could be that it makes the RUS specification dependent on two other specifications (WS-Enumeration and parts of WS-
Management), but overall I would still consider this a worthwile approach.

Action:

Update

Gilbert Netzer: 07/05/2007 10:59 AM EDT

Comment:

After having another quick look at the WS-Enumeration to refresh my memory,
I think the following things should be true:

The WS-Enumeration defines all the operations that are needed to get the
results out of a given enumeration (Pull,GetStatus) and to manage the
lifetime of the enumeration (Renew,Release,EnumerationEnd). These
operations are already defined, so we should have to redefine them.

The Enumerate method is optional and can be replaced by an equivalent
method that returns a Enumerate response message.

In the Enumerate response message, a <wsa:ReplyTo> element can be specified
which should then be used for any operations on that enumeration.

This gives two possibilities:

1 Use the same endpoint reference as the RUS.
In this case the implementor will have to also provide the WS-Enumeration
defined methods in the RUS service. However they should be in the wsen
namespace so the RUS spec should not be affected.

2 Use a different endpoint reference
Now the endpoint is different, so this should not affect RUS at all.

In any case we however need the Enumerate response from the WS-Enumeration
spec and I would think that we should import it from their schema and not
copy-paste it, if that is possible, to avoid inconsequences.

The interesting question would be if we could define a query method that
can either return the whole result (if it is small) in a response message
the we define or a Enumerate response message (if it is a large result set)
according to the WS-Enumerate specification. If that is possible I would
vote using only a single method, although it makes life harder for the
client because that needs to be able to handle the complexity.

If that is not possible, I would suggest to import (reference in the
specification) the Enumerate operation from the WS-Enumerate spec as it
seems to provide the features we would like (XPath as search/filter
expression).

But then again I am not sure if that works like I outlined above, I am not
that good with wsdls so please tell me if I am right or wrong about the
import things.

Action:

Update

Gilbert Netzer: 07/05/2007 10:00 AM EDT

Comment:

Copied from artf5934: Comment on 07/05/2007 by Xiaoyu Chen

2). Regarding RUS operations, the reason i got stucked somehow is an important signal from stakeholders about query operation. 
"The RUS specificaiton is hard to be conformed only if the query operations providing reliable extraction of usage records based on following contexts

:
first of all, as a specification, it should gives implementation guides to solve potential problems and makes reliable recommendataions. Which means, 

the specification should protect the RUS server from being crashed when returning large amount of data."
We agree to propose a "RUSTooComplexFault". However this never really solves returning huge-data to the client. Some implementation strongly requires 

a "steam" type return for stateful connection to the database. They request the complex queries always return what user asked even taking longer time.

 However, this would not work through Web service, which is stateless essentially and will throw session out error if taking too long time to get SOAP

 response message back. 

So i reviewed OSGA-DAI, WS-DAI, WS-Enumeration, WS-Context and other relevant specification and implementations. There is still no satisfiable 
solutions for this issue from the perspective of service interface definitions. OSGA-DAI provides an implementation-specific service interface, which 

is more or less like the operation defined in the WS-Enumeration specification:

GetFully Operation
Inputs: 
The name of a session known to a data service resource. 
The name of an open output stream known to the session. 
The session name and stream name are expected to be conjoined with a colon (:). 
Outputs: 
Data - a string, a chunk of valid XML or a byte array. 

GetNBlocks Operation
Inputs: 
The name of a session known to a data service resource. 
The name of an open output stream known to the session. 
The session name and stream name are expected to be conjoined with a colon (:). 
The number of blocks of data to be retrieved. 
Outputs: 
A block of data - a string, a chunk of valid XML or a byte array. 

Gilbert proposed the WS-Enumeration, and define a new operation for returning "wsen:EnumreationContext" type. That could be a solution. However, how 
to integrate the WS-Enumeration datatype and operations into RUS operations is another issue. 
* import WS-Enumeration WSDL within the RUS WSDL?
* define a new context data type for enumeration context within the RUS WSDL?

Action:

Update

Gilbert Netzer: 05/28/2007 4:58 AM EDT

Comment:

Comments by EMail from Xiaoyu Chen, 05/18/2007 05:43 PM

Again for a big returns, there are many solutions to deal with. like OGSA-DAI, which maintains the results within memory and allows the client to 
query chucks of data with many sessions. But Do you think a seperate fault for "TooBigResutlsFault" can be put into the specification, 'cos for some 
grid project, which deploys RUS implementation at site-level and operates on relatively same number of usage records. Or a RUS implementation only 
allows operations upon aggregate usage records. This fault will not thrown for these implementations at all. I can understand why people in OGF 20 
proposal various solutions for this, because they are thinking from implementor perspectives. But for specification, we can only recommend 
implemenations but not supposed to specify solutions. 
 
So my proposed solutions are: 
Put the "RUSQueryTooComplexFault" into  RUS::InvalidFault category;
Put "TooBigResultsFault" into some defined faults (do you have any recommendataions? );
Besides, do you remember the Fault schema in spec. v1.9 proposal allows custom defined RUS faults, but this would undermise standardisation and it is 
implementation's reponsibility to consider how to extends RUS faults for their custom implementations.

Action:

Update

Gilbert Netzer: 05/28/2007 4:35 AM EDT
	Action:	Create

Return