This is a static archive of the previous Open Grid Forum GridForge content management system saved from host forge.ogf.org file /sf/wiki/do/viewPage/projects.ur-wg/wiki/StorageSensorSampling at Thu, 03 Nov 2022 00:26:30 GMT SourceForge : View Wiki Page: StorageSensorSampling

Project Home

Tracker

Documents

Tasks

Source Code

Discussions

File Releases

Wiki

Project Admin
Search Wiki Pages Project: UR-WG     Wiki > StorageSensorSampling > View Wiki Page
wiki2526: StorageSensorSampling

Storage Accounting with Sampling/Monitoring

There has been much discussion about the sampling of stored data quantities to obtain usage for UR markup. We start with these axioms:

  • Data storage is a continuous usage i.e. it has a start time and an end time or duration.
  • There are two mechanisms to calculate this being considered:
    • record all operations which write to the storage device and delete from storage device e.g. a gridftp server modified to record I/O inbound and delete operations
    • measure usage periodically e.g. for a disk on a linux system using e.g. du

This lead on to the understanding that there would be inaccuracies in sampling process resulting in two proposals:

  1. URs do not specify a time period
    • Any time period would make the record in itself inaccurate.
    • Resources instead publish status information: the total size of all files stored (total allocation was also discussed) to the accounting systems via URs
    • UR consumer would then derive an average/max/min value over a period of time for accounting using whatever algorithm they wished
  2. URs specify a time period
    • URs would contain a self consistent usage value
    • URs would not reflect the dynamics of the usage

For details see email thread starting here http://www.ogf.org/pipermail/ur-wg/2012-February/000504.html

Agreed (skype meeting 28/02/2012):

  • Storage UR will specify a start and end time
    • To be set by the sensor/Resource-Provider
    • and subject only to local policy decisions
    • (Resource provider or service software is in best place to determine sampling rates)
  • Storage UR will present a data size value: N (bytes or other similar specified units c.f. UR v1)
    • i.e. not an integral value
    • UR will not mandate how this value is achieved so long as it is a reasonable mechanism and is described publicly for the user to consume
  • Storage UR data size value will be interpreted by UR consumers as an average constant value across the time period

Notes

  • Allowing resource providers to determine their UR start/end time could in principle lead to very many very short period URs in the system.
    • It needs to be noted that granularity will determine performance and therefore we need to request of service providers that they cut appropriately coarse grained URs.
    • Sampling can still be done as often as necessary for accuracy purposes but the resulting published UR needs to consider publishing average usage over a suitably long time baseline UR.
  • Would it be sensible to provide metadata in the UR to indicate the sampling process used?
  • Would it be sensible to provide metadata to indicate whether the value comes from a sampling mechanism?

 




The Open Grid Forum Contact Webmaster | Report a problem | GridForge Help
This is a static archive of the previous Open Grid Forum GridForge content management system saved from host forge.ogf.org file /sf/wiki/do/viewPage/projects.ur-wg/wiki/StorageSensorSampling at Thu, 03 Nov 2022 00:26:30 GMT