This is a static archive of the previous Open Grid Forum GridForge content management system saved from host forge.ogf.org file /sf/wiki/do/viewPage/projects.ogsa-wg/wiki/LifeScienceWorkflowUseCase?selectedTab=attachments at Sun, 06 Nov 2022 22:11:00 GMT SourceForge : View Wiki Page: LifeScienceWorkflowUseCase

Project Home

Tracker

Documents

Tasks

Source Code

Discussions

File Releases

Wiki

Project Admin

Glance

Calendar
Search Wiki Pages Project: OGSA-WG     Wiki > LifeScienceWorkflowUseCase > View Wiki Page
wiki1753: LifeScienceWorkflowUseCase
Title Life Science

Keywords

Goal and description

With the increased reliance on computational biology and data and statistical mining techniques, biological and behavioral science research efforts require ever more computational resources. A common paradigm in bioinformatics involves scanning a database and applying some function to each entry to generate a score or some other value. More formally, this procedure can be described as the set of results:

{f(x,t) | x ÎX}

or, alternatively:

For all x an element of X, compute f(x, t), where f(x, t) is some function such as BLAST, a docking code, or some other application.

In certain cases the function needs to be evaluated for the cross product of all values in one database across all of the values in another database:

{ f(x,y) | x ÎX, y ÎY }

or, alternatively :

For all x an element of X, For all y an element of Y, compute f(x, y), where f(x, y) is once again some application.

Typical times to evaluate the function for a single pair of entries range from a few seconds to several hours of CPU time and the number of evaluations can easily take up thousands of CPU hours. To make such tasks tractable, the task is parallelized, breaking the total number of evaluations into a single, or a small group, of evaluations per compute job and the jobs are run across multiple processors. This has become common practice within labs using clusters.

Another common paradigm involves larger numbers of ensemble computations with either stochastic behavior or slightly different parameters. Either way, the result is a large number of essentially independent, non-communicating computationally complex jobs which in aggregate require more power than one machine can provide. When spread across a large collection of machine, however, they are manageable.

The need to accommodate increasing demands for computational power causes research efforts to spend significant effort on managing the required computational resources. As computational biology becomes more ubiquitous, it is likely that a single research effort will require more computational power than the local institution can provide. Thus, it is reasonable to expect that local research efforts will need access to resources that are physically separated from the research team and controlled by a different organization.

Currently, gaining access to computational resources is done either by purchasing the required resources with the project budget or individually negotiating for use the time of existing resources. Buying new computational resources for a project can potentially prove an inefficient use of both time and money. New resources require expertise and effort to configure and may end up under-utilized or useless outside the project’s specialized demands. For borrowed/rented resources, there is often a significant effort up front to provide the research team with the proper accounts and infrastructure software (such as queuing systems or databases) and train research team members on unfamiliar infrastructure tools.

Actors/stakeholders

Assumptions

Preconditions

Main flow of events (Basic, alternate, exceptional)

Postconditions

Success requirements

Special Requirements

Issues

Sources/references

OGF sponsor/stakeholder/interested party

 




The Open Grid Forum Contact Webmaster | Report a problem | GridForge Help
This is a static archive of the previous Open Grid Forum GridForge content management system saved from host forge.ogf.org file /sf/wiki/do/viewPage/projects.ogsa-wg/wiki/LifeScienceWorkflowUseCase?selectedTab=attachments at Sun, 06 Nov 2022 22:11:00 GMT