
Researchers and computer scientists at UCLA’s Institute for Digital Research and Education (IDRE) are spearheading an advancement that, if successful, will lop years off the time it takes to run complex, high-performance computations using UC's massive computing network known as the UC Grid.
If all goes well, computation that now requires three years when it was manually supervised will be cut to only three months because of software known as the Kepler Workflow.
This software achieves this through automated workflow — a sequence of operations that makes it possible for a lengthy and complicated research project to run its computational course with little or no human intervention. Under development since 2002 by UC researchers, this software has actually been doing this for the last five years for work performed on desktop computers.
Now, through a cooperative effort led by IDRE research scientist Prakashan Korambath, the software is being modified for the first time to work in a Grid environment.
If all goes according to plan, researchers at seven UC campuses, including UCLA, will be able to use the vast computing resources of the Grid as if they were dropping their shirts off at the laundry — they’ll upload their data into the Grid via their desktop computer and come back when the complex computation is finished.
"It's a major breakthrough for the high-performance computation field," said Bill Labate, director of Research Computing Technologies at UCLA. “The real difference here is, instead of just resources available on a desktop computer, you’re looking at resources available anywhere on the Grid.”
What does that mean to UCLA scientists who have had to literally babysit their research projects for years as they are run their work through the Grid's supercomputer network?
A lot, they say. Currently, they must sit at a computer at various times while the data are running and manually program in directions at various stages of the lengthy computation to keep it progressing.
That’s what the quantum chemistry researchers of the Rosetta project at UCLA had to do as they ran their data using software that predicts how the body’s proteins fold. “It took their collaborators three years to publish their paper because of doing all of these steps with human intervention,” Labate said.
In fact, UCLA's Rosetta project has been chosen to serve as a guinea pig for the Grid-enabled Kepler Workflow software.
“We happen to have a real-life scenario we’re trying to automate,” Labate said. “If we’re successful, we will reduce the time tremendously for these people to do their research.”
The software is designed to make it easy for researchers to describe and direct complex data processing without being experts in computers and programming.
Here's how it would work: Using the software, researchers “will initiate the process with initial data inputs,” Korambath said. "The Grid will find the computer cluster (within the vast network of clusters) to run the first set of jobs, and it will monitor the status of the job." Once those jobs are finished, the Grid will find another available cluster. It will collect the data and migrate to the cluster that's free for the next step of the process.
“In each case, the Grid is making decisions on behalf of the user. It’s looking for free space that is big enough for the data size,” Korambath explained.
In addition to saving time, the software will ensure much more efficient use of computer power, because Kepler seeks out idle computer clusters within the Grid and puts them to use. It will also cut down on the chance for human error as well as make it easier for researchers to change parameters as their projects progress and to replicate projects.
Right now, researchers in the Scientific Workflow Automation Technologies group at the San Diego Supercomputer Center at UC San Diego are writing software specific to the needs of the Rosetta project to add to the Grid-enabled Kepler Workflow software. Korambath said he is hoping to begin feeding the project’s data into the Grid sometime this summer.
“Once we are successful with the chemistry project, we can identify a lot of similar projects, and they will move even faster because there will be a basis for the customized software that is needed,” Korambath said.
Ultimately, researchers across the UC system will be able to enjoy higher productivity and greater ease of use.
“The mechanics of high performance computation shouldn’t drive the science,” Labate said. "It should be the other way around."