Craig Benham

New Detective in Bioinformatics

The Xserve cluster lets Benham arrange the algorithm so each of 39 processors can handle a piece of the calculation and send the results back to an Apple Xserve RAID for storage. Researchers download the results from the RAID disk to their PowerBooks for analysis.

Benham says he decided to switch from a shared-memory architecture system when he realized his algorithm could run more efficiently on a cluster. “Beyond that,” Benham says, “the decision to go with Apple didn’t actually take too long when I looked at our options.”

Other cluster systems then available, he says, “weren’t really tuned to large floating-point calculations. The 64-bit processors in Xserve units and Power Mac G5s are designed for intensive, big-iron type floating point calculations, which is what we’re doing — and which we’ll do a lot more in the Genome Center as time goes on.”

“The Xserve cluster, together with the BioTeam iNquiry package, makes a turnkey solution for getting up and running very quickly. And that was something we really needed.”

Streamlining the Algorithm

Benham originally wrote the algorithm for the research in Fortran, then spruced it up in C++. When he implemented the Xserve cluster, he ran the program through the CHUD tools in Apple’s developer toolkit, now included with Apple’s Xcode IDE.

“We got significant time savings on our algorithm by running it through Apple’s developer tools,” Benham says. “It showed us how much time we were spending at specific steps so we could concentrate on making the slow steps more efficient. We probably improved the speed of the algorithm by a factor of three.”

A Billion Exponentials

Even a rather small run, Benham explains, “could calculate a hundred-million to a billion exponentials. Many of exponentials that we are calculating involve energy and, in many cases, the energy in different parts of the calculation is the same.

“We made a table of energies, so every time we needed to calculate an exponential for one of them, we swapped out of that table the energy and replaced it with the exponential. Whenever we needed that exponential later in the calculation, we’d just use it. We didn’t have to recalculate.”

Researchers keep all of the development tools on their PowerBooks. “We do all of our algorithm development and programming on the PowerBooks,” says Benham, “and then just port it over to the big machine when we want to do a production run.”

Turnkey Solution

Benham also chose the Mac platform because he “wanted a turnkey solution, and Apple is really the only computer company that integrates everything, from the hardware to the software.”

“The Xserve cluster, together with the BioTeam iNquiry package,” he elaborates, “makes a turnkey solution for getting up and running very quickly. And that was something we really needed. We are not experts in programming clusters, and we don’t want to be.

“We want to see what biology is illuminating and not worry about the technicalities of the thing.”

Low-Cost Implementation

For Benham, the Xserve system was also a very cost-effective implementation, and I think that was one of the determining differences between Apple and other cluster vendors — that and the fact that Apple takes responsibility for all aspects of their system. Other vendors don’t have the integration, the GUIs, or the general intelligent design of the whole system.”

The BioTeam consultants provided onsite training, installed the Xserve clusters and the software involved with distributed job management. Benham also set up a maintenance contract with BioTeam so he always has access to expert technical support. “Now we’re down to phone and email consulting,” Benham says, “because we have a pretty good handle on how to run the system.”

Scalability, Versatility and Power

Right now, Benham’s work focuses on one aspect of gene regulation. “It’s very illuminating,” he says, “but I would like to integrate other structural and sequence features to develop a global view of the mechanics of gene regulation, which will need a lot of computational resources.”

Clusters, he believes, “represent a very flexible solution for many computing needs. You can do a lot of major calculations with clusters of 64-bit processors, with maybe two to four gigs of memory each, if your calculations are CPU intensive. If you’ve got more data-intensive calculations, you may need some fiber optic interconnects that can move the data.

“Either way,” Benham says, “the Xserve cluster gives us scalability, versatility and power, and it’s the most cost-effective way to go once you are running calculations that can use that architecture.

“I don’t see anything on the clusters of other vendors that I don’t see also on the Xserve system. In the other direction, there are a lot of things in the Xserve system that I don’t see on other clusters.”

1 2