Craig Benham
New Detective in Bioinformatics
The Xserve cluster lets Benham arrange the algorithm so each of 39 processors can handle a piece of the calculation and send the results back to an Apple Xserve RAID for storage. Researchers download the results from the RAID disk to their PowerBooks for analysis.
Benham says he decided to switch from a shared-memory architecture system when he realized his algorithm could run more efficiently on a cluster. Beyond that, Benham says, the decision to go with Apple didnt actually take too long when I looked at our options.
Other cluster systems then available, he says, werent really tuned to large floating-point calculations. The 64-bit processors in Xserve units and Power Mac G5s are designed for intensive, big-iron type floating point calculations, which is what were doing and which well do a lot more in the Genome Center as time goes on.
The Xserve cluster, together with the BioTeam iNquiry package, makes a turnkey solution for getting up and running very quickly. And that was something we really needed.
Streamlining the Algorithm
Benham originally wrote the algorithm for the research in Fortran, then spruced it up in C++. When he implemented the Xserve cluster, he ran the program through the CHUD tools in Apples developer toolkit, now included with Apples Xcode IDE.
We got significant time savings on our algorithm by running it through Apples developer tools, Benham says. It showed us how much time we were spending at specific steps so we could concentrate on making the slow steps more efficient. We probably improved the speed of the algorithm by a factor of three.
A Billion Exponentials
Even a rather small run, Benham explains, could calculate a hundred-million to a billion exponentials. Many of exponentials that we are calculating involve energy and, in many cases, the energy in different parts of the calculation is the same.
We made a table of energies, so every time we needed to calculate an exponential for one of them, we swapped out of that table the energy and replaced it with the exponential. Whenever we needed that exponential later in the calculation, wed just use it. We didnt have to recalculate.
Researchers keep all of the development tools on their PowerBooks. We do all of our algorithm development and programming on the PowerBooks, says Benham, and then just port it over to the big machine when we want to do a production run.
Turnkey Solution
Benham also chose the Mac platform because he wanted a turnkey solution, and Apple is really the only computer company that integrates everything, from the hardware to the software.
The Xserve cluster, together with the BioTeam iNquiry package, he elaborates, makes a turnkey solution for getting up and running very quickly. And that was something we really needed. We are not experts in programming clusters, and we dont want to be.
We want to see what biology is illuminating and not worry about the technicalities of the thing.
Low-Cost Implementation
For Benham, the Xserve system was also a very cost-effective implementation, and I think that was one of the determining differences between Apple and other cluster vendors that and the fact that Apple takes responsibility for all aspects of their system. Other vendors dont have the integration, the GUIs, or the general intelligent design of the whole system.
The BioTeam consultants provided onsite training, installed the Xserve clusters and the software involved with distributed job management. Benham also set up a maintenance contract with BioTeam so he always has access to expert technical support. Now were down to phone and email consulting, Benham says, because we have a pretty good handle on how to run the system.
Scalability, Versatility and Power
Right now, Benhams work focuses on one aspect of gene regulation. Its very illuminating, he says, but I would like to integrate other structural and sequence features to develop a global view of the mechanics of gene regulation, which will need a lot of computational resources.
Clusters, he believes, represent a very flexible solution for many computing needs. You can do a lot of major calculations with clusters of 64-bit processors, with maybe two to four gigs of memory each, if your calculations are CPU intensive. If youve got more data-intensive calculations, you may need some fiber optic interconnects that can move the data.
Either way, Benham says, the Xserve cluster gives us scalability, versatility and power, and its the most cost-effective way to go once you are running calculations that can use that architecture.
I dont see anything on the clusters of other vendors that I dont see also on the Xserve system. In the other direction, there are a lot of things in the Xserve system that I dont see on other clusters.
