Mike Thomas
Breaking into Bioinformatics at ISU
Many Tools, One Interface
But what we particularly like about iNquiry, which we use frequently Thomas says, is the ability to package our own tools or other open-source tools from around the world in the same iNquiry wrapper.
Researchers can use the command line interface for any of the tools, but many prefer a wrapper. For instance, says Thomas, one researcher might need a tool for a particular project. Someone else wants to run a program on the cluster because it runs too slowly or takes too much processor time on a desktop system. Someone else wants to put a graphical user interface on a favorite command line interface application. We can wrap any program using XML so it has the same web interface as any other iNquiry tool.
The advantage is enormous, Thomas says, because anyone with an account can submit jobs to the cluster from any computer, anywhere in the world. The job runs on the cluster and automatically sends users the results.
It can take a desktop system several days to run 100 bootstrap replicates for maximum likelihood phylogenetic reconstructions. Running the same replicates on the cluster takes just minutes or a few hours depending on the data and the size of the data sets.
Parallel Processing Accelerates Research
BioTeam iNquiry includes distributed resource management software, Sun Grid Engine, for scheduling jobs based on demand, job type and job priority. Thomas took advantage of the Workgroup Clusters easy customizability by installing LAM/MPI and PVM parallel processing software to streamline cluster-computing tasks.
The parallel processors, he says, work seamlessly with parallel versions of BLAST, HMMer, ClustalW, MrBayes and other tools, so when a user submits a job using parallel versions of a tool, all nodes in the cluster work on the problem.
Evolutionary biologists at ISU use phylogenetic tools on the cluster to build evolutionary trees from molecular sequences. The programs they use, Thomas says, are also available as standalone applications, but it can take a desktop system several days to run 100 bootstrap replicates for maximum likelihood phylogenetic reconstructions.
Running the same replicates on the cluster, he adds, takes just minutes or a few hours depending on the data and the size of the data sets. Now theyre running 1,000 data sets in parallel on the cluster and getting their jobs done much more quickly.
Cluster As Teaching Tool
Thomas also relies on the cluster when he teaches his graduate course in bioinformatics to students whose experience, Thomas says, varies from zero to students who have created novel research applications.
Along with Mitch Day, a graduate teaching assistant, Thomas teaches the fundamentals of bioinformatics how different programs work and the algorithms within the software but students spend much of their time solving problems through hands-on work with the applications.
For live class demonstrations, Thomas links his PowerBook to an LCD projector and to the cluster via an AirPort Extreme wireless network.
We talk about the UNIX environment and how to implement these tools from the command line, Thomas explains. We also teach students a little PERL programming so they can tie programs together with small PERL scripts. This way, students know how the tools work and can become power users later on.
Enormously Successful
The program is so successful, so quickly, that several of Thomas students are helping other students solve their own problems in the lab. I think the carryover from students who are taking the course, Thomas says, will provide a knowledge base for other students.
Even researchers at the University of Idaho, ISUs rival school, have asked Thomas to brief them about the Apple Workgroup Cluster.
I think the cluster is going to have a huge effect in our research environment, Thomas says. And I think it will help scientists here generate additional research funding, because now were able to mention the cluster as a resource when we write grant proposals.
Even though the size of our cluster is modest, he adds colleagues are impressed with the amount of use that it gets, how well it has been integrated into the classroom and how many different people are using it for different applications.
