Dr. Will Gilbert
Feats of Bioinfomagic
William Gilbert used BLAST to search the human genome which, if printed, would fill 1,000 one-thousand-page phone books.
Piecing the Genetic Puzzle
To identify the genetic puzzle pieces unique to humans, Gilberts group first looked at research conducted by the University of California, Santa Cruz. Scientists there had found 192 places along the human genome where hereditary instructions might be stable stable over the last 100,000 years or so, Gilbert says. If our thinking is correct, these areas should contain genes that have pretty much settled down.
But there was a problem.
There was no way to tell from the Santa Cruz study which genes the 192 areas or bins in biotech language contained. At the other end of the country, in Washington D.C., researchers at NCBI had annotated the genome, which is estimated to contain 30,000 to 40,000 human genes that determine everything from gender to disease susceptibility. But the NCBI couldnt designate which genes populated the 192 regions of genetic stability.
Just for giggles, I cranked it way up. I got A/G BLAST to run a test in 19 seconds, beyond belief.
Whats more, the two databases used different coordinate systems. Gilbert knew if he could line up the two genomes using their sequences, hed have his answer. They were the same genome, he explains, so the matches should be identical. Once we matched regions, we could look up the genes and be back in business.
Slow Going
Gilbert first used NCBI BLAST to compare the 192 bins against the entire human genome which, if printed, would fill 1,000 one-thousand-page telephone books. It took NCBI BLAST 16 hours to match just one of the bins to the genome. Actual DNA code, Gilbert explains, has just four letters: A, G, C and T. When youre doing comparative genomics, you dont compare one genetic letter at a time a C or an A. You compare words composed of 20 genetic letters or 50 genetic letters TACCTAGAC and so on rarely more than 50 because conventional thinking is that you lose sensitivity when you use longer words.
Still, making comparisons 50 genetic letters at a time was slow going. And thats just doing it once, Gilbert says. Youd like to do it more than once because you want to tweak things and ask what-if questions. It quickly became apparent that it would take more than a month to complete all our bins with NCBI BLAST.
Apple/Genentech BLAST
Thats when Gilbert glanced at a chart he had taped to his wall. I looked at the plot; it compared the time it took NCBI BLAST and Apple/Genentech BLAST to execute comparisons. The plot for regular BLAST started out and leveled off; the plot for A/G BLAST was a straight line that went up at a 45-degree angle.
I got to thinking, Gilbert says, I wonder if that linearity continues. What if I cranked this thing up to word sizes of 200? That would certainly save the day. So I hopped on my Mac, pulled down the A/G Blast, spent about an hour indexing the genome a different way. And I said This is either going to work or not going to work. I tested a word size of 250 for my first shot. The test was done in two minutes, much to my disbelief. So I said, Well, that must not have worked, Yet when I examined the output, A/G BLAST had indeed found the right hunk of DNA. Things started getting very exciting at that point.
Gilbert then slowly brought the word size down to make sure he wasnt losing any sensitivity. A/G BLAST found the same gene region. Whether I used a smaller word size or a larger word size, I would find the same piece of DNA. That was very encouraging. And just for giggles, I cranked it way up, too. I think at one point I got A/G BLAST to run a test in 19 seconds, which is just beyond belief.
