MIT Media Laboratory: The Human Speechome Project

Terabytes of Babble Bites

Kitchen Video

A collection of 14 microphones and 11 video cameras, like this one that’s embedded in the ceiling over the Roy’s kitchen, record about 80 percent of their son’s movements and speech.

Two-thirds through the observation phase of the Human Speechome Project, Roy’s son has become a fountain of words, and Roy’s home-turned-language lab is a river of video and audio information. To streamline collecting and processing the incessant flow of data, each of the 1 megapixel video cameras records at only 14 frames per second--about half the frame rate of standard video. Each camera employs an automatic motion sensor to monitor activity in its coverage area. When it doesn’t detect motion, the camera reverts to a standby capture rate of just 1 frame per second. The microphones, however, are on full time, recording better than CD-quality, 16-bit 48kHz audio to ensure that no words slip by. The boundary layer mics are acoustically tuned to use the ceilings as sound pickups, preserving every whisper and gurgle with crystal clarity.

“Our work makes substantial use of both graphics and scientific computation, and we prefer the reliability of UNIX-based systems. Consequently, we planned for Mac OS X to be the primary development platform for the Speechome project's end-user applications.”

All that data flows downstairs to the basement. Audio samplers in the computer room process the analog sound data, which is relayed along with the video data to a cluster of 10 computers, including the five Xserves. The computers perform real-time video compression and provide time-stamped digital audio and video files, which are then stored temporarily on the 4.4TB Xserve RAID.

Roy’s team has long relied on Macintosh computers. “When we started the project, we had already been using Power Mac G5 workstations for other research projects,” he says. “Our work makes substantial use of both graphics and scientific computation, and we prefer the reliability of UNIX-based systems. Consequently, we planned for Mac OS X to be the primary development platform for the Speechome project's end-user applications.”

Even with the video streamlining and compression, Roy’s house generates about 200GB of data every day, an amount of data that demands powerful technologies just to archive, let alone analyze. At first, even Roy didn’t appreciate the basic problems that would be created just by the sheer volume of data. For instance, how could he safely and efficiently transport 200GB per day from home to MIT? The solution he worked out, which combines high-tech and decidedly low-tech strategies, is what he jokingly calls “sneakernet.” He installed an Overland NEO 2000 tape library in the basement data center that can hold 30 LTO-3 backup tapes. It uses a robotic arm to fetch a tape as needed and put it into the reader. The Xserve RAID is emptied into the tape library weekly, and every 40 days or so a full set of tapes—12TB of data—is filled. “You pop those tapes out, and they comfortably fit into a little case. And I just carry that to work.”

At the MIT lab, Roy’s team copies the tape data onto an ever-expanding storage area network, which incorporates seven 4.4TB Xserve RAIDs seamlessly interfaced with hardware from several different manufacturers. Twenty-two months into the project, Roy says the storage network holds approximately 250TB of data, and by the end of the project in another year he expects it to grow to a full capacity of 1.4 petabytes (million gigabytes). That’s enough room to hold digitized copies of every book in the Library of Congress--10 times over.

1 2 3