MIT Media Laboratory: The Human Speechome Project

TotalRecall: Making Sense of a Data Deluge

Image Page3 1

This visualization captures the interaction between caregiver and child.

How do you sift through torrents of raw audio/video recordings to pick out individual words, conversations, and interpersonal interactions? How do you discover patterns buried in words, games, body postures, rituals, and the minutiae of daily life to make sense of how children develop and learn to speak?

Roy and his Speechome team turned to the Mac. They created, and are continuing to develop, a UNIX-based application that runs on Macs that they aptly named TotalRecall. The program lets researchers scan and tunnel through what will be the entire 200,000 hours of audio and video data—from a pair of computer screens. At first glance, the TotalRecall main window looks like a standard non-linear video editing application. In fact, the user interface borrows many conventions, such as multi-track timelines and the ability to zoom through different time scales. A researcher browsing the data can zoom the timeline out to display all three years of data, or zoom all the way down to show moment-by-moment activity. When an interesting-looking segment is identified, it’s a simple task to watch and listen to the full audio/video tracks.

But how do you find those interesting tracks? A problem that becomes readily apparent to a Speechome researcher is simply finding the Roy’s son. He might be anywhere in the house, which means he could show up in any of the video channels. One approach looks for motion. TotalRecall incorporates a clever algorithm for highlighting motion without even having to watch the video. The program stacks video frames on top of each other, setting the opacity of every area of each frame in proportion to its movement within the video sequence. Moving objects—typically people, but sometimes balls or other flying toys—remain visible, while the background fades away. By graphically offsetting the stack of video frames, like a deck of playing cards spread out on a blackjack table, the moving objects paint an image of a worm wriggling through the sequence. Roy calls them “space-time worms.” By glancing at the “worm channel” on the TotalRecall timeline, it’s easy to spot video segments with lots of action.

“TotalRecall is a human-machine collaborative engine.”

Even with all the technological innovations packed into TotalRecall, the program can’t do everything on its own. “If you take today’s state-of-the-art in speech-recognition technology and try to automatically transcribe our speech recording, you get word salad,” Roy says. “It’s not even close.” For that onerous task, Roy enlisted the services of a uniquely qualified transcriber: his son’s daytime caregiver, perhaps the only person other than mom and dad who can decipher her young charge’s early words. But the TotalRecall team is trying to remove much of the pain from manual transcription by testing an innovative algorithm for feeding the transcriber small sound bites, just as fast as she can enter them.

In addition to the written transcripts, Roy plans to annotate the dataset with a rich body of metadata, including people’s positions and locations, activities, and non-verbal sounds. People will enter most of these annotations, Roy explains. “TotalRecall is a human-machine collaborative engine. That’s the whole idea behind it, and the vision is to bring people from fields other than computer science, such as behavioral scientists, developmental psychologists, and linguists, who are interested in language acquisition or just human development in general, and help them do the kind of research they want to do faster. Beyond speed, computational analysis and modeling tools may also let researchers make sense of complex interactions in ways that conventional statistical methods do not support. These new data mining, visualization, and analysis capabilities might enable new forms of inquiry into how children learn and grow. And right now, because it’s a work in progress, by engaging researchers from these fields as we’re building the tools, we’ll have them actually shape the functionality and the design. It really is a living, evolving package.”

1 2 3