A California startup tries to capture the elusive spoken word — and make it searchable

The Pop Up Archive interface lets users jump around in an audio recording by clicking on the accompanying text.

The few times I've used speech recognition technology, I've just ended up arguing with my smart phone in public.

But at Pop Up Archive, there are greater ambitions for that technology. The website uses speech recognition to automatically transcribe and tag large batches of audio, so that users can search for people and topics within the recordings.

[[entity_id:"84678" entity_type:"node" entity_title:"World in Words Subscribe (StoryAct)"]]

A Pop Up Archive search for the term “boycott,” for instance, brings up a list of radio segments and speeches that spans more than 50 years, from the Montgomery bus boycotts during the Civil Rights Movement, to the more recent boycotts of the private Google buses that shuttle employees to work from San Francisco. Clicking on a line in the transcript allows a user to instantly jump to that point in the audio recording.

Speech recognition technology provides a bridge between text-based search engines and the mountains of audio and video content people are creating, said Anne Wootton, one of the founders of Pop Up Archive.

“We’ve never been able to search within audio or video before and so it’s not an expectation that users have,” Wootton said. “But the potential is vast and many people recognize that.”

Other startups have caught on and now there are a variety of tools out there that allow users to generate these automatic, interactive transcripts for their media. The clickable text has even started popping up on some YouTube videos.

Pop Up Archive Co-founder Anne Wootton (right) shows off new features of the site to Ashleyanne Krigbaum (center).
Pop Up Archive Co-founder Anne Wootton (right) shows off new features of the site to Ashleyanne Krigbaum (center).Caroline Lewis
But people aren’t racing to use this tool, partly because speech recognition software still isn’t totally accurate. On Pop Up Archive, the results are still hit or miss and that can make searching more difficult. Users can edit their transcripts if there are mistakes, but not everyone wants to bother with that.

That’s why Wootton and her team are determined to make their speech recognition software better. During an interview at her apartment in Oakland, California, Wootton plays a newscast and then tells me the differences between "basic" and "premium" transcripts it generated on Pop Up Archive. The "premium" version had far fewer mistakes, could identify the speakers and used proper grammar.

Wootton did a summer internship with the Kitchen Sisters, a popular public radio duo, and found her inspiration in their chaotic San Francisco studio.

“We were doing oral histories in central coastal California and a lot of those folks have long gone,” said Nikki Silva, one of the Kitchen Sisters. “They were involved in things like ranching and rodeo and farming and fishing and all these things that kind of are no longer how things are done in that area.”

The Kitchen Sisters started out in 1979, recording on cassette tapes. When they started producing their “Lost and Found Sound” series, they graduated to DATs, or digital audio tapes — another technology that soon went extinct. Today, their studio is filled with audio in every format imaginable, from records and reels to CDs and hard drives.

“We have kind of what we call an ‘accidental archive.’ We never started out thinking that we were going to amass the amount of material that we did,” said Silva.

The team at PopUp Archive set out to turn this “accidental archive” into a searchable, accessible online library that would be open to the public. Soon, they realized the Kitchen Sisters weren’t the only ones trying to solve this problem.

Across the pond, the BBC is grappling with an archive of almost 100 years’ worth of content. They’ve already digitized TV shows, global newscasts and radio dramas — but they’re still not necessarily searchable.

“They still sit on the digital equivalent of the dusty shelf,” said Rob Cooper of the BBC’s Research & Development team. “Although they’re much more easily available, finding the interesting stuff in there — the good stuff — is the really challenging problem.”

In order to solve that problem and make their vast media library easier to navigate, the BBC Research & Development team has been working on their own solution, COMMA, which uses speech-to-text software in much the same way as Pop Up Archive.

In the US, Pop Up Archive has found a partner in the non-profit Public Radio Exchange, or PRX, and that’s attracted the support of foundations. They’ve also raised more than a million dollars through private investors. Many people seem to think this technology is the future.

Now, the only thing left to do is make it work.

“Speech-to-text has been around for a long time, but I think it’s kind of under-delivered,” said Wootton. “It’s not perfect, but there’s a difference between ‘perfect’ and ‘readable.’ And then there’s certainly a difference between ‘perfect’ and ‘searchable.’”

A group of public radio journalists from the San Francisco Bay area are the site’s beta testers. They’re checking Pop Up Archive’s transcripts of their reports and interviews for accuracy.

Ashleyanne Krigbaum, a reporter with KALW, said the site was still struggling with certain proper names, like the name of her show, “Crosscurrents.” In one spot, the site had interpreted the name as “crossed tourists.”

Wootton explained that the software often has trouble with proper names. “I want to teach it the word ‘Crosscurrents,’” she said.

Looking over the transcript, Krigbaum found another line that looked a bit off. It read, “First, he sealed big news.”

“That’s not correct,” said Krigbaum. She clicked the text and the audio played: “From KALW News in San Francisco…” read the radio host.

Krigbaum said she saw far fewer mistakes than she did using the old software, however. She asked Anne Wootton what had caused the improvement.

Wootton explained that the old software used a very large and general vocabulary.

“Basically, it used the entire Internet as its vocabulary,” she told the reporters. “So that’s why you would see stuff like ‘Rhianna’ or ‘HTC Galaxy.’ That’s what the Internet talks about. This software is more specific and it's particularly tailored to broadcast news.”

Now, it seemed the software was beginning to understand the language these public radio journalists were speaking. Of course, they speak English on public radio. But it’s a different English than, say, the English spoken on the BBC, where a different set of words and people make their way into the news.

As it turns out, software comes to “understand” what they’re talking about on public radio in much the same way that a person does: by listening to a lot of it. The Pop Up Archive team feeds their software tons of content that’s similar to what their users will be uploading. That way, the software can determine the probability that a series of words will appear next to each other in a particular order.

Pop Up Archive is working on creating different versions of the software for different topics, like sports and pop culture, and even for different time periods. They’ve created a model for the 1980s, for instance, that’s uniquely tailored to the vocabulary and cultural references of the time. The site still can’t determine automatically that it should use the 1980s model when someone uploads content from that era. Wootton says that capability isn’t a long way off.

It could be a big turning point for the company. Right now, Pop Up Archive is focused on helping organizations with big projects, like the Studs Terkel Archive they’re creating in partnership with the Chicago History Museum. But, in the future, the team wants the site to work well enough on its own to become popular with independent media makers and archivists.

Bits and pieces of the Kitchen Sisters’ “accidental archive” are already uploaded, transcribed and tagged in Pop Up Archive. But much of it is still stowed away in their studio, out of reach. Silva agrees the technology hasn't been perfected yet, but says she’s just glad someone’s working on it.

Patrick Cox adds: Also in this World in Words podcast: A Chinese translator laughs off a dumb insult tweeted by Argentina's president.

The World in Words podcast is on Facebook and iTunes.

Correction: An earlier version of this story had the incorrect public radio affiliate for reporter Ashleyanne Krigbaum.

Do you support journalism that strengthens our democracy?

At The World, we believe strongly that human-centered journalism is at the heart of an informed public and a strong democracy. We see democracy and journalism as two sides of the same coin. If you care about one, it is imperative to care about the other.

Every day, our nonprofit newsroom seeks to inform and empower listeners and hold the powerful accountable. Neither would be possible without the support of listeners like you. If you believe in our work, will you give today? We need your help now more than ever!