This story was originally covered by PRI’s Here and Now. For more, listen to the audio above.
In 2010, Google announced that it would scan all books, a number that Google estimates to be about 130 million, by 2020. Scanning these books does not just mean taking a picture of every page; the books will also be converted to a format that is readable by computers. An “optical character recognition” format will allow computers to search for a word throughout the text. This is no easy task, since many older books have smudges on the page or faded ink that can stump computers. Only a person could figure out much of the text.
For books written more than 50 years ago, 30 percent of the text is indecipherable by a computer, according to Dr. Luis Van Ahn of Carnegie Mellon University. The solution to decoding so many books is a technology called reCAPTCHA designed by Van Ahn. Using this technology, 100 million words are deciphered every day with the help of very everyday people.
Here’s how:
When someone purchases tickets to the next live taping of her favorite radio program, before payment, she is asked to re-type the garbled word she sees in the box above. This word is a CAPTCHA or a “Completely Automated Public Turing test to tell Computers and Humans Apart.” It’s a security feature to make sure that the ticket buyer is a person instead of a robo-computer buying up all the tickets for a scalper. Many sites use this tool, but what the buyer doesn’t know, is that the word she’s decoding may be from a scanned book. The word looks smudged, not because the computer generated a smudged word, but because the word was smudged on the original page.
If the CAPTCHA word is from a book, it’s part of Van Ahn’s reCAPTCHA program, and everyday people are truly reviving the books of yesterday.
———————————————————————————
“Here and Now” is an essential midday news magazine for those who want the latest news and expanded conversation on today’s hot-button topics: public affairs, foreign policy, science and technology, the arts and more.More “Here and Now”.
At The World, we believe strongly that human-centered journalism is at the heart of an informed public and a strong democracy. We see democracy and journalism as two sides of the same coin. If you care about one, it is imperative to care about the other.
Every day, our nonprofit newsroom seeks to inform and empower listeners and hold the powerful accountable. Neither would be possible without the support of listeners like you. If you believe in our work, will you give today? We need your help now more than ever!