This story was originally covered by PRI’s Here and Now. For more, listen to the audio above.
In 2010, Google announced that it would scan all books, a number that Google estimates to be about 130 million, by 2020. Scanning these books does not just mean taking a picture of every page; the books will also be converted to a format that is readable by computers. An “optical character recognition” format will allow computers to search for a word throughout the text. This is no easy task, since many older books have smudges on the page or faded ink that can stump computers. Only a person could figure out much of the text.
For books written more than 50 years ago, 30 percent of the text is indecipherable by a computer, according to Dr. Luis Van Ahn of Carnegie Mellon University. The solution to decoding so many books is a technology called reCAPTCHA designed by Van Ahn. Using this technology, 100 million words are deciphered every day with the help of very everyday people.
Here’s how:
When someone purchases tickets to the next live taping of her favorite radio program, before payment, she is asked to re-type the garbled word she sees in the box above. This word is a CAPTCHA or a “Completely Automated Public Turing test to tell Computers and Humans Apart.” It’s a security feature to make sure that the ticket buyer is a person instead of a robo-computer buying up all the tickets for a scalper. Many sites use this tool, but what the buyer doesn’t know, is that the word she’s decoding may be from a scanned book. The word looks smudged, not because the computer generated a smudged word, but because the word was smudged on the original page.
If the CAPTCHA word is from a book, it’s part of Van Ahn’s reCAPTCHA program, and everyday people are truly reviving the books of yesterday.
———————————————————————————
“Here and Now” is an essential midday news magazine for those who want the latest news and expanded conversation on today’s hot-button topics: public affairs, foreign policy, science and technology, the arts and more.More “Here and Now”.
The World is an independent newsroom. We’re not funded by billionaires; instead, we rely on readers and listeners like you. As a listener, you’re a crucial part of our team and our global community. Your support is vital to running our nonprofit newsroom, and we can’t do this work without you. Will you support The World with a gift today? Donations made between now and Dec. 31 will be matched 1:1. Thanks for investing in our work!