ReCaptcha is helping digitizing books, newspaper and other old stuff.
How this reCAPTCHA project is working?
Question is how this project is helping digitizing books and newspaper and how it know whatever you entered is correct or not?
As you know it uses OCR (Optical Character Recognition). First it takes image of those books and then use OCR to transform into character. Character takes less space compared to images and good in searching. So, transforming images into text is important.
Few words can be written in a way that is difficult to be read by OCR. Those words are given to us to read as CAPTCHA. When we read and enter the text then we we use our human mind to read the text. Here how computer know, if our entered text is correct or not!
It gives characters with known word and unknown words (already read and yet to read). When known word is correct then computer knows that it is human and not any automated system. So, validation process is over. Now, when you enter the yet to read word by computer (those tough to read words) then computer just store it and pass it to other human (as CAPTCHA) and then compare it with all. This way, CAPTCHA is helping digitizing the library and old newspapers.
Read two interesting article here about CAPTCHA.