Search Engine for Hand Written Documents - History Meets Search Tech

0 comments
Thread Title:
Researchers create tool to automatically search handwritten historical documents
Thread Description:

The Center for Intelligent Information Retrieval department of the University of Massachusetts Amherst have created a manuscript retrieval system capable of scanning and understanding hand written documents.

Imagine the potential of that...

On scanning/searching George Washingtons Personal Diaries

The scanned pages of Washington’s papers can be searched by typing in a word such as “Washington” or “Virginia,” and the program produces a list of ranked pages showing where they appear.

Manmatha says, “Right now, searching a scanned handwritten document is very hard to do. Scanned historical documents are basically images, or pictures, and currently can only be searched if someone manually transcribes the documents or creates and index of their contents. This is time consuming and expensive to do. Given the cost, most handwritten documents are never transcribed or indexed,” Manmatha says. “But there is an enormous amount of handwritten, historical material.

According to Toni Rath, “The basic idea is analogous to searching text documents in one language, say French, using queries in another language, say English. This is usually done by learning models from documents written in both languages. By analogy, our system learns from a parallel body of transcribed scanned images. That is, the word images form a ‘visual language’ and the transcriptions are in English.” Once the model is learned it may be used for searching scanned pages for which no transcriptions are available.

story via slashdot