DIAL 2004 explores new technologies that promise to assist the integration of imaged documents within digital libraries. This workshop describes the state of the art and identifies urgent open problems. Its papers cover general DIA challenges arising within DLs, DL systems architectures, document-image retrieval, content extraction from document images for DLs, and specialized challenges to DIA methods posed by handwritten and/or historical documents.In the case of MIT Press, the digital library end user requirements were straightforward: PDF files with searchable text. ... (3) OCR engine analysis. in which ASCII/Unicode text is formed using the identified text regions in the previous step; (4) clean-up of ... the generation of the output PDF files using custom code based on the PDF specification, and (7) the schema-drive generation of the output XML files.
|Publisher||:||IEEE - 2004-01-01|