I have had a busy weekend. Thanks to Richard Packard I now have a nearly complete set of Packard's Progress. This weekend I have added volumes 5-8 to the online collection. The entry point remains:
Some people have suggested that it would be ideal to have searchable copies of the volumes. That is simply not practical for me to do. These PDF files are created from page images (scans). Searchable PDF files are normally created from word processor files, which Adobe Acrobat converts into searchable PDF files. In order to create searchable filed from page images those images would first have to be converted to text using an optical character recognition (OCR) program. The converted text can then be embedded in the PDF file so that it can be searched.
While I have some years of experience in creating searchable PDF files from word processor documents, and about twenty years of experience with optical character recognition, trying to create searchable PDFs of Packard's Progress would involve an immense amount of time, probably increasing the amount of time involved by a factor of at least fifty times (and possibly much, much more). That is because the first thirty-one volumes were created photocopying, cutting, pasting, and again photocopying. The material in those volumes comes from many sources, using a large variety of fonts and point sizes, and is sometimes of poor quality which can be read by a human mind but is difficult for a computer to work with. In order to be accurate optical character recognition needs to be trained to recognize each of the fonts and point sizes to be converted. Even then careful proofreading is required. Because of the variety of source material involved this would be an immense task, one that I cannot undertake given my work as Plymouth County, MA, Coordinator for the USGenWeb Project, the dozens of genealogical web sites that I maintain, and the dozens of genealogical message boards and mailing lists (including this one) in which I participate.
If someone else has the appropriate software, skills, and (most of all) possibly months of time to OCR the page images, then proofread and correct the result (and, preferably, have at least one other person proofread and correct), I have the capability to incorporate that text into the PDF files to make them basically searchable (a search would probably not point to the exact spot in a page image matching the search criteria, but would point to the correct page, and possibly to the approximate location in the page). The only alternative would be to type all of the text into word processor files, which would still need to be proofread and corrected (preferably by at least two people).
Dale H. Cook, Member, NEHGS and MA Society of Mayflower Descendants;
Plymouth Co. MA Coordinator for the USGenWeb Project
Administrator of http://plymouthcolony.net
Notify Administrator about this message?
|Home | Help | About Us | Site Index | Jobs | PRIVACY | Affiliate|
|© 2007 The Generations Network|