ldelsarte Posted March 22, 2019 Share Posted March 22, 2019 I have a "net etiquette" and technical problem. Quite often, I find fantastic original Atari documents scanned and posted on archive.org. Sadly, some of them are sometimes difficult to read because the ink has faded to very pale or the document was xeroxed too many time (not straight, binder holes, lots of black dots everywhere, etc).So, patiently, I extract all the pages with Adobe Acrobat (or other online tools). Then I try to "clean up" all the pages, one by one, with the assistance of GIMP and Paint.NET. I give them a new life with clear white background and new darker ink. Finally, I recreate a clean .PDF. The trouble is, I don't know how to contact the original publishers on archive.org to offer my "cleaner" version for him/her to publish. I don't want to publish these documents myself: that would be really disrespectful to the original publisher. I'm very grateful for all these documents, and I don't want to offend anyone, but some of these documents are really easier to read when "reworked" a little bit. Any idea or suggestion? Thank you. To illustrate my point:Original document "Atari 600XL 1983-07-01 Product Status Meeting Handout" --> https://archive.org/details/AtariA600XLProductStatusMeetingHandout A great document I really enjoyed reading ! Enclosed: My "reworked" version as well as other documents, that I also "reworked". Atari-600XL-1983-07-01-Product-Status-Meeting-Handout-(darker, easier to read).pdf Atari 810 Disk Peripheral Device Description (darker, easier to read).pdf Atari Disk Data Structures Tutorial (darker, easier to read).pdf Atari LOGO A proposed plan (Nov 10, 1982) (darker, easier to read).pdf Atari Disk File Manager Functional Description (darker, easier to read).pdf Atari Speech Handler External Reference Specification (darker, easier to read).pdf John Starkweather about PILOT (Date 23 Nov 1981) (darker, easier to read).pdf Atari Colleen-Candy RAM Memory Map (Date 07-03-1979, Rev. A) (darker, easier to read).pdf 11 Quote Link to comment Share on other sites More sharing options...
+Allan Posted March 22, 2019 Share Posted March 22, 2019 Kevin Savetz published these. 'Savetz'=Atariage name. Allan 4 Quote Link to comment Share on other sites More sharing options...
+slx Posted March 22, 2019 Share Posted March 22, 2019 Id assume that everyone contributing to the internet archive does it to preserve stuff, so if you made it better and easier to use, I cant imagine theyd be put off. You can still add a note that youre not the original uploader to the metadata to placate your conscience. 5 Quote Link to comment Share on other sites More sharing options...
Kyle22 Posted March 24, 2019 Share Posted March 24, 2019 (edited) Nevermind. Edited March 24, 2019 by Kyle22 Quote Link to comment Share on other sites More sharing options...
Savetz Posted March 29, 2019 Share Posted March 29, 2019 Hi @ldelsarte It's fine with me if you clean up and post documents. I'd recommend including links to the original scans, in case someone wants to see what something closer to what the original version looks like. thanks for helping make these old docs more readable. -Kevin 2 Quote Link to comment Share on other sites More sharing options...
+Nezgar Posted March 29, 2019 Share Posted March 29, 2019 Just a general question to those experienced in scanning in documents for preservation... I'm working on scanning in all of my Vantari User Group newsletters, but before I post them publicly I'd like to OCR them as best as possible, and ideally with human verification of 'low confidence' words to ensure the best searchability, rather than relying solely on the automatic guesses. The older documents that were printed on dot matrix printers especially difficult for OCR, and a very high percent of the words require corrections. I've so far been using the "Recognize Text" function of Adobe Acrobat, but the interface seems really kludgy.. I can't tell it areas of the page not to recognize, I can't mark certain uncertainties as not text, instead of deleting the text and press accept, or I have to switch to 'review recognized text' to allow me to click on a different word.. it would be nice to have a 'skip word' type option... It also seems that even if I go through this effort, when uploading to Internet Archive, they do their own OCR and throw away my own efforts already in the document... Are there better OCR workflows? 1 Quote Link to comment Share on other sites More sharing options...
ivop Posted March 29, 2019 Share Posted March 29, 2019 https://github.com/tesseract-ocr 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.