Cleaning and dusting off some old Atari scanned documents.

ldelsarte · March 22, 2019

I have a "net etiquette" and technical problem. Quite often, I find fantastic original Atari documents scanned and posted on archive.org.

Sadly, some of them are sometimes difficult to read because the ink has faded to very pale or the document was xeroxed too many time (not straight, binder holes, lots of black dots everywhere, etc).
So, patiently, I extract all the pages with Adobe Acrobat (or other online tools). Then I try to "clean up" all the pages, one by one, with the assistance of GIMP and Paint.NET. I give them a new life with clear white background and new darker ink. Finally, I recreate a clean .PDF.

The trouble is, I don't know how to contact the original publishers on archive.org to offer my "cleaner" version for him/her to publish. I don't want to publish these documents myself: that would be really disrespectful to the original publisher. I'm very grateful for all these documents, and I don't want to offend anyone, but some of these documents are really easier to read when "reworked" a little bit. Any idea or suggestion?

Thank you.

To illustrate my point:
Original document "Atari 600XL 1983-07-01 Product Status Meeting Handout" --> https://archive.org/details/AtariA600XLProductStatusMeetingHandout A great document I really enjoyed reading !

Enclosed: My "reworked" version as well as other documents, that I also "reworked".

Atari-600XL-1983-07-01-Product-Status-Meeting-Handout-(darker, easier to read).pdf

Atari 810 Disk Peripheral Device Description (darker, easier to read).pdf

Atari Disk Data Structures Tutorial (darker, easier to read).pdf

Atari LOGO A proposed plan (Nov 10, 1982) (darker, easier to read).pdf

Atari Disk File Manager Functional Description (darker, easier to read).pdf

Atari Speech Handler External Reference Specification (darker, easier to read).pdf

John Starkweather about PILOT (Date 23 Nov 1981) (darker, easier to read).pdf

Atari Colleen-Candy RAM Memory Map (Date 07-03-1979, Rev. A) (darker, easier to read).pdf

+Allan · March 22, 2019

Kevin Savetz published these. 'Savetz'=Atariage name.

Allan

+slx · March 22, 2019

Id assume that everyone contributing to the internet archive does it to preserve stuff, so if you made it better and easier to use, I cant imagine theyd be put off. You can still add a note that youre not the original uploader to the metadata to placate your conscience.

Kyle22 · March 24, 2019

Nevermind.

Edited March 24, 2019 by Kyle22

Savetz · March 29, 2019

Hi @ldelsarte

It's fine with me if you clean up and post documents. I'd recommend including links to the original scans, in case someone wants to see what something closer to what the original version looks like.

thanks for helping make these old docs more readable.

-Kevin

+Nezgar · March 29, 2019

Just a general question to those experienced in scanning in documents for preservation...

I'm working on scanning in all of my Vantari User Group newsletters, but before I post them publicly I'd like to OCR them as best as possible, and ideally with human verification of 'low confidence' words to ensure the best searchability, rather than relying solely on the automatic guesses.

The older documents that were printed on dot matrix printers especially difficult for OCR, and a very high percent of the words require corrections.

I've so far been using the "Recognize Text" function of Adobe Acrobat, but the interface seems really kludgy.. I can't tell it areas of the page not to recognize, I can't mark certain uncertainties as not text, instead of deleting the text and press accept, or I have to switch to 'review recognized text' to allow me to click on a different word.. it would be nice to have a 'skip word' type option...

It also seems that even if I go through this effort, when uploading to Internet Archive, they do their own OCR and throw away my own efforts already in the document...

Are there better OCR workflows?

ivop · March 29, 2019

https://github.com/tesseract-ocr

Sign In

Cleaning and dusting off some old Atari scanned documents.

Recommended Posts

ldelsarte

Link to comment

Share on other sites

+Allan

Link to comment

Share on other sites

+slx

Link to comment

Share on other sites

Kyle22

Link to comment

Share on other sites

Savetz

Link to comment

Share on other sites

+Nezgar

Link to comment

Share on other sites

ivop

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Apps

My Activity Streams

More