Bee Posted December 24, 2019 Share Posted December 24, 2019 So the one thing I would like to be able to do is view PDFs on my A8. Is there such a program? When searching for this i get 2 results, a million PDfs about A8's or Empty. Has this been done, am I looking for something that has not been made? Thank you Quote Link to comment Share on other sites More sharing options...
Gunstar Posted December 24, 2019 Share Posted December 24, 2019 (edited) The A8 isn't capable of displaying all that goes into most or all PDF's (most PDF's are larger than even the largest memory upgrades for the A8, and all the non-text visuals). So it doesn't exist. however, I'd like to see an A8 app that can strip just the text out to view on an A8. That would be cool. But since PDF's aren't saved as any form of text, but are graphic visual representations of text and images, I doubt an app could be made for the A8 to strip just the text out. The app to do this would need to be on a PC and then allows one to save the text in standard .txt or.doc file. Then you could load the text into a viewer or Atari word processor (probably something like The Last Word processor since it can use upto 320K for text files) I've never tried, but Adobe's software might have an option to change it to text-only, then the text file might be loaded into The Last Word. Edited December 24, 2019 by Gunstar 2 Quote Link to comment Share on other sites More sharing options...
Bee Posted December 24, 2019 Author Share Posted December 24, 2019 I wonder if I can set this up as a script - https://online.pdfconvertertools.com/wim/static/wi/main.html?tp=wi&v=40.8&gnum=15&cid=8594&kw=pdf to text converter&gclid=CjwKCAiAi4fwBRBxEiwAEO8_HkHJCdQdGULpuMon9j0z_Y8A2YmvImqECtVRnqJLq-EbflYyua5dFhoCA2MQAvD_BwE&clickid=77624039990&cachecode=Q2icH4pzxMbJhYrvnvAQqg==&fcid=8404 and then dump it into a shared directory over SIO2PC with a check for file size and part it out if it's too big. Thanks for the idea. Quote Link to comment Share on other sites More sharing options...
tschak909 Posted December 24, 2019 Share Posted December 24, 2019 I am currently putting together materials to rasterize a PDF stream for output for cloud printing from an ESP device. (I say the following coming from an experience with writing PostScript by hand:) The more I dig into PDF, the more completely and utterly horrified I am that it is pushed as a long term archival format. It makes so many cardinal sins that long term archival formats should NEVER do: * Mixing of textual and binary data forms * Direct output of internal object graphs * No rosetta stone for decoding the file data from the file itself, you literally have to understand any and all implicit internal contexts of PDF parsers. * mixing of device independent and device dependent forms in the same chunks of data It's a clusterfuck. -Thom 2 1 Quote Link to comment Share on other sites More sharing options...
Fuji-Man Posted December 25, 2019 Share Posted December 25, 2019 Push a text file the tnfs server, then using enscript followed by ps2pdf? Quote Link to comment Share on other sites More sharing options...
ZylonBane Posted December 25, 2019 Share Posted December 25, 2019 14 hours ago, Gunstar said: But since PDF's aren't saved as any form of text... Completely wrong. PDFs store textual content as text, displayed using an embedded font. That's why you can copy-paste text out of a PDF, and edit them in Acrobat. Quote Link to comment Share on other sites More sharing options...
tschak909 Posted December 25, 2019 Share Posted December 25, 2019 (edited) @ZylonBane seriously dude, you are an arse. He's referring to the fact that PDF files are not in an easily parseable format. Something I'm knee deep in at the moment, and have spent a chunk of my professional career having to deal with (data transformation) PDF files are NOT textual. They are a mash of object graphs, some of them text, some of them binary. They are _VERY_ difficult to parse. I am speaking from actual experience. -Thom Edited December 25, 2019 by tschak909 1 Quote Link to comment Share on other sites More sharing options...
TGB1718 Posted December 25, 2019 Share Posted December 25, 2019 17 hours ago, tschak909 said: I am currently putting together materials to rasterize a PDF stream for output for cloud printing from an ESP device. (I say the following coming from an experience with writing PostScript by hand:) The more I dig into PDF, the more completely and utterly horrified I am that it is pushed as a long term archival format. It makes so many cardinal sins that long term archival formats should NEVER do: * Mixing of textual and binary data forms * Direct output of internal object graphs * No rosetta stone for decoding the file data from the file itself, you literally have to understand any and all implicit internal contexts of PDF parsers. * mixing of device independent and device dependent forms in the same chunks of data It's a clusterfuck. -Thom Its always the same when a "Standard" gets chosen, it's not always the best that wins, remember BetaMax, VHS and V2000 of the 3 VHS was the worst and it won, V2000 the best, but was always in 3rd place 1 Quote Link to comment Share on other sites More sharing options...
R0ger Posted December 26, 2019 Share Posted December 26, 2019 (edited) Well with PDF it's more of a historic issue. It started simple. It didn't last simple. Anyway I can't come up with any format less suitable for A8 right now Edited December 26, 2019 by R0ger 1 1 Quote Link to comment Share on other sites More sharing options...
snicklin Posted December 26, 2019 Share Posted December 26, 2019 And I thought that the RTF format was bad.... 1 Quote Link to comment Share on other sites More sharing options...
kogden Posted December 26, 2019 Share Posted December 26, 2019 PDF is like a mutant version of PostScript. It would be so beyond slow to build an interpreter that would be usable on a 6502. Plus the fact that many people scanning books and magazines do it as an image and embed it in a PDF instead of OCRing the text. Trying running CPEGview and looking at a JPG on your 8bit. A PDF interpreter would make that look fast. Best bet is just to strip the text from PDF, convert to ATASCII and send it on the the 8bit. There's plenty of Linux CLI tools that could help there. Quote Link to comment Share on other sites More sharing options...
Bee Posted December 26, 2019 Author Share Posted December 26, 2019 (edited) And we go full Circle now - I work with people trying it import SVGs to design software. Often it will not work. This is because it is not a real SVG but a bitmap in a SVG wrapper. I'm fine with the processing taking place off CPU, just that I can get the text content. However I'm seeing the limitations that might be a problem in just file size alone. I'm looking at a Pi Zero W as helper CPU for my A8 anyway. Thx Edited December 26, 2019 by Bee Quote Link to comment Share on other sites More sharing options...
ZylonBane Posted December 28, 2019 Share Posted December 28, 2019 On 12/24/2019 at 11:13 PM, tschak909 said: He's referring to the fact that PDF files are not in an easily parseable format. Please, this is Gunstar we're talking about here. He meant no such thing. Look at the full sentence: "But since PDF's aren't saved as any form of text, but are graphic visual representations of text and images...". He obviously thinks text in PDFs is converted to bitmaps or vector outlines or something, discarding the original textual content. Quote Link to comment Share on other sites More sharing options...
_The Doctor__ Posted January 24, 2021 Share Posted January 24, 2021 There is nothing portable about it... complete misnomer... 1 Quote Link to comment Share on other sites More sharing options...
tschak909 Posted January 24, 2021 Share Posted January 24, 2021 (edited) I can speak to this, as I'm one of two people who have worked on the #FujiNet printing framework, which generates a PDF document from scratch (the other more important individual being @jeffpiep) PDF is an absolute nightmare to implement. It is as if the designers of PDF took all the good elements of PostScript, wiped their collective arses with it, and promptly flushed it down the toilet. The biggest mark against PDF being an archival quality format is the sheer amount of system specific information encoded in the file to make it display and print correctly on different platforms. On average, 35% to 40% of a file encoded in Acrobat contains system specific information, and so much of the font selection and measurement bias data is encoded here, to say nothing of system specific entities encoded in what are supposed to be standardized . THE EXTREME USAGE OF SYSTEM SPECIFIC DATA AND THE MIXING OF SYSTEM SPECIFIC DATA IN "PORTABLE" SECTIONS OF THE STANDARD CATEGORICALLY DISQUALIFIES PDF AS A LONG TERM ARCHIVAL STORAGE FORMAT. (and for those snarky enough to say "PostScript was worse!" ... sigh. PostScript never claimed to be an archival document format. It was specifically a printer language built around a FORTH interpreter. -Thom Edited January 24, 2021 by tschak909 3 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.