matthew180 Posted January 4, 2018 Share Posted January 4, 2018 I found some old program printouts, nice 9-pin dot-matrix all-caps 99/4A BASIC-code goodness. I took some nice high-res photos and tried running them through every OCR program (online and offline) I could find, but they all fail miserably. All the software seems to try and determine the language and make sense of the code as English (or whatever language it auto-detects when that fails). Does anyone know of a *dumb* OCR program that will just worry about individual characters, or any OCR designed for technical documentation, code, etc.? 1 Quote Link to comment Share on other sites More sharing options...
Opry99er Posted January 4, 2018 Share Posted January 4, 2018 I used to work for a guy who ran a data archiving warehouse. Companies would send documents in crates, we would scan them in and then run them through an OCR program. IIRC, there was very little error checking on the OCR, as he would send the companies both the scans and the searchable OCR docs. IIRC, it was called Abby or Addy or something of the like. Pretty sure it was commercial though. Quote Link to comment Share on other sites More sharing options...
Opry99er Posted January 4, 2018 Share Posted January 4, 2018 I looked it up.. it was Abbyy. This was in about 2007, so it would have been an older piece of software... they have more modern software available but its like $150 to purchase. Yikes.... Might find one of the older programs in public domain though. I cannot remember the actual software name, even after looking at the ones from that era. Quote Link to comment Share on other sites More sharing options...
Opry99er Posted January 4, 2018 Share Posted January 4, 2018 Additionally, if your primary interest is program recovery, I would be happy to type the programs up for you. Ill have some time this week, since the family is out of town. Quote Link to comment Share on other sites More sharing options...
+mizapf Posted January 4, 2018 Share Posted January 4, 2018 Did you already try Tesseract? Quote Link to comment Share on other sites More sharing options...
matthew180 Posted January 4, 2018 Author Share Posted January 4, 2018 @Opry99er, Abbyy is one of the first programs to come up when you search for OCR, and their online service failed miserably, just like Google, and everyone else. Right now I only have one program that I found in one of my old COMPUTE! magazines. Apparently I made some fixes and changes though, since there is some of my chicken-scratch in the magazine. If you really want to type in the program, here is the link to the edition with the code (starts on pg 44): https://archive.org/details/1983-04-compute-magazine I really love the Internet Archive! (and yes I donate to it.) I'm actually considering getting rid of my paper magazines because they have archived everything I have in paper form. Anyway, I found the print-out tucked in the magazine and thought "gee, I should be able to OCR this in seconds these days, and I can put it up on AtariAge for people who like the old BASIC games..." Two hours later and I was still trying to find *any* software that could OCR the code and just leave it alone. @mizapf, Tesseract is the only program I did not try directly yet, mostly because it was not "quick and simple" and a lot of what I read is that it is a pain in the ass to use, you have to train it, blah blah blah. Also, apparently, it is the engine used in things like Google's OCR and such, and that failed for me too. However, I looked at the command line parms and you still have to specify a language, so I assume it will fail just as badly as everything else because it will be trying to find English words, which code it not. This was not supposed to be a huge endeavor, and I have already wasted more time trying to get it to work than it would have taken me to just type it again. Why is that always the way it goes when you are trying to do something you think should be simple by now? I have attached the first page if anyone wants to give it a try. I cropped the text and reduced the color to 1-bit black and white, otherwise the image is unmodified. I read that for OCR, characters need to be at least 10-pixels tall, which is easily the case here. After zooming in, I now see maybe the problem is too much data and separation between the dots from the printer. Maybe reducing the image to get the dots to merge would help? I don't know. Again the goal was quick-and-dirty OCR, which I did not achieve. 1 Quote Link to comment Share on other sites More sharing options...
Opry99er Posted January 4, 2018 Share Posted January 4, 2018 I didnt know Abbyy had an online OCR. Thats cool, but if it doesnt work, kind of useless to us. Perhaps their commercial software would be more versatile/useful... it is expensive though. At that document warehouse, it was a large printer, scanner, copier deal. I wasnt responsible for the scanning, I just hauled boxes from the 3 level warehouse off a pick list to the operator. It scanned directly to a computer where it was processed, then the documents were stored for 2 months and then we destroyed boxes of documents by the pallet-load. Quote Link to comment Share on other sites More sharing options...
ti99iuc Posted January 4, 2018 Share Posted January 4, 2018 (edited) I used this metod a pair of times and anyway needed to fix all errors after. i still haven't found a perfect one. Anyway Acrobat help me a bit. This is the result using the lowres image you attached converted using OCR of Acrobat Reader (it is not free of course ) post-24952-0-74083900-1515094244_thumb.pdf converted in txt.txt Edited January 4, 2018 by ti99iuc Quote Link to comment Share on other sites More sharing options...
+mizapf Posted January 4, 2018 Share Posted January 4, 2018 OK, I tried tesseract after applying a blur filter in Gimp to smear the pixels, then save as grayscale in jpg. 100 DIM BLOCK$(2),PLACE(2),BUILDING(32 2) 110 RANDOMIZE 5 120 REM BOMB CHARACTER 1 30 CALL CHAR ( 1 29 , " 00 1 CBEFF F FEE 1 COO “ ) 140 REM CROSSHAIR CHAR 150 CALL CHAR(130,"181818FFFF181818") 160 CALL CLEAR 170 CALL SCREEN(12) 180 FOR J=5 TO 8 190 CALL CDLDRiJ,5,16) 200 NEXT J 210 FOR J=9 TD 12 220 CALL CULUR(J,2,14) 230 NEXT J 240 T=G 250 P=0 250 9:0 270 "=0 280 CALL CLEAR 290 PRINT " AIR DEFENSE" 300 PRINT 310 PRINT 320 PRINT 330 PRINT " do you need instructions ?" 340 PRINT 350 PRINT " type Y or N" 360 FOR 1:1 TD 7 370 PRINT 380 NEXT I 390 CALL KEY(3,Y,STATUS) 400 IF STATUS=0 THEN 390 410 IF Y=ASC("N“)THEN ?50 420 IF Y=ASC£"Y")THEN 520 430 CALL CLEAR 440 PRINT 450 PRINT “ you did not press Y OF N." 450 FOR I=1 TB 13 470 PRINT 480 NEXT I 490 FOR DELAY=1 TO 500 Quote Link to comment Share on other sites More sharing options...
+mizapf Posted January 4, 2018 Share Posted January 4, 2018 BTW, I mentioned tesseract because I was highly surprised by its good recognition performance. I OCRed the HFDC manual that you can find on Whtech, and I also did that to the Editor/Assembler manual with only very little errors (less than 10 on a whole page). Apart from the extra work with gimp (which could be done automatically by ImageMagick in the command line), I just ran $ tesseract post.jpg listing One thing: There are lots of empty lines which are not shown in the listing above. Quote Link to comment Share on other sites More sharing options...
+mizapf Posted January 4, 2018 Share Posted January 4, 2018 OK, another try with ImageMagick. I had to experiment with the blur vector. $ convert -blur 11x2 post0.png post1.jpg $ tesseract post1.jpg listing yields this result: 100 DIM BLOCK$(2),PLACE(2),BUILDING(32 2) 110 RANDOMIZE 5 120 REM BOMB CHARACTER 130 CALL CHAR(129,"OOICBEFFFFBE1C00") 140 REM CROSSHAIR CHAR 150 CALL CHAR(130,"181818FFFF181818") 160 CALL CLEAR 170 CALL SCREEN(12) 180 FOR J=q TO 8 190 CALL COLDR(J,5,16) 200 NEXT J 210 FOR J=9 TO 12 220 CALL COLOR(J,2,14) 230 NEXT J 240 T=O 250 P=O 260 Q=O 270 M=O 280 CALL CLEAR 290 PRINT " AIR DEFENSE“ 300 PRINT 310 PRINT 320 PRINT 330 PRINT " do you need instructions ?" 340 PRINT 350 PRINT " type Y or N" 360 FOR I=1 TD 7 370 PRINT 380 NEXT I 190 CALL KEY(3,Y,STATUS) 400 IF STATUS=O THEN 390 410 IF Y=ASC("N")THEN 750 420 IF Y=ASC("Y")THEN 520 430 CALL CLEAR 440 PRINT 450 PRINT " you did not press Y or N." 460 FOR I=1 TD 13 470 PRINT 480 NEXT I 490 FOR DELAY=1 TO 500 Still some trouble with zero versus letter O, and O versus D. Quote Link to comment Share on other sites More sharing options...
matthew180 Posted January 4, 2018 Author Share Posted January 4, 2018 Nice. That is much better than anything I have tried so far. Thanks for the command line parameters, I might mess around with this a little more tonight. Quote Link to comment Share on other sites More sharing options...
+mizapf Posted January 4, 2018 Share Posted January 4, 2018 The trick is indeed to blur the picture suitably, maybe to apply some contrast afterwards. As I said, there are still some empty lines, but this is not too difficult to cope with. The language given to tesseract also determines the valid characters. In particular, when I want to make it recognize German, it has to check for umlaut characters (ä, ö, ü) which could be seen as a, o, u with some speckles in English. The same goes for French, Spanish etc. So it is not necessarily a question of vocabulary. Quote Link to comment Share on other sites More sharing options...
+Schmitzi Posted January 4, 2018 Share Posted January 4, 2018 Paperport 14.1 (no special settings, "many" errors) 100 DIM BLOCK$(2),PLACE(2),BUILD1NG(32,2)110 RANDOMIZE120 REM BOMB CHARACTER130 CALL CHAR(129,"001CBEFFFFBEIC00")140 REM CROSSHAIR CHAR150 CALL CHAR(130,"181818FFFF181818")160 CALL CLEAR170 CALL SCREEN (I2)180 FOR 3=5 TO 8190 CALL COLOR(J,506)200 NEXT J210 FOR J=9 TO 12220 CALL COLOR(3,2,14)230 NEXT240 T=0250 P=0260 Q=0270 M=0280 CALL CLEAR290 PRINT AIR DEFENSE"300 PRINT310 PRINT32 PRINT3W0 PRINT " do you need instructions ?"340 PRINT350 PRINT type or N"360 FOR :1=1 TO 7370 PRINT380 NEXT390 CALL KEY(3,Y,STATUS)400 IF STATUS=0 THEN 390410 IF Y=ASC("N")THEN 750420 IF Y=ASC("Y")THEN 520430 CALL CLEAR440 PRINT450 PRINT you did not press Y or Nm"460 FOR 1=1 TO 13470 PRINT480 NEXT490 FOR DELAY=I TO 500 Quote Link to comment Share on other sites More sharing options...
+Schmitzi Posted January 5, 2018 Share Posted January 5, 2018 AIRDEFENSE from the big PDF: So here we go. I had to change the variable ´DIGIT´ to ´DIGIT2´ in Lines 2530 + 2540. Don´t ask my why, and don´t ask me about more impacts AND, the Space Bar does not fire, and my basic is very rusty. Maybe somebody has an idea... ? If found, I can make a DSK or TIFILE from it. Let´s have fun Textfile: AirDefense-TI994A-TI-Basic.txt Code: 90 REM AIR DEFENSE BY T.L.WAHL91 TI-BASIC OR EXTENDED BASIC 100 DIM BLOCKS(2),PLACE(2),BUILDING(32,2)110 RANDOMIZE120 REM BOMB CHARACTER130 CALL CHAR(129,"001CBEFFFFBE1C00")140 REM CROSSHAIR CHARACTER150 CALL CHAR(130,"181818FFFF181818")160 CALL CLEAR170 CALL SCREEN(12)180 FOR J=5 TO 8190 CALL COLOR(J,5,16)200 NEXT J210 FOR J=9 TO 12220 CALL COLOR(J,2,14)230 NEXT J240 T=0250 P=0260 Q=0270 M=0280 CALL CLEAR290 PRINT " AIR DEFENSE"300 PRINT310 PRINT " BY T.L. WAHL"320 PRINT321 PRINT322 PRINT 330 PRINT " do you need instructions ?"340 PRINT350 PRINT " type Y or N"360 FOR I=1 TO 7370 PRINT380 NEXT I390 CALL KEY(3,Y,STATUS)400 IF STATUS=0 THEN 390410 IF Y=ASC("N") THEN 750420 IF Y=ASC("Y") THEN 520430 CALL CLEAR440 PRINT450 PRINT " you did not press Y or N."460 FOR I=1 TO 13470 PRINT480 NEXT I490 FOR DELAY=1 TO 500500 NEXT DELAY510 GOTO 280520 CALL CLEAR530 PRINT " YOU MUST STOP THE FALLING"540 PRINT " BOMB BY EXPLODING IT IN"550 PRINT " MID-AIR"560 PRINT570 PRINT " -MOVE THE CROSSHAIR-"580 PRINT590 PRINT " left :HOLD THE s KEY"600 PRINT " right:HOLD THE d KEY"610 PRINT " up :HOLD THE e KEY"620 PRINT " down :HOLD THE x KEY"630 PRINT640 PRINT "WHEN THE BOMB AND THE"650 PRINT "CROSSHAIR ARE LINED UP,"660 PRINT "FIRE BY PRESSING THE SPACE"670 PRINT "BAR. THE SOONER YOU BET THE"680 PRINT "BOMB THE HIGHER YOUR SCORE."690 PRINT700 PRINT710 PRINT720 PRINT " PRESS any key T0 BEGIN"730 CALL KEY(0,S,STATUS)740 IF STATUS=0 THEN 730750 CALL CLEAR760 CALL COLOR(8,2,1)770 PRINT " GOOD LUCK!!!"780 FOR I=1 TO 10790 PRINT800 NEXT I810 IF R=ASC("R") THEN 840820 GOSUB 2090830 GOTO 860840 FOR I=1 TO 250850 NEXT I860 CALL CLEAR870 GOSUB 2300880 IF T=20 THEN 1860890 T=T+1900 CCROSS=16910 RCROSS=21920 RBOMB=1930 CALL SCREEN(6)940 CBOMB=INT(RND*29)+2950 H$=STR$(T)960 ROW=2970 COL=3980 GOSUB 2520990 SCORE=P*Q*101000 H$=STR$(SCORE)1010 ROW=51020 GOSUB 25201030 FOR I=1 TO 701040 NEXT I1050 FOR I=2 TO 5 STEP 31060 CALL HCHAR(I,3,32,6)1070 NEXT I1080 OLDRCROSS=RCROSS1090 OLDCCROSS=CCROSS1100 CALL KEY(0,A,STATUS)1110 IF A<>ASC("E") THEN 11301120 RCROSS=RCROSS-SGN(RCROSS-1)1130 IF A<>ASC("X") THEN 11501140 RCROSS=RCROSS+SGN(22-RCROSS)1150 IF A<>ASC("D") THEN 11701160 CCROSS=CCROSS+SGN(31-CCROSS)1170 IF A<>ASC("S") THEN 11901180 CCROSS=CCROSS-SGN(CCROSS-2)1190 IF RBOMB=1 THEN 12101200 CALL VCHAR(RBOMB-1,CBOMB,32)1210 IF (RCROSS=OLDRCROSS)*(CCROSS=OLDCCROSS) THEN 12301220 CALL VCHAR(OLDRCROSS,OLDCCROSS,32)1230 CALL VCHAR(RCROSS,CCROSS,130)1240 CALL VCHAR(RBOMB,CBOMB,129)1250 RBOMB=RBOMB+11260 IF RBOMB=23 THEN 15401270 IF (RCROSS=RBOMB-1)*(CCROSS=CBOMC) THEN 12901280 GOTO 10801290 CALL KEY(0,B,STATUS)1300 IF B=32 THEN 13301310 GOTO 10801320 REM BOMB DESTROYED1330 RBOMB=RBOMB-11340 CALL SCREEN(10)1350 CALL VCHAR(RBOMB,CBOMB,32)1360 CNT=01370 C1=921380 C2=471390 FOR I=-1 TO 1 STEP 21400 CALL VCHAR(RBOMB+I,CBOMB+I,C1)1410 CALL VCHAR(RBOMB+I,CBOMB-I,C2)1420 NEXT I1430 C1=321440 C2=321450 IF CNT=1 THEN 15101460 CNT=11470 FOR VOL=10 TO 30 STEP 51480 CALL SOUND(100,-6,VOL)1490 NEXT VOL1500 GOTO 13901510 P=P+11520 Q=Q+(23-RBOMB)1530 GOTO 8801540 REM BOMB HITS THE CITY1550 CALL VCHAR(22,CBOMB,32)1560 CALL SCREEN(9)1570 CALL COLOR(12,11,1)1580 CALL VCHAR(23,CBOMB-1,122)1590 CALL VCHAR(23,CBOMB,32)1600 CALL VCHAR(23,CBOMB+1,123)1610 CALL VCHAR(24,CBOMB-1,124)1620 CALL VCHAR(24,CBOMB,125)1630 CALL VCHAR(24,CBOMD+1,126)1640 FOR I=1 TO 201650 NEXT I1660 CALL COLOR(12,7,1)1670 CALL SCREEN(12)1680 FOR I=1 TO 201690 NEXT I1700 CALL SCREEN(7)1710 FOR VOL=24 TO 1 STEP 41720 CALL SOUND(200 -7,VOL)1730 NEXT VOL1740 FOR DVOL=1 TO 24 STEP 41750 CALL SOUND(200,-7,DVOL)1760 NEXT DVOL1770 FOR J=23 TO 241780 FOR I=CBOMB-1 TO CBOMB+11790 CALL VCHAR(J,I,32)1800 NEXT I1810 NEXT J1820 CALL VCHAR(RCROSS,CCROSS,32)1830 CALL COLOR(12,2,14)1840 M=M+11850 GOTO 8801860 CALL CLEAR1870 CALL SCREEN(4)1880 CALL COLOR(8,5,16)1990 PRINT " GAME OVER"1900 FOR I=1 TO 41910 PRINT1920 NEXT I1930 PRINT " DESTROYED ";P1940 PRINT1950 PRINT " MISSED ";M1960 PRINT1970 PRINT " TOTAL POINTS";P*Q*101980 FOR I=1 TO 41990 PRINT2000 NEXT I2010 PRINT " PRESS r TO PLAY AGAIN"2020 PRINT2030 PRINT2040 CALL KEY(0,R,STATUS)2050 IF STATUS=0 THEN 20402060 IF R=ASC("R") THEN 1602070 END2080 REM READ CITY DATA2090 FOR ROW=2 TO 1 STEP -12100 FOR COL=1 TO 322110 READ BUILDING(COL,ROW)2120 NEXT COL2130 NEXT ROW2140 REM CUSTOM CHAR & COLORS2150 CALL CHAR(136,"FFABFFABFFABFFFF")2160 CALL CHAR(128,"003C7EFFFFFF7E42")2180 CALL CHAR(132,"6060606060606060")2190 CALL CHAR(133,"607858F8D8F8D8F8")2200 CALL CHAR(134,"F8A8F8A8F8A8F8F8")2210 CALL CHAR(135,"C3C3FFABFFABFFFF")2220 CALL COLOR(14,7,12)2230 CALL CHAR(122,"8040201008040201")2240 CALL CHAR(123,"0102040810204080")2250 CALL CHAR(124,"80E0F8FEFFFFFFFF")2260 CALL CHAR(125,"814224180081C3E7")2270 CALL CHAR(126,"01071F7FFFFFFFFF")2280 RETURN2290 REM SET UP CITY2300 FOR ROW=2 TO 1 STEP -12310 FOR COL=1 TO 322320 BLOCK$(ROW)=BLOCK$(ROW)&CHR$(BUILDING(COL,ROW))2330 NEXT COL2340 NEXT ROW2350 FOR ROW=2 TO 1 STEP -12360 FOR COL=1 TO 322370 PLACE(ROW)=ASC(SEG$(BLOCK$(ROW),COL,1))2380 CALL HCHAR(ROW+22,COL,PLACE(ROW))2390 NEXT COL2400 NEXT ROW2410 RETURN2420 REM CITY DATA2430 DATA 136,134,131,135,133,136,136,1332440 DATA 135,136,136,136,133,136,136,1352450 DATA 135,136,136,134,133,136,136,1362460 DATA 135,132,136,32,131,135,132,1352470 DATA 134,133,128,32,132,32,135,322480 DATA 32,32,134,132,132,32,133,322490 DATA 32,32,128,32,132,32,133,1352500 DATA 32,132,132,32,128,32,132,322510 REM HORIZONTAL # PRINTER2520 FOR I=1 TO LEN(H$)2530 DIGIT2=ASC(SEG$(H$,I,1))2540 CALL HCHAR(ROW,COL+I,DIGIT2)2550 NEXT I2560 RETURN 1 Quote Link to comment Share on other sites More sharing options...
Opry99er Posted January 5, 2018 Share Posted January 5, 2018 Sweet!!! Quote Link to comment Share on other sites More sharing options...
matthew180 Posted January 5, 2018 Author Share Posted January 5, 2018 Nice job! Did you type that in or OCR the PDF? Quote Link to comment Share on other sites More sharing options...
ralphb Posted January 5, 2018 Share Posted January 5, 2018 About ten years ago, I signed up for the OmniScan SDK, which even required signing an NDA. My idea was to create a BASIC "language" that would run after the raw recognition to remove the most obvious errors. For each letter on the page, OmniScan creates a list of potential glyphs, together with their probabilities. In theory, you could select those glyphs with the highest probability that made sense according to a BASIC grammar. To distinguish variables, say A0$ from AO$, you'd need even more heuristics. All in all, this seemed too time-consuming to me, so it never went anywhere. 2 Quote Link to comment Share on other sites More sharing options...
+Schmitzi Posted January 5, 2018 Share Posted January 5, 2018 I made screenshot-snips from the PDF, to have the right sequenze. Then I made a new, searchable PDF with OmniPage, and copied the text into a TXT-file. hen I had manually to check and correct against the original PDF. was some work Please do not forget my comment that fire does not work (or don´t I know how to play it ?) And the mystery with the DIGIT variable, I would like to know the reason Quote Link to comment Share on other sites More sharing options...
+OLD CS1 Posted January 6, 2018 Share Posted January 6, 2018 What engine does Neat Receipts use? Supposedly that system is very accurate with receipts, which are far worse than magazine listings. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.