Jump to content

Photo

Test Files for Uncrunching/Parsing Basic Statements

Crunch Uncrunch Parse Basic Interpreter TiFile Decode INTSIM Files Tools

22 replies to this topic

#1 kl99 OFFLINE  

kl99

    Dragonstomper

  • 865 posts
  • Location:Vienna, Austria

Posted Mon Dec 7, 2015 3:03 AM

Here is something I have worked on the weeking.

It's some testfiles for un-crunching BASIC programs.

The txt files are the LIST output of those programs, so they are matching exactly the way the BASIC INTERPRETER does the uncrunching of each basic line of the program. (Classic99 was used with LIST "CLIP" to generate those txt files)
If you compare that to the available tools (TI99DIR, imagetool,...) you will see that all have some issues in recreating that same syntax.
Example:
Output from TI99dir of Line 4 of XBCMD1

4 ACCEPTVALIDATE("YN"):R$

Output from TiImageTool of Line 4 of XBCMD1

4 ACCEPT VALIDATE ("YN"):R$

while if you LIST the program in the TI99 you will see:

4 ACCEPT VALIDATE("YN"):R$

XBCMD1 contains examples for XB commands from A-P
XBCDM2 contains examples for XB commands from P-Z
XBCMD3 contains quote examples
XBCMD4 contains all characters within strings from 0 to 255. Here the txt File fails for line 100 because it interprets some control character.

 

The examples for XB commands are mostly taken from the XB Manual and are only extended by me if insufficient.

I will create some more testfiles and add them as Unit Tests to Web99, so in case i touch the code, I immediately see if I broke some behavior. Feel free to do the same for your projects or use the files to manually test your tool during development.

 

Btw: The programs can be loaded, but running them makes no sense, their purpose is to find out if your Tool decodes a TiFile into the Basic Source Code the same way the Basic Interpreter does.

 

Please provide feedback if you want to use it, so I can provide you with updates.

 

Attached File  unittest.dsk   180KB   14 downloads

Attached File  XBCMD1.txt   5.25KB   12 downloads

Attached File  XBCMD2.txt   2.21KB   7 downloads

Attached File  XBCMD3.txt   348bytes   7 downloads

Attached File  XBCMD4.txt   425bytes   8 downloads


Edited by kl99, Mon Dec 7, 2015 3:34 AM.


#2 kl99 OFFLINE  

kl99

    Dragonstomper

  • Topic Starter
  • 865 posts
  • Location:Vienna, Austria

Posted Mon Dec 7, 2015 3:33 AM

Here are some Examples of wrongly decoded statements:

 

TI99Dir:

31 OPEN #2:"WD1.TEST",SEQUENTIAL,DISPLAY,INPUT,VARIABLE

TiImageTool:

31 OPEN #2:"WD1.TEST",SEQUENTIAL,DISPLAY,INPUT,VARIABLE

Pc99 Bas2Asc:

31 OPEN #2:"WD1.TEST",SEQUENTIAL,DISPLAY,INPUT,VARIABLE

TI-99/4a Real Output:

31 OPEN #2:"WD1.TEST",SEQUENTIAL,DISPLAY ,INPUT ,VARIABLE

-

TI99Dir:

82 FOR C=1 TO-3

TI-99/4a Real Output:

82 FOR C=1 TO -3

-

TI99Dir:

48 DATA ""THIS HAS QUOTES""

Pc99 Bas2Asc:

48 DATA ""THIS HAS QUOTES""

TI-99/4a Real Output:

48 DATA """THIS HAS QUOTES"""

-

TI99Dir:

27 OPEN #23:"DSK.MYDISK.X",RELATIVE100,INTERNAL,UPDATE,FIXED

TI-99/4a Real Output:

27 OPEN #23:"DSK.MYDISK.X",RELATIVE 100,INTERNAL,UPDATE,FIXED

-

TiImageTool:

64 DISPLAY AT(R,C) SIZE(FIELDLEN) BEEP :X$

Pc99 Bas2Asc:

64 DISPLAY AT(R,C) SIZE(FIELDLEN) BEEP:X$

TI-99/4a Real Output:

64 DISPLAY AT(R,C)SIZE(FIELDLEN)BEEP:X$

-

TiImageTool:

74 PRINT 356; TAB (18);"NAME"

TI-99/4a Real Output:

74 PRINT 356;TAB(18);"NAME"

-

TI99Dir:

140 STR$=""""PHISHA""""

TiImageTool:

140 STR$="""""" PHISHA """"""

Pc99 Bas2Asc:

140 STR$="""" PHISHA""""

TI-99/4a Real Output:

140 STR$="""""" PHISHA""""""

-

TI99Dir:

190 STR$="" QHISHA""

Pc99 Bas2Asc:

190 STR$="" QHISHA""

TI-99/4a Real Output:

190 STR$=""" QHISHA"""

Edited by kl99, Mon Dec 7, 2015 5:38 AM.


#3 kl99 OFFLINE  

kl99

    Dragonstomper

  • Topic Starter
  • 865 posts
  • Location:Vienna, Austria

Posted Mon Dec 7, 2015 4:56 AM

xdt99 has astounding good results already. The only thing I could detect in this set of test files was that the spacing of the end of the line was not identical for some commands.

xdt99 v1.5.0

"22 RESTORE "

TI-99/4a Real Output:

"22 RESTORE"

The same is true if the statement ends with any of these tokens:

DELETE, END, NEXT, PRINT, STOP, SUBEND, SUBEXIT, TRACE, UNBREAK, UNTRACE.

 

The Output for XBCMD4 is even more identical to the LIST output on the TI screen than what LIST "CLIP" produces for this special characters test in Classic99.

Web99 was failing in some cases like Ti99Dir and TiImageTool. I already updated the code to have no single issue left.

 

Any more known tools that decode a PROGRAM into the Basic Source Code? I think I have one in the PC99 Emulator, but besides that?



#4 ralphb OFFLINE  

ralphb

    Dragonstomper

  • 623 posts
  • Location:Germany

Posted Mon Dec 7, 2015 12:02 PM

xdt99 has astounding good results already. The only thing I could detect in this set of test files was that the spacing of the end of the line was not identical for some commands.

 

Well, whitespace at the end of the line is considered insignificant  ;) , but I guess I could apply a strip() to each line.  Thanks for the hint, I guess I didn't even notice, as my diff tool won't highlight those.

 

Note that replicating the BASIC interpreter output is surprisingly difficult.  The biggest headaches are space/no space around parentheses, "::" vs ": :", DATA, and comments.  Your DATAs are quite benign; try some of these:

.

DATA IF THEN ELSE,USING BEEP REM PRINT
DATA ,A,,B,"",C,
DATA A B C, D E F ,"GHI"," J K "
DATA A, B ,"C", " C " ,,"", , "" ,
DATA 123, 1 2 3 ,1+2,1.1
DATA A:B,C;D,E(F),A"B"C,"A,B"
DATA DATA +~/ :: PRINT A,B
DATA DATA +~/ :: PRINT A,B!C
DATA " DATA +~/ :: PRINT A,B"
DATA A,100,A$,200,1E9,1E9X

.

You can have a look at my test cases; they're used for both encoding and decoding.

 

EDIT: To clarify, those programs together with their TI listing are on this disk image.


Edited by ralphb, Mon Dec 7, 2015 12:04 PM.


#5 mizapf OFFLINE  

mizapf

    River Patroller

  • 3,506 posts
  • Location:Germany

Posted Mon Dec 7, 2015 1:05 PM

TI-99/4a Real Output:

31 OPEN #2:"WD1.TEST",SEQUENTIAL,DISPLAY ,INPUT ,VARIABLE

I'd rather consider that as a glitch in the real iron. :)

Concerning TIImageTool, the only relevant issue in my view is
 

TiImageTool:

140 STR$="""""" PHISHA """"""
TI-99/4a Real Output:
140 STR$="""""" PHISHA""""""

because the string is actually different. I'll have a look.

 

Proper decoding concerning whitespace is really tiresome. You always think you've finally found some general rules that apply to all commands and situations, just to find an example that does not work as expected.



#6 Schmitzi OFFLINE  

Schmitzi

    River Patroller

  • 4,476 posts
  • ToXiC
  • Location:Germany

Posted Mon Dec 7, 2015 1:31 PM

.

Hi,

 

regarding this issue I´ve recognized, too:

Sometimes basic-programs just break because of a missing space in the basiccode

…melting2commands together…

 

I never tracked that, but wanted to, so now I don´t know on which "layer"

this happens in particular - maybe on a DSKs way to a TI or EMU -

 

But if so, here are some updated lists with File-XFER and HDX-related stuff,

what maybe can effect (basic-)files while on transfer to “the other side” (?)

 

Maybe, just as an overview for a better brainstorming :0)

 

 

EMULATORs:

pdf.gif  TI-99-EMULATORS-v2.04.pdf   38.05KB

 

XFER/HDX (without emulators):

pdf.gif  TI-99-4A-IO-XFER-HDX-related-NO-EMUs-V1.00a-beta.pdf   52.38KB

 

-

 

XFER/HDX ALL (with emulators):

pdf.gif  TI-99-4A-IO-XFER-HDX-related-ALL-V1.00a-beta.pdf   55.9KB

 

 

 

xXx schmitzi

 

RMSAAED



#7 kl99 OFFLINE  

kl99

    Dragonstomper

  • Topic Starter
  • 865 posts
  • Location:Vienna, Austria

Posted Mon Dec 7, 2015 2:44 PM

Well, even when you consider such spacings a small, minor or even wrong things done by the real interpreter.

The whole purpose of this is to have one type of standard / reference (which can only be the real iron) for decoding a Basic Source Code Listing from a binary.

The hash code to identify unique / duplicates of a Basic Program will be based on the Basic Source Text rather than the binary file.

This hash code will further be used to bind a TiFile to meta data, instructions, other TiFiles, pdfs, pics, comments, or even whole forum threads to it.

To be able to use that same data in/from other tools the hash code for a Program has to be the same across all the tools.

A Hash code is completely different if there is a single bit or byte different.

 

Quoting myself from the Web99 Thread on why it would be wrong to generate a hash based on the binary file:

 

1. Basic Programs:
If you want to find out whether two Basic Programs are identical, you shouldn't do a binary comparison of their PROGRAM Files.

We are dealing with Memory Images, saved as PROGRAM Files, the binary data depends on what has been in VDP Ram and Ram when doing the SAVE Operation. And later totally depends on how the program was written/edited. Basic Programs are not stored in logical order. There is a Line Number Table which points to memory areas containing the content for that one line. That could be spread anywhere within a large area of memory. The computer adds the last edited line on top of its VDP Ram, and there is no resorting of any data during SAVE. As a result two people typing in the same program from a magazine (without any errors) will most probably result in two different binary PROGRAM Files representing the same Basic Source Code.

In other words two PROGRAM Files can be different on binary level, but still contain the same Basic Program in them. They should therefore be identified as duplicates to each other.

Rather you need to extract the Basic Source Code from those PROGRAM Files and compare those in order to find out whether they are identical or not. Such a Source Code comparison could even blend out different Line Numbers (10s instead of 100s,...) and different Variable Names (N$ instead of NAME$) in your comparison. Further the embedded assembler code that might be contained within the PROGRAM File should be safed as such to not be lost.

When calculating the Hash ID for Basic Files, we make it based on the Basic Source Code.

 

So consider this a pre-step for features that will be coming.


Edited by kl99, Mon Dec 7, 2015 2:45 PM.


#8 kl99 OFFLINE  

kl99

    Dragonstomper

  • Topic Starter
  • 865 posts
  • Location:Vienna, Austria

Posted Mon Dec 7, 2015 3:06 PM

 

You can have a look at my test cases; they're used for both encoding and decoding.

 

EDIT: To clarify, those programs together with their TI listing are on this disk image.

 

Thank you VERY much for this Testfiles. Currently checking Web99 against them :)



#9 mizapf OFFLINE  

mizapf

    River Patroller

  • 3,506 posts
  • Location:Germany

Posted Mon Dec 7, 2015 3:26 PM

Well, even when you consider such spacings a small, minor or even wrong things done by the real interpreter.

The whole purpose of this is to have one type of standard / reference (which can only be the real iron) for decoding a Basic Source Code Listing from a binary.

The hash code to identify unique / duplicates of a Basic Program will be based on the Basic Source Text rather than the binary file.

 

I understand. What about skipping all whitespaces when calculating the hash code? The thing is that whitespace differences are usually syntactically insignificant (unless they occur within strings).



#10 ti99sim OFFLINE  

ti99sim

    Space Invader

  • 14 posts

Posted Thu Dec 10, 2015 4:41 PM

Any more known tools that decode a PROGRAM into the Basic Source Code? I think I have one in the PC99 Emulator, but besides that?

 

I haven't tried your test samples yet, but there is a 'list' utility included with TI-99/Sim that decodes BASIC programs.



#11 kl99 OFFLINE  

kl99

    Dragonstomper

  • Topic Starter
  • 865 posts
  • Location:Vienna, Austria

Posted Fri Dec 11, 2015 8:50 AM

Hi ti99sim! Yes i checked that utility out, however I wasn't able to compile as suggested on your website. I didn't find the sln file as recommened for Visual Studio.

I took the old precompiled 0.0.9 version for Windows and saw some statements different than real output. Later I compared the source code of your tool and figured that you changed a lot in the list utility itsself and posting the results of the old version would be irrelevant.

Didn't have time to contact you about my troubles compiling the latest version of ti99sim. Glad you are here! :)


Edited by kl99, Fri Dec 11, 2015 8:51 AM.


#12 ralphb OFFLINE  

ralphb

    Dragonstomper

  • 623 posts
  • Location:Germany

Posted Fri Dec 11, 2015 10:26 AM

The whole purpose of this is to have one type of standard / reference (which can only be the real iron) for decoding a Basic Source Code Listing from a binary.

The hash code to identify unique / duplicates of a Basic Program will be based on the Basic Source Text rather than the binary file.

This hash code will further be used to bind a TiFile to meta data, instructions, other TiFiles, pdfs, pics, comments, or even whole forum threads to it.

To be able to use that same data in/from other tools the hash code for a Program has to be the same across all the tools.

 

If you really strive for tool-independent, universally unique identifiers for all TI-related files I would probably hash the normalized token stream instead of the source code, and even this might fail for trickery such as embedded assembly code.

 

But to provide a simple example, attached are two plain BASIC files (without embedded assembly or other pointer manipulation!) that list the same, but produce different output:

 

Attached File  duplist.png   74.76KB   1 downloads

 

Yes, it's more of a tease (and a bit of a cheat) than an actual counter argument ...   ;)

 

In any case, I would very much welcome any support to organize and deduplicate my TI software collection, so I'm looking forward to your tool!

Attached Files



#13 kl99 OFFLINE  

kl99

    Dragonstomper

  • Topic Starter
  • 865 posts
  • Location:Vienna, Austria

Posted Sat Dec 12, 2015 7:34 AM

Very nice input.

I thought about using the tokenized format to calculate the Hash as well, but different to the binary have the lines sorted first.

There are some programs having multiple lines with the same number and/or multiple lines with 0. But I think I handle all cases of that already.

Well it might take some iterations for finding the best algo/method but the final goal is to really identify all duplicates.

Still have to check where the embedded assembler is stored in PROGRAM files, and how to best store that next to the Basic to be able to generate the final PROGRAM again when exporting it to the TI-99 or Emulators. Is any tool currently visualizing the embedded assembler in any sort?



#14 HackMac OFFLINE  

HackMac

    Chopper Commander

  • 164 posts
  • Skywalker
  • Location:Germany

Posted Thu Jan 7, 2016 7:52 AM

Great input and thank you all for the test cases! I saw that I need some more work for my Basic decoder in my TI-Disk Manager. There are some serious issues to solve...

 

By the way, is there any notation like the Backus–Naur Form of the TI (Extended) Basic syntax? I think it would be a great help for all developer to have an exact description of the syntax!

Unfortunately I don't know if such a description ever exists. But eventually someone of us have one or have in mind to create one? THAT would really be great :-)



#15 ralphb OFFLINE  

ralphb

    Dragonstomper

  • 623 posts
  • Location:Germany

Posted Mon Jan 11, 2016 10:42 AM

By the way, is there any notation like the Backus–Naur Form of the TI (Extended) Basic syntax? I think it would be a great help for all developer to have an exact description of the syntax!

 

Well, here's my version for xdt99, which is used for syntax coloring and semantic renaming.  Because of this it's written so that correct programs are colored correctly, but the opposite won't always hold.

 

But I'd guess the more accurate the grammar, the less readable it'll be ...



#16 HackMac OFFLINE  

HackMac

    Chopper Commander

  • 164 posts
  • Skywalker
  • Location:Germany

Posted Mon Jan 11, 2016 1:04 PM

Hey Ralph, this is great! Thanks for this contribution, I'll have a look at it soon.



#17 HackMac OFFLINE  

HackMac

    Chopper Commander

  • 164 posts
  • Skywalker
  • Location:Germany

Posted Mon Jan 18, 2016 10:46 AM

Well, here's my version for xdt99, which is used for syntax coloring and semantic renaming.  Because of this it's written so that correct programs are colored correctly, but the opposite won't always hold.
 
But I'd guess the more accurate the grammar, the less readable it'll be ...

 
Hi Ralph, I had a first look at your BNF, really nice work. I didn't look at all details but one of my first closer looks I had for the DISPLAY statement.
I'm not sure, but I think I found an issue.
 
Here is this, what is currently checked in in your git repository. At Line 97 of Xbas99.bnf you'll find:

s_display ::=
    W_DISPLAY
    (a_display+ (OP_COLON a_using)? (OP_COLON a_print)? |
     a_using (OP_COLON a_print)? |
     a_print)?

 This is what should be correct in my opinion:

s_display ::= 
    W_DISPLAY
    (a_display+ (OP_COLON a_using)? OP_COLON |
     a_using OP_COLON)?
    a_print?

 
 I only reviewed the syntax, I didn't checked what Intelij IDEA makes.


Edited by HackMac, Mon Jan 18, 2016 11:31 AM.


#18 ralphb OFFLINE  

ralphb

    Dragonstomper

  • 623 posts
  • Location:Germany

Posted Mon Jan 18, 2016 1:11 PM

Here is this, what is currently checked in in your git repository. At Line 97 of Xbas99.bnf you'll find:

s_display ::=
    W_DISPLAY
    (a_display+ (OP_COLON a_using)? (OP_COLON a_print)? |
     a_using (OP_COLON a_print)? |
     a_print)?

 This is what should be correct in my opinion:

s_display ::= 
    W_DISPLAY
    (a_display+ (OP_COLON a_using)? OP_COLON |
     a_using OP_COLON)?
    a_print?

 

.

Those are very similar, but your version lacks this valid statement:

.

DISPLAY USING "#"

.

Not a very useful statement, I admit, but it's valid Extended BASIC.

 

Do you have an example where my grammar rule does not apply?



#19 HackMac OFFLINE  

HackMac

    Chopper Commander

  • 164 posts
  • Skywalker
  • Location:Germany

Posted Mon Jan 18, 2016 1:40 PM

Do you have an example where my grammar rule does not apply?

 

Yes I see, my version is not correct. I din't try on 'real iron', but I compare it with the Basic documentation. And while thinking about "sah ich den Wald vor lauter Bäumen nicht mehr". I think I should have think about again tomorrow, when I'm relaxed. :-)



#20 Ksarul OFFLINE  

Ksarul

    Quadrunner

  • 5,139 posts

Posted Mon Jan 18, 2016 1:59 PM

That one is the same in English too: I couldn't see the forest for the trees (I have also seen "for the trees" as "because of all the trees").



#21 mizapf OFFLINE  

mizapf

    River Patroller

  • 3,506 posts
  • Location:Germany

Posted Mon Jan 18, 2016 2:16 PM

When you learn a really different language like Arabic (which I'm doing a little bit, not too seriously but some basics, just for interest's sake) you immediately realize how closely related German and English actually are.



#22 HackMac OFFLINE  

HackMac

    Chopper Commander

  • 164 posts
  • Skywalker
  • Location:Germany

Posted Mon Jan 18, 2016 2:50 PM

For your interests, some quotations from Wikipedia's article about the ancient language of the Anglo-Saxons:

 

"Old English or Anglo-Saxon is the earliest historical form of the English language, spoken in England and southern and eastern Scotland in the early Middle Ages. [...] Old English developed from a set of Anglo-Frisian or North Sea Germanic dialects originally spoken by Germanic tribes traditionally known as the Angles, Saxons, and Jutes. [...] Old English is one of the West Germanic languages, and its closest relatives are Old Frisian and Old Saxon. [...] Old English grammar is quite similar to that of modern German: nouns, adjectives, pronouns, and verbs have many inflectional endings and forms, and word order is much freer."



#23 HackMac OFFLINE  

HackMac

    Chopper Commander

  • 164 posts
  • Skywalker
  • Location:Germany

Posted Thu Jan 21, 2016 1:40 PM

Well, here's my version for xdt99, which is used for syntax coloring and semantic renaming.  Because of this it's written so that correct programs are colored correctly, but the opposite won't always hold.

 

Okay Ralph, I had another look at your BNF and found an issue you can optimize.

In your definition file you'll find from line 142 on following definition:

private a_open ::=
    W_RELATIVE | W_SEQUENTIAL |
    W_DISPLAY | W_INTERNAL |
    W_INPUT | W_OUTPUT | W_APPEND | W_UPDATE |
    W_FIXED nexpr? | W_VARIABLE nexpr? |
    W_PERMANENT

You can extend W_RELATIVE and W_SEQUENTIAL with an optional numeric expression, if you like.

 

P.S.

Did you received my PM?







Also tagged with one or more of these keywords: Crunch, Uncrunch, Parse, Basic, Interpreter, TiFile, Decode, INTSIM, Files, Tools

0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users