Jump to content
IGNORED

Test Files for Uncrunching/Parsing Basic Statements


kl99

Recommended Posts

Here is something I have worked on the weeking.

It's some testfiles for un-crunching BASIC programs.

The txt files are the LIST output of those programs, so they are matching exactly the way the BASIC INTERPRETER does the uncrunching of each basic line of the program. (Classic99 was used with LIST "CLIP" to generate those txt files)
If you compare that to the available tools (TI99DIR, imagetool,...) you will see that all have some issues in recreating that same syntax.
Example:
Output from TI99dir of Line 4 of XBCMD1

4 ACCEPTVALIDATE("YN"):R$

Output from TiImageTool of Line 4 of XBCMD1

4 ACCEPT VALIDATE ("YN"):R$

while if you LIST the program in the TI99 you will see:

4 ACCEPT VALIDATE("YN"):R$

XBCMD1 contains examples for XB commands from A-P
XBCDM2 contains examples for XB commands from P-Z
XBCMD3 contains quote examples
XBCMD4 contains all characters within strings from 0 to 255. Here the txt File fails for line 100 because it interprets some control character.

 

The examples for XB commands are mostly taken from the XB Manual and are only extended by me if insufficient.

I will create some more testfiles and add them as Unit Tests to Web99, so in case i touch the code, I immediately see if I broke some behavior. Feel free to do the same for your projects or use the files to manually test your tool during development.

 

Btw: The programs can be loaded, but running them makes no sense, their purpose is to find out if your Tool decodes a TiFile into the Basic Source Code the same way the Basic Interpreter does.

 

Please provide feedback if you want to use it, so I can provide you with updates.

 

unittest.dsk

XBCMD1.txt

XBCMD2.txt

XBCMD3.txt

XBCMD4.txt

Edited by kl99
  • Like 1
Link to comment
Share on other sites

Here are some Examples of wrongly decoded statements:

 

TI99Dir:

31 OPEN #2:"WD1.TEST",SEQUENTIAL,DISPLAY,INPUT,VARIABLE

TiImageTool:

31 OPEN #2:"WD1.TEST",SEQUENTIAL,DISPLAY,INPUT,VARIABLE

Pc99 Bas2Asc:

31 OPEN #2:"WD1.TEST",SEQUENTIAL,DISPLAY,INPUT,VARIABLE

TI-99/4a Real Output:

31 OPEN #2:"WD1.TEST",SEQUENTIAL,DISPLAY ,INPUT ,VARIABLE

-

TI99Dir:

82 FOR C=1 TO-3

TI-99/4a Real Output:

82 FOR C=1 TO -3

-

TI99Dir:

48 DATA ""THIS HAS QUOTES""

Pc99 Bas2Asc:

48 DATA ""THIS HAS QUOTES""

TI-99/4a Real Output:

48 DATA """THIS HAS QUOTES"""

-

TI99Dir:

27 OPEN #23:"DSK.MYDISK.X",RELATIVE100,INTERNAL,UPDATE,FIXED

TI-99/4a Real Output:

27 OPEN #23:"DSK.MYDISK.X",RELATIVE 100,INTERNAL,UPDATE,FIXED

-

TiImageTool:

64 DISPLAY AT(R,C) SIZE(FIELDLEN) BEEP $

Pc99 Bas2Asc:

64 DISPLAY AT(R,C) SIZE(FIELDLEN) BEEP:X$

TI-99/4a Real Output:

64 DISPLAY AT(R,C)SIZE(FIELDLEN)BEEP:X$

-

TiImageTool:

74 PRINT 356; TAB (18);"NAME"

TI-99/4a Real Output:

74 PRINT 356;TAB(18);"NAME"

-

TI99Dir:

140 STR$=""""PHISHA""""

TiImageTool:

140 STR$="""""" PHISHA """"""

Pc99 Bas2Asc:

140 STR$="""" PHISHA""""

TI-99/4a Real Output:

140 STR$="""""" PHISHA""""""

-

TI99Dir:

190 STR$="" QHISHA""

Pc99 Bas2Asc:

190 STR$="" QHISHA""

TI-99/4a Real Output:

190 STR$=""" QHISHA"""
Edited by kl99
  • Like 1
Link to comment
Share on other sites

xdt99 has astounding good results already. The only thing I could detect in this set of test files was that the spacing of the end of the line was not identical for some commands.

xdt99 v1.5.0

"22 RESTORE "

TI-99/4a Real Output:

"22 RESTORE"

The same is true if the statement ends with any of these tokens:

DELETE, END, NEXT, PRINT, STOP, SUBEND, SUBEXIT, TRACE, UNBREAK, UNTRACE.

 

The Output for XBCMD4 is even more identical to the LIST output on the TI screen than what LIST "CLIP" produces for this special characters test in Classic99.

Web99 was failing in some cases like Ti99Dir and TiImageTool. I already updated the code to have no single issue left.

 

Any more known tools that decode a PROGRAM into the Basic Source Code? I think I have one in the PC99 Emulator, but besides that?

Link to comment
Share on other sites

xdt99 has astounding good results already. The only thing I could detect in this set of test files was that the spacing of the end of the line was not identical for some commands.

 

Well, whitespace at the end of the line is considered insignificant ;) , but I guess I could apply a strip() to each line. Thanks for the hint, I guess I didn't even notice, as my diff tool won't highlight those.

 

Note that replicating the BASIC interpreter output is surprisingly difficult. The biggest headaches are space/no space around parentheses, "::" vs ": :", DATA, and comments. Your DATAs are quite benign; try some of these:

.

DATA IF THEN ELSE,USING BEEP REM PRINT
DATA ,A,,B,"",C,
DATA A B C, D E F ,"GHI"," J K "
DATA A, B ,"C", " C " ,,"", , "" ,
DATA 123, 1 2 3 ,1+2,1.1
DATA A:B,C;D,E(F),A"B"C,"A,B"
DATA DATA +~/ :: PRINT A,B
DATA DATA +~/ :: PRINT A,B!C
DATA " DATA +~/ :: PRINT A,B"
DATA A,100,A$,200,1E9,1E9X

.

You can have a look at my test cases; they're used for both encoding and decoding.

 

EDIT: To clarify, those programs together with their TI listing are on this disk image.

Edited by ralphb
  • Like 1
Link to comment
Share on other sites

TI-99/4a Real Output:

31 OPEN #2:"WD1.TEST",SEQUENTIAL,DISPLAY ,INPUT ,VARIABLE

I'd rather consider that as a glitch in the real iron. :)

 

Concerning TIImageTool, the only relevant issue in my view is

 

TiImageTool:

140 STR$="""""" PHISHA """"""
TI-99/4a Real Output:

140 STR$="""""" PHISHA""""""

because the string is actually different. I'll have a look.

 

Proper decoding concerning whitespace is really tiresome. You always think you've finally found some general rules that apply to all commands and situations, just to find an example that does not work as expected.

Link to comment
Share on other sites

.

Hi,

 

regarding this issue I´ve recognized, too:

Sometimes basic-programs just break because of a missing space in the basiccode

…melting2commands together…

 

I never tracked that, but wanted to, so now I don´t know on which "layer"

this happens in particular - maybe on a DSKs way to a TI or EMU -

 

But if so, here are some updated lists with File-XFER and HDX-related stuff,

what maybe can effect (basic-)files while on transfer to “the other side” (?)

 

Maybe, just as an overview for a better brainstorming :0)

 

 

EMULATORs:

pdf.gif TI-99-EMULATORS-v2.04.pdf 38.05KB

 

XFER/HDX (without emulators):

pdf.gif TI-99-4A-IO-XFER-HDX-related-NO-EMUs-V1.00a-beta.pdf 52.38KB

 

-

 

XFER/HDX ALL (with emulators):

pdf.gif TI-99-4A-IO-XFER-HDX-related-ALL-V1.00a-beta.pdf 55.9KB

 

 

 

xXx schmitzi

 

RMSAAED

Link to comment
Share on other sites

Well, even when you consider such spacings a small, minor or even wrong things done by the real interpreter.

The whole purpose of this is to have one type of standard / reference (which can only be the real iron) for decoding a Basic Source Code Listing from a binary.

The hash code to identify unique / duplicates of a Basic Program will be based on the Basic Source Text rather than the binary file.

This hash code will further be used to bind a TiFile to meta data, instructions, other TiFiles, pdfs, pics, comments, or even whole forum threads to it.

To be able to use that same data in/from other tools the hash code for a Program has to be the same across all the tools.

A Hash code is completely different if there is a single bit or byte different.

 

Quoting myself from the Web99 Thread on why it would be wrong to generate a hash based on the binary file:

 

1. Basic Programs:
If you want to find out whether two Basic Programs are identical, you shouldn't do a binary comparison of their PROGRAM Files.

We are dealing with Memory Images, saved as PROGRAM Files, the binary data depends on what has been in VDP Ram and Ram when doing the SAVE Operation. And later totally depends on how the program was written/edited. Basic Programs are not stored in logical order. There is a Line Number Table which points to memory areas containing the content for that one line. That could be spread anywhere within a large area of memory. The computer adds the last edited line on top of its VDP Ram, and there is no resorting of any data during SAVE. As a result two people typing in the same program from a magazine (without any errors) will most probably result in two different binary PROGRAM Files representing the same Basic Source Code.

In other words two PROGRAM Files can be different on binary level, but still contain the same Basic Program in them. They should therefore be identified as duplicates to each other.

Rather you need to extract the Basic Source Code from those PROGRAM Files and compare those in order to find out whether they are identical or not. Such a Source Code comparison could even blend out different Line Numbers (10s instead of 100s,...) and different Variable Names (N$ instead of NAME$) in your comparison. Further the embedded assembler code that might be contained within the PROGRAM File should be safed as such to not be lost.

When calculating the Hash ID for Basic Files, we make it based on the Basic Source Code.

 

So consider this a pre-step for features that will be coming.

Edited by kl99
Link to comment
Share on other sites

Well, even when you consider such spacings a small, minor or even wrong things done by the real interpreter.

The whole purpose of this is to have one type of standard / reference (which can only be the real iron) for decoding a Basic Source Code Listing from a binary.

The hash code to identify unique / duplicates of a Basic Program will be based on the Basic Source Text rather than the binary file.

 

I understand. What about skipping all whitespaces when calculating the hash code? The thing is that whitespace differences are usually syntactically insignificant (unless they occur within strings).

Link to comment
Share on other sites

Hi ti99sim! Yes i checked that utility out, however I wasn't able to compile as suggested on your website. I didn't find the sln file as recommened for Visual Studio.

I took the old precompiled 0.0.9 version for Windows and saw some statements different than real output. Later I compared the source code of your tool and figured that you changed a lot in the list utility itsself and posting the results of the old version would be irrelevant.

Didn't have time to contact you about my troubles compiling the latest version of ti99sim. Glad you are here! :)

Edited by kl99
Link to comment
Share on other sites

The whole purpose of this is to have one type of standard / reference (which can only be the real iron) for decoding a Basic Source Code Listing from a binary.

The hash code to identify unique / duplicates of a Basic Program will be based on the Basic Source Text rather than the binary file.

This hash code will further be used to bind a TiFile to meta data, instructions, other TiFiles, pdfs, pics, comments, or even whole forum threads to it.

To be able to use that same data in/from other tools the hash code for a Program has to be the same across all the tools.

 

If you really strive for tool-independent, universally unique identifiers for all TI-related files I would probably hash the normalized token stream instead of the source code, and even this might fail for trickery such as embedded assembly code.

 

But to provide a simple example, attached are two plain BASIC files (without embedded assembly or other pointer manipulation!) that list the same, but produce different output:

 

post-35214-0-84283300-1449850895_thumb.png

 

Yes, it's more of a tease (and a bit of a cheat) than an actual counter argument ... ;)

 

In any case, I would very much welcome any support to organize and deduplicate my TI software collection, so I'm looking forward to your tool!

duplist.zip

Link to comment
Share on other sites

Very nice input.

I thought about using the tokenized format to calculate the Hash as well, but different to the binary have the lines sorted first.

There are some programs having multiple lines with the same number and/or multiple lines with 0. But I think I handle all cases of that already.

Well it might take some iterations for finding the best algo/method but the final goal is to really identify all duplicates.

Still have to check where the embedded assembler is stored in PROGRAM files, and how to best store that next to the Basic to be able to generate the final PROGRAM again when exporting it to the TI-99 or Emulators. Is any tool currently visualizing the embedded assembler in any sort?

Link to comment
Share on other sites

  • 4 weeks later...

Great input and thank you all for the test cases! I saw that I need some more work for my Basic decoder in my TI-Disk Manager. There are some serious issues to solve...

 

By the way, is there any notation like the Backus–Naur Form of the TI (Extended) Basic syntax? I think it would be a great help for all developer to have an exact description of the syntax!

Unfortunately I don't know if such a description ever exists. But eventually someone of us have one or have in mind to create one? THAT would really be great :-)

  • Like 1
Link to comment
Share on other sites

By the way, is there any notation like the Backus–Naur Form of the TI (Extended) Basic syntax? I think it would be a great help for all developer to have an exact description of the syntax!

 

Well, here's my version for xdt99, which is used for syntax coloring and semantic renaming. Because of this it's written so that correct programs are colored correctly, but the opposite won't always hold.

 

But I'd guess the more accurate the grammar, the less readable it'll be ...

  • Like 2
Link to comment
Share on other sites

Well, here's my version for xdt99, which is used for syntax coloring and semantic renaming. Because of this it's written so that correct programs are colored correctly, but the opposite won't always hold.

 

But I'd guess the more accurate the grammar, the less readable it'll be ...

 

Hi Ralph, I had a first look at your BNF, really nice work. I didn't look at all details but one of my first closer looks I had for the DISPLAY statement.

I'm not sure, but I think I found an issue.

 

Here is this, what is currently checked in in your git repository. At Line 97 of Xbas99.bnf you'll find:

s_display ::=
    W_DISPLAY
    (a_display+ (OP_COLON a_using)? (OP_COLON a_print)? |
     a_using (OP_COLON a_print)? |
     a_print)?

This is what should be correct in my opinion:

s_display ::= 
    W_DISPLAY
    (a_display+ (OP_COLON a_using)? OP_COLON |
     a_using OP_COLON)?
    a_print?

 

I only reviewed the syntax, I didn't checked what Intelij IDEA makes.

Edited by HackMac
Link to comment
Share on other sites

Here is this, what is currently checked in in your git repository. At Line 97 of Xbas99.bnf you'll find:

s_display ::=
    W_DISPLAY
    (a_display+ (OP_COLON a_using)? (OP_COLON a_print)? |
     a_using (OP_COLON a_print)? |
     a_print)?

This is what should be correct in my opinion:

s_display ::= 
    W_DISPLAY
    (a_display+ (OP_COLON a_using)? OP_COLON |
     a_using OP_COLON)?
    a_print?

 

.

Those are very similar, but your version lacks this valid statement:

.

DISPLAY USING "#"

.

Not a very useful statement, I admit, but it's valid Extended BASIC.

 

Do you have an example where my grammar rule does not apply?

Link to comment
Share on other sites

Do you have an example where my grammar rule does not apply?

 

Yes I see, my version is not correct. I din't try on 'real iron', but I compare it with the Basic documentation. And while thinking about "sah ich den Wald vor lauter Bäumen nicht mehr". I think I should have think about again tomorrow, when I'm relaxed. :-)

Link to comment
Share on other sites

For your interests, some quotations from Wikipedia's article about the ancient language of the Anglo-Saxons:

 

"Old English or Anglo-Saxon is the earliest historical form of the English language, spoken in England and southern and eastern Scotland in the early Middle Ages. [...] Old English developed from a set of Anglo-Frisian or North Sea Germanic dialects originally spoken by Germanic tribes traditionally known as the Angles, Saxons, and Jutes. [...] Old English is one of the West Germanic languages, and its closest relatives are Old Frisian and Old Saxon. [...] Old English grammar is quite similar to that of modern German: nouns, adjectives, pronouns, and verbs have many inflectional endings and forms, and word order is much freer."

Link to comment
Share on other sites

Well, here's my version for xdt99, which is used for syntax coloring and semantic renaming. Because of this it's written so that correct programs are colored correctly, but the opposite won't always hold.

 

Okay Ralph, I had another look at your BNF and found an issue you can optimize.

In your definition file you'll find from line 142 on following definition:

private a_open ::=
    W_RELATIVE | W_SEQUENTIAL |
    W_DISPLAY | W_INTERNAL |
    W_INPUT | W_OUTPUT | W_APPEND | W_UPDATE |
    W_FIXED nexpr? | W_VARIABLE nexpr? |
    W_PERMANENT

You can extend W_RELATIVE and W_SEQUENTIAL with an optional numeric expression, if you like.

 

P.S.

Did you received my PM?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...