Jump to content
IGNORED

Basic Parsing and Transformation Tool - New Version


dmsc

Recommended Posts

Hi!

 

hello dmsc,

 

incbin works like a charm now, so i've got another, quite bold request this time. it would be really nice to be able to type:

$incbin sprite$, "char6.fnt", (16+5)*8, 12*8
instead of

$incbin sprite$, "char6.fnt", 176, 96
i reckon this ain't no easy, but i see in README.md

 

  poke @PCOLR0+2, $1F   : ' Replaced by: POKE 706,31
so you basically have some arithmetic already.

 

best,

pirx

 

This is not currently possible, as the "$defines" and "$incbin" are resolved at parsing time, and stored as constants. If I parsed the parameter as a BASIC expression, the value will be calculated *after* constant propagation. But I need the value *before* the optimization, so this is not possible. Also, I would need to protect the optimizer of recursive calls, like:

 $incbin a$, "a.bin", LEN(@b$)
 $incbin b$, "b.bin", LEN(@a$)
Another possibility would be adding the calculation of simple arithmetic expressions to the parser, but I think that would be too much complication.
  • Like 1
Link to comment
Share on other sites

Hi!,

 

 

I uploaded a new MAC build at https://github.com/dmsc/tbxl-parser/releasesfor version 9.3, cross-compiled from Linux, can you test it?

 

If it works, I can add the build recipe to my release script, and compile on each releas.

 

Ok. I have clone from github now to avoid version issue, however now I facing leg/peg command not found (it doesn't exist in mac os x).

 

I am going to compile a version for it.

 

anyway, in the last version I didnt have that problem. Are you using this new tool ?

Link to comment
Share on other sites

I built a peg/leg tool: https://github.com/sheremetyev/peg.git

and I am getting these errors (using yacc?) :

gcc -c -Wall -O2 -g -Wstrict-prototypes -Wmissing-prototypes -Ibuild/src/ -MMD -MP -o build/obj/ataribcd.o src/ataribcd.c
gcc -c -Wall -O2 -g -Wstrict-prototypes -Wmissing-prototypes -Ibuild/src/ -MMD -MP -o build/obj/basexpr.o src/basexpr.c
gcc -c -Wall -O2 -g -Wstrict-prototypes -Wmissing-prototypes -Ibuild/src/ -MMD -MP -o build/obj/basic.o src/basic.c
In file included from src/basic.c:50:
build/src/basic_peg.c:5069:3: warning: unused label 'l318' [-Wunused-label]
  l318:;          yypos= yypos318; yythunkpos= yythunkpos318;
  ^~~~~
build/src/basic_peg.c:6013:3: warning: unused label 'l474' [-Wunused-label]
  l474:;          yypos= yypos0; yythunkpos= yythunkpos0;
  ^~~~~
src/basic.c:53:28: error: unknown type name 'yycontext'
static int matchIgnoreCase(yycontext *yy, int c)
                           ^
src/basic.c:76:5: error: use of undeclared identifier 'yycontext'; did you mean
      'yytext'?
    yycontext *yy = yyctx;
    ^~~~~~~~~
    yytext
build/src/basic_peg.c:60:24: note: 'yytext' declared here
YY_VARIABLE(char *   ) yytext= 0;
                       ^
src/basic.c:76:15: error: invalid operands to binary expression ('char *' and
      'expr *' (aka 'struct expr_struct *'))
    yycontext *yy = yyctx;
    ~~~~~~~~~ ^~~
src/basic.c:76:21: error: use of undeclared identifier 'yyctx'
    yycontext *yy = yyctx;

Link to comment
Share on other sites

Hi!,

 

Ok. I have clone from github now to avoid version issue, however now I facing leg/peg command not found (it doesn't exist in mac os x).

 

I am going to compile a version for it.

 

anyway, in the last version I didnt have that problem. Are you using this new tool ?

Thanks, but what I tried to say is that I compiled a version for OSX from my Linux box, using the "osxcross" cross-compiler, but I don't know if the generated binary works. The archive at https://github.com/dmsc/tbxl-parser/releases/download/v9.3/basicParser-v9.3-0-gc896685-dirty-maxosx.zipcontains my generated "basicParser".

 

 

I built a peg/leg tool: https://github.com/sheremetyev/peg.git

and I am getting these errors (using yacc?) :

gcc -c -Wall -O2 -g -Wstrict-prototypes -Wmissing-prototypes -Ibuild/src/ -MMD -MP -o build/obj/ataribcd.o src/ataribcd.c
gcc -c -Wall -O2 -g -Wstrict-prototypes -Wmissing-prototypes -Ibuild/src/ -MMD -MP -o build/obj/basexpr.o src/basexpr.c
gcc -c -Wall -O2 -g -Wstrict-prototypes -Wmissing-prototypes -Ibuild/src/ -MMD -MP -o build/obj/basic.o src/basic.c
In file included from src/basic.c:50:
build/src/basic_peg.c:5069:3: warning: unused label 'l318' [-Wunused-label]
  l318:;          yypos= yypos318; yythunkpos= yythunkpos318;
  ^~~~~
build/src/basic_peg.c:6013:3: warning: unused label 'l474' [-Wunused-label]
  l474:;          yypos= yypos0; yythunkpos= yythunkpos0;
  ^~~~~
src/basic.c:53:28: error: unknown type name 'yycontext'
static int matchIgnoreCase(yycontext *yy, int c)
                           ^
src/basic.c:76:5: error: use of undeclared identifier 'yycontext'; did you mean
      'yytext'?
    yycontext *yy = yyctx;
    ^~~~~~~~~
    yytext
build/src/basic_peg.c:60:24: note: 'yytext' declared here
YY_VARIABLE(char *   ) yytext= 0;
                       ^
src/basic.c:76:15: error: invalid operands to binary expression ('char *' and
      'expr *' (aka 'struct expr_struct *'))
    yycontext *yy = yyctx;
    ~~~~~~~~~ ^~~
src/basic.c:76:21: error: use of undeclared identifier 'yyctx'
    yycontext *yy = yyctx;

 

The version of PEG/LEG you are trying to use is too old, and with many extra modifications, you need to use the official version ar http://piumarta.com/software/peg/, should work in OSX with no changes, just type "make".

 

And yes, the code that "leg" generates has a similar interface to what lex/yacc used to generate.

 

Note that for my generated binaries I'm using another modified version of LEG that produces smaller code, but should be fully compatible.

Link to comment
Share on other sites

Hi!,

 

 

Thanks, but what I tried to say is that I compiled a version for OSX from my Linux box, using the "osxcross" cross-compiler, but I don't know if the generated binary works. The archive at https://github.com/dmsc/tbxl-parser/releases/download/v9.3/basicParser-v9.3-0-gc896685-dirty-maxosx.zipcontains my generated "basicParser".

 

 

 

The version of PEG/LEG you are trying to use is too old, and with many extra modifications, you need to use the official version ar http://piumarta.com/software/peg/, should work in OSX with no changes, just type "make".

 

And yes, the code that "leg" generates has a similar interface to what lex/yacc used to generate.

 

Note that for my generated binaries I'm using another modified version of LEG that produces smaller code, but should be fully compatible.

 

Ok, I executed your version ("dirty"), and it works as charm.

Link to comment
Share on other sites

I tried it and used with a long file. I used -s option to get a short file with the -n option as 250. Then ENTERed the resulting LST and RUN. After a while, the program stopped with ERROR-9 in a very long line.

 

Trying to identify the statement, I used the parser again, but using -n option with a value of 16 to get many short lines. This time the program run OK many times.

 

I went back to the LST with long lines and it failed again. I compared both LST files using an HEX file comparer and didn't find any difference other than line numbers instead of colons.

 

Could this be a TurboBASIC XL bug dealing with very long lines? May be a border condition?

 

I SAVEd the program and analyzed it... guess what? The offending tokenized line has a length of exactly 255 bytes. Coincidence?

 

I manually moved the last sentence of that line to the begining of the next one in the LST file, ENTERed it and RUN it without problems a couple of times. I analyzed it and there were no lines with a length of 255 bytes. The offending line now has only 251 bytes, and the next one has 4 bytes more than before.

 

I don't have a developement environment in my Windows PC to compile your parser by myself and do more tests... It is possible that you could change the length limit for the tokenized lines to be up to 254 instead of 255 bytes and provide the binary? Thanks!

Link to comment
Share on other sites

Hi!

 

I tried it and used with a long file. I used -s option to get a short file with the -n option as 250. Then ENTERed the resulting LST and RUN. After a while, the program stopped with ERROR-9 in a very long line.

 

Trying to identify the statement, I used the parser again, but using -n option with a value of 16 to get many short lines. This time the program run OK many times.

 

I went back to the LST with long lines and it failed again. I compared both LST files using an HEX file comparer and didn't find any difference other than line numbers instead of colons.

 

Could this be a TurboBASIC XL bug dealing with very long lines? May be a border condition?

 

I SAVEd the program and analyzed it... guess what? The offending tokenized line has a length of exactly 255 bytes. Coincidence?

 

I manually moved the last sentence of that line to the begining of the next one in the LST file, ENTERed it and RUN it without problems a couple of times. I analyzed it and there were no lines with a length of 255 bytes. The offending line now has only 251 bytes, and the next one has 4 bytes more than before.

 

I don't have a developement environment in my Windows PC to compile your parser by myself and do more tests... It is possible that you could change the length limit for the tokenized lines to be up to 254 instead of 255 bytes and provide the binary? Thanks!

Yes, it is possible, but I prefer to try fixing the bug, detecting where TBXL chokes and producing a "compatible" listing. Can you send me the offending program?

 

Thanks for your tests!

 

Daniel.

Link to comment
Share on other sites

Hi!,

 

I tried it and used with a long file. I used -s option to get a short file with the -n option as 250. Then ENTERed the resulting LST and RUN. After a while, the program stopped with ERROR-9 in a very long line.

 

Trying to identify the statement, I used the parser again, but using -n option with a value of 16 to get many short lines. This time the program run OK many times.

 

I went back to the LST with long lines and it failed again. I compared both LST files using an HEX file comparer and didn't find any difference other than line numbers instead of colons.

 

Could this be a TurboBASIC XL bug dealing with very long lines? May be a border condition?

So, yes, it is a TurboBASIC XL bug in the interpreter. This is the code of an "IF" (at address $F570):

 

X_IF:       jsr EXEXPR
            ldx STKPTR
            lda ARGSTK1,X
            beq IF_FALSE
            ldx STINDEX    ; $A8
            inx
            cpx STMLB      ; $A7
            bcs X_ENDIF
            jmp X_GOTO
IF_FALSE:   ldy STINDEX    ; $A8
            dey
            lda (STMCUR),Y ; $8A
            cmp #TOK_THEN  ; $1B
            beq SKIP_EOL
X_ELSE:     lda #$07
            ldx #$41
            ldy #$40
            jmp SKIP_ELSE
SKIP_EOL:   lda LLNGTH     ; $9F
            sta STMLB      ; $A7
X_ENDIF:    rts
Problem is, to detect if the statement is an IF/THEN or simply an IF/ELSE/ENDIF, the current index at $A8 is compared with the statement end at $A7 after an increment. But an increment is not needed, as the index is already at the next statement (because is advanced by EXEPR).

 

So, a simple fix is patching the code replacing the "INX" at address $F57C with a "NOP".

 

Attached is an ATR image with a patched TurboBASIC XL, your program works with it ;)

 

Well, this is probably not the only place in the interpreter where a bug like this is present, so I will modify my tool to write lines of maximum 254 bytes tokenized, expect a new version tomorrow, a shame as the maximum program length will be reduced :( .

 

We really need a new TurboBASIC interpreter with all bugs fixed :)

tb15-fixed.atr

  • Like 3
Link to comment
Share on other sites

Hi!,

 

A suggestion I would like to be able to specify .BAS vs .LST, 120- vs. long lines, and shortened variables vs. my variables as separate command line options.

 

Because I'd like to easily make 120-character BAS files with my own variable names.

 

K

It would be possible, but I don't see the point, as the BAS file will not have 120 character limit at listing time (because the abbreviations would be expanded). Why don't you simply use the listing output?

 

Currently, the parser has two output routines, one in "lister.c" and other in "baswriter.c", both complicated by the need to split statements based on the concatenated length. Perhaps both could be joined in one routine that outputs both the LST and BAS, and checks any of the two conditions (short listed line length and tokenized line length).

Link to comment
Share on other sites

Hi!,

 

Would it be possible to create a machine code binary with some sort of runtime environment like the one for the turbo basic compiler?

Or, at least, to compile it to the P-Code for the Turbo-Basic runtime?

I'm currently writing a compiler, but at a very slow pace :)

 

Code is on github, on a WIP branch.

  • Like 5
Link to comment
Share on other sites

Thank you, master!!!

Problem is, to detect if the statement is an IF/THEN or simply an IF/ELSE/ENDIF, the current index at $A8 is compared with the statement end at $A7 after an increment. But an increment is not needed, as the index is already at the next statement (because is advanced by EXEPR).

So, a simple fix is patching the code replacing the "INX" at address $F57C with a "NOP".


Well, the last statement of the offending line (with a length of 255 bytes) is an IF of a IF/ELSE/ENDIF construct. Without inspecting the souce code of TurboBASIC XL, I think that the border condition happens only when the IF is at the end of the line, and the "INX" is needed when there is a THEN or another statement in the same line. I'll try to test this... If this is true, the limit might stay at 255, except for an "IF <condition>" as the last statement of the line, which should be moved to the next line.

A suggestion —I would like to be able to specify .BAS vs .LST, 120- vs. long lines, and shortened variables vs. my variables as separate command line options.

Because I'd like to easily make 120-character BAS files with my own variable names.


I was not alone! I also wanted to keep my own variables, which I keep short, but easy to remember their meaning... And if them are not short, it will be my fault.

Link to comment
Share on other sites

I did more tests on this border condition with an "IF <condition>" as the last statement of a line, which has an internal (tokenized) length of 255, and I found something that surprised me:

 

This code has the border condition in line 1:

1 DIM X(10):FOR I=0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0 TO 9:? "Hola.....";:IF I
2 X(I)=I:ENDIF  I:NEXT I

It should write 10 lines with a message and a number, but what I got is:

Hola.....0
Hola.....
ERROR-   9 DIM  AT LINE 1

The error was raised because the DIM statement at the beginning of the 1st line was executed twice... it seems that the IF as the last statement in that "special" line sent the flow's pointer to the beginning of the same line instead if the next one... and it only happened when the condition of the IF was true, because it could print the number after the ENDIF when index I was zero (false).

 

To check that this behavior happens only on lines with a length of 255, If one of the dots of the message is removed, the length of the line will be 254 and it'll run OK.

 

I also tried this test using your patched TBXL and it run OK as you promised, but I'm not very convinced that this patch would be OK for every case, I mean, if a bug was introduced for another condition.

Edited by vitoco
Link to comment
Share on other sites

Hi!,

 

I did more tests on this border condition with an "IF <condition>" as the last statement of a line, which has an internal (tokenized) length of 255, and I found something that surprised me:

 

This code has the border condition in line 1:

1 DIM X(10):FOR I=0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0 TO 9:? "Hola.....";:IF I
2 X(I)=I:ENDIF  I:NEXT I
It should write 10 lines with a message and a number, but what I got is:

Hola.....0
Hola.....
ERROR-   9 DIM  AT LINE 1
The error was raised because the DIM statement at the beginning of the 1st line was executed twice...

 

Not really, you get the ERR 9 even without the DIM :)

 

I found the same error in other parts of the TurboBASIC source, for example, try:

1 COLOR 1:? 1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1,1+1+1+1+1+1+1+1+1,"123":CIRCLE 5,5,5
It is basically the same bug, the parser reads 3 numbers and then checks if it is at the end before reading the fourth. It is also present reading the "STEP" in a FOR loop:

1 ? 1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1,1+1+1+1+1+1+1+1+1+1,"123":FOR A=0 TO 3

All other statements with optional parts that I tried were not affected: PRINT, LIST, DIM, CLOSE, CLS, SOUND, DSOUND, *F.

 

To check the length of the first line, you can do:

? PEEK(DPEEK($88)+2)

 

Well, I plan to add a check to my program reducing in one byte the maximum length if the last statement is an "IF" without "THEN", a "CIRCLE" or a "FOR". Checking if the CIRCLE has 3 or 4 arguments or if the FOR has STEP is too difficult as the code stands now.

Link to comment
Share on other sites

I think I found a bug in the parser. This line:

10IFPEEK(X)=Y

is tokenized by TurboBASIC as:

10 IF PEEK(X)=Y

but the parser shows this in the short mode:

10 A(A)=B

It believes that "IFPEEK" is an array (and renames it with a short variable name)

 

If I add a space between "IF" and "PEEK", the parser removes the space and leaves the line in the form I initially tried.

 

BTW, this behavior also happens in the tokenized output and with other functions like "INT". Using -f option on tokenized output, a "DUMP" in TurboBASIC shows variables like "IFPEEK" :-o

Link to comment
Share on other sites

Hi!,

 

I think I found a bug in the parser. This line:

10IFPEEK(X)=Y
is tokenized by TurboBASIC as:

10 IF PEEK(X)=Y
but the parser shows this in the short mode:

10 A(A)=B
It believes that "IFPEEK" is an array (and renames it with a short variable name)

 

If I add a space between "IF" and "PEEK", the parser removes the space and leaves the line in the form I initially tried.

 

BTW, this behavior also happens in the tokenized output and with other functions like "INT". Using -f option on tokenized output, a "DUMP" in TurboBASIC shows variables like "IFPEEK" :-o

 

Yes, this is intentional, see the README at https://github.com/dmsc/tbxl-parser#user-content-limitations-and-incompatibilities

 

Any string is accepted as a variable name, even if it is already an statement, function name or operator.

 

The following code is valid:

  PRINTED = 0     : ' Invalid in Atari Basic, as starts with "PRINT"
  DONE = 3        : ' Invalid in TurboBasic XL, as starts with "DO"
This relaxed handling of variable naming creates an incompatibility, as the first example above is parsed differently as the standard Atari Basic, where it means "PRINT (ED = 0)" instead of "LET PRINTED = 0".

 

Note that currently, even full statements are accepted as variable names, but avoid using them as they could produce hard to understand errors.

 

Note that the parser tries hard to detect if you are defining a variable, by checking if the full expression is valid as an assignment before trying to parse as an statement.

 

The reason the parser behaves like this is that it allows to be compatible with Atari Basic programs that use "DONE" as a variable name (there are a few of these in my test suite), those are not compatible with Turbo Basic. Also, it gives the parser the possibility to add new statements without giving up backward compatibility.

 

The parser has two modes of operation: compatible and extended. In extended mode, even variable names that are part of a token name are accepted, like "? ERROR1", this is parsed as "? ERR OR 1" in the default mode but not in the extended mode.

 

Please, if you are writing a program to be parsed by my tool, it is much better to write it in the "long" mode, it is more legible and gives more opportunities for optimizations to the parser. Remember to enable the optimizations! ;)

Link to comment
Share on other sites

Hi!,

 

Well, the last statement of the offending line (with a length of 255 bytes) is an IF of a IF/ELSE/ENDIF construct.

Added workaround in the new version.

 

I was not alone! I also wanted to keep my own variables, which I keep short, but easy to remember their meaning... And if them are not short, it will be my fault.

A suggestion I would like to be able to specify .BAS vs .LST, 120- vs. long lines, and shortened variables vs. my variables as separate command line options.

 

Because I'd like to easily make 120-character BAS files with my own variable names.

Well, I implemented a solution: with the "-f" option, variables with names already short (2 or less characters) are kept, and longer variables are renamed. The parser uses the list of "safe" variables, so for example a variable named "TO" will be renamed.

 

Also, from the release notes:

 

- Workaround a bug in TueboBasix XL interpreter that incorrectly check the tokenized line length while checking for the end of IF, FOR and CIRCLE statements.

- Prints a warning when a variable name is the prefix of a statement.

- Fixes a segfault with more than 1020 variables in the short listing.

- Implements the '-f' option in the short listing, keeping variable names that are at most two characters.

- Adds OSX build to the make-release script.

 

As always, download from the github page: https://github.com/dmsc/tbxl-parser/releases/tag/v9.4

  • Like 1
Link to comment
Share on other sites

Thanks for the update.

 

- Workaround a bug in TueboBasix XL interpreter that incorrectly check the tokenized line length while checking for the end of IF, FOR and CIRCLE statements.

 

Does the same bug exist with SOUND statement?

 

... and which was the workaround? Just wrap that statements when the internal line length is exactly 255?

 

BTW, it's not "TueboBasix"... ;)

Link to comment
Share on other sites

Hi!,

 

Thanks for the update.

 

Does the same bug exist with SOUND statement?

No, commands that allow no arguments are not affected.

 

In the interpreter, STINDEX points to the current statement byte at the start of the routines that deal with the statement. So, to detect that there are no arguments, the pointer is incremented and compared with NXTSTD (next-statement) value. I suspect that here is where the bug came, as the same logic is used in the IF/CIRCLE/FOR cases, but it is wrong as STINDEX already points to the next token.

 

 

... and which was the workaround? Just wrap that statements when the internal line length is exactly 255?

 

BTW, it's not "TueboBasix"... ;)

;) corrected in github page....

 

The workaround limits the tokenized line length to 254 bytes if the last statement is a problematic one. The default is a maximum of 255 bytes.

Link to comment
Share on other sites

  • 1 year later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...