Jump to content
IGNORED

Basic Parsing and Transformation Tool - New Version


dmsc

Recommended Posts

Hi All!,

 

I updated again my Basic parsing tool, this is a major release, with two big features added:

  • Support for parser directives, adding options, "defines" and binary file inclusion.
  • Added an optimizer, currently performs constant folding and replacing of small values with %*.
A full list of changes from version 5 is:
  • Make parsing error messages more specific, indicating if a numeric or string expression is expected.
  • Adds parsing of "$options" directive.
  • Adds an "extended" mode to the parser that is stricter with the spacing before tokens, this allows using variable names that are problematic in the original TurboBasic XL.
  • Fixes variable names starting with "REM".
  • Adds a tree-based optimizer, that transforms the tokenized program to an expression tree and back again. All optimizations operate in the tree representation.
  • Adds a constant-folding optimization pass.
  • Adds an optimization pass that converts numbers from 0 to 3 to %0 to %3.
  • Adds support for "definitions" to the parser, those are similar to C defines.
  • Adds a directive to include binary files to a string define.
  • Makes default output mode the binary tokenized format.
The new release can be downloaded over github: https://github.com/dmsc/tbxl-parser/releases/tag/v6

 

A sample program using P/M graphics:

  ' Enable all optimizations:
  $options +optimize
  ' Include graphics from a binary file
  $incbin dataPM$, "pm.bin"
  ' Some defines
  $define RAMTOP = $6A
  $define SDMCTL = 559
  $define PCOLR0 = 704
  $define HPOSP0 = $D000
  $define GRACTL = $D01D
  $define PMBASE = $D407

  ' Enable P/M
  MemTop = peek(@RAMTOP) - 4
  poke @RAMTOP, MemTop
  graphics 0
  P0Mem = 256 * MemTop + 512

  ' Clear P0 memory
  poke P0Mem, 0
  move P0Mem, P0Mem+1, 127
  poke @PCOLR0, $1F               : ' Set P0 color
  poke @SDMCTL, peek(@SDMCTL) ! 8 : ' Enable Player DMA
  poke @PMBASE, MemTop            : ' Set PMBASE
  poke @GRACTL, 2                 : ' Turns on Player read from ANTIC

  oldYpos = P0Mem + 1
  Ypos = 50
  Xpos = 100
  dx = 1
  dy = 1

  ' Loop movement of the player
  do
    pause 0
    exec MovePM

    Xpos = Xpos + dx
    Ypos = Ypos + dy

    if Xpos < 48
      dx = 1
    endif
    if Xpos > 200
      dx = -1
    endif
    if Ypos < 16
      dy = 1
    endif
    if Ypos > 100
      dy = -1
    endif

  loop
  end

  ' Move P/M
  proc MovePM
    ' Set X position
    poke @HPOSP0, Xpos
    ' Clear old data
    move oldYpos - 1, oldYpos, len( @dataPM$ ) + 1
    ' Set new data
    move adr( @dataPM$ ), P0Mem + Ypos, len( @dataPM$ )
    oldYpos = P0Mem + Ypos
  endproc
This produces the following basic output (as a tokenized .BAS):

0 A=PEEK(106)-4:POKE 106,A:GRAPHICS %0:B=256*A+512:POKE B,%0:MOVE B,B+%1,127:POKE 704,$1F:POKE 559,PEEK(559)!8:POKE 54279,A:POKE 53277,%2:C=B+%1:D=50:E=100:F=%1:G=%1:DO :PAUSE %0:EXEC A:E=E+F:D=D+G:IF E<48:F=%1:ENDIF 
1 IF E>200:F=-1:ENDIF :IF D<16:G=%1:ENDIF :IF D>100:G=-1:ENDIF :LOOP :END 
2 PROC A:POKE 53248,E:MOVE C-%1,C,9:MOVE ADR("8DTD8"),B+D,8:C=B+D:ENDPROC 
  • Like 7
Link to comment
Share on other sites

Hi All!,

 

There is a new version (7.1) of the parser, I added more optimizations:

- Replace repeated constants with a variable initialized at program start,

- Remove all unused line numbers, making the program shorter.

 

Remember to try the parser with the "-O" (uppercase "o") option to enable the optimizations!!

 

As always, download at:

https://github.com/dmsc/tbxl-parser/releases/tag/v7.1

  • Like 2
Link to comment
Share on other sites

  • 2 weeks later...

Hi Again!

 

A new version is available, ( Version 8 ), some subtle bugs were fixed, now it should be rock-solid

 

https://github.com/dmsc/tbxl-parser/releases/tag/v8

 

Full changelog copied from github:

 

The code was restructured so that old parser to statements and tokens was completely removed, now the source code parses directly to an expression tree. This simplifies the code but does not affect any of the current output.

 

As part of the restructuring many small bugs were fixed:

- Some subtle buffer overflows.

- Crashes in the parser on some statement combinations and errors in definitions.

- Enabling and disabling of individual optimizations.

- Unsupported parsing of some GET/PUT/%GET/%PUT variants.

 

And some minor enhancements:

- Printing numbers in long-output with full precision.

- Output $defines in the long-output as is.

  • Like 4
Link to comment
Share on other sites

Again thanks for the new version.

I hope I do not sound impertinent, but is there the possibility, that you are may interested in porting the compiler?

At least you are already juggling quite successful with the tokens...

If wanted, I could translate the article, describing the inner workings of the compiler, for you. It was published in conjunction with the Turbo Basic release in a German computing magazine.

 

 

  • Like 2
Link to comment
Share on other sites

Hi!

 

Again thanks for the new version.

I hope I do not sound impertinent, but is there the possibility, that you are may interested in porting the compiler?

At least you are already juggling quite successful with the tokens...

Ha, no problem, the last version took away the limitations so that now it will be easier to do a compiler.

 

Currently, my plan is first implement parameters to PROC/EXEC, my idea is to translate this syntax:

 exec test, x+10, 20
 rem ....
 end

 proc test, x, y
   ? "my parameters are "; x; " and "; y
 endproc
To the following turbobasic code:

 _par_test_x=x+10: _par_test_y=20: exec test
 end
 proc test
   ? "my parameters are "; _par_test_x; " and "; _par_test_y
 endproc

If wanted, I could translate the article, describing the inner workings of the compiler, for you. It was published in conjunction with the Turbo Basic release in a German computing magazine.

No need, I already have some ideas about the kind of output to generate. But, I really want to preserve float variables, I think that this will make the compiler more versatile.

 

 

This is exaclty what I was thinking - the language is now just begging for a compiler :)

Possibly, just possibly, it would be even easier to go the modern way with yacc / bison et consortes (?).

Not really, currently the parser uses a "PEG" (Parsing Expression Grammar) syntax, see the file "basic-base.peg" for the definitions. The syntax is similar to yacc/lex, but much more powerful.

 

The reason a parser like PEG is needed is that the original Atari Basic parser does not divide the input into tokens, you can write something like: 1FORI1=3TO6:PRINTI1:NEXTI1

 

A bison parser needs tokens, but it is not possible to tokenize the above string without interpreting the grammar. In the case of PEG, the parser does not need a tokenizing step. Also, a PEG parser backtracks and tries different combinations until one possibility is found, and only then produces output. This allows my parser to parse this FORA=3:?FORA as an assignment, because the missing "TO" to make a "FOR" statement.

 

The original Turbo Basic parser is a recursive descent parser with backtracking, so it can be expressed as a PEG easily.

 

The only problem with the PEG parsers generated with the PEG/LEG tool is that the needed backtracking can cause exponential parsing times, but this does not occurs with "normal" grammars like the one my parser uses now, in fact I think that the generated parser is really fast.

  • Like 4
Link to comment
Share on other sites

Ha, no problem, the last version took away the limitations so that now it will be easier to do a compiler.

 

Cool!

 

proc test, x, y

? "my parameters are "; x; " and "; y

endprocTo the following turbobasic code:

_par_test_x=x+10: _par_test_y=20: exec test

end

proc test

? "my parameters are "; _par_test_x; " and "; _par_test_y

endproc

 

Don't forget the warning for recursive calls... ;)

 

 

No need, I already have some ideas about the kind of output to generate. But, I really want to preserve float variables, I think that this will make the compiler more versatile.

 

That's interesting, because in the documentation is pointed out, that the original compiler optimizes code to integers (if applicable) to save speed.

 

Link to comment
Share on other sites

Hi!,

 

Would it be possible to add a switch for the 3 character extension? I'm using TBXL exclusively, and it'd be nice to use a more appropriate extension. Thanks!

You currently can use the "-o" option to specify the output file, is this not enough?

 

Perhaps the same "-o" option could specify "extension" if specified as "-o .bst", with only a dot.

Link to comment
Share on other sites

Hi!,

 

You currently can use the "-o" option to specify the output file, is this not enough?

 

Perhaps the same "-o" option could specify "extension" if specified as "-o .bst", with only a dot.

 

Yes, I don't really want to name the whole file, as I will like to keep the file's base name, but change the extension only.

 

The method you suggest would be perfectly fine if you can do it that way, or something like this: "-o *.bst", if you'd like to follow Atari-style conventions.

Link to comment
Share on other sites

You currently can use the "-o" option to specify the output file, is this not enough?

 

Perhaps the same "-o" option could specify "extension" if specified as "-o .bst", with only a dot.

 

Actually no problem/rush on that. I just changed the program defaults using a hex editor.

Link to comment
Share on other sites

Hi!,

 

Well, I published the new version, now "Version 9":

 

The mayor change in this version as adding support for PROC with parameters and local variables, converted to standard TurboBasic XL code on output. Note that currently the parser adds too many extra variables, an optimization pass that removes unused variables will be added later.

 

There are other minor new features:

- Check all EXEC targets and print error if not matching to the corresponding PROC.

- Allows the "-o" option to specify an output file name extension (asked by MrFish).

 

And other code changes:

- Cleanup the PEG parser to use standard constructs to parse most tokens.

- Use a common dynamic array implementation for all the arrays in the code.

- Cleanup warnings when compiled with GCC -Wextra.

 

As always, just download from https://github.com/dmsc/tbxl-parser/releases

  • Like 2
Link to comment
Share on other sites

Hi DMSC,

 

This is an interesting project. I've not seen anything like this before. What was your inspiration for it? Was it to help you with a project that you had/have in place?

10 liners contest. I know because I did something similar albeit WAY simpler.

Yes :) , I started the parser as a rewrite of my old perl script, as a simple PEG grammar that understood the statements and replaced them by shorter versions and joining of statements with ':'.

 

It is really fun to write this as I though new ways to enhance the parser and add more stuff.

  • Like 1
Link to comment
Share on other sites

Hi!,

 

Don't forget the warning for recursive calls... ;)

Well, It was not so easy, so currently I don't even try to detect them. The problem is that to detect indirect recursive calls (where PROC A calls PROC B who calls PROC A again), we need to construct some type of call-tree. But, as the BASIC code is unstructured, any GOTO/GOSUB/GO# inside would invalidate all the call tree construction.

 

The program currently stores the full parsed source as an expression tree with all statements pointing to the next one in the left leaf and to the statement tokens in the right leaf. And all the tokens are also represented in the same tree.

 

As an example, the program:

 color 2
 plot x+1, y
 end
is represented by the following tree:

post-18634-0-35752800-1445646529_thumb.png

 

By analyzing the control-flow, I could put each PROC or subroutine as a new leaf in the main tree, this could allow to build a call-tree and also optimize variable usage (like reusing variables from one PROC in another).

 

Well, my current plan is to try to generate code from the parse tree directly, without really understanding the flow, but perhaps in a future a new "restricted" subset of the language could be optimized better.

  • Like 1
Link to comment
Share on other sites

Hi again!,

 

And another minor update, "Version 9.1":

 

- Adds the unused variable removal optimization, this removes all unused variables after all the other optimizations, this makes the PROC/EXEC with parameters use less variables.

- Always runs some optimization passes, this simplifies generation of BAS and short listings.

- Fixes listing programs with too many variables.

- Remove limitations of 256 variables, the parser now handles an unlimited number of variables or procedures.

- Sort variables by usage, reducing binary size in programs with many variables.

 

As usual, download from https://github.com/dmsc/tbxl-parser/releases

 

I will begin to (slowly) write the compiler part now, but I will need more test programs to ensure proper compilation. If anyone has a program that want to compile, you can post here and I will use it for the "test-suite".

 

Perhaps I should post this to the programming forum, what do you think?

  • Like 3
Link to comment
Share on other sites

I will begin to (slowly) write the compiler part now, but I will need more test programs to ensure proper compilation. If anyone has a program that want to compile, you can post here and I will use it for the "test-suite".

 

Perhaps I should post this to the programming forum, what do you think?

 

Hi and thanks again for all your great work here.

 

I can provide some more source code to work with, but it will all be sent in PM as before.

 

Just as a note, the program I sent you before was not working when compiled -- even though no compile errors. This was true before I used it in conjunction with your program and after. I may not be adhering to some known issues for the compiler though; I haven't checked on that. I know there is a thread in the programming forum that talks about it. No matter for me right now, but it will probably be no help in your compiler tests at the moment.

 

I don't think it matters what forum you post about your work. It'll surely get the most attention here, but those who care will find it in either place. Your call.

Link to comment
Share on other sites

Hi!,

 

Have you looked at LLVM?

 

It has 99% of what you need:

Yes, I know LLVM, it is a nice library for code generation. But, it does not have support for 6502 ;)

 

Also, most of the optimizations passes of current state of the art compilers (like LLVM or GCC) are not really suitable for the kind of assembly for a processor like the 6502, or for a language like old BASIC dialects, so I think is better, for simplicity, to have my own simpler code generator.

 

What I'm currently doing is converting the BASIC statements to a lower level pseudo-language, currently this code:

10 GRAPHICS 8
20 SETCOLOR 2,0,4
30 COLOR 1
40 PLOT 10,100
50 DRAWTO 100,10
60 GOTO 60

is converted to this:

	let @ral = 8
	call bas_graphics~

	let @ral = 4
	let 710~ = @ral

	let @ral = 1
	let COLOR~ = @ral

	let @ral = 100
	let ROWCRS~ = @ral
	let @rax = 10
	let COLCRS~ = @rax
	call bas_plot~

	let @ral = 10
	let ROWCRS~ = @ral
	let @rax = 100
	let COLCRS~ = @rax
	call bas_drawto~

# @_lnum_60
	go# @_lnum_60
In the pseudo-code "@ral" is an 8 bit register (will be converted to the 6502 accumulator) and "@rax" is a 16 bit register (will be converted to the A/X register combination in the 6502). The "~" at the end of words represents that the word is an assembly label.

 

After finishing the code generator, all the "bas_*" assembly support routines will need to be written, I'm planning on generating ca65 compatible code, so the ld65 linker can be used to integrate the used routines in the output program.

  • Like 2
Link to comment
Share on other sites

Is any significant portion of the work you are doing flexible enough to be useful later on in generating code for other languages? Specifically, a "converter" for Basic to Action! ?

 

I tried that some years ago, but was in over my head!

 

-Larry

Link to comment
Share on other sites

Hi!,

 

Is any significant portion of the work you are doing flexible enough to be useful later on in generating code for other languages? Specifically, a "converter" for Basic to Action! ?

 

I tried that some years ago, but was in over my head!

Yes, but with some restrictions.

 

To generate any output from the BASIC sources, you simply write code that converts all BASIC statements to your output code. The difficult part is that many BASIC statements will not have any equivalent in other languages, so you will end writing them as subrutine calls. The code will look like:

 

void convert_to_action_ex(expr *ex, output *out)
{
  if( !ex )
    return;

  // Convert LEFT expression
  convert_to_action_ex(ex->lft, out);

  switch( ex->type )
  {
    case et_c_number:
      // Convert constant number
      break;
    case et_var_string:
      // Convert string variable
      break;
    //... and so on ...
  }
  // Convert RIGHT expression
  convert_to_action_ex(ex->rgt, out);
}

void convert_to_action_stmt(expr *ex, output *out)
{
  if( !ex || ex->type != et_stmt )
    return;
  switch( ex->stmt )
  {
    case STMT_END:
      // Convert END statement
      break;
    case STMT_PRINT:
      // Convert PRINT statement, this will be dificult!
      break;
    //... and so on ...
   }
  // Convert next statement
  convert_to_action_stmt(ex->lft, out);
}

If the output language is different enough, perhaps it will be easier to convert the "low-level" code that I'm generating now, but the resulting program will not be pretty :)

 

On the other hand, if you want to generate code from another language, you can remove all the parsing code and replace it by your own parser, reusing the optimization passes and the code generator.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...