dmsc Posted October 4, 2015 Share Posted October 4, 2015 Hi All!, I updated again my Basic parsing tool, this is a major release, with two big features added: Support for parser directives, adding options, "defines" and binary file inclusion. Added an optimizer, currently performs constant folding and replacing of small values with %*. A full list of changes from version 5 is: Make parsing error messages more specific, indicating if a numeric or string expression is expected. Adds parsing of "$options" directive. Adds an "extended" mode to the parser that is stricter with the spacing before tokens, this allows using variable names that are problematic in the original TurboBasic XL. Fixes variable names starting with "REM". Adds a tree-based optimizer, that transforms the tokenized program to an expression tree and back again. All optimizations operate in the tree representation. Adds a constant-folding optimization pass. Adds an optimization pass that converts numbers from 0 to 3 to %0 to %3. Adds support for "definitions" to the parser, those are similar to C defines. Adds a directive to include binary files to a string define. Makes default output mode the binary tokenized format. The new release can be downloaded over github: https://github.com/dmsc/tbxl-parser/releases/tag/v6 A sample program using P/M graphics: ' Enable all optimizations: $options +optimize ' Include graphics from a binary file $incbin dataPM$, "pm.bin" ' Some defines $define RAMTOP = $6A $define SDMCTL = 559 $define PCOLR0 = 704 $define HPOSP0 = $D000 $define GRACTL = $D01D $define PMBASE = $D407 ' Enable P/M MemTop = peek(@RAMTOP) - 4 poke @RAMTOP, MemTop graphics 0 P0Mem = 256 * MemTop + 512 ' Clear P0 memory poke P0Mem, 0 move P0Mem, P0Mem+1, 127 poke @PCOLR0, $1F : ' Set P0 color poke @SDMCTL, peek(@SDMCTL) ! 8 : ' Enable Player DMA poke @PMBASE, MemTop : ' Set PMBASE poke @GRACTL, 2 : ' Turns on Player read from ANTIC oldYpos = P0Mem + 1 Ypos = 50 Xpos = 100 dx = 1 dy = 1 ' Loop movement of the player do pause 0 exec MovePM Xpos = Xpos + dx Ypos = Ypos + dy if Xpos < 48 dx = 1 endif if Xpos > 200 dx = -1 endif if Ypos < 16 dy = 1 endif if Ypos > 100 dy = -1 endif loop end ' Move P/M proc MovePM ' Set X position poke @HPOSP0, Xpos ' Clear old data move oldYpos - 1, oldYpos, len( @dataPM$ ) + 1 ' Set new data move adr( @dataPM$ ), P0Mem + Ypos, len( @dataPM$ ) oldYpos = P0Mem + Ypos endproc This produces the following basic output (as a tokenized .BAS): 0 A=PEEK(106)-4:POKE 106,A:GRAPHICS %0:B=256*A+512:POKE B,%0:MOVE B,B+%1,127:POKE 704,$1F:POKE 559,PEEK(559)!8:POKE 54279,A:POKE 53277,%2:C=B+%1:D=50:E=100:F=%1:G=%1:DO :PAUSE %0:EXEC A:E=E+F:D=D+G:IF E<48:F=%1:ENDIF 1 IF E>200:F=-1:ENDIF :IF D<16:G=%1:ENDIF :IF D>100:G=-1:ENDIF :LOOP :END 2 PROC A:POKE 53248,E:MOVE C-%1,C,9:MOVE ADR("8DTD8"),B+D,8:C=B+D:ENDPROC 7 Quote Link to comment Share on other sites More sharing options...
dmsc Posted October 12, 2015 Author Share Posted October 12, 2015 Hi All!, There is a new version (7.1) of the parser, I added more optimizations: - Replace repeated constants with a variable initialized at program start, - Remove all unused line numbers, making the program shorter. Remember to try the parser with the "-O" (uppercase "o") option to enable the optimizations!! As always, download at: https://github.com/dmsc/tbxl-parser/releases/tag/v7.1 2 Quote Link to comment Share on other sites More sharing options...
dmsc Posted October 21, 2015 Author Share Posted October 21, 2015 Hi Again! A new version is available, ( Version 8 ), some subtle bugs were fixed, now it should be rock-solid https://github.com/dmsc/tbxl-parser/releases/tag/v8 Full changelog copied from github: The code was restructured so that old parser to statements and tokens was completely removed, now the source code parses directly to an expression tree. This simplifies the code but does not affect any of the current output. As part of the restructuring many small bugs were fixed: - Some subtle buffer overflows. - Crashes in the parser on some statement combinations and errors in definitions. - Enabling and disabling of individual optimizations. - Unsupported parsing of some GET/PUT/%GET/%PUT variants. And some minor enhancements: - Printing numbers in long-output with full precision. - Output $defines in the long-output as is. 4 Quote Link to comment Share on other sites More sharing options...
Irgendwer Posted October 21, 2015 Share Posted October 21, 2015 Again thanks for the new version. I hope I do not sound impertinent, but is there the possibility, that you are may interested in porting the compiler? At least you are already juggling quite successful with the tokens... If wanted, I could translate the article, describing the inner workings of the compiler, for you. It was published in conjunction with the Turbo Basic release in a German computing magazine. 2 Quote Link to comment Share on other sites More sharing options...
pirx Posted October 21, 2015 Share Posted October 21, 2015 This is exaclty what I was thinking - the language is now just begging for a compiler Possibly, just possibly, it would be even easier to go the modern way with yacc / bison et consortes (?). 1 Quote Link to comment Share on other sites More sharing options...
dmsc Posted October 22, 2015 Author Share Posted October 22, 2015 Hi! Again thanks for the new version. I hope I do not sound impertinent, but is there the possibility, that you are may interested in porting the compiler? At least you are already juggling quite successful with the tokens... Ha, no problem, the last version took away the limitations so that now it will be easier to do a compiler. Currently, my plan is first implement parameters to PROC/EXEC, my idea is to translate this syntax: exec test, x+10, 20 rem .... end proc test, x, y ? "my parameters are "; x; " and "; y endprocTo the following turbobasic code: _par_test_x=x+10: _par_test_y=20: exec test end proc test ? "my parameters are "; _par_test_x; " and "; _par_test_y endproc If wanted, I could translate the article, describing the inner workings of the compiler, for you. It was published in conjunction with the Turbo Basic release in a German computing magazine. No need, I already have some ideas about the kind of output to generate. But, I really want to preserve float variables, I think that this will make the compiler more versatile. This is exaclty what I was thinking - the language is now just begging for a compiler Possibly, just possibly, it would be even easier to go the modern way with yacc / bison et consortes (?). Not really, currently the parser uses a "PEG" (Parsing Expression Grammar) syntax, see the file "basic-base.peg" for the definitions. The syntax is similar to yacc/lex, but much more powerful. The reason a parser like PEG is needed is that the original Atari Basic parser does not divide the input into tokens, you can write something like: 1FORI1=3TO6:PRINTI1:NEXTI1 A bison parser needs tokens, but it is not possible to tokenize the above string without interpreting the grammar. In the case of PEG, the parser does not need a tokenizing step. Also, a PEG parser backtracks and tries different combinations until one possibility is found, and only then produces output. This allows my parser to parse this FORA=3:?FORA as an assignment, because the missing "TO" to make a "FOR" statement. The original Turbo Basic parser is a recursive descent parser with backtracking, so it can be expressed as a PEG easily. The only problem with the PEG parsers generated with the PEG/LEG tool is that the needed backtracking can cause exponential parsing times, but this does not occurs with "normal" grammars like the one my parser uses now, in fact I think that the generated parser is really fast. 4 Quote Link to comment Share on other sites More sharing options...
+MrFish Posted October 22, 2015 Share Posted October 22, 2015 No need, I already have some ideas about the kind of output to generate. But, I really want to preserve float variables, I think that this will make the compiler more versatile. I'm curious what you have in mind. Quote Link to comment Share on other sites More sharing options...
Irgendwer Posted October 22, 2015 Share Posted October 22, 2015 Ha, no problem, the last version took away the limitations so that now it will be easier to do a compiler. Cool! proc test, x, y? "my parameters are "; x; " and "; y endprocTo the following turbobasic code: _par_test_x=x+10: _par_test_y=20: exec test end proc test ? "my parameters are "; _par_test_x; " and "; _par_test_y endproc Don't forget the warning for recursive calls... No need, I already have some ideas about the kind of output to generate. But, I really want to preserve float variables, I think that this will make the compiler more versatile. That's interesting, because in the documentation is pointed out, that the original compiler optimizes code to integers (if applicable) to save speed. Quote Link to comment Share on other sites More sharing options...
pirx Posted October 22, 2015 Share Posted October 22, 2015 (edited) Great, forgot about PEG, I am holding my breath then! Please, please include integer arithmetic as this is really sufficient in 99% of cases and "framerate is god" Edited October 22, 2015 by pirx Quote Link to comment Share on other sites More sharing options...
+MrFish Posted October 22, 2015 Share Posted October 22, 2015 Would it be possible to add a switch for the 3 character extension? I'm using TBXL exclusively, and it'd be nice to use a more appropriate extension. Thanks! Quote Link to comment Share on other sites More sharing options...
dmsc Posted October 22, 2015 Author Share Posted October 22, 2015 Hi!, Would it be possible to add a switch for the 3 character extension? I'm using TBXL exclusively, and it'd be nice to use a more appropriate extension. Thanks! You currently can use the "-o" option to specify the output file, is this not enough? Perhaps the same "-o" option could specify "extension" if specified as "-o .bst", with only a dot. Quote Link to comment Share on other sites More sharing options...
+MrFish Posted October 23, 2015 Share Posted October 23, 2015 Hi!, You currently can use the "-o" option to specify the output file, is this not enough? Perhaps the same "-o" option could specify "extension" if specified as "-o .bst", with only a dot. Yes, I don't really want to name the whole file, as I will like to keep the file's base name, but change the extension only. The method you suggest would be perfectly fine if you can do it that way, or something like this: "-o *.bst", if you'd like to follow Atari-style conventions. Quote Link to comment Share on other sites More sharing options...
snicklin Posted October 23, 2015 Share Posted October 23, 2015 Hi DMSC, This is an interesting project. I've not seen anything like this before. What was your inspiration for it? Was it to help you with a project that you had/have in place? Quote Link to comment Share on other sites More sharing options...
pirx Posted October 23, 2015 Share Posted October 23, 2015 10 liners contest. I know because I did something similar albeit WAY simpler. Quote Link to comment Share on other sites More sharing options...
+MrFish Posted October 23, 2015 Share Posted October 23, 2015 You currently can use the "-o" option to specify the output file, is this not enough? Perhaps the same "-o" option could specify "extension" if specified as "-o .bst", with only a dot. Actually no problem/rush on that. I just changed the program defaults using a hex editor. Quote Link to comment Share on other sites More sharing options...
dmsc Posted October 24, 2015 Author Share Posted October 24, 2015 Hi!, Well, I published the new version, now "Version 9": The mayor change in this version as adding support for PROC with parameters and local variables, converted to standard TurboBasic XL code on output. Note that currently the parser adds too many extra variables, an optimization pass that removes unused variables will be added later. There are other minor new features: - Check all EXEC targets and print error if not matching to the corresponding PROC. - Allows the "-o" option to specify an output file name extension (asked by MrFish). And other code changes: - Cleanup the PEG parser to use standard constructs to parse most tokens. - Use a common dynamic array implementation for all the arrays in the code. - Cleanup warnings when compiled with GCC -Wextra. As always, just download from https://github.com/dmsc/tbxl-parser/releases 2 Quote Link to comment Share on other sites More sharing options...
dmsc Posted October 24, 2015 Author Share Posted October 24, 2015 Hi DMSC, This is an interesting project. I've not seen anything like this before. What was your inspiration for it? Was it to help you with a project that you had/have in place? 10 liners contest. I know because I did something similar albeit WAY simpler. Yes , I started the parser as a rewrite of my old perl script, as a simple PEG grammar that understood the statements and replaced them by shorter versions and joining of statements with ':'. It is really fun to write this as I though new ways to enhance the parser and add more stuff. 1 Quote Link to comment Share on other sites More sharing options...
dmsc Posted October 24, 2015 Author Share Posted October 24, 2015 Hi!, Don't forget the warning for recursive calls... Well, It was not so easy, so currently I don't even try to detect them. The problem is that to detect indirect recursive calls (where PROC A calls PROC B who calls PROC A again), we need to construct some type of call-tree. But, as the BASIC code is unstructured, any GOTO/GOSUB/GO# inside would invalidate all the call tree construction. The program currently stores the full parsed source as an expression tree with all statements pointing to the next one in the left leaf and to the statement tokens in the right leaf. And all the tokens are also represented in the same tree. As an example, the program: color 2 plot x+1, y endis represented by the following tree: By analyzing the control-flow, I could put each PROC or subroutine as a new leaf in the main tree, this could allow to build a call-tree and also optimize variable usage (like reusing variables from one PROC in another). Well, my current plan is to try to generate code from the parse tree directly, without really understanding the flow, but perhaps in a future a new "restricted" subset of the language could be optimized better. 1 Quote Link to comment Share on other sites More sharing options...
dmsc Posted October 24, 2015 Author Share Posted October 24, 2015 Hi again!, And another minor update, "Version 9.1": - Adds the unused variable removal optimization, this removes all unused variables after all the other optimizations, this makes the PROC/EXEC with parameters use less variables. - Always runs some optimization passes, this simplifies generation of BAS and short listings. - Fixes listing programs with too many variables. - Remove limitations of 256 variables, the parser now handles an unlimited number of variables or procedures. - Sort variables by usage, reducing binary size in programs with many variables. As usual, download from https://github.com/dmsc/tbxl-parser/releases I will begin to (slowly) write the compiler part now, but I will need more test programs to ensure proper compilation. If anyone has a program that want to compile, you can post here and I will use it for the "test-suite". Perhaps I should post this to the programming forum, what do you think? 3 Quote Link to comment Share on other sites More sharing options...
matosimi Posted October 25, 2015 Share Posted October 25, 2015 dmsc, honestly THIS S#IT IS SICK! example in your original post is mindblowing... ...but I don't code in Basic nor TB so I won't use it. Anyway, respect! Quote Link to comment Share on other sites More sharing options...
+MrFish Posted October 26, 2015 Share Posted October 26, 2015 I will begin to (slowly) write the compiler part now, but I will need more test programs to ensure proper compilation. If anyone has a program that want to compile, you can post here and I will use it for the "test-suite". Perhaps I should post this to the programming forum, what do you think? Hi and thanks again for all your great work here. I can provide some more source code to work with, but it will all be sent in PM as before. Just as a note, the program I sent you before was not working when compiled -- even though no compile errors. This was true before I used it in conjunction with your program and after. I may not be adhering to some known issues for the compiler though; I haven't checked on that. I know there is a thread in the programming forum that talks about it. No matter for me right now, but it will probably be no help in your compiler tests at the moment. I don't think it matters what forum you post about your work. It'll surely get the most attention here, but those who care will find it in either place. Your call. Quote Link to comment Share on other sites More sharing options...
tschak909 Posted October 29, 2015 Share Posted October 29, 2015 Have you looked at LLVM? It has 99% of what you need: http://llvm.org/docs/tutorial/ -Thom Quote Link to comment Share on other sites More sharing options...
dmsc Posted November 2, 2015 Author Share Posted November 2, 2015 Hi!, Have you looked at LLVM? It has 99% of what you need: Yes, I know LLVM, it is a nice library for code generation. But, it does not have support for 6502 Also, most of the optimizations passes of current state of the art compilers (like LLVM or GCC) are not really suitable for the kind of assembly for a processor like the 6502, or for a language like old BASIC dialects, so I think is better, for simplicity, to have my own simpler code generator. What I'm currently doing is converting the BASIC statements to a lower level pseudo-language, currently this code: 10 GRAPHICS 8 20 SETCOLOR 2,0,4 30 COLOR 1 40 PLOT 10,100 50 DRAWTO 100,10 60 GOTO 60 is converted to this: let @ral = 8 call bas_graphics~ let @ral = 4 let 710~ = @ral let @ral = 1 let COLOR~ = @ral let @ral = 100 let ROWCRS~ = @ral let @rax = 10 let COLCRS~ = @rax call bas_plot~ let @ral = 10 let ROWCRS~ = @ral let @rax = 100 let COLCRS~ = @rax call bas_drawto~ # @_lnum_60 go# @_lnum_60 In the pseudo-code "@ral" is an 8 bit register (will be converted to the 6502 accumulator) and "@rax" is a 16 bit register (will be converted to the A/X register combination in the 6502). The "~" at the end of words represents that the word is an assembly label. After finishing the code generator, all the "bas_*" assembly support routines will need to be written, I'm planning on generating ca65 compatible code, so the ld65 linker can be used to integrate the used routines in the output program. 2 Quote Link to comment Share on other sites More sharing options...
+Larry Posted November 2, 2015 Share Posted November 2, 2015 Is any significant portion of the work you are doing flexible enough to be useful later on in generating code for other languages? Specifically, a "converter" for Basic to Action! ? I tried that some years ago, but was in over my head! -Larry Quote Link to comment Share on other sites More sharing options...
dmsc Posted November 2, 2015 Author Share Posted November 2, 2015 Hi!, Is any significant portion of the work you are doing flexible enough to be useful later on in generating code for other languages? Specifically, a "converter" for Basic to Action! ? I tried that some years ago, but was in over my head! Yes, but with some restrictions. To generate any output from the BASIC sources, you simply write code that converts all BASIC statements to your output code. The difficult part is that many BASIC statements will not have any equivalent in other languages, so you will end writing them as subrutine calls. The code will look like: void convert_to_action_ex(expr *ex, output *out) { if( !ex ) return; // Convert LEFT expression convert_to_action_ex(ex->lft, out); switch( ex->type ) { case et_c_number: // Convert constant number break; case et_var_string: // Convert string variable break; //... and so on ... } // Convert RIGHT expression convert_to_action_ex(ex->rgt, out); } void convert_to_action_stmt(expr *ex, output *out) { if( !ex || ex->type != et_stmt ) return; switch( ex->stmt ) { case STMT_END: // Convert END statement break; case STMT_PRINT: // Convert PRINT statement, this will be dificult! break; //... and so on ... } // Convert next statement convert_to_action_stmt(ex->lft, out); } If the output language is different enough, perhaps it will be easier to convert the "low-level" code that I'm generating now, but the resulting program will not be pretty On the other hand, if you want to generate code from another language, you can remove all the parsing code and replace it by your own parser, reusing the optimization passes and the code generator. 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.