Jump to content
IGNORED

Developing a new language - ACUSOL


Pab

Recommended Posts

Wow! So many replies overnight. Hope I can get to them all.

 

About the compiler, will it output 6502 assembler ? and will code be compiled without use of a runtime like Action! has ?

 

It will not output assembler, it will output machine code in a DOS format binary load file. (.COM or .OBJ file in other words) the same way Action! does.

 

There will need to be a runtime library, BUT the library will be included with the compiler, released into the public domain, and easy to use. Also, since I plan on the compiler being multi-pass, it will be able to determine exactly which procedures it needs from the runtime library and only compile/link those into the finished product.

  • Like 1
Link to comment
Share on other sites

 

This way you will duplicate some slowness of CC65 :-/

In typical situation when you need an object you have an array of objects. Take a look at this example:

http://www.cc65.org/mailarchive/2010-09/8593.html

At least there should be an option how to place structures.

 

My other proposals are:

  • Do not use stack for local variables (therefore no recursion - recursion can be easily programmed with user defined stack).
  • Do not use stack for function parameters - make them global variables.
  • Perform call tree analysis to find out which local variables can be reused.
  • Allow to define zero-page variables for speed.
  • Allow to place data (& code) at defined places. This is important with all the Atari alignment issues (fonts, sprites, LMS etc.).
  • Avoid pointers in the language if possible. They (and recursion) disallow a most of optimizations.

Btw, take a look at http://www.dwheeler.com/6502/

 

Regarding objects and CC65, no. You're thinking of dynamic instances and dynamic variables. They can be handy when memory is at a premium during runtime and can allow for recursion, but they slow things down so much with allocation and garbage collection. Objects in Acusol will be static and allocated compile time just like Action record types.

 

I considered pre-allocating procedure arguments like you suggested, but that removes backward compatibility with Action (which I'm trying to preserve as much as possible so people can port and reuse old code) and makes it difficult to pass variables to already written ML routines or OS calls. Instead of using the 6502 stack for those arguments, I'm going the way Action did and storing them in zero page from $A0 to $AF and copying the first three into the accumulator, the X register, and the Y register respectively. That way you'll still be able to do things like:

PROC CIO=$E456

    ...
    CIO(0,dev * $10)

should you want/need to.

 

Call-tree analysis: Good idea. A little outside the scope of the moment but certainly possible with a multi-pass compiler. Certainly something to consider while working on the native Atari version of the compiler, and can be added back into the PC version for those who want to cross-develop.

 

Zero-page variables: just as with Action, they can be defined by the programmer when speed is needed. For example, "BYTE counter=$80" or "CARD adr=$B3" or even "FLOAT fr0=$D4" for masochists. In the current version of the code the only zero-page locations the language will need are $A0-$AF (for argument passing) and $CB-$CF for banked RAM code. The $Ax segment is (as I said above) for backwards compatibility and the locations for banked RAM can be moved if we want. Maybe to $B0-$B5 to keep the language's zero page usage contiguous? And, of course, anyone using FLOAT variables will be sacrificing the floating-point zero page locations.

 

As with Action, variables and code can be placed anywhere, even in banked RAM. A new MODULE statement will place code or variables wherever the programmer wants it.

MODULE ORG=$2000  // Place the following on a 4K boundary.

And variables can either be defined in a block (with others as part of a module) or specifically as wanted. "CARD ARRAY datalocations[200]=$5000:03" for example.

 

As for pointers, I'm including them mainly for backwards compatibility. They're less necessary now, but those who want to use them (or have old code to adapt) can do so.

 

Pointers, by the way, will now be three bytes long instead of two since they will also point to a bank number. We might need to add a compiler directive to switch them to two bytes for those using old code that expects them to be two bytes.

Link to comment
Share on other sites

IMO, Action! was a terrific starting point. The big weakness in Action! is math. Not only is it weak, but it is buggy and doesn't work as it is supposed to much of the time. Hopefully you can improve on this (even if it is not lightning-quick).

 

-Larry

 

Math is where I'm going to need the most help on this. Pascal/Delphi code to parse mathematical statements and some fast 6502 math routines would be greatly appreciated.

Link to comment
Share on other sites

I think ilmenit's and Mathy's suggestions above are very good.

 

About the extended banks routines...if you are planning to provide access only on a byte by byte basis, ie., BANKPEEK, you will introduce a lot of slowness. I would suggest you make provisions to map a page or a 1k block of extended ram at a time into a special reserved area in real memory. Most of the time, memory access is sequential, ie., a progam that has just read from byte 0002 is very likely to read 0003 next.

 

The BANKPEEK and BANKPOKE routines I included in my micro-runtime are mainly for accessing BYTE and CARD variables that reside in banks other than the one currently in use. I'll probably write a quick BANKFLOATPEEK/POKE to copy 6 bytes at a time using FR0 as a buffer.

 

Of course, if the memory space you need to use is already in the active bank, the compiler isn't going to BANKPEEK it; it will just access it automatically. So if you're running a procedure in bank 1, all of its local variables (unless you specified that all variables be somewhere else all together) will already be in that bank and immediately accessible; globals that you've got in bank 0 will need to be banked in and out.

 

For moving larger blocks, like copying strings or blocks of data with a Move function, is there a good buffer to use? I want to leave Page 6 untouched for noob compatibility. My gut reaction would be to use the cassette buffer so I can swap in/out 128 bytes at a time when needed.

 

Of course, those copy routines would only be needed if the location being copied to is in a different bank between $4000 and $7FFF. If you tell the program to Move($4000:01,$2000,2048) the compiler would just swap in bank 1, copy the 2K, and then go back to whatever bank you were using before.

 

I was also wondering if there would be any point to providing some support for allowing a programmer to define variables in OSRAM? Maybe have the compiler treat references to bank 255 as meaning locations in OSRAM? Or would that be too much work for too little benefit?

Link to comment
Share on other sites

Cassette buff is a good idea I think, if 128 is enough...I think OSRAM is a valuable resouce ( 10k iirc ) but there's some overhead involved that isn't present is just banking, dealing with interrupts is one issue I think.

 

Keep in mind that for banked code, any code that is running in the bank area is going to be swapped out itself when it requests a byte from another bank, so having a protected location like cassette buf that is not in the banked space is handy.

Link to comment
Share on other sites

Last night I realized something that might be able to come of the way I structured my parser is that it can recognize STRING ARRAY and STRING POINTER ARRAY as legitimate definitions.

My gut reaction was to not allow STRING ARRAY since we're talking about huge heaping gobs of memory that can be eaten up, but the ability to have an array of pointers to strings can be useful if you want to create a table of static strings for something like error reporting or exception handling. Perhaps something like

MODULE VARIABLES=$7000:1

STRING POINTER ARRAY ErrorText[256]

PROC CreateErrorCodes
   ErrorText[128]="BREAK pressed"
   ErrorText[129]="IOCB already open"
   ErrorText[130]="Device does not exist"
   ErrorText[131]="Write-only device"
   // ...and so on
RETURN

PROC Trap(BYTE Err)
   Print("Error ")
   PrintB(Err)
   Put(32)
   PrintE(ErrorText[Err]^)
RETURN

Is this something worth supporting? And how should I treat string arrays? Not allow them? Treat "STRING ARRAY" as "STRING POINTER ARRAY"? Support string arrays? Thoughts?

Edited by Pab
Link to comment
Share on other sites

Regarding objects and CC65, no. You're thinking of dynamic instances and dynamic variables. They can be handy when memory is at a premium during runtime and can allow for recursion, but they slow things down so much with allocation and garbage collection. Objects in Acusol will be static and allocated compile time just like Action record types.

 

Nope, this is not related to dynamic instances but 6502 addressing. In given example there was no dynamic memory allocation.

If you have array of objects O with properties x,y,z then instead of placing them in memory: x1,y1,z1, x2, y2, z2, x3, y3, z3 in many cases it's better to place them as x1,x2,x3, y1,y2,y3, z1,z2,z3. Then access to the n-th object property f.e. O[1].y is as fast as accessing array y[1]. You don't have to calculate address of O[1]. This is much faster on 6502. The only problem is with built in functions like memset, memcpy, memmove that can't work lineary, but the advantage for other code is huge.

 

Example:

CLASS position

BYTE x

BYTE y

END

 

position ARRAY p(5);

 

FOR i=0 TO 5 DO

p(i).x=value1;

p(i).y=value2;

OD

 

When in array then storing to object p is simple:

ldy i

lda value1

sta px,y

lda value2

sta py,y

Edited by ilmenit
  • Like 1
Link to comment
Share on other sites

Cassette buff is a good idea I think, if 128 is enough...I think OSRAM is a valuable resouce ( 10k iirc ) but there's some overhead involved that isn't present is just banking, dealing with interrupts is one issue I think.

 

Keep in mind that for banked code, any code that is running in the bank area is going to be swapped out itself when it requests a byte from another bank, so having a protected location like cassette buf that is not in the banked space is handy.

 

Of course, most access to code in banks you aren't currently in is going to be accessing variables, most of which are one through six bytes long and would/could be handled in the bankram micro-runtime. A procedure in bank 1 accessing a CARD or INT variable in bank 2 would have the compiler generate

 LDA #<ADDR
 STA BANKADDR
 LDA #>ADDR
 STA BANKADDR+1
 LDA #2
 STA BANK
 JSR BANKCARDPEEK
 STA VAR
 STX VAR+1

An access to a string variable in a different bank would require the buffer.

 LDA #sourcebank
 STA BANK
 JSR BANKSELECT
 LDX var1 ;Length in zeroeth byte
 TAX
 PHA
 CPX #128
 BPL TWOPASS ;Not included here, but you get the idea
LOOP LDA var1,X
 STA $400,X
 DEX
 BNE LOOP
 JSR UNBANK
 LDA #destbank
 STA BANK
 JSR BANKSELECT
 PLA
 TAX
LOOP2 LDA $400,X
 STA var2,X
 DEX
 BNE LOOP2

Because of the length of this code (46 bytes), I'll probably include it in the micro-runtime so it can get down to something like

 LDA #<var1
 STA $B0
 LDA #>var1
 STA $B1
 LDA #<var2
 STA $B2
 LDA #>var2
 STA $B3
 LDA #bank1
 STA $B4
 LDA #bank2
 STA $B5
 JSR BANKSTRINGCOPY

15 bytes each time it's called.

Link to comment
Share on other sites

Does it support long int ?

Array support 2D ? 3D ?

 

Can it use nested functions or procedures ?

 

I can't recall action! but I think I must have procedures/functions in a certain order otherwise it didn't work, I think or am I mistaken. When there is a certain order needed in action! can you lift this limitation ?

 

Processor support please, so a flag for 6502 and one to generat 65816 code.

Link to comment
Share on other sites

ilmenit's point about table layout is extremely right, if speed is a consideration. I had to learn about this recently while working on the TCP stack for dragoncart, I originally laid out the table of connection structures in the 'intuitive' way, and found myself calculating a lot of index offsets and fooling around with ZP indirect addressing. After reworking it into parallel low byte/hi byte 'stripes' its much easier and faster. Not sure how this maps exactly on to what you're talking about, I just wanted to say the point is valid as far as speed goes.

  • Like 1
Link to comment
Share on other sites

Yes to inline ML (probably inline assembly code as well, certainly in the PC version perhaps in the native Atari).

 

Yes to custom and editable runtime.

 

I don't think I'll have nested functions or procedures. At least not at first.

 

As for having things in a certain order, Action! had it that way because it was a single-pass compiler so you had to define a procedure before you called it. So if you needed the result from a function called "Parse," you needed to have "Parse"'s code before the procedure calling it. Most languages for modern platforms also have this requirement.

 

I thought over this long and hard. I knew I wanted a two-pass compiler so we could discard code from the runtime libraries that wasn't needed. And probably procedures and functions from the main program that aren't called, either. (Say just offer a hint on compiling: "Hint: FUNC Unnecessary() is never called.") This means that we would know the function exists before it's called, BUT would we know where it was in the code? That's the rub.

 

The only way I could think of handling this would be to assign the address of every procedure, function, and variable on the first pass through instead of on the second. Unfortunately, this would lead to some ugly situations because we would either need to know exactly how many bytes each statement would generate in compiled code (which would be a pain in the ass whenever we modified the compiler to make it tighter or faster), or we would have to just have a lot of vectors that JMP to the actual code and then go back and plug the actual addresses into those JMP statements. Much more work (especially on the Atari where the compilation will almost certainly be directly to disk with no intermediate stay in memory) and some really bad code resulting.

 

On the first pass we might be able to know the precedence of procedures, so we know that we need to compile a function called before the procedure that calls it. We'd have to jump around in the source code, almost (but not quite) to the point of re-writing it on the fly. Probably not feasible at this time.

Link to comment
Share on other sites

IMHO: Since everything is going to be published and the way it is going to be implemented, it isn't necessary to implement any features other then 6502/8 bit at this time.

 

For instance, with a separate run time, anyone that wants to can rewrite it to take advantage of other processors. Even is someone wanted to do it for the 6809 version of the Atari 8 bit, all that would be necessary is a few mods to the calling convention and run time. With any high level language there is a degree of abstraction that lets them run on other hardware.

 

For that matter, you could rewrite the Action! run time to take advantage of 6502/65C02/65816 and just change what run time you included at compile time.

 

Just jerking around, I've compiled C code to run on an 8 bit Nintendo and Action! code on a 5200. Really isn't terribly difficult if you know the details of the compiler and the hardware you want it to run on.

 

It's one of those mutually exclusive things for some reason: You need to know some assembly language to accomplish this. If you know assembly, you turn your nose up on a high level languages. Too much anti high level bias among assembly language programmers. More likely to be told ~there already is a 5200 development language called assembler then to be shown how to do it.

Link to comment
Share on other sites

Since there's been a lot of questions about the runtime library, its changeability, and the need for one, let me bring up how I picture this being handled.

 

Moving from Action on the Atari to Turbo Pascal on the PC back in the 1990's, I grew to love the way it handled what TP calls Units, as compared to the INCLUDE system used by Action and C. I think I want to bring a degree of this to the new language.

 

I picture breaking the RTL up into segments, each handling a different area. We could then have the first statement of a program or segment of code be one telling the compiler which "units" of the RTL are used by the code that will follow.

 

For example, the first statement of an ACUSOL program might be along the lines of

USES DOS,SOUND,PMG,DISPLIST

This would tell the compiler that the program uses routines, variables, or classes defined in files called DOS.ACU, SOUND.ACU, PMG.ACU, and DISPLIST.ACU (for disk routines, sounds, player-missile graphics, and custom display lists).

 

This would also allow us to have "units" of code that require code from other units, and have the compiler know what it needs to compile from each. Let's say you have units called TEXT and STREAMS, both of which require DOS routines. Those two units would have USES statements that reference DOS. Upon loading the first unit, the compiler would know it needs DOS, and that it already has it when it loads the second unit.

 

This way you don't have to remember that certain segments require others, and we won't end up with multiple INCLUDEs by accident.

 

INCLUDE will, of course, still be supported for backwards compatibility.

Link to comment
Share on other sites

 

It's one of those mutually exclusive things for some reason: You need to know some assembly language to accomplish this. If you know assembly, you turn your nose up on a high level languages. Too much anti high level bias among assembly language programmers. More likely to be told ~there already is a 5200 development language called assembler then to be shown how to do it.

 

That's true. But assembler is not an easy language to master. Especially when you have tasks that seem simple when you hear about them (player-missile graphics) and a pain in the ass to implement in assembler. Even macro-assembly is, while easier, not always the easiest thing to master.

 

Higher level languages make things easier, and more accessible to hobbyists as opposed to diehards. How many people have come up with 2600 homebrew games that would never have bothered if it weren't for things like bAtari BASIC?

 

My thinking with this was to put more of the power previously available only through ML to the average user. And I thought a structured language like Action! would be best because it would make it easier (if not require) the programmer to create more efficient code.

 

And with an RTL broken up into Units, it would actually be a lot easier to cross-develop with compiler switches. Say we re-appropriate Action's DEFINE statement to not only set code macros, but to be used for compiler switches and IFDEFINE to examine them. Then we could have a program start off with something like

IFDEFINE target="5200" THEN

   MODULE ORG=$4000 VARIABLES=$0700

   USES CTRL52,SOUND52,PMG52,SCREEN52

END

IFNDEFINE target="5200" THEN

   MODULE ORG=$1F00

   USES CONTRLRS,SOUND,PMG,SCREEN,DOS

END

This way you could completely develop and debug a 5200 game on an 8-bit computer, then compile a 5200 version by just adding DEFINE target="5200" to the top of your code. On compile, the routines in the 5200 units would be used instead of the "standard" units. This way a Print() command wouldn't call CIO to print to IOCB #0, but poke the appropriate bytes into screen RAM, And references to registers for POKEY and GTIA would go to their correct locations on the game console.

Link to comment
Share on other sites

Oh, and as regards generating code for other processors, I've never used anything other than stock Atari processors and don't know from new capabilities and opcodes. Anyone who wants to write versions of code for the alternate processors once I have the guts of the compiler finished, it would be greatly appreciated.

Link to comment
Share on other sites

I've had my first major change of heart on a subject after reading the comments here and mulling things over.

 

I've decided not to use the area of $A0 through $AF to "push" arguments passed to a procedure. If it's needed for old Action! code being adapted for some reason, a compiler switch will imitate this behavior:

DEFINE parameters="action"

If this switch is on, then the arguments will be "pushed" and copied into registers. If not, then all parameters will be static local variables and stored directly there.

 

This may take up a few extra cycles while calling the procedure (since it takes slightly longer to write to a non-zero page location than zero page) but will be more than made up for by not having to copy the parameters back.

 

For ML calls that need registers set, I will be creating three pseudovariables called 6502_A, 6502_X, and 6502_Y to provide direct access to the accumulator, X, and Y as needed right before or after a procedure call. So where Action had

PROC CIO=$E456

   CIO(0,$10)   ; CIO call with IOCB #1

the equivalent in Acusol (without the Action parameters switch) would be

PROC CIO=$E456

   6502_X = $10
   CIO

A little more ungainly in source code, but faster in the long run since the accumulator can be skipped in the equation and the pseudovariable assignment will be translated by the compiler to a simple "LDX #$10"

  • Like 1
Link to comment
Share on other sites

Wow! So many replies overnight. Hope I can get to them all.

 

 

It will not output assembler, it will output machine code in a DOS format binary load file. (.COM or .OBJ file in other words) the same way Action! does.

 

There will need to be a runtime library, BUT the library will be included with the compiler, released into the public domain, and easy to use. Also, since I plan on the compiler being multi-pass, it will be able to determine exactly which procedures it needs from the runtime library and only compile/link those into the finished product.

 

I have to admit that although I once had an Action! cartridge, I never used it. Sold it a few years ago. So I'm slightly unclear about the format it'll be outputting. Will it produce a compound file? The problem with compound files if someone wants to develop a piece of cartridge-based software is that the block headers need to be stripped out or the cartridge needs a loader which will load each block into their proper memory locations.

 

Two things I'd personally like to see would be:

 

- An inline assembler for those of us who might want to inject a little bit of extra speed into our code. Just looked back in the discussions and can see it's planned. Yay!

- Interrupt functions. Basically they're functions which, when compiled, instead of ending with an RTS, they end with an RTI.

 

They're something I used a lot in a high level language called PL65 that I used extensively back in the day.

Link to comment
Share on other sites

 

I have to admit that although I once had an Action! cartridge, I never used it. Sold it a few years ago. So I'm slightly unclear about the format it'll be outputting. Will it produce a compound file? The problem with compound files if someone wants to develop a piece of cartridge-based software is that the block headers need to be stripped out or the cartridge needs a loader which will load each block into their proper memory locations.

 

- An inline assembler for those of us who might want to inject a little bit of extra speed into our code. Just looked back in the discussions and can see it's planned. Yay!

- Interrupt functions. Basically they're functions which, when compiled, instead of ending with an RTS, they end with an RTI.

 

They're something I used a lot in a high level language called PL65 that I used extensively back in the day.

 

The compiler will only create compound files when absolutely necessary. Currently, the only time compounds are formed are when MODULE statements switch the place where the subsequent code will be generated. So if you write all your code in a contiguous block, then there will be no need to add additional segments. At least not until the end when a segment is used to tell DOS where to run the program, and that's easy to strip out.

 

The real fun comes when you've got code that gets put into an external bank. When it hits that module, the compiler closes out the current segment, starts a new segment to store the number of the bank it needs to switch to into the appropriate address, creates a segment to INIT the bank-switching routine, then a new segment for the new code. Then the next module loads, it creates a segment to switch back to the main bank before it does anything else.

 

As for interrupts, sure. Maybe a key word INTERRUPT instead of PROC?

Link to comment
Share on other sites

Oh, and let me add that cartridge development was one of the reasons that I added the VARIABLES keyword to allocate variables someplace other than within the code the way Action does. If you're designing a stand-alone cart to run without DOS you could do "MODULE ORG=$A000 VARIABLES=$0700" to allocate your variables in RAM while your program will be stored in ROM. A cart to be run with DOS would just need to put its variables in a safe spot in RAM (or even in a bank).

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...