Jump to content
IGNORED

Creating bank switched cartridges with gcc


TheMole

Recommended Posts

In trying to come up with a strategy to win back memory for Alex Kidd, I was thinking about stuffing some code in a cartridge, so I can win back some of that 32kb expansion memory. Given that I'm currently already at nearly 16k of executable code (including constants), and that I still need to add a good number of features, I need to find a way to create bank switching software with gcc. What follows is a write-up of my ideas, not everything has been tested, and I'm looking for a sanity check: will this work, am I missing something that could simplify things?


1. Multiple pieces of code at the same location


The first thing we need to do when hacking support for banked memory (such as bank switched cartridges) in gcc, is to tell the compiler that specific pieces of code will run from the same physical address space. In the case of a program designed to run from cartridge, this would be 0x6000.


By default, gcc will put all executable code into a section called .text, and you can tell the linker to position this code at any location in memory by using command line options (--section-start .text=0x6000), or by creating a bespoke linker script and adding a properly configured SECTIONS section:




SECTIONS
{
. = 0x6000;
.text : { *( .text ) }
. = 0xa000;
.data : { *( .data ) }
.bss : { *( .bss ) }
}


(Note: the above example requires a system with 32k memory expansion installed, since it puts all variables in expanded memory. It also requires a crt0 implementation that copies the initialization values for variables in the .data segment from somewhere in ROM or from disk to 0xa000)


Since all code is in the .text segment by default, the linker will just start filling up memory with code from 0x6000 onwards, blasting past 0x7fff if the code segment happens to be larger than 8k and in the process creating a useless image for our purposes. At the very least, we can define our memory layout in the linker script to get a warning when one of our blocks exceeds the maximum size. We can do this by adding a MEMORY section to the linker script (there's no command line equivalent of this), and changing the SECTIONS section accordingly:




MEMORY
{
cart_rom (rx) : origin=0x6000, length=0x2000; /* cartridge ROM, read-only */
lower_exp (wx) : origin=0x2080, length=0x1F80; /* 8k - 128 bytes */
higher_exp (wx) : origin=0xa000, length=0x6000;
scratchpad (wx) : origin=0x8320, length=0x00e0; /* 32b is for workspace */
}


SECTIONS
{
. = >cart_rom;
.text : { *( .text ) }
. = >higher_exp;
.data : { *( .data ) }
.bss : { *( .bss ) }
}


Now, whenever the .text section exceeds 8k, the linker will throw an error and abort. At least we'll know our program is too big to fit in the 8k, but it would be even better if we could stuff more code in other parts of memory. Unfortunately, ld will not do this for us, and we'll need to explicitly assign code to different sections in our source files by adding attributes to the function definitions. Supposing we already have filled our 8k of cartridge ROM, we could for instance decide to put additional functions in the lower 8k of the 32k memory expansion. First we add the section attribute to each function we want to put in the lower memory expansion area:




void somefunction(int somearg) __attribute__ ((section ( .moretext )));
void somefunction(int somearg)
{
// some code
}


We now have code that will get put in the .moretext section, so we need to tell the linker where to put this code (assuming the same MEMORY section as in the example above):




SECTIONS
{
. = >cart_rom;
.text : { *( .text ) }
. = >lower_exp;
.moretext : { *( .moretext ) }
. = >higher_exp;
.data : { *( .data ) }
.bss : { *( .bss ) }
}


(Note: again we need to remember that the cart will need to load the contents of section .moretext from somewhere in ROM or from disk and copy it to the lower memory expansion at 0x2080)


In theory, we could automate the annotation of functions by doing two compilation passes: one with all code in the standard .text segment to discover the size of each compiled symbol, and one that uses that info to assign individual functions to the two available sections. In practice, I imagine this is doable enough by hand for most programs. Also, on our platform gcc doesn't seem to support calculating the size of individual compiled symbols, so by hand it is.


So now we are able to put code into two different physical locations in the TI's memory, but that still doesn't allow for bank switching. As we said at the very beginning, for that we need to tell the linker that two or more sections of code need to target the same memory area. Turns out that we can do this with the OVERLAY command:



SECTIONS
{
OVERLAY >cart_rom : AT 0x0000
{
.text { *( .text ) }
.moretext { *( .moretext ) AT ALIGN(0x2000)}
}
OVERLAY >higher_exp : AT ALIGN(0x2000)
{
.data : { *( .data ) }
}
.bss : { *( .bss ) }
}


Running the linker with a script with the above SECTIONS section will give us a binary that contains three 8k banks: .text, .moretext and .data (we ignore .bss, because those are just zero-initialized variables and are taken care of by our crt0 implementation). The code in the first two banks will expect to run at 0x6000, and expects to find the initialized data from the .data section at 0xa000. Given all this, we should be able to generate binaries in the right format to support bank switching.



2. Actually switching banks in code


That was the easy part, after all, it didn't require any coding :). However, the trickiest part to bank switching is to write code that can cope with switching from one bank to another (and have that new code return). There are a couple of ways to do this (some more cumbersome than others), but they will all share a common requirement: you need to keep a "bank switching stack" (for lack of a better term). That is to say, when code in bank 1 calls a function in bank 2, we need to save the return bank "location" (i.e. what enables "bank 1") somewhere. If that function in bank 2 then in turn calls a function in bank 3, we need to do the same thing without overwriting the first return bank location. This is a recursive problem, so we need a stack.


The idea location for the bank switching stack seems to be in scratchpad, since it will be relatively small and that part of memory is always available. By putting the pointers to this stack in a separate section, we can use the linker script to put it there (or wherever else is convenient). The management of the stack needs to be done right before calling a function in another bank, and right before returning to the calling bank at the end of a function.


On a select number of platforms, GCC supports so-called 'far' and 'near' pointers and/or function attributes, which could be used to implement two different function prologues/epilogues depending on the type of function call that needs to be done. Unfortunately, the tms9900 platform implementation does not support these attributes.


GCC also has support for instrumenting each function call and return via the -finstrument-functions command line option. You need to implement your prologue and epilogue code in the following two functions somewhere in your code:


void __cyg_profile_func_enter (void *, void *) __attribute__((no_instrument_function));

void __cyg_profile_func_exit (void *, void *) __attribute__((no_instrument_function));


However, the call to and return from __cyg_profile_func_enter happens /before/ the call to the actual function, so it would take some serious wrestling with the C call stack to transparently implement bank switching in these functions.


Our last option is to instrument individual functions and function calls. This is certainly the most cumbersome implementation of all, but it is the only one which does not need embedded support in the compiler implementation itself. Instrumentation of the function call is relatively easy, keeping in mind that all manipulation of the bank switching stack needs to be done from within the calling bank and the absolute last command needs to be the one that triggers the switch to the next bank. The following process could be a usable implementation:


The caller (code runs in bank 1):

  • Writes the address and bank location of the intended callee in two registers (e.g. r0 and r1)
  • Invokes the trampoline
The trampoline (code runs in scratchpad/expmem):

  • Saves the current bank on the bank switching stack
  • Loads the new bank
  • Makes the call using the info in (e.g.) r0 and r1
The callee (code runs in bank 2):

  • Does stuff
  • Returns to the trampoline
The trampoline (code runs in scratchpad/expmem):

  • Loads the original bank (which is popped from the bank switching stack)
  • Returns to the caller

Or, in other words, every function call should be structured as follows:


caller calls trampoline(), trampoline calls callee, callee returns to trampoline, trampoline returns to caller.


Using this type of construct, the trampoline function needs to transparently pass on all arguments to the callee. The easiest way to accomplish this is the have a bespoke trampoline function for each "far" function we're looking to call (with a "far" function being any function that runs from a bank switchable piece of memory). Something like the following example:




// Our "far" function, in bank 2
int far_somefunction(int someint) __attribute__ ((section ( .bank2 )));
int far_somefunction(int someint)
{
// do something
return somevalue;
}


// Our trampoline function, in non bankable memory (e.g. scratchpad)
int somefunction(int someint) __attribute__ ((section ( .nonbankable )));
int somefunction(int someint)
{
// Set to bank 2, and push caller's bank on the stack
push_bank(2);

// Call far function
retval = far_somefunction(someint);

// Set caller's bank
pop_bank();

return retval;
}


Using this, we can safely call somefunction() (our trampoline function for far_somefunction()) from anywhere in our code, no matter which bank we're currently in and no matter where the calling code resides in memory. Furthermore, we can also still call far_somefunction() directly from within the same bank if we want to avoid the overhead of the bank switching and the trampoline function.


The big downside of course is that we now have one trampoline function for every "far" function we want to call, all with nearly identical function bodies, eating at our available non-bankable memory. Not a big deal if you plan on banking code in big chunks, but problematic if you have lots of little functions that you need to call from everywhere in your program. We could opt to create one generic trampoline function, using variable argument lists and function pointers, if we're really strapped for memory. The downside is that it would create even more overhead for every "far" function call you're looking to make.


Even with bespoke trampoline functions for each far function, it's a good idea to limit the number of bank switching calls you need to do, especially if you're writing an action game that needs to retain a high frame rate, given the fairly high overhead the bank switching introduces. If the compiler had support for naked functions (functions without prologue and epilogue), we could probably reduce the overhead to an absolute minimum, similar to what you'd get with pure assembly code, but unfortunately gcc doesn't support that attribute on our target.


I think the above is a sound strategy?

  • Like 2
Link to comment
Share on other sites

I haven't looked at how GCC for the 9900 handles the stack as far as calls/returns go so I really don't know the answer here.
If you implement your intermediate calls in C, will it generate additional stack handling?
In assembly the code stubs (you call them trampoline functions) you just bank switch and jump to the appropriate routine and on return you bank switch and return which eliminates additional stack or register manipulation overhead.
You do have to deal with making the assembler stubs know the pages & addresses of the final routines and you have to define the assembler stubs as external in the main code but it may create lower overhead.

Link to comment
Share on other sites

If you implement your intermediate calls in C, will it generate additional stack handling?

 

Yes, it will. Unfortunately, I see no way of getting around this without modifying the compiler itself. We'd need either naked functions (functions without prologue/epilogue, and this without stack manipulation), or direct support for bank switching in the compiler itself, I think.

Link to comment
Share on other sites

On SDCC, I've been doing very similar with the named sections, but I do all the bank switching manually. It requires a bit more focus, but it works well enough.

 

What I ended up doing instead of a bank stack was having a single variable that tracked the current bank index (which I called 'bank' for lack of creativity). Then any time I switched banks and needed to switch back, I just stored the current value in a local variable. The compiler then could decide whether that local variable was a register, on the stack, or whatever it wanted. Like so:

void wrapstars() {
 unsigned int old = nBank;
 SWITCH_IN_BANK6;
 stars();
 SWITCH_IN_PREV_BANK(old);
}

For code running in non-paged memory (ie: the 32k expansion) which calls paged functions, I don't need to save bank or restore it, just change and call.

 

The macro "SWITCH_IN_BANKx" performs the actual bank switch and updates 'bank' to the new value. 'SWITCH_IN_PREV_BANK' does the same thing, but based on the passed in variable instead of a hard-coded value. Naturally the trampoline functions themselves can't page - they need to either be in RAM or the same in every bank. ;)

 

 

#define SWITCH_IN_PREV_BANK(nOldBank) (*(volatile unsigned char*)0)=(*(volatile unsigned char*)nOldBank); nBank=nOldBank;
#define SWITCH_IN_BANK1 (*(volatile unsigned char*)0)=(*(volatile unsigned char*)0xFFFE); nBank=(unsigned int)0xfffe;

 

(note: this is for Coleco - the Coleco carts page on any access to the top of memory, thus the reads.)

 

It's another option if you don't mind taking some of the responsibility.

Edited by Tursi
Link to comment
Share on other sites

Actually, it seems to me there is some stack manipulation for the assembly code so the return addresses are right for the stub and original caller.

After some thought, I *think* the assembly stub just switches the page and then jumps to the function.

The return goes directly back to the caller without switching pages back.

So no extra stack handling

Since pages aren't restored that doesn't support calls across pages.

For that you need a stub with additional stack use like you are doing

Edited by JamesD
Link to comment
Share on other sites

I assume it's not possible to copy all your code from ROM into the 32K RAM and otherwise use the ROM for fetching static data only?

 

It is, but the static data is only 1.7k in total right now (since the graphics and maps are loaded from disk, currently) so it wouldn't free up all that much space. Having code in cart space in and by itself will be a big help even without banking (8k extra for code and constants), but I also just want to have this figured out for the future.

Link to comment
Share on other sites

What I ended up doing instead of a bank stack was having a single variable that tracked the current bank index (which I called 'bank' for lack of creativity). Then any time I switched banks and needed to switch back, I just stored the current value in a local variable. The compiler then could decide whether that local variable was a register, on the stack, or whatever it wanted.

 

Good point, gcc will push every instance of a local variable on its stack already when calling a new function, no need for me to duplicate this behavior and keep my own stack for tracking banks!

 

Also good to know that you've come up with basically the same solution, that probably means I'm not missing anything obvious (or obviously better).

Link to comment
Share on other sites

After some thought, I *think* the assembly stub just switches the page and then jumps to the function.

The return goes directly back to the caller without switching pages back.

So no extra stack handling

Since pages aren't restored that doesn't support calls across pages.

For that you need a stub with additional stack use like you are doing

 

Unfortunately, I think there's no way to integrate assembly code in such a way that it can interact with C functions without incurring the C function call overhead. The stack manipulations are done by the callee, not by the caller. So in theory you could do what you're suggesting to bank in pure assembler code (if you're careful not to mess with any of the registers that the compiler is expecting to be in a certain state), but never to bank in real C functions (which is my main goal).

Link to comment
Share on other sites

I'm not entirely sure what the suggestion was, but GCC does allow very good control of inline assembly, so you can actually specify what registers you need and what registers you impact, even allow the compiler to select the registers for you if you don't care which ones you use. I used that a bit in libTI99 for things like the joystick read. Really handy :)

Link to comment
Share on other sites

I'm not entirely sure what the suggestion was, but GCC does allow very good control of inline assembly, so you can actually specify what registers you need and what registers you impact, even allow the compiler to select the registers for you if you don't care which ones you use. I used that a bit in libTI99 for things like the joystick read. Really handy :)

 

I don't know if you could do it inline, I'd have to look at what GCC can do.

 

 

Unfortunately, I think there's no way to integrate assembly code in such a way that it can interact with C functions without incurring the C function call overhead. The stack manipulations are done by the callee, not by the caller. So in theory you could do what you're suggesting to bank in pure assembler code (if you're careful not to mess with any of the registers that the compiler is expecting to be in a certain state), but never to bank in real C functions (which is my main goal).

The assembly doesn't care about the parameters, it's just redirecting the function call and doesn't need to know parameters or return values.

 

It would look something like this. I don't have experience with 9900 assembly and my 9900 assembly book is toast so the syntax may be very bad, but you should get the idea.

The stack frame and registers should still be intact and high level languages normally don't assume the status bits are correct after a function call.

Or at least I think that's the case.

FunctionName:
   MOV  R0,TEMP       ; preserve R0.  only needed if you must preserve R0
   LI   R0,#RAMPAGE5  ;get the page setting for the BANKREGister
   MOVB R0,BANKREG    ;set the RAM bank
   MOV  TEMP,R0       ; restore R0.   only needed if you must preserve R0
   JMP  FunctionNameFinal

The function you are calling gets the stack and registers exactly as if it had been called directly and the return goes directly back to the caller.

The compiler doesn't know the assembly doesn't know anything about the parameters or return values, it just assumes and generates the proper code.

The linker *should* do the same. All it does is insert the address of the stub routine into the code.

So you'll basically have an assembly file with all the stub routines and defined them in your C code as external.

If GCC doesn't expect some register to be preserved, you could dump two lines of code by using that register for setting the banking register(s).

 

Your assembly stub routine names will have to match the C naming conventions for the function in order to link to it.

I'd have to look at the code GCC generates to tell you what to use.

It will have info on what type parameters and return code value are.

Other compilers I've used generated something like this:

__void*__function_int_long

 

The easy way to deal with it is to look at what GCC generates.

 

The jump in the assembly code will have to call the function name in the same manner.

 

Linking the assembler stub routines to the final paged code is the sticky part since it's not going to be in one file.

I don't know if the linker has an automated way to deal with paged code.

You may have to define the final address labels by extracting values from the object code for the paged functions.

Some cross compilers have an automated way to do this.

Edited by JamesD
Link to comment
Share on other sites

Thanks for the help James, a lot of what you're saying makes sense.

The assembly doesn't care about the parameters, it's just redirecting the function call and doesn't need to know parameters or return values.

The function you are calling gets the stack and registers exactly as if it had been called directly and the return goes directly back to the caller.

 

Actually, I was thinking this because I do want to be able to jump from one page to the other and then return again. That's why we're using a trampoline function to begin with, otherwise you could just do the bank switch right before and right after the function call.

The compiler doesn't know the assembly doesn't know anything about the parameters or return values, it just assumes and generates the proper code.
The linker *should* do the same. All it does is insert the address of the stub routine into the code.

 

Yes, you're right. I can create a pure assembly version of the trampoline code by including a pure assembler file and not doing inline assembly. I hadn't thought of that. All I need to take care off is that the assembly code doesn't mess with the stack or registers. However, since I will need the callee to return to the trampoline function (and not the original caller), I will need to cache and override the return address in r11 (this is where gcc stores the return address for BL's). That's not a big deal though.

So you'll basically have an assembly file with all the stub routines and defined them in your C code as external.

 

Yup, that's the part I hadn't thought of, but you're absolutely right.

Your assembly stub routine names will have to match the C naming conventions for the function in order to link to it.

 

I think that's easy, looking at my symbol table it seems like gcc just uses the actual function name, no decorations or anything.

 

 

Link to comment
Share on other sites

There are two ways to cache the return address, a small stack (which requires considerable overhead) or just a temporary variable.
One important thing to note, you don't return to the trampoline function you call, you return to a single function that returns to the original code.
Maybe do something like this:

FunctionName:
   MOV  R11,RETURNTEMP            ; preserve return address.
   MOV  BANKREG,R11               ; get the current RAM bank
   MOV  R11,RAMTEMP               ; save current RAM bank
   LI   R11,#RAMPAGE5             ; get the page setting for the BANKREGister
   MOVB R11,BANKREG               ; set the RAM bank
   MOV  #TrampolineExit,R11       ; return to trampoline exit function
   JMP  FunctionNameFinal

TrapolineExit:
   MOV  RETURNTEMP,R11
   JMP  R11                       ; I have no idea to code this on the 9900
;Or?
   JMP  (RETURNTEMP)
   

If you want to do more than one call across pages you will probably have to keep a stack which means the trampoline function has to save R11 in a temp space, sets R11 to a table index for that function and then calls the common function that saves the return address and memory page to the stack, sets the function page from the table and calls the destination function from the table. Think of the table as a 2D array. That's the smallest way I can think of implementing it.
You'll need a stack pointer in RAM and a small stack space. I'd point the stack pointer to the next available location if you can do something like this:

Function:
       MOV   R11,(STACK)               ;STACK points to next available location on the stack
       LDI   R11,FunctionTableIndex    ;load the index to the function table or the address with the call info
       JMP   TrampolineMain

TrampolineMain:
       ;decrement stack pointer here (if it builds down)

       ;save R11 in temp location

       ;push current memory page to stack
       ;decrement stack pointer

       ;load and set memory page of function from table based on value saved from R11

       ;set return address to TrampolineExit
 
       ;jump to function via address in table

TrampolineExit:
       ;pop previous memory page from stack
       ;set memory page
       ;pop return address from stack
       ;jump to return address


If the 9900 requires use of a register for the pointer you just have to save and restore that instead of the indirect(?) addressing call.

Edited by JamesD
Link to comment
Share on other sites

I make no promises on syntax but this is the general idea using a stack to track mem pages and return addresses.... and it's bigger than it probably needs to be.

I'm sure an experienced 9900 programmer can improve on it

;Function stub routines
Function1:
	MOV		FunctionTableAddress1,@FunctionTemp
	JMP		TrampolineMain
...
Function2:
	MOV		FunctionTableAddress2,@FunctionTemp
	JMP		TrampolineMain
...
;Trampoline function call code
TrampolineMain:
	MOV		R11,@R11Temp				;save R11 to temp location
	MOV		@STACK,R11				;STACK points to next available location on the stack
        MOV		@R11Temp,*R11+				;push original R11 value to the stack

	;push current memory page to stack
	MOV		@MEMPAGE,*R11+
	MOV		R11,@STACK				;save the updated stack pointer
	
	;load and set memory page of function from table based on value saved from R11
	MOV		@FunctionTemp,R11			;point R11 to function table info
	MOV		*R11+,@PageRegister			;set the memory page register
	MOV		@PageRegister,@MEMPAGE	         	;copy for the return
	MOV		*R11,@FunctionJump+2	 	        ;modify the jump address
	MOV		TrampolineExit,R11			;Set return address to TrampolineExit
FunctionJump:
	JMP		>0000					;jump to function, self modifying code changes it from >0000

;Trampoline return code
TrampolineExit:
	;pop previous memory page from stack
	MOV		R11,@R11Temp				;save R11
	MOV		@STACK,R11				;get the stack pointer
	DECT	        R11					;since stack points to next stack location
	MOV		*R11-,@MEMPAGE				;get the previous memory page info
	
	;set memory page
	MOV		@MEMPAGE,@PageRegiser
	
	;pop return address from stack
	MOV		*R11,ReturnJump+2			;self modifying code
	MOV		@R11Temp,R11				;restore R11
ReturnJump:
	JMP		>0000					;jump to return address

Edited by JamesD
  • Like 1
Link to comment
Share on other sites

Look up "gcc name mangling". It's actually pretty cryptic from what I just read.

 

That's for C++, not C. I was reading up on this, and in C symbols are apparently always unmangled (hence the need for the 'extern "C"' stuff when mixing the two languages. This tells the compiler that it can't mangle the C symbols).

Link to comment
Share on other sites

 

That's for C++, not C. I was reading up on this, and in C symbols are apparently always unmangled (hence the need for the 'extern "C"' stuff when mixing the two languages. This tells the compiler that it can't mangle the C symbols).

Du-Oh! Yeah, I was thinking about C++

Link to comment
Share on other sites

 

I make no promises on syntax but this is the general idea using a stack to track mem pages and return addresses.... and it's bigger than it probably needs to be.

I'm sure an experienced 9900 programmer can improve on it

 

Thanks! I'm definitely no 9900 assembler expert either, but that looks ok to me. I'm thinking we can skip the explicit stub routine definition though. I can create a macro in C that writes the address of the function I want to call to a well known location in memory (@FunctionJump+2, in your code) and call TrampolineMain directly. Like so:

extern void* FunctionJump;

#define somefunction(somearg) FunctionJump+2 = &_far_somefunction; _retval = TrampolineMain(somearg); _retval;

The only real benefit is that this consumes memory where the caller resides instead of in non-bankable memory (which is much scarcer). But of course, it will add said amount of code each time the function is called instead of just once. Hmmmm, your version is probably better in most circumstances.

 

*edit* another benefit of the macro is that we won't need an explicit lookup table for the function address. The function's bank can be added in pretty much the same way as well.

Edited by TheMole
Link to comment
Share on other sites

 

Thanks! I'm definitely no 9900 assembler expert either, but that looks ok to me. I'm thinking we can skip the explicit stub routine definition though. I can create a macro in C that writes the address of the function I want to call to a well known location in memory (@FunctionJump+2, in your code) and call TrampolineMain directly. Like so:

extern void* FunctionJump;

#define somefunction(somearg) FunctionJump+2 = &_far_somefunction; _retval = TrampolineMain(somearg); _retval;

The only real benefit is that this consumes memory where the caller resides instead of in non-bankable memory (which is much scarcer). But of course, it will add said amount of code each time the function is called instead of just once. Hmmmm, your version is probably better in most circumstances.

 

*edit* another benefit of the macro is that we won't need an explicit lookup table for the function address. The function's bank can be added in pretty much the same way as well.

Ok but where is the bank switch and switch back so you can call between banks?

If you bank switch there, you'll be in a different bank before your call.

If you are calling the trampoline code directly, it has to know what to set and call.

I guess i'm not following you.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...