Jump to content
IGNORED

Creating bank switched cartridges with gcc


TheMole

Recommended Posts

For my part, I got sidetracked into an ALC exercise that was aided and abetted |:) by comments wondering about parameter passing and, of course, my ignorance of the compiler. I was actually rather hoping for just such a concise exposition as you posted above, which obviates most of the ALC discussion for anything other than a sidebar learning exercise.

 

...lee

Link to comment
Share on other sites

As a test, I wrote a simple trampoline function and verified the assembly -- you'll have a hard time doing much better, I think.

  *targetBank;  // force a memory read to switch, but we don't need the result. The volatile makes it work.
  *old; // force a memory read to switch back
 movb *r2, r2           * read targetBank, which performs the bank switch

A nice thing about this function is it's completely position independent. You could build it without any special consideration, and manually copy it from ROM to RAM or scratchpad (it's only 32 bytes) for actual execution. It'd be hard to do too much better - even without the GCC considerations, as a generic trampoline function I don't know if I'd change anything there.

 

Wait, you read the address corresponding to the desired bank? I thought you were supposed to write to it to trigger a bank switch, how would reading work given that the first bytes of the cart (0x6000 and beyond) are read by the system to populate the cart menu?

  • Like 1
Link to comment
Share on other sites

C is a good language but a huge waste of memory to use.

 

Unix when converted from Assembly to C took up so much more space that it took a cut down version call Linux to fit on a standard PC.

 

Later with the marked increase in memory on Desktop PC Unix would fit, but with a huge cut down of Libraries.

 

Assembly takes more time and effort to get it right.

 

Modern optimising C compilers produce some amazing assembly code, often better than humans could produce. They're especially effective at re-arranging code to make some code redundant, therefore allowing code to be omitted completely. As humans, we don't tend to think like that. Old fashioned C compilers were quite poor code generators, it's true, but the state of the art has moved on a looong way since then!

 

I have coded in C in the past. I could never get the effing pointer syntax right. When do I use a * and when do I not use a *? It absolutely drove me effing nuts! It's probably because I learned from books, which always tended to tread carefully when dicussing the tricky subject of C pointers. If I'd been a trainee software engineer in a C shop I'd have had a mentor to get me on the right path. I don't think it's a fault of the language itself. Just my exposure to it. These days, if I'm doing a C type languge, I'll go for Java with strong types and good run-time error detection, massive class library, and cross-platform at the binary level ;)

 

I find (at least, on the 9900) I'm actually more productive writing in assembler myself than writing C! Assembler on the 9900 is sooooo easy and simple and nice. Of course, being familiar with it over nearly 30 years helps!

 

I feel old today :)

  • Like 1
Link to comment
Share on other sites

 

I have coded in C in the past. I could never get the effing pointer syntax right. When do I use a * and when do I not use a *? It absolutely drove me effing nuts!

 

 

Before I started to contribute to MESS I thought I can program C well enough. But still today, when I see lines like this

 

 

typedef mfmhd_image_format_t *(*mfmhd_format_type)();

 

it takes me some time to figure out what exactly I'm reading here.

 

Other typical lines from MESS:

 

 

template<class _Object> static devcb_base &set_intrq_wr_callback(device_t &device, _Object object) { return downcast<hdc92x4_device &>(device).m_out_intrq.set_callback(object); }

 

(both from code I recently wrote)

  • Like 2
Link to comment
Share on other sites

 

Wait, you read the address corresponding to the desired bank? I thought you were supposed to write to it to trigger a bank switch, how would reading work given that the first bytes of the cart (0x6000 and beyond) are read by the system to populate the cart menu?

 

Whoops, sorry.. too much time on the Coleco corrupting my thought patterns. Just change that to a write, no big deal. :)

 

 

void trampoline(void (*target)(), volatile char *targetBank) {
  volatile char *old = CurrentBank; // save the current bank
  CurrentBank = targetBank;   // update the cache variable
  *targetBank=0;  // force a memory write to switch
  target();  // call the target function
  CurrentBank = old; // update the cache variable
  *old = 0; // force a memory write to switch back
}

 

 

 def trampoline
trampoline
 ai   r10, >FFFC         * update stack pointer
 mov  r11, *r10          * save return address to stack
 mov  r9, @>2(r10)       * save R9 to stack (I called frame pointer, may be a temp)
 mov  @CurrentBank, r9   * save current bank in R9 ('old')
 mov  r2, @CurrentBank   * store R2 ('targetBank') into CurrentBank
 clr  r3                 * we need a zero - this is probably all I'd optimize if I did it by hand
 movb r3, *r2            * write to the cart to change banks ('targetBank' again)
 bl   *r1                * call the function ('target')
 mov  r9, @CurrentBank   * copy R9 ('old') back to CurrentBank
 clr  r1                 * we need a zero again
 movb r1, *r9            * write to the cart to change banks back ('old' again)
 mov  *r10+, r11         * get return address back from stack
 mov  *r10+, r9          * get frame/temp back from stack
 b    *r11               * return to caller

 

The unnecessary CLRs made me wonder if we could reuse an existing variable to save them. But, sadly, because the pointer if volatile, the compiler copies it to another register even if we try to get just the MSB. Still... two instructions over the optimal isn't too bad. :)

 

 

*targetBank = ((int)targetBank>>;

 

takes the MSB of the pointer, which should be directly accessible, and so gives this assembly:

 

 

 mov  r2, @CurrentBank   * store R2 ('targetBank') into CurrentBank
 mov  r2, r3             * make a copy of R2 for manipulation (so still redundant)
 movb r3, *r2            * but no manipulation needed, just do the write
Edited by Tursi
  • Like 1
Link to comment
Share on other sites

 

Modern optimising C compilers produce some amazing assembly code, often better than humans could produce.

 

CAN produce... ;)

 

A lot of modern programmers take the above statement as gospel these days, and while if you're writing code for a super scalar processor with massive pipelines and prediction and the like, it can be overwhelming to remember all the options and write good assembly by hand, if you're writing for a simpler processor (even the 386 or 486, which most PC software is still written for!), it's not that hard to beat the compiler.

 

If it /matters/, you should always examine the assembly code that the compiler produced and never just assume. :)

Link to comment
Share on other sites

 

I have coded in C in the past. I could never get the effing pointer syntax right. When do I use a * and when do I not use a *? It absolutely drove me effing nuts!

 

Oh... and this -- exactly the same times you would use a * and not use a * in 9900 assembly. ;)

Link to comment
Share on other sites

Oh... and this -- exactly the same times you would use a * and not use a * in 9900 assembly. ;)

 

 mov  r11, *r10          * save return address to stack
 mov  r9, @>2(r10)       * save R9 to stack (I called frame pointer, may be a temp)

 

I assume the above is how to use indirect addressing. Are there other ways to write the same, also considering any flexibility with different 9900 compilers ?

 

I'm asking since the choice of syntax / formatting is not all that consistent !? Think I remember another processor using parentheses only for indirect addressing ?

 

Just off the top of your head. ;)

Link to comment
Share on other sites

Well... generally assembler syntax is consistent only within a single CPU (sometimes not even then, but, the 9900 syntax is consistent except for the GCC assembler, but the order is the same there. :) )

 

So yeah, *r10 means to dereference R10 and store at the address pointed to. @>2(R10) means to take the address in R10, and add the fixed value 2 (it's generally used in the opposite way, to indexing arrays. IE: @ARRAY(R10) ).

 

I guess that's not terribly consistent, * in one case and () in others. Heh. :)

 

But... you know assembly pretty well, I thought, all your demos? Just haven't worked with the indirection much?

  • Like 1
Link to comment
Share on other sites

Not sound petty, but I posted working code. What are you guys trying to build?

So, once someone posts a solution to a problem nobody should ever try to come up with a different one?

 

Yes, you have a good solution and I already said so, but you said also requires at least 32 bytes for each call.

The OP said their code was already about 16K and the whole reason for doing this was to keep from running out of memory.

If you only have 10 banked calls it doesn't sound like much but think about it, for every 32 calls, that's 1K.

While I find it unlikely most people will need 100+ calls, it's definitely possible and someone will eventually want that back.

I can only imagine if someone gets ambitious and tries to port code someone posts for the Colecovision/Adam or MSX.

The 2nd stack only needs to be large enough to store enough for the greatest call depth across pages.

Just one call for your version takes 32 bytes. That's enough for 8 levels of calls and just two would probably handle anything most people will ever write on the TI.

The other reason for keeping a second stack was that it's transparent to the compiler for every function call no matter how many parameters and I didn't risk screwing up the stack for the compiler before knowing anything about it.

 

I guess I'll let you guys figure it out once you need the memory back, it's not like I'm likely to use it.

 

Link to comment
Share on other sites

So, once someone posts a solution to a problem nobody should ever try to come up with a different one?

 

 

Why don't you try quoting this part, hmm?

 

 

 

I'm happy to go away and let you guys play with it if you're just trying to think it through, I don't mean to belittle that effort. I'd just like to help and I don't understand the end goal.
Link to comment
Share on other sites

If you only have 10 banked calls it doesn't sound like much but think about it, for every 32 calls, that's 1K.

 

 

The solution is per prototype. I doubt you have a project with 32 prototypes - please, show me. Even if you do - you are bank switching. You have gone from 8k to 16k or more. You can afford 1k. to gain 7k more, can't you?

 

It's 32 bytes for the function CODE. Storing the call takes either no bytes or four bytes, depending on whether the function you call needs to store the return information to the stack. Setting up a completely alternate stack uses more than that, as does the code to manage it.

 

It will be very difficult to come up with a system that can handle any set of parameters without changing the calling convention -- there's no way to know what to save. I was trying to show you that it's not stack based like many systems - the parameters are usually stored in registers and you don't know how many. You said yourself that you don't understand 9900 or the compiler, I understand both and was trying to help.

Link to comment
Share on other sites

I think I see what you're trying to do... so even though I'm not allowed to help, I'll post this before I leave the thread. ;) The compiler isn't going to set up the call to your assembly stubs correctly unless the prototype matches the called function, which means you need the definitions anyway.

 

But so far as the code you posted, this will make it work with the C stack rather than trying to put another stack off to the side somewhere in RAM. You can still do that if you want, just remember to copy its value into a register before trying to dereference it.

 

 

;Function stub routines
Function1:
 ;push original return address to the C stack (we reserve all 6 bytes now)
 AI    R10,-4
 MOV  R11,@2(R10)
 BL  @TrampolineMain
 DATA Func1MemPage,Function1Address
...
Function2:
 ;push original return address to the C stack
 AI R10,-4
 MOV  R11,@2(R10)
 BL  @TrampolineMain
 DATA Func2MemPage,Function2Address
...
;Trampoline function call code
TrampolineMain:
 ;push current memory page to stack
 MOV  @MEMPAGE,*R10    ;save the last memory page
 ;load and set memory page of function from table based on value in R11 which points to the data
 MOV  *R11+,R0   ; copy the page value into mempage (R0 is free to use)
 MOV   R0,@MEMPAGE   ; save it off
 CLR   *R0           ; do the bankswitch
 MOV   *R11,R11      ; get the function address
 BL  *R11     ; call the function
 
 ;Trampoline return code
 ;pop previous memory page from stack
 MOV *R10+,R0        ; gets the old mempage (and increments the stack pointer)
 MOV R0,@MEMPAGE     ; save it
 CLR *R0             ; trigger the bank switch
 MOV *R10+,R11       ; get the return address and fix the stack pointer
 ;return via address from stack
 B  *R11       ; return to caller

 

Didn't test it, but should be correct syntax. HTH.

 

This code takes 14 bytes per stub, 28 bytes for the common code, and uses 2 bytes of storage (globally) and 4 bytes of stack per call. Using common code will save you 18 bytes per stub, but you'll still need a stub for each prototype because of that.

 

All yours now. :)

  • Like 3
Link to comment
Share on other sites

Well... generally assembler syntax is consistent only within a single CPU (sometimes not even then, but, the 9900 syntax is consistent except for the GCC assembler, but the order is the same there. :) )

 

So yeah, *r10 means to dereference R10 and store at the address pointed to. @>2(R10) means to take the address in R10, and add the fixed value 2 (it's generally used in the opposite way, to indexing arrays. IE: @ARRAY(R10) ).

 

I guess that's not terribly consistent, * in one case and () in others. Heh. :)

 

But... you know assembly pretty well, I thought, all your demos? Just haven't worked with the indirection much?

Well, you know, one can obtain nice effects just moving data around, but yeah, I use indirection with assembly and have and will with other languages 'n stuff. ;)

 

It's just that, it struck me as "not consistent", the * and (), and since I'm only an occasional assembly programmer, I thought that one could maybe get away with sticking to only one of the formats. Never got that from the editor/assembler manual. No big issue, but I thought I'd ask. And thanks for the reply.

 

;)

  • Like 1
Link to comment
Share on other sites

I think I see what you're trying to do... so even though I'm not allowed to help, I'll post this before I leave the thread. ;) The compiler isn't going to set up the call to your assembly stubs correctly unless the prototype matches the called function, which means you need the definitions anyway.

...

Please, show me where I said that?

 

Link to comment
Share on other sites

Or you could do this.

;Function stub routines
Function1:
 ;push original return address to the C stack (we reserve all 6 bytes now)
 MOV R11,@TEMP
 BL  @TrampolineMain
 DATA Func1MemPage,Function1Address
...
Function2:
 ;push original return address to the C stack
 MOV R11,@TEMP
 BL  @TrampolineMain
 DATA Func2MemPage,Function2Address
...
;Trampoline function call code
TrampolineMain:
 AI    R10,-4
 MOV  @TEMP,@2(R10)  ;if this is legal

 ;push current memory page to stack
 MOV  @MEMPAGE,*R10    ;save the last memory page
 ;load and set memory page of function from table based on value in R11 which points to the data
 MOV  *R11+,R0   ; copy the page value into mempage (R0 is free to use)
 MOV   R0,@MEMPAGE   ; save it off
 CLR   *R0           ; do the bankswitch
 MOV   *R11,R11      ; get the function address
 BL  *R11     ; call the function
 
 ;Trampoline return code
 ;pop previous memory page from stack
 MOV *R10+,R0        ; gets the old mempage (and increments the stack pointer)
 MOV R0,@MEMPAGE     ; save it
 CLR *R0             ; trigger the bank switch
 MOV *R10+,R11       ; get the return address and fix the stack pointer
 ;return via address from stack
 B  *R11       ; return to caller
  • Like 1
Link to comment
Share on other sites

Thanks guys! It's been a learning experience and i might be able to muddle my way through an assembly language program now.
Even if nobody ends up using the assembly version of this.

I'm not sure if that's any smaller than Tursi's version. It's a pretty minor change.
At most it saves 2 bytes. The data is 4, the MOV is probably 4, and the BL is probably 4.
If anyone got to the point where they did have 100 calls it would save a whopping 200 bytes over his code.
Clock cycle wise it's gotta be worse since it's basically adding an additional instruction.
The real savings was sharing the common code and Tursi nailed what I was going for but with the compiler stack.

Using the BL to point to the data was brilliant btw.
I moved the data but it never dawned on me the BL basically loaded a pointer to it.
Sadly, I can't use that on any other CPU I program in assembly because they are all stack oriented.
I guess I could use it to pass a parameter on the stack quickly.

Edited by JamesD
  • Like 1
Link to comment
Share on other sites

Huzzah, I finally got my setup where I want it and have my first bank switching cart. (*edit* it doesn't do much, only displays a number of messages on the screen telling you from which banks the code is running. Nothing exciting!).

 

post-33891-0-32704600-1439284054_thumb.png

 

carttest.rpk

 

First, I gotta say, linker script syntax is a pain in the ass... horribly inconsistent within the same section! Anyway, in my first post I mused about what a good linker script might look like and I came up with something that I figured looked fairly elegant. No dice though, that was full of assumptions that don't work. So, for posterity (and those looking to do this themselves), here's what it looks like now:

/* cart.ld : Linker script to create TI99/4A cartridges */

/* Output straight to a flat binary format (i.e. not ELF) */
OUTPUT_FORMAT(binary)
OUTPUT(cartridge.bin)

/* TI memory layout */
MEMORY
{
	cart_rom (rx) : ORIGIN = 0x6000, LENGTH = 0x2000 /* cartridge ROM, read-only */
	lower_exp (wx) : ORIGIN = 0x2080, LENGTH = 0x1f80 /* 8k - 128 bytes       */
	higher_exp (wx) : ORIGIN = 0xa000, LENGTH = 0x6000
	scratchpad (wx) : ORIGIN = 0x8320, LENGTH = 0x00e0 /* 32b is for workspace */
}

/* Where we put sections */
SECTIONS
{
	. = 0x6000;
	.header 	: { bank0/cart_header.o(.text) } >cart_rom											/* Bank 0: Cart ROM header */
	_persistent_src = 0x601a;
	.persistent	: AT ( _persistent_src ) { _persistent = . ; persistent/*.o(.text); _persistent_end = . ;} >lower_exp	/* Bank 0: Code that never can get bankswitched out */
	.bank0 (LOADADDR(.persistent) + SIZEOF( .persistent )) : { _text = . ; bank0/*.o(.text); _text_end = . ;}	/* Bank 0: code */
	.bank1 0x6000 : AT ( 0x8000 ) { bank1/*.o(.text); }														/* Bank 1: code */
	.bank2 0x6000 : AT ( 0xa000 ) { bank2/*.o(.text); }														/* Bank 2: code */
	.data  0xa000 : AT ( 0xc000 ) { _data = . ; persistent/*.o( .data ) bank0/*.o( .data ) bank1/*.o( .data ) bank2/*.o( .data ); _data_end = . ;}							/* Bank 3: data */
	.bss (_data_end) : { _bss = . ; persistent/*.o( .bss ) bank0/*.o( .bss ) bank1/*.o( .bss ) bank2/*.o( .bss ) ; _bss_end = . ;}
	.fill  0xdfff : AT ( 0xdfff) { BYTE(0x00); }
}

/* Ensure banks don't call each other's functions directly */
NOCROSSREFS( .bank0 .bank1. .bank2 .bank3)

The above creates a 32kb cart, organized as such:

  • Bank 0 contains: cart header, startup code (crt0), a copy of all persistent (non banking) code, and finally normal (bankable) code
  • Bank 1 contains: normal banking code
  • Bank 2 contains: normal banking code
  • Bank 3 contains: initialization data (to be moved to higher memory expansion)

The startup code in crt0 copies code flagged as "persistent" in bank 0 to the lower memory expansion area, and copies the data initialization values from bank 3 to higher memory. I'll probably try to clean up the script a bit further, and make it more flexible in terms of number of banks.

 

Note that we no longer need the elf2cart utility with this script, which I think is pretty cool!

 

In my first post, I mentioned using attributes to put functions in specific banks. That works, but is very cumbersome because you need to put each individual symbol in a specific bank. So assigning a function to a bank is not enough, you need to do the same for any constants that you use in that function as well. So I changed tracks and opted to create a directory for each bank, and let the linker script use that to decide where to put a piece of code in the binary. That leaves cleaner code (no more attributes), and makes it easier to move code from one bank to another.

 

I'm thinking of writing a script that generates the trampoline functions automatically, but the parsing looks daunting and doing it manually works well enough for now.

 

This was fun, I've learned a lot about gcc and the role of the crt0 file! Now back to Alex Kidd. Porting the existing code to run from a cart shouldn't be too hard, and I'll be happy to get rid of the disk loading code.

 

Let me know if anyone is interested in the code for the cart above, it's a good starting point for a gcc-based cart development skeleton.

Edited by TheMole
  • Like 5
Link to comment
Share on other sites

 

Or you could do this.

 

;Function stub routines
Function1:
 ;push original return address to the C stack (we reserve all 6 bytes now)
 MOV R11,@TEMP
 BL  @TrampolineMain
 DATA Func1MemPage,Function1Address
...

;Trampoline function call code
TrampolineMain:
 AI    R10,-4
 MOV  @TEMP,@2(R10)  ;if this is legal
...
;load and set memory page of function from table based on value in R11 which points to the data
MOV *R11+,R0 ; copy the page value into mempage (R0 is free to use)
...

I was just reading through the code again and realized that R0 is free to use.

That means the lines with @TEMP can be changed to R0. That might save another 2 bytes in the stub routines but I'm not sure. If so, it's 10 bytes per stub.

I would hope it saves clock cycles as well.

 

Function1:
 ;push original return address to the C stack (we reserve all 6 bytes now)
 MOV R11,R0
 BL  @TrampolineMain
 DATA Func1MemPage,Function1Address
...

;Trampoline function call code
TrampolineMain:
 AI    R10,-4
 MOV  R0,@2(R10)  ;save the return address
...

Edited by JamesD
Link to comment
Share on other sites

  • 1 year later...
  • 2 weeks later...

I must be missing something because I can't seem to get this to compile right. It creates the bin and rpk files, but the header appears to be off somehow. Classic99, Mame, and real hardware won't add the cart to the program list. I've combed over it a couple of times and I can't seem to find the spanner in the works.

Link to comment
Share on other sites

  • 2 weeks later...

<nudge> I installed Linux and set up the GCC development environment from scratch, just to make sure I was as up to date as possible. Still no joy. Out of desperation, I downloaded the above zip again and tried to open the pre-compiled binary and rpk. Mess does not like them - Invalid Image. I compare the headers to other bin files and they don't look quite the same. I'm not sure exactly what the problem is.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...