-
Content Count
855 -
Joined
-
Last visited
Posts posted by TheMole
-
-
Huzzah, I finally got my setup where I want it and have my first bank switching cart. (*edit* it doesn't do much, only displays a number of messages on the screen telling you from which banks the code is running. Nothing exciting!).
First, I gotta say, linker script syntax is a pain in the ass... horribly inconsistent within the same section! Anyway, in my first post I mused about what a good linker script might look like and I came up with something that I figured looked fairly elegant. No dice though, that was full of assumptions that don't work. So, for posterity (and those looking to do this themselves), here's what it looks like now:
/* cart.ld : Linker script to create TI99/4A cartridges */ /* Output straight to a flat binary format (i.e. not ELF) */ OUTPUT_FORMAT(binary) OUTPUT(cartridge.bin) /* TI memory layout */ MEMORY { cart_rom (rx) : ORIGIN = 0x6000, LENGTH = 0x2000 /* cartridge ROM, read-only */ lower_exp (wx) : ORIGIN = 0x2080, LENGTH = 0x1f80 /* 8k - 128 bytes */ higher_exp (wx) : ORIGIN = 0xa000, LENGTH = 0x6000 scratchpad (wx) : ORIGIN = 0x8320, LENGTH = 0x00e0 /* 32b is for workspace */ } /* Where we put sections */ SECTIONS { . = 0x6000; .header : { bank0/cart_header.o(.text) } >cart_rom /* Bank 0: Cart ROM header */ _persistent_src = 0x601a; .persistent : AT ( _persistent_src ) { _persistent = . ; persistent/*.o(.text); _persistent_end = . ;} >lower_exp /* Bank 0: Code that never can get bankswitched out */ .bank0 (LOADADDR(.persistent) + SIZEOF( .persistent )) : { _text = . ; bank0/*.o(.text); _text_end = . ;} /* Bank 0: code */ .bank1 0x6000 : AT ( 0x8000 ) { bank1/*.o(.text); } /* Bank 1: code */ .bank2 0x6000 : AT ( 0xa000 ) { bank2/*.o(.text); } /* Bank 2: code */ .data 0xa000 : AT ( 0xc000 ) { _data = . ; persistent/*.o( .data ) bank0/*.o( .data ) bank1/*.o( .data ) bank2/*.o( .data ); _data_end = . ;} /* Bank 3: data */ .bss (_data_end) : { _bss = . ; persistent/*.o( .bss ) bank0/*.o( .bss ) bank1/*.o( .bss ) bank2/*.o( .bss ) ; _bss_end = . ;} .fill 0xdfff : AT ( 0xdfff) { BYTE(0x00); } } /* Ensure banks don't call each other's functions directly */ NOCROSSREFS( .bank0 .bank1. .bank2 .bank3)The above creates a 32kb cart, organized as such:
- Bank 0 contains: cart header, startup code (crt0), a copy of all persistent (non banking) code, and finally normal (bankable) code
- Bank 1 contains: normal banking code
- Bank 2 contains: normal banking code
- Bank 3 contains: initialization data (to be moved to higher memory expansion)
The startup code in crt0 copies code flagged as "persistent" in bank 0 to the lower memory expansion area, and copies the data initialization values from bank 3 to higher memory. I'll probably try to clean up the script a bit further, and make it more flexible in terms of number of banks.
Note that we no longer need the elf2cart utility with this script, which I think is pretty cool!
In my first post, I mentioned using attributes to put functions in specific banks. That works, but is very cumbersome because you need to put each individual symbol in a specific bank. So assigning a function to a bank is not enough, you need to do the same for any constants that you use in that function as well. So I changed tracks and opted to create a directory for each bank, and let the linker script use that to decide where to put a piece of code in the binary. That leaves cleaner code (no more attributes), and makes it easier to move code from one bank to another.
I'm thinking of writing a script that generates the trampoline functions automatically, but the parsing looks daunting and doing it manually works well enough for now.
This was fun, I've learned a lot about gcc and the role of the crt0 file! Now back to Alex Kidd. Porting the existing code to run from a cart shouldn't be too hard, and I'll be happy to get rid of the disk loading code.
Let me know if anyone is interested in the code for the cart above, it's a good starting point for a gcc-based cart development skeleton.
-
5
-
As a test, I wrote a simple trampoline function and verified the assembly -- you'll have a hard time doing much better, I think.
*targetBank; // force a memory read to switch, but we don't need the result. The volatile makes it work. *old; // force a memory read to switch back
movb *r2, r2 * read targetBank, which performs the bank switch
A nice thing about this function is it's completely position independent. You could build it without any special consideration, and manually copy it from ROM to RAM or scratchpad (it's only 32 bytes) for actual execution. It'd be hard to do too much better - even without the GCC considerations, as a generic trampoline function I don't know if I'd change anything there.
Wait, you read the address corresponding to the desired bank? I thought you were supposed to write to it to trigger a bank switch, how would reading work given that the first bytes of the cart (0x6000 and beyond) are read by the system to populate the cart menu?
-
1
-
-
As a test, I wrote a simple trampoline function and verified the assembly -- you'll have a hard time doing much better, I think.
void trampoline(void (*target)(), volatile char *targetBank) { volatile char *old = CurrentBank; // save the current bank CurrentBank = targetBank; // update the cache variable *targetBank; // force a memory read to switch, but we don't need the result. The volatile makes it work. target(); // call the target function CurrentBank = old; // update the cache variable *old; // force a memory read to switch back }This function takes the address of a function to call, and a pointer to the bank switch address (ie: 0x6000 for bank 0, 0x6002 for 1, etc). It expects that somewhere you have a global to cache the 'current' bank, defined as "volatile char *CurrentBank". Volatile is important to prevent optimizing out or re-ordering accesses to it.
The generated assembly code for this function looks like this (using -O2, no optimizations produced broken code).
def trampoline trampoline ai r10, >FFFC * update the stack pointer mov r11, *r10 * save return address on stack mov r9, @>2(r10) * save frame pointer on stack mov @CurrentBank, r9 * save current value of CurrentBank ('old') mov r2, @CurrentBank * save 'targetBank' into CurrentBank movb *r2, r2 * read targetBank, which performs the bank switch bl *r1 * call target function mov r9, @CurrentBank * restore saved value from 'old' (note: saved in register!) movb *r9, r1 * perform the memory read, which switches the bank back mov *r10+, r11 * restore return address from stack mov *r10+, r9 * restore frame pointer from stack b *r11 * return to callerA nice thing about this function is it's completely position independent. You could build it without any special consideration, and manually copy it from ROM to RAM or scratchpad (it's only 32 bytes) for actual execution. It'd be hard to do too much better - even without the GCC considerations, as a generic trampoline function I don't know if I'd change anything there.
That's what I did for my first test, but I don't see an obvious way to make this work with functions that take arguments (except writing a specific trampoline function for each)?
-
Ok but where is the bank switch and switch back so you can call between banks?
If you bank switch there, you'll be in a different bank before your call.
Just to clarify, I was working from your example code. That macro doesn't do the bank switching itself, that happens in the trampoline function. The macro only replace the stub functions in your example, not the actual trampoline code.
If you are calling the trampoline code directly, it has to know what to set and call.
That's what the "FunctionJump+2 = &_far_somefunction;" does, it overwrites the >0000 part in the JMP instruction in your code around line 26 with the address of _far_somefunction. The page can still be looked up in a lookup table, or you can simply follow the same approach as for the function pointer, with the exception that the page address needs to be hardcoded.
-
Unix when converted from Assembly to C took up so much more space that it took a cut down version call Linux to fit on a standard PC.
Later with the marked increase in memory on Desktop PC Unix would fit, but with a huge cut down of Libraries.
This is not true. AT&T put out UNIX for PCs in 1985, MINIX was release in 1987. Linux was first released on October 5th, 1991 because Linus wanted to have a free Unix-like OS for day-to-day use and MINIX's licensing conditions limited it to educational use. Nothing to do with memory consumption.
-
Attached is my cf7+ image. This comes from a 2GB CF card with 6 images installed (and image 6 mounted as DSK1, most likely images 2 and 3 as DSK2 and DSK3 respectively).
-
I make no promises on syntax but this is the general idea using a stack to track mem pages and return addresses.... and it's bigger than it probably needs to be.
I'm sure an experienced 9900 programmer can improve on it
Thanks! I'm definitely no 9900 assembler expert either, but that looks ok to me. I'm thinking we can skip the explicit stub routine definition though. I can create a macro in C that writes the address of the function I want to call to a well known location in memory (@FunctionJump+2, in your code) and call TrampolineMain directly. Like so:
extern void* FunctionJump; #define somefunction(somearg) FunctionJump+2 = &_far_somefunction; _retval = TrampolineMain(somearg); _retval;
The only real benefit is that this consumes memory where the caller resides instead of in non-bankable memory (which is much scarcer). But of course, it will add said amount of code each time the function is called instead of just once. Hmmmm, your version is probably better in most circumstances.
*edit* another benefit of the macro is that we won't need an explicit lookup table for the function address. The function's bank can be added in pretty much the same way as well.
-
Look up "gcc name mangling". It's actually pretty cryptic from what I just read.
That's for C++, not C. I was reading up on this, and in C symbols are apparently always unmangled (hence the need for the 'extern "C"' stuff when mixing the two languages. This tells the compiler that it can't mangle the C symbols).
-
Thanks for the help James, a lot of what you're saying makes sense.
The assembly doesn't care about the parameters, it's just redirecting the function call and doesn't need to know parameters or return values.
The function you are calling gets the stack and registers exactly as if it had been called directly and the return goes directly back to the caller.Actually, I was thinking this because I do want to be able to jump from one page to the other and then return again. That's why we're using a trampoline function to begin with, otherwise you could just do the bank switch right before and right after the function call.
The compiler doesn't know the assembly doesn't know anything about the parameters or return values, it just assumes and generates the proper code.
The linker *should* do the same. All it does is insert the address of the stub routine into the code.Yes, you're right. I can create a pure assembly version of the trampoline code by including a pure assembler file and not doing inline assembly. I hadn't thought of that. All I need to take care off is that the assembly code doesn't mess with the stack or registers. However, since I will need the callee to return to the trampoline function (and not the original caller), I will need to cache and override the return address in r11 (this is where gcc stores the return address for BL's). That's not a big deal though.
So you'll basically have an assembly file with all the stub routines and defined them in your C code as external.
Yup, that's the part I hadn't thought of, but you're absolutely right.
Your assembly stub routine names will have to match the C naming conventions for the function in order to link to it.
I think that's easy, looking at my symbol table it seems like gcc just uses the actual function name, no decorations or anything.
-
After some thought, I *think* the assembly stub just switches the page and then jumps to the function.
The return goes directly back to the caller without switching pages back.
So no extra stack handling
Since pages aren't restored that doesn't support calls across pages.
For that you need a stub with additional stack use like you are doing
Unfortunately, I think there's no way to integrate assembly code in such a way that it can interact with C functions without incurring the C function call overhead. The stack manipulations are done by the callee, not by the caller. So in theory you could do what you're suggesting to bank in pure assembler code (if you're careful not to mess with any of the registers that the compiler is expecting to be in a certain state), but never to bank in real C functions (which is my main goal).
-
What I ended up doing instead of a bank stack was having a single variable that tracked the current bank index (which I called 'bank' for lack of creativity). Then any time I switched banks and needed to switch back, I just stored the current value in a local variable. The compiler then could decide whether that local variable was a register, on the stack, or whatever it wanted.
Good point, gcc will push every instance of a local variable on its stack already when calling a new function, no need for me to duplicate this behavior and keep my own stack for tracking banks!
Also good to know that you've come up with basically the same solution, that probably means I'm not missing anything obvious (or obviously better).
-
I assume it's not possible to copy all your code from ROM into the 32K RAM and otherwise use the ROM for fetching static data only?
It is, but the static data is only 1.7k in total right now (since the graphics and maps are loaded from disk, currently) so it wouldn't free up all that much space. Having code in cart space in and by itself will be a big help even without banking (8k extra for code and constants), but I also just want to have this figured out for the future.
-
If you implement your intermediate calls in C, will it generate additional stack handling?
Yes, it will. Unfortunately, I see no way of getting around this without modifying the compiler itself. We'd need either naked functions (functions without prologue/epilogue, and this without stack manipulation), or direct support for bank switching in the compiler itself, I think.
-
In trying to come up with a strategy to win back memory for Alex Kidd, I was thinking about stuffing some code in a cartridge, so I can win back some of that 32kb expansion memory. Given that I'm currently already at nearly 16k of executable code (including constants), and that I still need to add a good number of features, I need to find a way to create bank switching software with gcc. What follows is a write-up of my ideas, not everything has been tested, and I'm looking for a sanity check: will this work, am I missing something that could simplify things?
1. Multiple pieces of code at the same location
The first thing we need to do when hacking support for banked memory (such as bank switched cartridges) in gcc, is to tell the compiler that specific pieces of code will run from the same physical address space. In the case of a program designed to run from cartridge, this would be 0x6000.
By default, gcc will put all executable code into a section called .text, and you can tell the linker to position this code at any location in memory by using command line options (--section-start .text=0x6000), or by creating a bespoke linker script and adding a properly configured SECTIONS section:
SECTIONS
{
. = 0x6000;
.text : { *( .text ) }
. = 0xa000;
.data : { *( .data ) }
.bss : { *( .bss ) }
}
(Note: the above example requires a system with 32k memory expansion installed, since it puts all variables in expanded memory. It also requires a crt0 implementation that copies the initialization values for variables in the .data segment from somewhere in ROM or from disk to 0xa000)
Since all code is in the .text segment by default, the linker will just start filling up memory with code from 0x6000 onwards, blasting past 0x7fff if the code segment happens to be larger than 8k and in the process creating a useless image for our purposes. At the very least, we can define our memory layout in the linker script to get a warning when one of our blocks exceeds the maximum size. We can do this by adding a MEMORY section to the linker script (there's no command line equivalent of this), and changing the SECTIONS section accordingly:
MEMORY
{
cart_rom (rx) : origin=0x6000, length=0x2000; /* cartridge ROM, read-only */
lower_exp (wx) : origin=0x2080, length=0x1F80; /* 8k - 128 bytes */
higher_exp (wx) : origin=0xa000, length=0x6000;
scratchpad (wx) : origin=0x8320, length=0x00e0; /* 32b is for workspace */
}
SECTIONS
{
. = >cart_rom;
.text : { *( .text ) }
. = >higher_exp;
.data : { *( .data ) }
.bss : { *( .bss ) }
}
Now, whenever the .text section exceeds 8k, the linker will throw an error and abort. At least we'll know our program is too big to fit in the 8k, but it would be even better if we could stuff more code in other parts of memory. Unfortunately, ld will not do this for us, and we'll need to explicitly assign code to different sections in our source files by adding attributes to the function definitions. Supposing we already have filled our 8k of cartridge ROM, we could for instance decide to put additional functions in the lower 8k of the 32k memory expansion. First we add the section attribute to each function we want to put in the lower memory expansion area:
void somefunction(int somearg) __attribute__ ((section ( .moretext )));
void somefunction(int somearg)
{
// some code
}
We now have code that will get put in the .moretext section, so we need to tell the linker where to put this code (assuming the same MEMORY section as in the example above):
SECTIONS
{
. = >cart_rom;
.text : { *( .text ) }
. = >lower_exp;
.moretext : { *( .moretext ) }
. = >higher_exp;
.data : { *( .data ) }
.bss : { *( .bss ) }
}
(Note: again we need to remember that the cart will need to load the contents of section .moretext from somewhere in ROM or from disk and copy it to the lower memory expansion at 0x2080)
In theory, we could automate the annotation of functions by doing two compilation passes: one with all code in the standard .text segment to discover the size of each compiled symbol, and one that uses that info to assign individual functions to the two available sections. In practice, I imagine this is doable enough by hand for most programs. Also, on our platform gcc doesn't seem to support calculating the size of individual compiled symbols, so by hand it is.
So now we are able to put code into two different physical locations in the TI's memory, but that still doesn't allow for bank switching. As we said at the very beginning, for that we need to tell the linker that two or more sections of code need to target the same memory area. Turns out that we can do this with the OVERLAY command:
SECTIONS
{
OVERLAY >cart_rom : AT 0x0000
{
.text { *( .text ) }
.moretext { *( .moretext ) AT ALIGN(0x2000)}
}
OVERLAY >higher_exp : AT ALIGN(0x2000)
{
.data : { *( .data ) }
}
.bss : { *( .bss ) }
}
Running the linker with a script with the above SECTIONS section will give us a binary that contains three 8k banks: .text, .moretext and .data (we ignore .bss, because those are just zero-initialized variables and are taken care of by our crt0 implementation). The code in the first two banks will expect to run at 0x6000, and expects to find the initialized data from the .data section at 0xa000. Given all this, we should be able to generate binaries in the right format to support bank switching.
2. Actually switching banks in code
That was the easy part, after all, it didn't require any coding
. However, the trickiest part to bank switching is to write code that can cope with switching from one bank to another (and have that new code return). There are a couple of ways to do this (some more cumbersome than others), but they will all share a common requirement: you need to keep a "bank switching stack" (for lack of a better term). That is to say, when code in bank 1 calls a function in bank 2, we need to save the return bank "location" (i.e. what enables "bank 1") somewhere. If that function in bank 2 then in turn calls a function in bank 3, we need to do the same thing without overwriting the first return bank location. This is a recursive problem, so we need a stack.
The idea location for the bank switching stack seems to be in scratchpad, since it will be relatively small and that part of memory is always available. By putting the pointers to this stack in a separate section, we can use the linker script to put it there (or wherever else is convenient). The management of the stack needs to be done right before calling a function in another bank, and right before returning to the calling bank at the end of a function.
On a select number of platforms, GCC supports so-called 'far' and 'near' pointers and/or function attributes, which could be used to implement two different function prologues/epilogues depending on the type of function call that needs to be done. Unfortunately, the tms9900 platform implementation does not support these attributes.
GCC also has support for instrumenting each function call and return via the -finstrument-functions command line option. You need to implement your prologue and epilogue code in the following two functions somewhere in your code:
void __cyg_profile_func_enter (void *, void *) __attribute__((no_instrument_function));
void __cyg_profile_func_exit (void *, void *) __attribute__((no_instrument_function));
However, the call to and return from __cyg_profile_func_enter happens /before/ the call to the actual function, so it would take some serious wrestling with the C call stack to transparently implement bank switching in these functions.
Our last option is to instrument individual functions and function calls. This is certainly the most cumbersome implementation of all, but it is the only one which does not need embedded support in the compiler implementation itself. Instrumentation of the function call is relatively easy, keeping in mind that all manipulation of the bank switching stack needs to be done from within the calling bank and the absolute last command needs to be the one that triggers the switch to the next bank. The following process could be a usable implementation:
The caller (code runs in bank 1):
- Writes the address and bank location of the intended callee in two registers (e.g. r0 and r1)
- Invokes the trampoline
The trampoline (code runs in scratchpad/expmem):
- Saves the current bank on the bank switching stack
- Loads the new bank
- Makes the call using the info in (e.g.) r0 and r1
The callee (code runs in bank 2):
- Does stuff
- Returns to the trampoline
The trampoline (code runs in scratchpad/expmem):
- Loads the original bank (which is popped from the bank switching stack)
- Returns to the caller
Or, in other words, every function call should be structured as follows:
caller calls trampoline(), trampoline calls callee, callee returns to trampoline, trampoline returns to caller.
Using this type of construct, the trampoline function needs to transparently pass on all arguments to the callee. The easiest way to accomplish this is the have a bespoke trampoline function for each "far" function we're looking to call (with a "far" function being any function that runs from a bank switchable piece of memory). Something like the following example:
// Our "far" function, in bank 2
int far_somefunction(int someint) __attribute__ ((section ( .bank2 )));
int far_somefunction(int someint)
{
// do something
return somevalue;
}
// Our trampoline function, in non bankable memory (e.g. scratchpad)
int somefunction(int someint) __attribute__ ((section ( .nonbankable )));
int somefunction(int someint)
{
// Set to bank 2, and push caller's bank on the stack
push_bank(2);
// Call far function
retval = far_somefunction(someint);
// Set caller's bank
pop_bank();
return retval;
}
Using this, we can safely call somefunction() (our trampoline function for far_somefunction()) from anywhere in our code, no matter which bank we're currently in and no matter where the calling code resides in memory. Furthermore, we can also still call far_somefunction() directly from within the same bank if we want to avoid the overhead of the bank switching and the trampoline function.
The big downside of course is that we now have one trampoline function for every "far" function we want to call, all with nearly identical function bodies, eating at our available non-bankable memory. Not a big deal if you plan on banking code in big chunks, but problematic if you have lots of little functions that you need to call from everywhere in your program. We could opt to create one generic trampoline function, using variable argument lists and function pointers, if we're really strapped for memory. The downside is that it would create even more overhead for every "far" function call you're looking to make.
Even with bespoke trampoline functions for each far function, it's a good idea to limit the number of bank switching calls you need to do, especially if you're writing an action game that needs to retain a high frame rate, given the fairly high overhead the bank switching introduces. If the compiler had support for naked functions (functions without prologue and epilogue), we could probably reduce the overhead to an absolute minimum, similar to what you'd get with pure assembly code, but unfortunately gcc doesn't support that attribute on our target.
I think the above is a sound strategy?
-
2
- Writes the address and bank location of the intended callee in two registers (e.g. r0 and r1)
-
Excellent work, at this pace you'll have a full game in no time!
-
1
-
-
That is a very good point about the readability on CRT's, had not thought of that.
-
1
-
-
[Edit] - I just opened the parsec.rpk from ftp://ftp.whtech.com/Cartridges/rpk/ that one works (not freezing so far),
but the speech is a bit "scrambled"
Just tested this myself, and I'm getting a similar effect to this with the version of Parsec hosted on the site. Speech is mostly fine, but sometimes it garbles up/glitches. Whenever it happes, you see this in the log window:
Speech: speak external Begin talking *** Warning *** Ran out of bits on a parse (4) Speech: speak external Begin talking
Never had it hang though.
-
Is it possible to write a URL that starts Js99'er and autoloads a cartridge from another location (URL) ?

I've asked this before, it'd be a killer feature but it would open up js99er.net to cross-site scripting security issues. By default, your browser won't allow Javascipt to load binary data hosted on domains other than the one it's running from (js99er.net in this case). There's ways around that (like uuencoding your binary in text format, or letting the host retrieve and cache the file, then serve it itself), but they all introduce additional (although mostly small-ish) security risks.
-
Looks very slick! Looking forward to playing the finished version of this.
Any chance you could give the different numbers different background colors? It'd make it a bit easier to visually scan the board, and it might add some color to the overall look of the game.
Either way, love it!
-
1
-
-
Any chance we can get an rpk for MESS or js99er.net?
-
1
-
-
Any news on the patch, Insomnia?
-
Some corrections to your list, Globeron:
- js99er.net is actually written in javascript, not Java. Even the offline version will need a browser to run. There is v9t9j, which is written in Java though.
- ti99sim works on Linux, Mac and Windows.
- MESS works on Linux, Mac and Windows; (as does QMC2, by the way, but as Michael says we should split those out since QMC2 is only a frontend and a lot of people use MESS without it, myself included).
- Classic99 and v9t9 (no j), do not work on anything besides windows, so no Linux support. v9t9j, being Java based, does work on Linux and Mac.
-
1
-
DIN 5 (midi) RS232 (TI)
------------ ----------
2 1
4 3
5 7
Pretty simple cable.
I'm not sure if I have the original disks, the version I use has been on my hard drive for like forever.
I can try and make a program disk from it if you guys are really interested.
Gazoo
I'm guessing this is for Midi-out only? Is there a way to do Midi-in (and use the TI's sound chip as a synthesizer)?
-
Might cable length be the issue? My console-to-card cable is about 15 inches in length.
( straw I haven't yet grasped ).
Have you tried looking at the contents of one of those formatted disks that threw error 38 in a working system yet? Perhaps with a sector editor or some such? Maybe you can find something that points you in the right direction. Error 38 would seem to indicate some unknown error while reading, so it might be that the disk actually does get formatted but something goes wrong during the validation step (if that even exists)?

Minesweeper - game released
in TI-99/4A Development
Posted · Edited by TheMole
*nevermind; obvious question answered in your animated gif.*