GCC for the TI

TheMole · October 2, 2012

I tried ti99sim for the first time. You must also rename it "helloc.bin" and run "convert-ctg hello.bin" without the "C", then it says "1 bank of ROM at 6000, 1 bank of ROM at 7000".

If you type "convert-ctg helloc.bin", it says "GROMS: 3" and does not work.

You sir, are my new personal hero, dropping the "c" in the convert-ctg command line did the trick. Thank you very much!

lucien2 · October 5, 2012

GCC is rather large and slow.

...

LLVM is about 3x faster than GCC, and has a better code optimizer.

Sure, it's slow to compile itself (about 10 minutes with my pc).

But it compiles 22k of code (about the maximum that works with the linker as it is - the TI high memory) in 2 seconds with my 2 x 2.26Ghz PC. Only one CPU is used by the compiler with Cygwin.

The compiler uses 32,5MB on my 100GB Windows partition. That's 34/100'000 MB, 0.03%.

It takes about 20 seconds to compile 22k, manually convert the binary to TIFILES with TIDir, and type the name in Classic99 with the E/A cartridge.

About 30 seconds more to transfer it to the hardware with the CF7.

Sure it's a bit hard to debug. You have to know some assembly to step in the classic99 debugger and find that in the generated assembly.

I don't think the LLVM compiler would bring so much improvement. Maybe 0.6 seconds in the compilation time?

I think the generated code is fast enough, compared to the other compilers available for the TI. When you say LLVM has a better code optimizer, is it size or performance?

Edited October 8, 2012 by lucien2

TheMole · October 11, 2012

Does anyone know how to create multiple-ROM images from gcc? I've been using it to develop my master system VDP test app (see this post),but I'm already exceeding the 8k limit (about 6k of pattern data and 1.5k of tile information), which currently creates binaries in which the memory wraps onto itself (so, part of the .data section gets overwritten by later defined constants). I suppose I'd need to implement bank-switching code to get this to work properly but I have no idea where to start looking for info...

+Gemintronic · October 11, 2012

Does anyone know how to create multiple-ROM images from gcc? I've been using it to develop my master system VDP test app (see this post),but I'm already exceeding the 8k limit (about 6k of pattern data and 1.5k of tile information), which currently creates binaries in which the memory wraps onto itself (so, part of the .data section gets overwritten by later defined constants). I suppose I'd need to implement bank-switching code to get this to work properly but I have no idea where to start looking for info...

What about switching to z88dk?

http://www.z88dk.org/forum/

UPDATE: Oops. I thought you were talking about the SMS after the statement in bold.

Edited October 11, 2012 by theloon

TheMole · October 11, 2012

What about switching to z88dk?

http://www.z88dk.org/forum/

Well, the TI-99/4A doesn't use a Z80 or alike CPU but the TMS9900, for which the recently created port of GCC is the only viable C compiler that I know off...

lucien2 · October 11, 2012

Does anyone know how to create multiple-ROM images from gcc?

I don't know how to do it, but you can use the EA5 format for programs bigger than 8k. Here is the "elf2ea5" program, with an exemple: http://www.atariage....75#entry2343991

I wrote a small utility to split the output file to 8k files and rename them with the last character incremented: ea5split.zip

You must then convert these files to TIFILES format and put them in a disk image. I use TI99Dir for that. I don't know how to do it under linux.

TheMole · October 12, 2012

I don't know how to do it, but you can use the EA5 format for programs bigger than 8k. Here is the "elf2ea5" program, with an exemple: http://www.atariage....75#entry2343991

I wrote a small utility to split the output file to 8k files and rename them with the last character incremented: ea5split.zip

You must then convert these files to TIFILES format and put them in a disk image. I use TI99Dir for that. I don't know how to do it under linux.

Well, I can run TI99Dir perfectly fine under wine, but there's no way to automate from the command line and make it part of the Makefile, which is a bit of a shame as it does increase the number of steps before you can test your code. That said, I tried what you proposed, and strangely enough (although I indeed end up with 2 EA5 files after compiling) the problem stays the same. Is there a limit to the amount of static data you can define in a program? Assuming a system with the 32k memory expansion, I should have "plenty" of memory available (ok, that's an overstatement, but still) to load at least a full single level. Where what ends up in memory when using gcc for the TI is still a bit of a mystery to me...

lucien2 · October 12, 2012

Well, I can run TI99Dir perfectly fine under wine, but there's no way to automate from the command line and make it part of the Makefile, which is a bit of a shame as it does increase the number of steps before you can test your code.

I know that Adding the TIFILES/V9T9 conversion to "ea5split" is on my TODO list for a long time. That would be fine with classic99, but with ti99sim, it will still need the "disk image" step.

There is an utility called TICOPY with V9T9. I think it is possible to transfer a V9T9 FIAD file into a disk image with the command line.

That said, I tried what you proposed, and strangely enough (although I indeed end up with 2 EA5 files after compiling) the problem stays the same. Is there a limit to the amount of static data you can define in a program? Assuming a system with the 32k memory expansion, I should have "plenty" of memory available (ok, that's an overstatement, but still) to load at least a full single level. Where what ends up in memory when using gcc for the TI is still a bit of a mystery to me...

In EA5 format, the compiled program is loaded in the 24k high memory (>A000..>FFFF). In cartridge format, the address is >6000.

For both formats, the global variables are stored at the beginning of the 8k low memory (>2000..). The stack is at the end of the low memory (..>3FFF).

What's annoying is that initialized data is first loaded in high memory and then copied to low memory. To avoid that, I load the big data tables directly into low memory to have more space in high memory for the code (I did that in my NYOG'SOTHEP game for the 3000 bytes of the map).

If you post your EA5 files, I could try to understand the problem.

lucien2 · October 13, 2012

If your code (and the stack) is not bigger than 8k, the easiest way is to put data into high memory and code into low memory.

I just tried that, it seems to work. Just change "LDFLAGS_EA5" in the makefile to this: "--section-start .text=2080 --section-start .data=a000 -M"

EDIT: You still have to load the data yourself, since it's now loaded to >2080 and then copied to >A000...

Edited October 13, 2012 by lucien2

lucien2 · October 13, 2012

Solution 1: If your data is declared "const", it's not copied to the data section. Keep your code loaded at >A000 and you have 24K for your code and data.

Solution 2: Load the code to >2000 and load the data file yourself to >A000. Then you have ~8K for your code and 24K for your data.

Edited October 13, 2012 by lucien2

lucien2 · October 13, 2012

I know that Adding the TIFILES/V9T9 conversion to "ea5split" is on my TODO list for a long time.

Done! ea5split3.zip

There is an utility called TICOPY with V9T9. I think it is possible to transfer a V9T9 FIAD file into a disk image with the command line.

The DISK utility from ti99sim works better, it does not have the 8 characters limitation of the V9T9 utilities.

TheMole · October 13, 2012

Solution 1: If your data is declared "const", it's not copied to the data section. Keep your code loaded at >A000 and you have 24K for your code and data.

Solution 2: Load the code to >2000 and load the data file yourself to >A000. Then you have ~8K for your code and 24K for your data.

And by "load the data file yourself to >A000", you mean from a separate resource, such as a file on disk? Solution 1 is an acceptable work-around for now, but I'm trying to understand memory usage as best as I can 'cause I'm sure to hit the same problem again sooner rather than later . The biggest downside of using the master system VDP is that VRAM is used much more extensively by the VDP itself, so there's not a lot of room to use it as temporary storage. I think 8k of code will probably bring me a long wat towards a standard side-scrolling platformer, but a semi-decent level will easily fill up around 16k if you're not using compression and the likes.

Thanks for that new version of easplit! will come in handy, indeed in combination with the disk program.

I love that we have these high level tools that are starting to come available to us as time goes by. This gcc port is incredibly useful! Kudos to insomnia!

JamesD · October 23, 2012

I think the generated code is fast enough, compared to the other compilers available for the TI. When you say LLVM has a better code optimizer, is it size or performance?

Ultimately, that depends on the target CPU code generator, but as far as intermediate code generation/optimization (CPU independent optimization).

The more a CPU supports compiled languages, the more it should benefit from the optimizations.

For example, a 6809 would benefit more than a Z80, which in turn would benefit more than a 6502.

The 9900 is fairly powerful so it should benefit, but any code generation difficulties experienced with GCC would still exist.

JamesD · December 16, 2012

So has anyone started/finished a C project using GCC?

lucien2 · December 16, 2012

Sure, this game, this cellular automaton and this bitmap viewer are finished.

This game still needs a single player mode and a game save.

Here are the latest sources: GCC Projects.zip.

insomnia · January 16, 2013

Hey everybody, I've got a new set of patches for GCC, with changelog and stuff below. But first, since I've been away for a long time, I figured I should answer some of the questions people have asked so far.

Retroclouds wanted to know if http://ultra-embedded.com/?fat_filelib would compile. It does, but I had to use include files from my PC since there's no standard C library yet. The resulting code may also need to be massaged for size or be split up into smaller pieces. It's about 21 KB of code, 3 KB of data.

total:
[Nr] Name			 Type		 Size(Hex) Size(Dec)
[ 1] .text			 PROGBITS	 00005462 = 21602
[ 3] .data			 PROGBITS	 00000030 = 48
[ 4] .bss			 NOBITS		 00000CCA = 3274

Of course, this is missing some other stuff:

puts

printf

strncpy

strncmp

memcpy

memset

On the subject of LLVM, I've looked into it and it looks interesting but don't plan on using it any time soon. The problem I see with it right now is that it's new and mainly oriented for Intel archetectures. This means that it's likely to have structural changes in the near future, and any TMS9900 port may be short-lived. It's also not clear how word-to-byte conversions would work, which has by far been the most challenging part of the GCC port. As has been said earlier, for the code sizes we're dealing with for the TI, any difference in compile speed would meaningless.

So, on to the GCC changes:

Libgcc now built for TMS9900

Implemented optimized assembly functions in libgcc for:

count leading zero bits

count trailing zero bits

find index of least significant set bit

count set bits

calculate parity bit

signed and unsigned 32-bit division and modulus

Fixed 32-bit multiplies, was only doing unsigned multiply

Fixed 32-bit negation, was emitting invalid NEG instruction

Removed fake PC register (Yay!)

New build instructions to make libgcc

Fixed function prologue and epilogue, was saving R11 unnecessarily

Optimized function epilogue, saving a few cycles

Enforced correct use of R11 register, was causing randomly broken code

The main two features here are the addition of libgcc and better handling of R11.

Libgcc is needed to complete coverage for 32-bit math. The other functions are frequently used by third party libraries. Later, this will be the proper place to add support for floating-point operations.

The handling of R11 deserves some discussion here. There were three places where we determined which registers to save on the stack, and they did not all use the same algorithm. This resulted in sometimes saving R11 when it wasn't necessary, or not accounting for that space when values are stored on the stack. This results in corrupt registers and likley crashes.

Another problem I found (not my fault this time) was that if a function can take advantage of peephole optimization patterns, any register allocated to fulfill that pattern will not be marked as used in the GCC internal tables. In the case where R11 is selected to be used for data, and this condition hits, my port did not know it needed to save the return pointer. At that point you've got another crash when the function tries to return to the caller using a corrupted R11.

All these problems can pop up based on seemingly insignificant code changes. It all depends on how that code gets implemented, and which registers get allocated along the way. It's also maddening to try to debug. This was fixed by designating R11 for return pointer use only. We lose a general-purpose register, which I'm not crazy about, but at least the generated code will run. If I can figure out what's going on with the peepholes, I should be able to recover R11 for general use again.

Another biggish change is the build instructions. This was needed for libgcc, and apparently I've been doing it wrong all along. The GCC developers use these instructions, so that's what I'll use too.

Unpack and patch as before, and from the top level of the source tree do this:

$ mkdir build

$ cd build

$ ../configure --prefix <path_to_installation_dir> --target=tms9900 --enable-languages=c

$ make all-gcc all-target-libgcc

$ make install

When compiling, GCC will know where to find libgcc and you don't need to do anything to link against it.

If anyone's interested, the details for all this plus the sad, depressing and ultimately futile search for a standard C library are on my blog.

As always, please post any problems or questions you may have. I'll actually be around to answer them this time.

gcc-4.4.0-tms9900-1.7-patch.tar.gz

Tursi · February 3, 2013

Just wanted to note that I downloaded and built with the latest patches here, and I've been working with it most of the night. So far it has worked well with everything I've coded, I haven't done any workarounds or the like. Nothing too extreme, but happily using c99 system (not TI C99, but the c99 standard) with it and manipulating bitmap graphics. So far so good!

Tursi · February 12, 2013

Having got a little more time and a lot more code out, I'm very happy with the stability of GCC and the quality of the code it puts out. Well done! So far the only weird bug I've run into was my own fault in assembly code that I included.

insomnia · February 13, 2013

Glad to hear that. If ever you do find a problem or see an opportunity for improvement, don't hesitate to let me know. I love bug reports!

matthew180 · February 13, 2013

Is there any way to force code generation to specific memory locations? For example, if you were developing a cartridge game you need the code to generate at >6000 to >7FFF and to have "segments" of code that could be paged in and out of that memory space.

On a similar note, how does GCC decide where to start generating code? Does it only generate relocatable code?

insomnia · February 13, 2013

This is actually a bit involved.

Just for some quick background, there are several tools used to compile a binary file: The compiler (GCC), the assembler(GAS), the linker (LD), and a tool to convert the result to a format the TI can use (for now that's just ELF2EA5 or ELF2CART).

The linker is responsible for defining the address for the code and can use a configuration file to do this. This allows you can assign locations for everything in the output file. There are a lot of options available, and GNU has a pretty good manual at http://ftp.gnu.org/o...ode/ld_toc.html

There is a "hello world" example for carts back in post #64 of this thread. It doesn't use a link file and instead uses command-line arguments to define locations in the output file. Here's a link file which is equivalent to that example, let's call it linkfile.ld:

SECTIONS{
. = 0x6000;
.text : {*(.text);}

. = 0x2000;
.data : {*(.data);}
.bss : {*(.bss);}
}

The dot symbol is the current output location. By modifying it, the locations of the sections will be defined. In this example, all .text sections are linked assuming they will be loaded starting at location 0x6000, the .data and .bss sections are loaded starting at location 0x2000. There are a lot of ways to write the link file to accomplish the same result. Check the ld manual for more details here.

This link file would be used by GCC by using the "-T" switch, like this:

tms9900-gcc -T linkfile.ld ctr0.o main.o -o hello.elf

Now, the segments you were wondering about are commonly called overlays, and there are a lot of examples which can be found online. None of these will be exactly what you are looking for, but the GNU tools are used for everything, and there are common concepts which would probably be helpful.

Here's an example link file using three overlays:

SECTIONS
{
. = 0x2000;
.data : { *(.data); }
.bss : { *(.bss); }

OVERLAY 0x6000 : AT (0x0000)
{
.otext1 { overlay1.o(.text); }
.otext2 { overlay2.o(.text); }
.otext3 { overlay3.o(.text); }
}
}

This will cause all .text sections of the overlay files to be at address 0x6000, and they will appear sequentially in the output file. The .data and .bss sections will start at address 0x2000. Here are the sections defined in the output file:

eric@compaq:~/dev/tios/src/test_overlay$ tms9900-readelf -S hello.elf
There are 11 section headers, starting at offset 0x650:

Section Headers:
 [Nr] Name			  Type		    Addr	 Off    Size   ES Flg Lk Inf Al
 [ 0]				   NULL		    00000000 000000 000000 00	  0   0  0
 [ 1] .bss			  NOBITS		  00002000 000800 000006 00  WA  0   0  1
 [ 2] .otext1		   PROGBITS	    00006000 000200 000006 00  AX  0   0  2
 [ 3] .rela.otext1	  RELA		    00000000 0008ec 000000 0c	  9   2  4
 [ 4] .otext2		   PROGBITS	    00006000 000400 000006 00  AX  0   0  2
 [ 5] .rela.otext2	  RELA		    00000000 0008ec 000000 0c	  9   4  4
 [ 6] .otext3		   PROGBITS	    00006000 000600 000006 00  AX  0   0  2
 [ 7] .rela.otext3	  RELA		    00000000 0008ec 000000 0c	  9   6  4
 [ 8] .shstrtab		 STRTAB		  00000000 000606 000047 00	  0   0  1
 [ 9] .symtab		   SYMTAB		  00000000 000808 0000b0 10	 10   5  4
 [10] .strtab		   STRTAB		  00000000 0008b8 000034 00	  0   0  1
Key to Flags:
 W (write), A (alloc), X (execute), M (merge), S (strings)
 I (info), L (link order), G (group), x (unknown)
 O (extra OS processing required) o (OS specific), p (processor specific)

An additional tool will then be required to split up the output file into pieces suitable for Classic99 or MESS or actual hardware or whatever. Unfortunately, no one has written a tool to do that yet.

Finally, GCC can produce position independent code (by using the "-fpic" switch), but that feature is currently broken for the TMS9900. If it were to work, the code would be compiled so it could be loaded to and run from any address. This probably not what you want.

matthew180 · February 13, 2013

Thanks for the info! So without specifying specific locations for the sections, where do they default? You mention the -fpic flag for position independent code, but I would think that would be the default, no? I guess I'm thinking in terms of the default for assembly code where the object file is fixed-up by the loader. Now that I look at it, the tool ELF2EA5 pretty much answers that question (the code is linked to run at a specific location by default).

An additional tool will then be required to split up the output file into pieces suitable for Classic99 or MESS or actual hardware or whatever. Unfortunately, no one has written a tool to do that yet.

How about compiling each overlay independently? Would that work?

insomnia · February 13, 2013

Compiling each overlay independantly would work, but there is the added work of making sure that any RAM used is located at the same address for each independant overlay section. This is probably not too bad if all overlays use the same memory structures. If all global and static variables are defined in a common header file shared by each of the overlay files, you should be fine. If the overlays are written in a more usual style, with independant needs for global variables, it will be difficult to manually locate each of those variables in RAM.

As for the default load location, I'd have to check later in the day, I'm pretty sure its something useless like >0000 or some other invalid location. Since I knew the loading would be different between cart and EA5 images, I wasn't too concerend with specifying a default address since I knew the linker would have to be supplied with one anyway.

Position independant code is neat, but has its drawbacks. By using -fpic, the compiler must use a Global Offset Table and additional code to calculate the addresses of code and data based on the current load address. This makes for larger and slower code.

Here's an example using a very simple function:

void example1()
{
example2();
}

It's assembly would look something like this:

example1
b @example2

With position independant code, we don't know where example2 is anymore and must rely on the size and reletive positions of the functions to determine where to jump. We would have to do something like this:

getpc
mov r11, r1 * Return address of calling instruction
b *r11

example1
li r1, {address of getpc}	 * This could also be provided as an argument
bl *r1
ai r1, {offset to this instruction} - {offset to example2}
b *r1

This probably won't work right, but you get the idea. Position independant code is bigger, slower and more awkward to use. Check out http://eli.thegreenp...ared-libraries/ for a discussion of how to use position independant code for shared libraries in x86 systems. It includes a more in-depth discussion of how position independant code can be implemented, but the example above gives some idea of the differences and the extra effort involved.

JamesD · February 18, 2013

Someone else porting GCC. It's for the TI990 but as far as I can tell, the code should be compatible.

http://www.cozx.com/~dpitts/ti990.html

BTW, after looking at some other TI990 info, I have to wonder if some of the decisions in the design of the TI-99/4 were based on experience with the TI990.

TheMole · February 18, 2013

Someone else porting GCC. It's for the TI990 but as far as I can tell, the code should be compatible.

http://www.cozx.com/~dpitts/ti990.html

BTW, after looking at some other TI990 info, I have to wonder if some of the decisions in the design of the TI-99/4 were based on experience with the TI990.

The most interesting part there for me is probably his libc.a implementation, which Insomnia hasn't gotten to yet if I understand it correctly...

GCC for the TI

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members