Jump to content
insomnia

GCC for the TI

Recommended Posts

The most interesting part there for me is probably his libc.a implementation, which Insomnia hasn't gotten to yet if I understand it correctly...

What? No interest in the Focal source code or in hacking COBOL to run on the 99/4A? :)

There's also a BASIC. If you could hack a machine or emulator to allow you to disable ROMs you could potentially run a BASIC that uses native 9900 instructions.

 

Funny how a CPU designed off a mini-computer processor ended up in a microcomputer and they crippled it so much you couldn't possibly tell it's origin.

Share this post


Link to post
Share on other sites

Someone else porting GCC. It's for the TI990 but as far as I can tell, the code should be compatible.

http://www.cozx.com/~dpitts/ti990.html

 

BTW, after looking at some other TI990 info, I have to wonder if some of the decisions in the design of the TI-99/4 were based on experience with the TI990.

 

Origin of the TMS9900 CPU, apparently :)

 

Share this post


Link to post
Share on other sites

This is getting way off topic, but if you're interested in the 990 you can find a book titled "990 computer Family Systems Handbook" on bookfinder.com for about $6.00. Published in 1976.

Share this post


Link to post
Share on other sites

Dave Pitts is responsible for the TMS990 GCC support. He had functional code working before I ever started my stuff.

 

The comipiler output should be OK for TMS9900, but there are differences in approach. The biggest one is probably in handlaing of in-register byte operations. I solved that by modifying the mid-level GCC code to propogate type conversions. Dave did it by converting all byte operations to memory operations. The location of the workspace is stored in R10 and offsets to registers are calculated rom there. Personally, I feel my approach is better, but there's room for argument here. I've also implemented a lot of optimizations that are lacking in the 990 compiler.

 

As for libc, I've actually done some work on that. I've got pretty much everything that doesn't involve formatted output or console or file IO working. I could release what I have, but I'm not sure how useful that would be on its own.

 

Unfortunately, there's not a lot I can pull from the 990 code.The math routines for libm could be stolen directly, but according to the comments, those are themselves stolen from Minix. All other libc functions are implemented by performing a syscall to the 990 operating system, which obviously won't work here. Although having said that, people have had good results by bridging into the GPL routines.

 

I'm actually working on IEEE floating point support right now (32-bit only, no doubles). It seemed like it would be fun to do, and there's really not too much to it. My blog hasn't been updated for a while, but I've currently got type conversion working. Addition and subtraction are in progress, but almost done. So far it's about 300 bytes for the float routines, I guess that full support will probably take about 1K when complete.

Share this post


Link to post
Share on other sites

If you're taking requests, how about support for individual code and data segments, specifically support for -fdata-sections and -ffunction-sections, so that we can take advantage of --gc-sections? :) It makes it easier to maintain a generic library and have all unused code and data automatically removed, which is valuable on a small-memory system like we have.

 

I'm not sure what's involved, I haven't gotten deep enough to dare attempt it myself yet. :)

Share this post


Link to post
Share on other sites

Always happy to take requests.

 

I didn't plan for the use of multiple sections, so the assembler, compiler and possibly the linker will need to be changed. The ELF conversion tools should be fine as they are. The biggest impact will be in the assembler. It will need to have a new keyword added to specify an arbitrarily named section. None of these should be too bad, though. Maybe a hundred lines or so total.

 

The work required to get these options to work will also make it easier to make programs using overlays, It seems pretty common to use special sections to keep all the pieces straight.

 

For now, you can get a similar result by creating a library with each function in its own file. The linker will then only include the parts that are actually used. I have a library of common functions that is maintained his way and it works pretty well. Most libc implementations do this too.

  • Like 1

Share this post


Link to post
Share on other sites

Found a compilation bug that's slowing me down. I'm writing a test application (and very glad to have C for it), but there's a small issue in one of my functions. Not sure if it's a tail recursion issue or not.

 

The following function is what gave me the problem:

 

void hexprint(unsigned char x) {
char buf[3];

buf[0]=(x>>4)+'0';
if (buf[0]>'9') buf[0]+=7;
buf[1]=(x&0x0f)+'0';
if (buf[1]>'9') buf[1]+=7;
buf[2]='\0';
puts(buf);
}

 

I verified that this was enough to reproduce the bug - compiled with this command line:

 

tms9900-gcc -c test.c -O2 -std=c99 -s --save-temp -o test.o

 

With this, you can look at test.s and right at the end, you'll see this:

 

L3
sb   @>4(r10), @>4(r10)
mov  r10, r1
inct r1
bl   @puts
ai   r10, >6
b    *r11

 

Note that between the bl @puts and the b *r11, that R11 is never loaded with the correct return address (it IS correctly saved to the stack on function entry). The result is that the code gets stuck in a loop there, since R11 still contains the return address after the BL.

 

As far as I'm aware, I'm on the latest code.

 

Share this post


Link to post
Share on other sites

My workaround for the above, in the affected function, right at the end, I added:

 

	__asm__("mov *r10,r11");

 

However, I can't predict which functions it will occur in, so it's sort of manual inspection (or maybe turn down the optimization level, haven't tried that yet).

 

Found a very odd one... this string:

 

puts("Map EEPROM to >6000\n");

 

Becomes this in the .S file:

 

LC14
text 'Map EEPROM to >6000'
byte 10
byte 0

 

But in the object file:

 

4D 61 70 20 45 45 50 52 4F 4D 20 74 6F 20 24 36 30 30 30 0A 00 -- Map EEPROM to $6000  

 

How did the '>' become '$'? :)

 

Share this post


Link to post
Share on other sites

Here's a funny one... I had a function like so:

 

unsigned char GromReadData(unsigned int address, unsigned char port) {
   // we only support 15 ports, this still fits in a char
   port <<= 2;

   // set address
   *((volatile unsigned char*)(0x9c02+port)) = address>>8;
   *((volatile unsigned char*)(0x9c02+port)) = address&0xff;

   // read data
   return *((volatile unsigned char*)(0x9800+port));
}

 

The compiler likes to inline this, which is fine with me, it often optimizes it nicely. However, I had a call like this:

 

out[0]=GromReadData(0x7000, 0);

 

and it compiled to this:

 

   A2EE  0201  li   R1,>7000               (20)
  A2F2  D801  movb R1,@>9c02              (38)
>> A2F6  7820  sb   @>9c02,@>9c02          (58)
  A2FC  DAA0  movb @>9800,@>0006(R10)    

 

The code that uses SB to zero a byte can't use this on hardware addresses, it doesn't work on the hardware (tends to work in emulation, it appears!) But the read value from a write-only port isn't well defined.

 

Naturally the compiler can't be expected to know it's hardware, but perhaps the SB approach can be suppressed for volatile targets, in favor of a slower register-based approach? (That, or maybe sacrifice a register or even a memory address in scratchpad to contain >0000 guaranteed, so you can move from it?)

 

Sorry for all the bug reports! Hope they're helpful.

Edited by Tursi

Share this post


Link to post
Share on other sites

Tursi, any chance you can provide me a zip of your compiled toolchain? I'm unfortunately stuck on a windows machine, so wouldn't know how to start compiling gcc on this thing (and it's my work machine, so don't want to spend too much time setting it up here).

Share this post


Link to post
Share on other sites

Sure, I can zip it up tonight. You will still need Cygwin installed to run it, so get that, and it includes a hack for the above mentioned "SB" bug that consumes one additional word of scratchpad memory. (I zero the word then use it as the source of all zeros ;) ).

Share this post


Link to post
Share on other sites

Sorry that I forgot about this.. here are my binaries. They are expected to be located in your Cygwin install in your home folder (so on mine it is /home/tursi -- which in Windows is C:\Cygwin\home\tursi -- your path may vary).

 

The zip is too large to upload here, so grab it from http://harmlesslion.com/temp/compiledGCCWithTursiHackForZero.zip

 

As per the instructions above, you have to change /bin/as.exe to the TMS9900 version -- rename the existing one, and then copy binutils/bin/tms9900-as.exe to /bin/ and rename it to 'as.exe'. Then it should work for you.

 

You should give downloading and building yourself a shot, if you haven't! It will help for future builds and for making your environment more like you want it.

Share this post


Link to post
Share on other sites

Thanks, much appreciated!

 

You should give downloading and building yourself a shot, if you haven't! It will help for future builds and for making your environment more like you want it.

Well, I did compile the toolchain on my Linux laptop, and will probably do so on my Mac as well. But I'm really not at home on a Windows machine and I don't want to invest in learning the platform too much either, this is just a way to test some stuff in the few minutes per day I can spend on this at work. So again, thanks for this, it's a big help.

Share this post


Link to post
Share on other sites

Ah, okay. Don't be too worried about learning Windows.. once you install Cygwin (which you'll need to run my compiled versions anyway), and start the BASH shell, you're in BASH, same as Linux, and you needn't think about the WIndows side. :)

Edited by Tursi

Share this post


Link to post
Share on other sites

Yeah - I love Cygwin. I use it a ton and hardly ever use the windows shell... Which is crap.

 

Ah, okay. Don't be too worried about learning Windows.. once you install Cygwin (which you'll need to run my compiled versions anyway), and start the BASH shell, you're in BASH, same as Linux, and you needn't think about the WIndows side. :)

Share this post


Link to post
Share on other sites

I've gone ahead and published my first stab at creating a small library. To encourage use, it's up on GitHub so people can play with it at will, even fork it and offer code back. It seems to work, although the issue of writing hard-coded zeros to hardware still exists in the compiler.. my version works around that, but I hope we'll see Insomnia again soon. :)

 

A basic start here, anyway: https://github.com/tursilion/libti99

 

Share this post


Link to post
Share on other sites

I've gone ahead and published my first stab at creating a small library. To encourage use, it's up on GitHub so people can play with it at will, even fork it and offer code back. It seems to work, although the issue of writing hard-coded zeros to hardware still exists in the compiler.. my version works around that, but I hope we'll see Insomnia again soon. :)

 

A basic start here, anyway: https://github.com/tursilion/libti99

 

Wow, impressive work! I've been working on a poor man's version of this but this is way beyond where I ended up. I can guarantee you I will be using this. One question: I see the library - when compiled - is 26k. When using gcc for the TI I noticed it doesn't strip unused functions and symbols (but maybe I'm using it wrong). My work-around was using defines in the makefile to enable/disable pieces of functionality as needed by the program. Did you run into something similar, if so how did you solve it?

Edited by TheMole

Share this post


Link to post
Share on other sites

Thanks! These are just the functions I created for my music player demo and my GROM test program, split up and lightly documented.

 

That it's not stripping unused parts for you is odd -- that was the whole reason I made it a library, and in my testing (with 'testlib.c'), it worked correctly, linking in only what was needed automatically. Maybe check my makefile and see if I did something radically different than you did?

 

Alternately, post yours? I can't investigate too far but I can at least see if I reproduce your results.

 

Share this post


Link to post
Share on other sites

hmmm.. also... note that the 26k is full ELF with debug information and symbols - that information doesn't end up in the final code. That scared me for a moment when I read that. :)

Share this post


Link to post
Share on other sites

Maybe not a gcc compiler question per se, but don't know if there's a better place to post this. For the raycaster project I'm trying to implement a fixed point square root function that used 10.6 binary fixed point numbers, but keep running against some limitations. This is what I currently have:

#ifndef _FIXEDPOINT_H_
#define _FIXEDPOINT_H_

typedef int fixedpt;
typedef long fixedptd;
typedef unsigned int fixedptu;
typedef unsigned long fixedptud;

#define FP_BITS 16
#define FP_WBITS 10
#define FP_FBITS (FP_BITS - FP_WBITS)
#define FP_FMASK (((fixedpt)1 << FP_FBITS) - 1)
// conversion to and from fixed point numbers
#define fp_rconst(R) ((fixedpt)((R) * FP_ONE + ((R) >= 0 ? 0.5 : -0.5)))
#define fp_fromint(I) ((fixedptd)(I) << FP_FBITS)
#define fp_toint(F) ((F) >> FP_FBITS)
#define fp_fracpart(A) ((fixedpt)(A) & FP_FMASK)
// Number operations that are not handled by inline functions
#define fp_abs(A) ((A) < 0 ? -(A) : (A))
// Some common constants
#define FP_ONE ((fixedpt)((fixedpt)1 << FP_FBITS))
#define FP_ONE_HALF (FP_ONE >> 1)
#define FP_TWO (FP_ONE + FP_ONE)
#define FP_PI fp_rconst(3.14159265358979323846)
#define FP_HALF_PI fp_rconst(3.14159265358979323846 / 2)

// Multiplies two fixedpt numbers, returns the result.
static inline fixedpt fp_mul(fixedpt A, fixedpt B)
{
 return (((fixedptd)A * (fixedptd)B) >> FP_FBITS);
}

// Divides two fixedpt numbers, returns the result.
static inline fixedpt fp_div(fixedpt A, fixedpt B)
{
 return (((fixedptd)A << FP_FBITS) / (fixedptd)B);
}

// Returns the square root of the given number, or -1 in case of error
static inline fixedpt fp_sqrt(fixedpt A)
{
 int invert = 0;
 int iter = FP_FBITS;
 int l, i;

 if (A < 0)
   return (-1);

 if (A == 0 || A == FP_ONE)
   return (A);

 if (A < FP_ONE && A > 6)
 {
   invert = 1;
   A = fp_div(FP_ONE, A);
 }

 if (A > FP_ONE)
 {
   int s = A;
   iter = 0;
   while (s > 0)
   {
     s >>= 2;
     iter++;
   }
 }

 // Newton's iterations
 l = (A >> 1) + 1;
 for (i = 0; i < iter; i++)
   l = (l + fp_div(A, l)) >> 1;
 if (invert)
   return (fp_div(FP_ONE, l));

 return (l);
}

 

When compiling, this (the sqrt function) seems to require __divsi3 (division for signed 32 bit integers), which is not implemented. Oddly enough, the first instance where to code needs this is in the comparison to zero (if (A == 0 || (A == FP_ONE) ), which I don't get at all. Does anyone know of a 16 bit fixed point sqrt algorithm that does not require 32 bit numbers during calculation AND has a reasonable accuray?

 

 

hmmm.. also... note that the 26k is full ELF with debug information and symbols - that information doesn't end up in the final code. That scared me for a moment when I read that. :)

 

Yeah, I know, I just quickly looked up the file size just to have an idea. I did not turn my files into a real library, I compile and link against the object files depending on the project, so that might be the difference. I don't have the code here to test against.

Edited by TheMole

Share this post


Link to post
Share on other sites

My first question is "do you really need square root?" Mathematical accuracy isn't that important with the resolutions we're dealing with, you could probably just scale down the squared values and use them.

 

That said, a quick Google search turned up this algorithm for a 16-bit PIC that avoids division and reportedly works well: http://ww1.microchip.com/downloads/en/AppNotes/91040a.pdf

 

As for the library - that explains it. I was doing exactly the same as you, just using #ifdefs, but Insomnia in this thread suggested that a library with each function in its own object file would perform the selecting linking I desired. That was the motivation behind creating the library -- in my brief testing so far it does appear to work.

Share this post


Link to post
Share on other sites

So, some comparison operations, shift right, etc... all seem to drag in __divsi3 for some reason so I just gave up and added the following piece of code:

__divsi3(unsigned int a, unsigned int b)
{
   return a / b;
}

Seems to work fine, but haven't tested a wide range of numbers yet...

Share this post


Link to post
Share on other sites

It might be interesting to look at the output assembly code for, for instance, that comparison, and see what it has actually generated?

Share this post


Link to post
Share on other sites

I took a quick look at the output of the fixed point code. It looks like the problem is actually in fp_div The left shifting of value A promotes the value to a 32-bit quantity, and the compiler then uses __divsi3 to get the result, which is then demoted back to a 16-bit value.

 

This could be avoided by the use of inline assembly (div uses a 32-bit numerator and 16-bit divisor) alternately, you could try algebraic manipulation to keep all intermediate values in 16-bit representation. Personally, I'd go for the inline assembly since that would be faster to execute and simpler to understand.

 

That being said, the __divsi3 function is included in libgcc, if you want to use it.

 

On to new patches. Since I've been busy lately, this is going to be a pretty sparse update. These are mostly fixes for stuff Tursi has found so far.

 

GCC changes:

Fixed R11 restoration in epilogue being dropped by DCE

Added support for named sections

Removed support for directly zeroing byte memory, was buggy in some memories

 

binutils changes:

Added more informative syntax error messages

Fixed values like ">6000" in strings being mangled

Confirm support for named sections

 

As always, more details are on my blog.

binutils-2.19.1-tms9900-1.5-patch.tar.gz

gcc-4.4.0-tms9900-1.8-patch.tar.gz

  • Like 1

Share this post


Link to post
Share on other sites

I was looking at integer square roots, and found (on wikipedia no less) this algorithm. It appears to run in constant time and only uses fast operations (shifts, adds and subtracts). This also only uses 16-bit intermediate values, so no libgcc required.

 

I haven't analyzed this for performance or correctness, but it looks interesting.

 

Link:

http://en.wikipedia....em_.28base_2.29

 

 

Code:

short isqrt(short num) {
short res = 0;
short bit = 1 << 14; // The second-to-top bit is set: 1L<<30 for long
// "bit" starts at the highest power of four <= the argument.
while (bit > num)
 bit >>= 2;
while (bit != 0) {
 if (num >= res + bit) {
	 num -= res + bit;
	 res = (res >> 1) + bit;
 }
 else
	 res >>= 1;
 bit >>= 2;
}
return res;
}

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...