Jump to content

Photo

GCC for the TI


511 replies to this topic

#126 JamesD OFFLINE  

JamesD

    Quadrunner

  • 8,435 posts
  • Location:Flyover State

Posted Sun Feb 17, 2013 10:53 PM

The most interesting part there for me is probably his libc.a implementation, which Insomnia hasn't gotten to yet if I understand it correctly...

What? No interest in the Focal source code or in hacking COBOL to run on the 99/4A? :)
There's also a BASIC. If you could hack a machine or emulator to allow you to disable ROMs you could potentially run a BASIC that uses native 9900 instructions.

Funny how a CPU designed off a mini-computer processor ended up in a microcomputer and they crippled it so much you couldn't possibly tell it's origin.

#127 Tursi OFFLINE  

Tursi

    Quadrunner

  • 5,555 posts
  • HarmlessLion
  • Location:BUR

Posted Mon Feb 18, 2013 2:50 AM

Someone else porting GCC. It's for the TI990 but as far as I can tell, the code should be compatible.
http://www.cozx.com/~dpitts/ti990.html

BTW, after looking at some other TI990 info, I have to wonder if some of the decisions in the design of the TI-99/4 were based on experience with the TI990.


Origin of the TMS9900 CPU, apparently :)


#128 senior_falcon ONLINE  

senior_falcon

    Stargunner

  • 1,404 posts
  • Location:Lansing, NY, USA

Posted Mon Feb 18, 2013 6:14 AM

This is getting way off topic, but if you're interested in the 990 you can find a book titled "990 computer Family Systems Handbook" on bookfinder.com for about $6.00. Published in 1976.

#129 insomnia OFFLINE  

insomnia

    Star Raider

  • Topic Starter
  • 91 posts
  • Location:Pittsburgh, PA

Posted Mon Feb 18, 2013 9:55 AM

Dave Pitts is responsible for the TMS990 GCC support. He had functional code working before I ever started my stuff.

The comipiler output should be OK for TMS9900, but there are differences in approach. The biggest one is probably in handlaing of in-register byte operations. I solved that by modifying the mid-level GCC code to propogate type conversions. Dave did it by converting all byte operations to memory operations. The location of the workspace is stored in R10 and offsets to registers are calculated rom there. Personally, I feel my approach is better, but there's room for argument here. I've also implemented a lot of optimizations that are lacking in the 990 compiler.

As for libc, I've actually done some work on that. I've got pretty much everything that doesn't involve formatted output or console or file IO working. I could release what I have, but I'm not sure how useful that would be on its own.

Unfortunately, there's not a lot I can pull from the 990 code.The math routines for libm could be stolen directly, but according to the comments, those are themselves stolen from Minix. All other libc functions are implemented by performing a syscall to the 990 operating system, which obviously won't work here. Although having said that, people have had good results by bridging into the GPL routines.

I'm actually working on IEEE floating point support right now (32-bit only, no doubles). It seemed like it would be fun to do, and there's really not too much to it. My blog hasn't been updated for a while, but I've currently got type conversion working. Addition and subtraction are in progress, but almost done. So far it's about 300 bytes for the float routines, I guess that full support will probably take about 1K when complete.

#130 Tursi OFFLINE  

Tursi

    Quadrunner

  • 5,555 posts
  • HarmlessLion
  • Location:BUR

Posted Tue Feb 19, 2013 6:49 PM

If you're taking requests, how about support for individual code and data segments, specifically support for -fdata-sections and -ffunction-sections, so that we can take advantage of --gc-sections? :) It makes it easier to maintain a generic library and have all unused code and data automatically removed, which is valuable on a small-memory system like we have.

I'm not sure what's involved, I haven't gotten deep enough to dare attempt it myself yet. :)

#131 insomnia OFFLINE  

insomnia

    Star Raider

  • Topic Starter
  • 91 posts
  • Location:Pittsburgh, PA

Posted Tue Feb 19, 2013 7:56 PM

Always happy to take requests.

I didn't plan for the use of multiple sections, so the assembler, compiler and possibly the linker will need to be changed. The ELF conversion tools should be fine as they are. The biggest impact will be in the assembler. It will need to have a new keyword added to specify an arbitrarily named section. None of these should be too bad, though. Maybe a hundred lines or so total.

The work required to get these options to work will also make it easier to make programs using overlays, It seems pretty common to use special sections to keep all the pieces straight.

For now, you can get a similar result by creating a library with each function in its own file. The linker will then only include the parts that are actually used. I have a library of common functions that is maintained his way and it works pretty well. Most libc implementations do this too.

#132 Tursi OFFLINE  

Tursi

    Quadrunner

  • 5,555 posts
  • HarmlessLion
  • Location:BUR

Posted Mon Mar 11, 2013 3:41 AM

Found a compilation bug that's slowing me down. I'm writing a test application (and very glad to have C for it), but there's a small issue in one of my functions. Not sure if it's a tail recursion issue or not.

The following function is what gave me the problem:

void hexprint(unsigned char x) {
	char buf[3];

	buf[0]=(x>>4)+'0';
	if (buf[0]>'9') buf[0]+=7;
	buf[1]=(x&0x0f)+'0';
	if (buf[1]>'9') buf[1]+=7;
	buf[2]='\0';
	puts(buf);
}

I verified that this was enough to reproduce the bug - compiled with this command line:

tms9900-gcc -c test.c -O2 -std=c99 -s --save-temp -o test.o

With this, you can look at test.s and right at the end, you'll see this:

L3
	sb   @>4(r10), @>4(r10)
	mov  r10, r1
	inct r1
	bl   @puts
	ai   r10, >6
	b    *r11

Note that between the bl @puts and the b *r11, that R11 is never loaded with the correct return address (it IS correctly saved to the stack on function entry). The result is that the code gets stuck in a loop there, since R11 still contains the return address after the BL.

As far as I'm aware, I'm on the latest code.


#133 Tursi OFFLINE  

Tursi

    Quadrunner

  • 5,555 posts
  • HarmlessLion
  • Location:BUR

Posted Mon Mar 11, 2013 11:25 PM

My workaround for the above, in the affected function, right at the end, I added:

__asm__("mov *r10,r11");

However, I can't predict which functions it will occur in, so it's sort of manual inspection (or maybe turn down the optimization level, haven't tried that yet).

Found a very odd one... this string:

puts("Map EEPROM to >6000\n");

Becomes this in the .S file:

LC14
	text 'Map EEPROM to >6000'
	byte 10
	byte 0

But in the object file:

4D 61 70 20 45 45 50 52 4F 4D 20 74 6F 20 24 36 30 30 30 0A 00 -- Map EEPROM to $6000

How did the '>' become '$'? :)


#134 Tursi OFFLINE  

Tursi

    Quadrunner

  • 5,555 posts
  • HarmlessLion
  • Location:BUR

Posted Tue Mar 26, 2013 10:48 PM

Here's a funny one... I had a function like so:

unsigned char GromReadData(unsigned int address, unsigned char port) {
    // we only support 15 ports, this still fits in a char
    port <<= 2;

    // set address
    *((volatile unsigned char*)(0x9c02+port)) = address>>8;
    *((volatile unsigned char*)(0x9c02+port)) = address&0xff;

    // read data
    return *((volatile unsigned char*)(0x9800+port));
}

The compiler likes to inline this, which is fine with me, it often optimizes it nicely. However, I had a call like this:

out[0]=GromReadData(0x7000, 0);

and it compiled to this:

A2EE  0201  li   R1,>7000               (20)
   A2F2  D801  movb R1,@>9c02              (38)
>> A2F6  7820  sb   @>9c02,@>9c02          (58)
   A2FC  DAA0  movb @>9800,@>0006(R10)

The code that uses SB to zero a byte can't use this on hardware addresses, it doesn't work on the hardware (tends to work in emulation, it appears!) But the read value from a write-only port isn't well defined.

Naturally the compiler can't be expected to know it's hardware, but perhaps the SB approach can be suppressed for volatile targets, in favor of a slower register-based approach? (That, or maybe sacrifice a register or even a memory address in scratchpad to contain >0000 guaranteed, so you can move from it?)

Sorry for all the bug reports! Hope they're helpful.

Edited by Tursi, Tue Mar 26, 2013 10:51 PM.


#135 TheMole OFFLINE  

TheMole

    Dragonstomper

  • 819 posts
  • Location:Belgium

Posted Thu Apr 18, 2013 9:34 AM

Tursi, any chance you can provide me a zip of your compiled toolchain? I'm unfortunately stuck on a windows machine, so wouldn't know how to start compiling gcc on this thing (and it's my work machine, so don't want to spend too much time setting it up here).

#136 Tursi OFFLINE  

Tursi

    Quadrunner

  • 5,555 posts
  • HarmlessLion
  • Location:BUR

Posted Thu Apr 18, 2013 1:08 PM

Sure, I can zip it up tonight. You will still need Cygwin installed to run it, so get that, and it includes a hack for the above mentioned "SB" bug that consumes one additional word of scratchpad memory. (I zero the word then use it as the source of all zeros ;) ).

#137 Tursi OFFLINE  

Tursi

    Quadrunner

  • 5,555 posts
  • HarmlessLion
  • Location:BUR

Posted Sun Apr 21, 2013 4:25 AM

Sorry that I forgot about this.. here are my binaries. They are expected to be located in your Cygwin install in your home folder (so on mine it is /home/tursi -- which in Windows is C:\Cygwin\home\tursi -- your path may vary).

The zip is too large to upload here, so grab it from http://harmlesslion....HackForZero.zip

As per the instructions above, you have to change /bin/as.exe to the TMS9900 version -- rename the existing one, and then copy binutils/bin/tms9900-as.exe to /bin/ and rename it to 'as.exe'. Then it should work for you.

You should give downloading and building yourself a shot, if you haven't! It will help for future builds and for making your environment more like you want it.

#138 TheMole OFFLINE  

TheMole

    Dragonstomper

  • 819 posts
  • Location:Belgium

Posted Sun Apr 21, 2013 7:36 AM

Thanks, much appreciated!

You should give downloading and building yourself a shot, if you haven't! It will help for future builds and for making your environment more like you want it.

Well, I did compile the toolchain on my Linux laptop, and will probably do so on my Mac as well. But I'm really not at home on a Windows machine and I don't want to invest in learning the platform too much either, this is just a way to test some stuff in the few minutes per day I can spend on this at work. So again, thanks for this, it's a big help.

#139 Tursi OFFLINE  

Tursi

    Quadrunner

  • 5,555 posts
  • HarmlessLion
  • Location:BUR

Posted Sun Apr 21, 2013 2:46 PM

Ah, okay. Don't be too worried about learning Windows.. once you install Cygwin (which you'll need to run my compiled versions anyway), and start the BASH shell, you're in BASH, same as Linux, and you needn't think about the WIndows side. :)

Edited by Tursi, Sun Apr 21, 2013 3:01 PM.


#140 unhuman OFFLINE  

unhuman

    Stargunner

  • 1,225 posts
  • Location:Vienna, VA

Posted Sun Apr 21, 2013 4:02 PM

Yeah - I love Cygwin. I use it a ton and hardly ever use the windows shell... Which is crap.

Ah, okay. Don't be too worried about learning Windows.. once you install Cygwin (which you'll need to run my compiled versions anyway), and start the BASH shell, you're in BASH, same as Linux, and you needn't think about the WIndows side. :)



#141 Tursi OFFLINE  

Tursi

    Quadrunner

  • 5,555 posts
  • HarmlessLion
  • Location:BUR

Posted Sun Apr 21, 2013 6:41 PM

I've gone ahead and published my first stab at creating a small library. To encourage use, it's up on GitHub so people can play with it at will, even fork it and offer code back. It seems to work, although the issue of writing hard-coded zeros to hardware still exists in the compiler.. my version works around that, but I hope we'll see Insomnia again soon. :)

A basic start here, anyway: https://github.com/tursilion/libti99


#142 TheMole OFFLINE  

TheMole

    Dragonstomper

  • 819 posts
  • Location:Belgium

Posted Mon Apr 22, 2013 2:19 AM

I've gone ahead and published my first stab at creating a small library. To encourage use, it's up on GitHub so people can play with it at will, even fork it and offer code back. It seems to work, although the issue of writing hard-coded zeros to hardware still exists in the compiler.. my version works around that, but I hope we'll see Insomnia again soon. :)

A basic start here, anyway: https://github.com/tursilion/libti99


Wow, impressive work! I've been working on a poor man's version of this but this is way beyond where I ended up. I can guarantee you I will be using this. One question: I see the library - when compiled - is 26k. When using gcc for the TI I noticed it doesn't strip unused functions and symbols (but maybe I'm using it wrong). My work-around was using defines in the makefile to enable/disable pieces of functionality as needed by the program. Did you run into something similar, if so how did you solve it?

Edited by TheMole, Mon Apr 22, 2013 2:19 AM.


#143 Tursi OFFLINE  

Tursi

    Quadrunner

  • 5,555 posts
  • HarmlessLion
  • Location:BUR

Posted Mon Apr 22, 2013 2:26 AM

Thanks! These are just the functions I created for my music player demo and my GROM test program, split up and lightly documented.

That it's not stripping unused parts for you is odd -- that was the whole reason I made it a library, and in my testing (with 'testlib.c'), it worked correctly, linking in only what was needed automatically. Maybe check my makefile and see if I did something radically different than you did?

Alternately, post yours? I can't investigate too far but I can at least see if I reproduce your results.


#144 Tursi OFFLINE  

Tursi

    Quadrunner

  • 5,555 posts
  • HarmlessLion
  • Location:BUR

Posted Mon Apr 22, 2013 2:39 AM

hmmm.. also... note that the 26k is full ELF with debug information and symbols - that information doesn't end up in the final code. That scared me for a moment when I read that. :)

#145 TheMole OFFLINE  

TheMole

    Dragonstomper

  • 819 posts
  • Location:Belgium

Posted Mon Apr 22, 2013 7:45 AM

Maybe not a gcc compiler question per se, but don't know if there's a better place to post this. For the raycaster project I'm trying to implement a fixed point square root function that used 10.6 binary fixed point numbers, but keep running against some limitations. This is what I currently have:
#ifndef _FIXEDPOINT_H_
#define _FIXEDPOINT_H_

typedef int fixedpt;
typedef long fixedptd;
typedef unsigned int fixedptu;
typedef unsigned long fixedptud;

#define FP_BITS 16
#define FP_WBITS 10
#define FP_FBITS (FP_BITS - FP_WBITS)
#define FP_FMASK (((fixedpt)1 << FP_FBITS) - 1)
// conversion to and from fixed point numbers
#define fp_rconst(R) ((fixedpt)((R) * FP_ONE + ((R) >= 0 ? 0.5 : -0.5)))
#define fp_fromint(I) ((fixedptd)(I) << FP_FBITS)
#define fp_toint(F) ((F) >> FP_FBITS)
#define fp_fracpart(A) ((fixedpt)(A) & FP_FMASK)
// Number operations that are not handled by inline functions
#define fp_abs(A) ((A) < 0 ? -(A) : (A))
// Some common constants
#define FP_ONE ((fixedpt)((fixedpt)1 << FP_FBITS))
#define FP_ONE_HALF (FP_ONE >> 1)
#define FP_TWO (FP_ONE + FP_ONE)
#define FP_PI fp_rconst(3.14159265358979323846)
#define FP_HALF_PI fp_rconst(3.14159265358979323846 / 2)

// Multiplies two fixedpt numbers, returns the result.
static inline fixedpt fp_mul(fixedpt A, fixedpt B)
{
  return (((fixedptd)A * (fixedptd)B) >> FP_FBITS);
}

// Divides two fixedpt numbers, returns the result.
static inline fixedpt fp_div(fixedpt A, fixedpt B)
{
  return (((fixedptd)A << FP_FBITS) / (fixedptd)B);
}

// Returns the square root of the given number, or -1 in case of error
static inline fixedpt fp_sqrt(fixedpt A)
{
  int invert = 0;
  int iter = FP_FBITS;
  int l, i;

  if (A < 0)
    return (-1);

  if (A == 0 || A == FP_ONE)
    return (A);

  if (A < FP_ONE && A > 6)
  {
    invert = 1;
    A = fp_div(FP_ONE, A);
  }

  if (A > FP_ONE)
  {
    int s = A;
    iter = 0;
    while (s > 0)
    {
      s >>= 2;
      iter++;
    }
  }

  // Newton's iterations
  l = (A >> 1) + 1;
  for (i = 0; i < iter; i++)
    l = (l + fp_div(A, l)) >> 1;
  if (invert)
    return (fp_div(FP_ONE, l));

  return (l);
}

When compiling, this (the sqrt function) seems to require __divsi3 (division for signed 32 bit integers), which is not implemented. Oddly enough, the first instance where to code needs this is in the comparison to zero (if (A == 0 || (A == FP_ONE) ), which I don't get at all. Does anyone know of a 16 bit fixed point sqrt algorithm that does not require 32 bit numbers during calculation AND has a reasonable accuray?


hmmm.. also... note that the 26k is full ELF with debug information and symbols - that information doesn't end up in the final code. That scared me for a moment when I read that. :)


Yeah, I know, I just quickly looked up the file size just to have an idea. I did not turn my files into a real library, I compile and link against the object files depending on the project, so that might be the difference. I don't have the code here to test against.

Edited by TheMole, Mon Apr 22, 2013 7:47 AM.


#146 Tursi OFFLINE  

Tursi

    Quadrunner

  • 5,555 posts
  • HarmlessLion
  • Location:BUR

Posted Mon Apr 22, 2013 12:47 PM

My first question is "do you really need square root?" Mathematical accuracy isn't that important with the resolutions we're dealing with, you could probably just scale down the squared values and use them.

That said, a quick Google search turned up this algorithm for a 16-bit PIC that avoids division and reportedly works well: http://ww1.microchip.com/downloads/en/AppNotes/91040a.pdf

As for the library - that explains it. I was doing exactly the same as you, just using #ifdefs, but Insomnia in this thread suggested that a library with each function in its own object file would perform the selecting linking I desired. That was the motivation behind creating the library -- in my brief testing so far it does appear to work.

#147 TheMole OFFLINE  

TheMole

    Dragonstomper

  • 819 posts
  • Location:Belgium

Posted Tue Apr 23, 2013 10:18 AM

So, some comparison operations, shift right, etc... all seem to drag in __divsi3 for some reason so I just gave up and added the following piece of code:
__divsi3(unsigned int a, unsigned int b)
{
    return a / b;
}
Seems to work fine, but haven't tested a wide range of numbers yet...

#148 Tursi OFFLINE  

Tursi

    Quadrunner

  • 5,555 posts
  • HarmlessLion
  • Location:BUR

Posted Tue Apr 23, 2013 12:54 PM

It might be interesting to look at the output assembly code for, for instance, that comparison, and see what it has actually generated?

#149 insomnia OFFLINE  

insomnia

    Star Raider

  • Topic Starter
  • 91 posts
  • Location:Pittsburgh, PA

Posted Thu May 2, 2013 12:51 AM

I took a quick look at the output of the fixed point code. It looks like the problem is actually in fp_div The left shifting of value A promotes the value to a 32-bit quantity, and the compiler then uses __divsi3 to get the result, which is then demoted back to a 16-bit value.

This could be avoided by the use of inline assembly (div uses a 32-bit numerator and 16-bit divisor) alternately, you could try algebraic manipulation to keep all intermediate values in 16-bit representation. Personally, I'd go for the inline assembly since that would be faster to execute and simpler to understand.

That being said, the __divsi3 function is included in libgcc, if you want to use it.

On to new patches. Since I've been busy lately, this is going to be a pretty sparse update. These are mostly fixes for stuff Tursi has found so far.

GCC changes:
Fixed R11 restoration in epilogue being dropped by DCE
Added support for named sections
Removed support for directly zeroing byte memory, was buggy in some memories

binutils changes:
Added more informative syntax error messages
Fixed values like ">6000" in strings being mangled
Confirm support for named sections

As always, more details are on my blog.

Attached Files



#150 insomnia OFFLINE  

insomnia

    Star Raider

  • Topic Starter
  • 91 posts
  • Location:Pittsburgh, PA

Posted Thu May 2, 2013 10:32 AM

I was looking at integer square roots, and found (on wikipedia no less) this algorithm. It appears to run in constant time and only uses fast operations (shifts, adds and subtracts). This also only uses 16-bit intermediate values, so no libgcc required.

I haven't analyzed this for performance or correctness, but it looks interesting.

Link:
http://en.wikipedia....em_.28base_2.29


Code:
short isqrt(short num) {
short res = 0;
short bit = 1 << 14; // The second-to-top bit is set: 1L<<30 for long
// "bit" starts at the highest power of four <= the argument.
while (bit > num)
	 bit >>= 2;
while (bit != 0) {
	 if (num >= res + bit) {
		 num -= res + bit;
		 res = (res >> 1) + bit;
	 }
	 else
		 res >>= 1;
	 bit >>= 2;
}
return res;
}





0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users