Jump to content

insomnia

Members
  • Posts

    91
  • Joined

  • Last visited

  • Days Won

    1

Posts posted by insomnia

  1. This is actually a bit involved.

     

    Just for some quick background, there are several tools used to compile a binary file: The compiler (GCC), the assembler(GAS), the linker (LD), and a tool to convert the result to a format the TI can use (for now that's just ELF2EA5 or ELF2CART).

     

    The linker is responsible for defining the address for the code and can use a configuration file to do this. This allows you can assign locations for everything in the output file. There are a lot of options available, and GNU has a pretty good manual at http://ftp.gnu.org/o...ode/ld_toc.html

     

    There is a "hello world" example for carts back in post #64 of this thread. It doesn't use a link file and instead uses command-line arguments to define locations in the output file. Here's a link file which is equivalent to that example, let's call it linkfile.ld:

     

    SECTIONS{
    . = 0x6000;
    .text : {*(.text);}
    
    . = 0x2000;
    .data : {*(.data);}
    .bss : {*(.bss);}
    }
    

     

    The dot symbol is the current output location. By modifying it, the locations of the sections will be defined. In this example, all .text sections are linked assuming they will be loaded starting at location 0x6000, the .data and .bss sections are loaded starting at location 0x2000. There are a lot of ways to write the link file to accomplish the same result. Check the ld manual for more details here.

     

    This link file would be used by GCC by using the "-T" switch, like this:

    tms9900-gcc -T linkfile.ld ctr0.o main.o -o hello.elf
    

     

    Now, the segments you were wondering about are commonly called overlays, and there are a lot of examples which can be found online. None of these will be exactly what you are looking for, but the GNU tools are used for everything, and there are common concepts which would probably be helpful.

     

    Here's an example link file using three overlays:

    SECTIONS
    {
    . = 0x2000;
    .data : { *(.data); }
    .bss : { *(.bss); }
    
    OVERLAY 0x6000 : AT (0x0000)
    {
    .otext1 { overlay1.o(.text); }
    .otext2 { overlay2.o(.text); }
    .otext3 { overlay3.o(.text); }
    }
    }
    

     

    This will cause all .text sections of the overlay files to be at address 0x6000, and they will appear sequentially in the output file. The .data and .bss sections will start at address 0x2000. Here are the sections defined in the output file:

     

    eric@compaq:~/dev/tios/src/test_overlay$ tms9900-readelf -S hello.elf
    There are 11 section headers, starting at offset 0x650:
    
    Section Headers:
     [Nr] Name			  Type		    Addr	 Off    Size   ES Flg Lk Inf Al
     [ 0]				   NULL		    00000000 000000 000000 00	  0   0  0
     [ 1] .bss			  NOBITS		  00002000 000800 000006 00  WA  0   0  1
     [ 2] .otext1		   PROGBITS	    00006000 000200 000006 00  AX  0   0  2
     [ 3] .rela.otext1	  RELA		    00000000 0008ec 000000 0c	  9   2  4
     [ 4] .otext2		   PROGBITS	    00006000 000400 000006 00  AX  0   0  2
     [ 5] .rela.otext2	  RELA		    00000000 0008ec 000000 0c	  9   4  4
     [ 6] .otext3		   PROGBITS	    00006000 000600 000006 00  AX  0   0  2
     [ 7] .rela.otext3	  RELA		    00000000 0008ec 000000 0c	  9   6  4
     [ 8] .shstrtab		 STRTAB		  00000000 000606 000047 00	  0   0  1
     [ 9] .symtab		   SYMTAB		  00000000 000808 0000b0 10	 10   5  4
     [10] .strtab		   STRTAB		  00000000 0008b8 000034 00	  0   0  1
    Key to Flags:
     W (write), A (alloc), X (execute), M (merge), S (strings)
     I (info), L (link order), G (group), x (unknown)
     O (extra OS processing required) o (OS specific), p (processor specific)
    

     

    An additional tool will then be required to split up the output file into pieces suitable for Classic99 or MESS or actual hardware or whatever. Unfortunately, no one has written a tool to do that yet.

     

    Finally, GCC can produce position independent code (by using the "-fpic" switch), but that feature is currently broken for the TMS9900. If it were to work, the code would be compiled so it could be loaded to and run from any address. This probably not what you want.

    • Like 2
  2. Matters Computational: Ideas, Algorithms, Source Code

    http://www.jjj.de/fxt/fxtbook.pdf

     

    I found a link to this file while looking through the Atari 2600 programming section. It's a Creative Commons licensed book written by Jorg Arndt giving a brief overview of a huge swath of numerical techniques and theory.

     

    This starts with a ton of bit manipulation tricks, then moves into sorting, data structures, graph analysis, and permutations. The author wades deep into number theory with plenty of examples. Each section builds on the one before, so readers shouldn't feel overwhelmed. There is corresponding C code for most topics discussed and an extensive set of references.

     

    It's about a thousand pages of math-y goodness, and serves as a great starting point for more in-depth research. There's a section titled "Multiplication of Hypercomplex Numbers", for crying out loud. what's not to love?

     

    Enjoy

    • Like 1
  3. Hey everybody, I've got a new set of patches for GCC, with changelog and stuff below. But first, since I've been away for a long time, I figured I should answer some of the questions people have asked so far.

     

    Retroclouds wanted to know if http://ultra-embedded.com/?fat_filelib would compile. It does, but I had to use include files from my PC since there's no standard C library yet. The resulting code may also need to be massaged for size or be split up into smaller pieces. It's about 21 KB of code, 3 KB of data.

     

    total:
    [Nr] Name			 Type		 Size(Hex) Size(Dec)
    [ 1] .text			 PROGBITS	 00005462 = 21602
    [ 3] .data			 PROGBITS	 00000030 = 48
    [ 4] .bss			 NOBITS		 00000CCA = 3274
    

    Of course, this is missing some other stuff:

    puts

    printf

    strncpy

    strncmp

    memcpy

    memset

     

    On the subject of LLVM, I've looked into it and it looks interesting but don't plan on using it any time soon. The problem I see with it right now is that it's new and mainly oriented for Intel archetectures. This means that it's likely to have structural changes in the near future, and any TMS9900 port may be short-lived. It's also not clear how word-to-byte conversions would work, which has by far been the most challenging part of the GCC port. As has been said earlier, for the code sizes we're dealing with for the TI, any difference in compile speed would meaningless.

     

    So, on to the GCC changes:

     

    Libgcc now built for TMS9900

    Implemented optimized assembly functions in libgcc for:

    count leading zero bits

    count trailing zero bits

    find index of least significant set bit

    count set bits

    calculate parity bit

    signed and unsigned 32-bit division and modulus

    Fixed 32-bit multiplies, was only doing unsigned multiply

    Fixed 32-bit negation, was emitting invalid NEG instruction

    Removed fake PC register (Yay!)

    New build instructions to make libgcc

    Fixed function prologue and epilogue, was saving R11 unnecessarily

    Optimized function epilogue, saving a few cycles

    Enforced correct use of R11 register, was causing randomly broken code

     

    The main two features here are the addition of libgcc and better handling of R11.

     

    Libgcc is needed to complete coverage for 32-bit math. The other functions are frequently used by third party libraries. Later, this will be the proper place to add support for floating-point operations.

     

    The handling of R11 deserves some discussion here. There were three places where we determined which registers to save on the stack, and they did not all use the same algorithm. This resulted in sometimes saving R11 when it wasn't necessary, or not accounting for that space when values are stored on the stack. This results in corrupt registers and likley crashes.

     

    Another problem I found (not my fault this time) was that if a function can take advantage of peephole optimization patterns, any register allocated to fulfill that pattern will not be marked as used in the GCC internal tables. In the case where R11 is selected to be used for data, and this condition hits, my port did not know it needed to save the return pointer. At that point you've got another crash when the function tries to return to the caller using a corrupted R11.

     

    All these problems can pop up based on seemingly insignificant code changes. It all depends on how that code gets implemented, and which registers get allocated along the way. It's also maddening to try to debug. This was fixed by designating R11 for return pointer use only. We lose a general-purpose register, which I'm not crazy about, but at least the generated code will run. If I can figure out what's going on with the peepholes, I should be able to recover R11 for general use again.

     

    Another biggish change is the build instructions. This was needed for libgcc, and apparently I've been doing it wrong all along. The GCC developers use these instructions, so that's what I'll use too.

     

    Unpack and patch as before, and from the top level of the source tree do this:

     

    $ mkdir build

    $ cd build

    $ ../configure --prefix <path_to_installation_dir> --target=tms9900 --enable-languages=c

    $ make all-gcc all-target-libgcc

    $ make install

     

    When compiling, GCC will know where to find libgcc and you don't need to do anything to link against it.

     

    If anyone's interested, the details for all this plus the sad, depressing and ultimately futile search for a standard C library are on my blog.

     

    As always, please post any problems or questions you may have. I'll actually be around to answer them this time.

    gcc-4.4.0-tms9900-1.7-patch.tar.gz

    • Like 5
  4. Well, it's patch time again.

     

    First off, an aplogy to Lucien for not responding earlier, but the short answer for "how to use a single quote?" is "you can't". There was a bug in binutils which prevented its use.

     

    I tried to be clever and allow either TI-style or C-style strings in the assembly code, but did a terrible job of it. The parser always treated escaped single quotes as the end of the string, which causes some frustration, to put it mildly.

     

    Fortunately, that's all been fixed in the latest patch.

     

    So here's the official changelog for binutils:

     

    Fixed bug prohibiting the use of single quotes in a string

    Strings my be in either TI-style 'stuff' or C-style "stuff"

    TI-style strings follow E/A text rules

    C-style strings may include standard escape codes "example\n"

     

    And the other things fixed in GCC:

     

    Fixed comparison against +-1 and +-2, they got broken in 1.5

    Prevented incorrect use of fake PC register for real work

    Improved AND operations to use fewer setup instructions

    Fixed incorrect long-to-char conversions

    Fixed post-increment pointers which live on the stack

    Added optimization for setting byte quantities to zero

    Added optimization for (int)X = (unsigned char)((int)X)

    Removed double-counting space for saved registers on the stack

    Reduced overhead needed for multiply instructions

    Fixed bug causing structures to be loaded into registers

    Structures used as function arguments now passed by reference

    Fixed more bugs causing bad int-to-char conversions

     

    Work has kept me pretty busy lately, but progress continues.

     

    As always, if anyone finds problems or has suggestions, please let me know

    binutils-2.19.1-tms9900-1.4-patch.tar.gz

    gcc-4.4.0-tms9900-1.6-patch.tar.gz

    • Like 3
  5. So "charx" must be something between >00 and >FF. Right ?

     

    Looking at the first instruction, being SRA (Shift Right Arithmetic), I assume "charx" is represented in memory as a word (looking at the code it's in R2 at that stage) like something between >00xx and >FFxx ?

     

    SRA fills vacated bit positions with original MSB (Most Significant Bit).

     

    So "SRA R2,8" would turn >00xx into >0000, and >FFxx would be >FFFF ?

     

    But wouldn't that make (int)(charx) wrong ? - Shouldn't >FF00 be turned into >00FF ?

     

    The example was written assuming the values are in currently in registers, and doesn't use correct C syntax. The idea was to use shorthand and c-like pseudocode to get the idea across quickly. A real-life example would look something like this:

     

    void do_something()
    {
     char x;
     int y;
     ...
     y=((int)x)<<4;
     ...
    }
    

     

    You're right X must be a value between >00 and >FF, and if the X value is in memory, it need not occupy a full word. Once copied into a register (using MOVB or something) the value will be stored in the high byte.

     

    What you wrote would be true for unsigned values, but not for signed ones.

     

    >FFxx in a register can be interpreted as either (char)(-1) or (unsigned char)(255)

    >FFFF is (int)(-1)

    >00FF would be (int)(255)

     

    There are optimizations for both of these, but I only used an example for signed values since the timings are the same and only differ by the SRA or SRL instruction.

     

    The initial implementation would produce this code:

    * Assume x has a value of -4 (>FC), and is stored in r2 as >FCxx
    * y = (-4)<<4 = -4 * 16 = -64 = >FFC0
    sra r2, 8   * Convert to signed integer (r2=FFFC)
    sla r2, 4   * Left shift converted value (r2=FFC0)
    

     

    The optimization emits this code:

    * Assume x has a value of -4 (>FC), and is stored in r2 as >FCxx
    * y = (-4)<<4 = -4 * 16 = -64 = >FFC0
    sra r2, 4   * Shift into final position (r2=FFCx)
    andi r2, >FFF0	* Mask unknown bits (r2=FFC0)
    

     

    And finally, for unsigned values:

    * Assume x has a value of 252 (>FC), and is stored in r2 as >FCxx
    * y = 252<<4 = 252 * 16 = 4032 = >0FC0
    srl r2, 4	* Shift into final position (r2=0FCx)
    andi r2, >FFF0	* Mask unknown bits (r2=0FC0)
    

     

    Since fewer bit shifts are required, the optimized code runs faster (I figure about 33% faster on average), but uses one additional code word.

    • Like 1
  6. Now that the compiler has had a lot of fixes applied, I can replace some of the workarounds with the code which was originally intended, This makes for more compact and much easier to understand code.

     

    My intent was just to undo the damage which was forced upon this code by the problems in the compiler. I've also added a rush.dsk disk image (containing an EA5 file named RUSH) to make it easier to try the resulting program.

     

    Hopefully this will make lucien feel better about making the source code available.

    RUSH_HOUR4.zip

  7. Well, after a really long time without any signs of life from this project, it's patch time.

     

    A big "thank you" goes out to Lucien. A lot of the updates here are a direct result of the effort he put into making Rush Hour. He did a great job wading through all the brokenness to make a functional game. Now it's time for everyone to benefit from that work.

     

    New Binutils fixes in this release:

     

    STST was incorrectly looking for two arguments

    SBO, SBZ and TB incorrectly using constants

    EQU'ed symbols sometimes replaced using wrong endianness

     

    GCC fixes:

     

    Fixed several word-to-byte conversion errors

    Fixed "unrecognizable instruction" for zero comparison operations

    Made optimizations for most comparison operations

    Improved correctness of condition flag handling

    Switch statements now work properly

    Fixed divison and modulus, operands were used in wrong order

    Fixed subtract, operands were occasionally used in wrong order

    Fixed stack frame corruption when local variables are in use

    Added optimizations for forms like (int Y)=((int)(char X))<<N

     

    The patch and build procedures are the same as always. Development notes are on my blog for those who are interested.

     

    Things are shaping up pretty well so far. (Yes it is taking forever, sorry about that.) I don't see any obvious holes to fill, or optimizations yet to do. At this point, I just need to exercize the compiler with larger programs and increase test coverage. If anyone finds a problem, or sees an area where improvements can be made, please let me know.

     

    I'm continuing to work on related projects (disk management tool, libc library, documentation). There's still lots to do, so these updates will keep coming.

    binutils-2.19.1-tms9900-1.3-patch.tar.gz

    gcc-4.4.0-tms9900-1.5-patch.tar.gz

    • Like 3
  8. I do my development against MESS on a Linux machine, using GAS as my assembler. All code is written in Gedit (which is an equivalent to Notepad).

     

    I started doing development without any idea what tools were out there, so I've written them all myself (assember, disassembler, disk tools, font editor, etc.). However, I replaced my simple assembler with GAS as soon as I could. I don't use any debugging tools at all, although that would make my life much easier. Since I didn't have anything close to the comprehensive documentation now available here, I've had to reverse engineer some cartridges, disk images and a few of Sometimes's demos to better understand how the TI works

     

    I have a real machine packed away, but for now I'm content doing everything in MESS.

     

    This is probably the most difficult way possible to develop TI code, but I thought I'd share.

  9. Great job! That's just too darn cool.

     

    Unfortunately, I see you've had to put in a lot of effort to work around problems in the compiler. I'm sorry about that. You've also pointed out how badly a library of basic functions is needed (display stuff, keyboard support, file IO, sound), but it may be a while before I have a chance to get to that.

     

    First thing's first. Let me see if I can get rid of the problems you've found so far.

  10. Actually, the code that you found was in the disassembler, which isn't much help. What you want is line 722 of binutils-2.19.1/gas/config/tc-tms9900.c

     

    Change that line to

     { "stst", 0x02C0, {ARG_REGISTER,  ARG_NONE}},
    

    and you are back in business.

     

    The error you were seeing was due to the fact that this instruction was falsely insisting upon a second argument for STST. Something like this would have made it happy:

    stst r0, >0000
    

     

    It also seems like there is a problem with the SBO, SBZ and TB instructions. During the assembly process, the bit offsets are being reduced by half. They are currently using constants in the same way as JMP, which is wrong. I need to add a new constant type for these CRU instructions for correct operation. At least LDCR and STCR look right. I haven't had a chance to look at your other issues yet, but I should have some answers for you tomorrow.

  11. The worst was Wizard's Dominion. It was a really super watered down RPG. The mazes were even simpler than in A-Maze-Ing! and you had to keep track of things on paper. Isn't that what we have computers for?

     

    Gah! I loved that game when I was a kid.

     

    Maybe it's just the nostalgia talking, or the fact that it was one of only a handful of TI games I had, or maybe even the fact that I haven't seen it in almost thirty years, but I'm still convinced that Wizard's Dominion is super-cool.

     

    Take your "facts" and "informed opinion" and get outta here. I'll be standing over there with my giant run-on sentences, thinking about how awesome stuff I barely remember is...

    • Haha 1
  12. OK, it's patch time again.

     

    This patch includes the fixes for all bugs mentioned here since the last patch release in addition to a few I found on my own. The same patch and build directions used before are used for this one too.

     

    Here's the changes in this release:

     

    Fixed a bug with byte initializers, it was handling negative values wrongly

    Fixed multiply bug, it was using the wrong registers

    Changed frame pointer from R8 to R9. Frame was being lost

    Byte reads from memory were assumed to be copied into register's LSB.

    Fixed a problem with AND improperly modifying input values.

    Fixed a bug where R11 was not saved if used as a data register.

    Modified output to use hex values for all constants and addresses

     

    I've also packaged up an ELF to EA5 converter and an example program made to run as an EA5 image. The program does the same useless flashing text thing that the cart example did. This was done to make the differences easier to spot. The changes made to the EA5 crt0 are a bit safer than the one used in the cart version (this one better handles zero size sections). I'll probably release a new version of the cart tool and example sometime soon which incorporates these changes.

     

    The next thing on my list is to update all the documentation. Everything I've posted so far is still valid, but there are probably holes where some subjects need more description.

     

    I also need to put together a library for the missing 32- but functions (multiply, divide, modulus, shift). These functions are already written and tested for the most part, so releasing them should be quick and easy.

     

    Finally, I need to make my V9T9 disk management tool ready for public consumption. It currently works, and the disk images it creates were used to test the EA5 converter, but it's super hacky at the moment. Once I spruce it up a bit and turn it into a useful tool, I can send it out the door.

     

    As always, the gory details are on my blog for those who are interested.

    gcc-4.4.0-tms9900-1.4-patch.tar.gz

    elf2ea5.tar.gz

    hello_ea5.tar.gz

    • Like 1
  13. Well, that's funky.

     

    A few things you should know:

     

    By using inline assembly, to call "kscan" you are preventing GCC from knowing that it has to retain the "key" pointer stored in R1 and return pointer in R11. If "kscan" were to use these registers, the values stored there would be destroyed, and the C code would behave weirdly.

     

    That can be fixed by doing something like this:

    extern void kscan();
    
    int key_scan(char* key) {
           KEY_UNIT=5;
           kscan();
           *key=KEY;
           char c=STATUS&0x20;
           return c;
    }
    

     

    That results in this assembly:

    def	key_scan
    key_scan
    ai   r10, >FFFC
    mov  r11, *r10
    mov  r9, @2(r10)
    mov  r1, r9
    li   r1, >500
    movb r1, @-31884
    bl   @kscan
    movb @-31883, *r9
    movb @-31876, r1
    andi r1, >2000
    sra  r1, 8
    mov  *r10+, r11
    mov  *r10+, r9
    b    *r11
    
    ref	kscan
    

     

    The different results you are seeing in the return values are due to GCC removing a sign extension on the read value of the STATUS macro. During the register allocation step, GCC is trying to be helpful and removing what it thinks is an unnecessary operation, since it believes that byte quantities are already stored in the least significant byte of the register.

     

    I missed this when I was fixing things for the weird byte format. By using the temporary variable, you are forcing the change in data types. I need to go take a hard look at the compiler and find a fix for this. (The decimal addresses are also annoying and unhelpful. That increment in "mov *r10+, r9" should probably be removed too. Looks like I'll be busy for a while.)

     

    By the way, I think TI chose to store bytes this way to allow the comparison logic to work for word and byte values. This would also remove the need for sign extend and zero extend logic. It also opens up the possibility of storing data in the otherwise unused low byte. I don't have anything to back up this reasoning, but it seems to make sense. In any event, no matter why this decision was made, we're stuck with it now.

  14. Ah, man you're gonna make me cry over here. :)

     

    I'd suggest that one could make buggy, hole-ridden code in any language. And in my experience, most crappy code is born of lazy or rushed programmers. It's so much simpler and faster to slap on a band-aid than to comprehensively understand how every line of code works. This is especially true for larger projects (operating systems come to mind). In general, testing effort and risk for bugs rise exponentially with code size. This would be true for any language.

     

    OK, if I'm being fair, C does have a higher bugs-per-KLOC ratio than many other languages, but I think that's a reasonable trade for the flexibility it provides. I wouldn't count on any language to magically make programming simple and foolproof. The design of any language involves compromises.

     

    That being said, Forth is pretty cool, and it's amazing how much functionality you can get out of such a simple idea. I would honestly love to see an operating system written in Forth.

     

    Writing a game of this complexity is pretty darn impressive too. Tons of talent and dedication are required to pull something like that off.

    • Like 3
  15. OK, the multiply bug was more involved than the earlier ones, but here you go:

     

    In gcc-4.4.0/gcc/config/tms9900/tms9900.md. remove lines 1455 through 1484 (the "mulhisi3", and "*multhisi" patterns).

     

    Replace them with this:

     

    (define_insn "mulhisi3"
     [(set (match_operand:SI 0 "register_operand" "=r,r")
    (mult:SI (match_operand:HI 1 "register_operand" "r,r")
    	 (match_operand:HI 2 "general_operand" "rR>,Q")))]
     ""
     {
       /* When both input operands are registers, we may need to swap them. */
       if(REG_P(operands[1]) && REG_P(operands[2]))
       {
         /* Check for forms like: r0 = r1 * r0 */ 
         if(REGNO(operands[0]) == REGNO(operands[2]))
         {
           /* Swap operands, otherwise we will emit code like:
                mov r1, r0
                mpy r0, r0
    
              instead of:
                mpy r1, r0
           */
           rtx temp = operands[1];
           operands[1] = operands[2];
           operands[2] = temp;
         }
       }
    
       if(REGNO(operands[0]) != REGNO(operands[1]))
       {
         output_asm_insn("mov  %1, %0", operands);
       }
       output_asm_insn("mpy  %2, %0", operands);
       return("");
     }
     [(set_attr "length" "2,3")])
    

     

    The original code was an attempt to force the register allocator to use registers which were most convenient for the MPY instruction. Obviously,that didn't work out so well. This new code is less aggressive, accepting any register choice GCC may make. It also works in all optimization levels. An optional MOV instruction is now used to prepare for the multiply if the register allocator is not kind.

     

    Moving on to questions...

     

    * What registers are used by the compiler generated program? Are all 16 registers used or are some of them "free" for own use?

     

    GCC will attempt to make maximum use of all 16 registers, so there's no guarantee that there any lying around unused.

     

    If you have an assembly routine you would like to interface with C code, the information needed for that should be shown in earlier posts. If you would like more detailed information (calling convention, register usage and allocation order, etc) I'd be happy to let you know. I have a document I've been neglecting which should include all this stuff.

     

    * What (scratchpad) memory is used by the compiler generated program?

     

    GCC (or any compiler) is basically an engine to convert source code into assembly. That means you have complete control over what memory is used, and for what purpose. So unlike Basic, Java or Forth (I presume), there is no other code working behind the scenes you need to be aware of. All of the machine's resources are available to you, and any other limitations are of your own making.

     

    In the example code I've posted earlier, the registers are located at >8300 by the ctr0 code. Except for what's used by the registers, all of scratchpad memory is available. If you wanted to, you could put the registers elsewhere in scratchpad or 8-bit memory with no impact on the C code.

     

    You can store data anywhere in the system you like (like Lucien did in his bricks code). Using the linker, you can build your code to run from anywhere in memory. You could even put (small bits of) code into scratchpad and run from there for that extra performace boost (as I believe was done in Parsec).

     

    Remember, C was originally designed to write device drivers and operating systems, so the sky's the limit here.

  16. Thanks for the bug report, and keep 'em coming.

     

    It is embarrassing have my mistakes up where everyone can see them, but each bug found and fixed makes for a more capable tool for everyone. So again, thanks.

     

    I made an optimization pattern to convert a sequence like:

    li   r2, >AA00
    movb r2, *r1
    li   r2, >BB00
    movb r2, *r1
    

     

    to one like this:

    li   r1, >AABB
    movb r2, *r1
    swpb r2
    movb r2, *r1
    

     

    but I forgot to mask off the sign-extended bits of the constant in the lower byte. As a result, when the two values are ORed together, the byte stored in the upper byte is lost.

    >FFBB | >AA00 = >FFBB
    

     

    He also found another problem using -O0, which I don't usually use. The register I chose for the frame pointer, R8, is volatile. That means that it can be destroyed over a function call. The frame pointer is used as the base to locate local variables which live on the stack. If this is destroyed after a function call, the code following that call can behave unpredictably.

     

    This was just a dumb mistake. In order to preserve the ABI interface, I've moved the frame pointer to R9, which is preserved across function calls. The resulting code looks much safer now.

     

    I'll include these fixes in the next patch, but for the impatient here's how to fix these problems now:

     

    Change gcc-4.4.0/gcc/config/tms9900.md, line 2423 (near "*movhi_combine_consts") to look like this:

     

       operands[1] = GEN_INT(((INTVAL(operands[1]) & 0xFF) <<  |
                              (INTVAL(operands[3]) & 0xFF));
    

     

    Change gcc-4.4.0/gcc/config/tms9900.h, line 517 to look like this:

     

    #define FRAME_POINTER_REGNUM		HARD_R9_REGNUM
    

     

    And gcc-4.4.0/gcc/config/tms9900.h, line 523 to look like this:

     

    #define STATIC_CHAIN_REGNUM	        HARD_R9_REGNUM
    

     

    By the way, I think there's a problem where the stack frame is improperly sized when using -O0. This seems to be the result of the order of operations in GCC. The size of the frame is not yet known when the code to build the function prologue is called. That's on my todo list.

    • Like 1
  17. lucien,

     

    I use Linux for all my development and testing, and I don't have Cygwin installed on my Windows box, but I'll see what I can do for you right now.

     

    From your logs, it looks like you are building from:

    /gcc/lib/gcc/tms9900/4.4.0/

     

    Seems an odd location..

     

    I also see lines referencing /home/-/gcc-4.4.0/host-i686-pc-cygwin/gcc/xgcc, which seems like you are using user "-", which doesn't seem right either. These things might be messing up the build or install process if you don't have write or execute permissions in these directories for some reason.

     

    But it looks like GCC was built properly, and the GCC docs we built as well. You can confirm that by looking for a file named cc1 at:

     

    /gcc/lib/gcc/tms9900/4.4.0/host-i686-pc-cygwin/gcc/cc1

     

    You can check to see that it works by going to that directory and doing this:

     

    $ echo "void test() {}" > test.c
    $ ./cc1 test.c
    test
    Analyzing compilation unit
    Performing interprocedural optimizations
    <visibility> <early_local_cleanups> <summary generate> <inline>Assembling functions:
    test
    Execution times (seconds)
    parser                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 (100%) wall      34 kB ( 7%) ggc
    TOTAL                 :   0.00             0.00             0.01                498 kB
    $cat test.s 
           pseg
           even    
    
           def     test
    test
           ai   r10, >0
           mov  r10, r8
           b    *r11
    

     

    If cc1 is working, you may be having an installation problem, I'd recommend running "make distclean" and re-running the build process again (configure, make-all, make install)

     

    If it helps, I've included a copy of the output on my machine for these steps. For these logs, I've followed the directions posted above in a new directory. I'm using /home/eric/dev/tios/toolchain/WORKSPACE/temp/gcc-4.4.0 for my build location, and the /home/eric/dev/tios/toolchain/WORKSPACE/temp/bin for my install location.

     

    If you're still having problems, make similar copies of your output for these steps and send it my way.

    gcc_configure.txt

    gcc_build.txt

    gcc_install.txt

  18. Update time!

     

    It's about six months later than promised, but I haven't given up yet.

     

    Most of that time has been putting in a ton of hours at work and beating on the GCC code to get byte operations working properly. What's in this release is the fourth or fifth overhaul of the port. In the end I had to rewrite core bits of how GCC relates byte and word quantities. I've kept those changes to a minimum, so ports to later versions should still work.

     

    Here's what got changed in this patch:

     

     

    Add optimization to remove redundant moves in int-to-char casts

    Remove invalid CB compare immediate mode.

    Add optimizations for byte immediate comparison

    Added optimizations for shift and cast forms like (byte)X=(int)X>>N

    Remove invalid compare immediate with memory

    Improved support for subtract immediate

    Fixed bug causing gibberish in assembly output

    GCC now recognizes that bit shift operations set the comparison flags

    Fixed bug causing bytewise AND to operate on the wrong byte

    Add optimization for loading byte arrays into memory

    Confirmed that variadic functions work properly.

    Fixed the subtract instruction to handle constants

    Fixed the CI instruction, it was allowing memory operands

    Fixed a bug allowing the fake PC register to be used as a real register

    Encourage memory-to-memory copies instead of mem-reg-mem copies

    Added optimization to eliminate INV-INV-SZC sequences

    Modify GCC's register allocation engine to handle TMS9900 byte values

    Remove the 32 fake 8-bit registers. GCC now uses 16 16-bit registers

    Modify memory addressing to handle forms like @LABEL+CONSTANT(Rn)

    Clean up output assembly by vertically aligning operands

    Clean up output by combining constant expressions

    Optimize left shift byte quantities

    Fixed a bug where SZC used the wrong register

    Removed C instruction for "+=4" forms, AI is twice as fast

    Added 32-bit negate

    Fixed 32-bit subtract

    Fixed a bug causing MUL to use the wrong register

    Fixed a bug allowing shifts to use shift counts in the wrong register

    Confirmed that inline assembly works correctly

    Added optimization to convert "ANDI Rn, >00FF" to "SB Rn,Rn"

    Optimize compare-with-zero instructions by using a temp register

    Fixed a bug allowing *Rn and *Rn+ memory modes to be confused

    Removed most warnings from the build process

     

     

    There were also changes made to binutils, I hope this will be the last update for this.

     

     

    More meaningful error messages from the assembler

    DATA and BYTE constructs with no value did not allocate space

    Fix core dump in tms9900-objdump during disassembly

     

     

    The ELF conversion utility was also updated to allow crt0 to properly set memory before the C code executes. If it finds a "_init_data" label in the ELF file, it will fill out a record with all the information crt0 needs to do the initialization.

     

    In light of all these changes, I've made a new "hello world" program with lots of comments, a Makefile and all supporting files. I've also included the compiled .o, .elf, and converted cart image. In addition, there's also a hello.s file which is the assembly output from the compiler.

     

    I'm not sure if I mentioned this earlier, but the tms9900-as assembler will accept TI-syntax assembly files, but there are a number of additions:

     

     

    Added "or", "orb" aliases for "soc" and "socb" (that's been a gotcha for a several people here)

    Added "textz" directive - This appends a zero byte to the data.

    "textz '1234'" is equivalent to "byte >31, >32, >33, >34, 0"

    Added "ntext" directive - This prepends the byte count to the data.

    "ntext '1234'" is equivalent to "byte 4, >31, >32, >33, >34"

    Added "string" variants to all "text" directives

    No length limit for label names

    No limitation for constant calculations, all operations are allowed (xor, and, or, shifts, etc.)

     

     

    It think thats about enough for now

     

    I believe this is the biggest jump in usefulness yet. I've gone through and tested every instruction, and written several tests programs which did semi-interesting things from the compiler's point of view. They were, however, exceptionally dull from a user's point of view. For all the blow-by-blow details, check out my blog.

     

    As a final test of the byte handling code, I built that chess program posted back in December. No problems were seen and no hinky-looking code was generated. In addition, it was about 5% smaller.

     

    The build instructions are listed in post #43, and haven't changed since.

     

    So, let me know what you think,

    gcc-4.4.0-tms9900-1.3-patch.tar.gz

    binutils-2.19.1-tms9900-1.2-patch.tar.gz

    elf2cart.tar.gz

    hello.tar.gz

    • Like 3
  19. That chess program was so insane, I needed to see what the compiler would do with it. Aside from a few unexpected fake register accesses (grr! fixed by hand), and failing to handle the giant lookup table (not suprising) it was pretty much drama-free.

     

    The resulting code with -O2 optimizations: 1463 lines of assembly. Are there errors? Could be... But I have no motivation to compare against that C code. At first glance, it looks right.

     

    Here's the readelf dump of the resulting object file:

     

    eric@compaq:~/dev/tios/src/temp$ tms9900-readelf -S 2k_chess.o
    There are 8 section headers, starting at offset 0xefc:
    
    Section Headers:
     [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
     [ 0]                   NULL            00000000 000000 000000 00      0   0  0
     [ 1] .text             PROGBITS        00000000 000034 000e54 00  AX  0   0  2
     [ 2] .rela.text        RELA            00000000 001e70 000da4 0c      6   1  4
     [ 3] .data             PROGBITS        00000000 000e88 000041 00  WA  0   0  2
     [ 4] .bss              NOBITS          00000000 000ec9 000cb2 00  WA  0   0  1
     [ 5] .shstrtab         STRTAB          00000000 000ec9 000031 00      0   0  1
     [ 6] .symtab           SYMTAB          00000000 00103c 000b00 10      7 145  4
     [ 7] .strtab           STRTAB          00000000 001b3c 000333 00      0   0  1
    Key to Flags:
     W (write), A (alloc), X (execute), M (merge), S (strings)
     I (info), L (link order), G (group), x (unknown)
     O (extra OS processing required) o (OS specific), p (processor specific)
    

     

    So that's 3668 bytes of code (.text section), and 3315 bytes of data (.data + .bss sections). Of course that data size is shy about 128 MB or so.

     

    Still, I'm really impressed with how things are shaping up so far.

    • Like 1
  20. I see you've been busy with your own projects, so don't feel bad.

     

    Well, one of this things I've noticed is that unless you compile with the -O2 optimization, the resulting code is terrible and excessively wordy. I think your example code shows that off pretty well.

     

    I've compiled your sample with my in-development compiler, and with the recent changes, it looks a lot better: (comments by me of course)

     

    eric@compaq:~/dev/tios/src/temp$ /home/eric/dev/tios/toolchain/gcc-4.4.0/host-i686-pc-linux-gnu/gcc/cc1 tursi2.c
    

     

    pseg
    even	
    
    ****************************
    * void strcpy(char *d, char *s)
    *    R1 = *d
    *    R2 = *s
    ****************************
    def	strcpy
    strcpy
    ai r10, >FFFC     * Allocate four bytes from the stack (s and d)
    mov r10, r8       * Initialize the frame pointer
    mov r1, *r8       * Save s on the stack
    mov r2, @2(r8)    * Save d on the stack
    jmp L2            * Jump to bottom of copy loop
    L3
    mov @2(r8), r1    * Copy source address to R1
    movb *r1, r2      * Copy current character to R2
    mov *r8, r1       * Copy destination address to R1
    movb r2, *r1      * Copy current character to destination
    inc *r8           * Increment destination address
    inc @2(r8)        * Increment source address
    L2
    mov @2(r8), r1    * Copy source addresss to R1
    movb *r1, r1      * Copy current character to R1
    movb r1, r1       * Compare with zero, is this the terminator?
    jne L3            * If not, jump to top of loop
                             * Else clean up and exit
    c *r10+, *r10+    * Free stack space 
    b *r11            * Return to caller
    
    LC0
    text 'hello world'
    byte 0
    even	
    
    def	main
    main
    ai r10, >FFDE    * Allocate 34 bytes from stack (a and return pointer)
    mov r11, *r10    * Save return pointer
    mov r10, r8      * Initialize frame pointer
    mov r8, r1       * First step for setting destination address for strcpy
    inct r1          * Final step for destination address (d=frame+2)
    li r2, LC0       * Set source address for strcpy
    bl @strcpy       * Call strcpy
    clr r1           * Set return value for main
    mov *r10+, r11   * Restore return pointer
    ai r10, >20      * Free stack space
    b *r11           * Return to caller
    

     

     

    It looks like all your problems have been fixed by the in-development code. Horray!

     

    In fact, GCC does assume that there is a crt0 or some other launcher to set the initial stack pointer, initialize memory regions, set workspace location, set interrupts, etc. That code would most likely be written in assembly. I have an implementation I'd share with everyone, but it's not very exciting.

     

    Here's the same code with some optimizations applied, notice that it makes much better code.

     

    eric@compaq:~/dev/tios/src/temp$ /home/eric/dev/tios/toolchain/gcc-4.4.0/host-i686-pc-linux-gnu/gcc/cc1 -O2 -Os tursi2.c
    

     

    The strcpy has been inlined in main, but a non-inlined version was left intact:

     

    pseg              * Put code in program segment
    even	          * Start on even address
    
    ****************************
    * void strcpy(char *d, char *s)
    *    R1 = *d
    *    R2 = *s
    ****************************
    def	strcpy
    strcpy
    jmp L2
    L3
    movb r3, *r1+   * Copy current character to destination, increment destination address
    inc r2          * Increment source address
    L2
    movb *r2, r3    * Get next source character, is it the zero terminator?
    jne L3          * If not, go back to top of loop
    b *r11          * Return to caller
    
    
    *****************************
    * Constant string used by main()
    LC0
    text 'hello world'
    byte 0
    even	
    
    *****************************
    * Entry point of program
    
    def	main
    main
    ai r10, >FFE0      * Allocate 32 bytes of stack
    li r1, LC0         * Find source address
    mov r10, r8        * Find destination address (bottom of stack)
    
    ********
    * Inlined strcpy()
    jmp L7             
    L8
    movb r2, *r8+      * Copy current character to destination, increment destination address
    inc r1             * Increment source address
    L7
    movb *r1, r2       * Get next source character, is it the zero terminator?
    jne L8             * If not, go back to top of copy loop
    ********
    
    clr r1             * Set return value
    ai r10, >20        * Free stack space
    b *r11             * Return to caller
    

     

     

    Granted, the strcpy code could be better. More like this:

     

    ****************************
    * void strcpy(char *d, char *s)
    *    R1 = *d
    *    R2 = *s
    ****************************
    def	strcpy
    strcpy
    L1
           movb *r2+, r3   * Get next source character, increment source address
    movb r3, *r1+   * Copy current character to destination, increment destination address
           jne L1          * Was the copied character the zero terminator?
    b *r11          * Return to caller
    

     

    This is four bytes smaller, and I think the smallest implementation possible. But the GCC version isn't too far off.

     

    I think that's probably enough assembly for this post...

     

    A lot of the confusing code you are seeing is a result of the default compile options (which usually result in horribly ugly code). And all your errors have been fixed by some recent changes I've made in the development build.

     

    For everyone who may be wondering what I've been up to,here's a quick update:

     

    I've made some changes to the register allocation to make sure that the byte operations work properly. That seems to work nicely. I'm also trying to redo the division instructions to make it a bit more elegant. The current code requires extra moves in some cases. Ick.

     

    Most notably, I've fixed how memory accesses work. The code above show the results of that.

     

    I think that after the division stuff is wrapped up, it might be time for another release.

    • Like 1
  21. Thanks everyone, I appreciate the support!

     

    I've done assembly optimization before on a lot of machines (x86, ARM, POWERPC), but this is the first time I've done anything with the TMS9900. I got a Mini Memory cart at the tail end of my TI days, but was never able to make much use of it. Actually, I don't even have any of my TI stuff here with me, it's all in a box somewhere in my parent's basement. All the assembly info I have was cobbled together by trawling through forums, Google searches, and the odd technical paper here or there. Of course, after I had everything I needed to start TI work again, somebody posted it all together in a much more concise and convenient form here. (Grr!)

     

    At any rate, even though I've written three (crappy, severely limited) compilers before, and I've used GCC before, this is the first time I've done any actual GCC development. The earlier experience helps some, but mostly, its been a lot of reading and trial-and-error to get here.

     

    I spent a few months writing a lot of libraries in 99000 assembly to get used to the language. And now that I've got a handle on it, I've been reviewing the GCC output and trying to implement any improvements I can find.

     

    Pretty much everything I've done so far has been posted on my blog. The only real stuff missing was the initial data-gathering part, and most of the research, which I can't imagine anyone would want to read anyway.

     

    Short version: the first time I've seen any of this stuff was less than two years ago. (Wow, that seems like a long time now that I've written that down)

     

    So this project is a whole pile of firsts for me. It's taking quite a bit longer than I originally thought, but things seem to be going well.

  22. Well, I haven't been very productive lately, what with Christmas and all, but here's what I've been up to:

     

    I'm still working on libc support, and have a few functions done:

     

    strchr, strpbrk, memchr, strcmp, strrchr

    memcmp, strcpy, strspn, memcpy, strcspn

    strstr, memmove, strlen, strtok, memset

    strncat, strncmp, strcat, strncpy

     

    Here's the GCC features done since the last release:

     

    Add optimisation to remove redundant moves in int-to-char casts

    Remove invalid CB compare immediate instruction

    Add optimisation to restore byte immediate comparison

    Added optimisations for forms like (byte)X=(int)X>>N

    Remove invalid compare immediate with memory

    Improved support for subtract immediate

    Fixed bug causing gibberish in assembly output

    GCC recognises that bit shift operations set the comparison flags

     

    I've also improved the error handling in the GAS assembler, so the messages are more helpful

     

    Stuff I'm working on at the moment:

     

    Make sure that the low byte of registers are not accessed directly. This happens because I'm lying to GCC. I told it that the TMS9900 actually has 32 8-bit registers, but can only use the even-numbered ones for byte values. Sometimes it gets confused, and tries to use the fake, odd-numbered ones instead. I think I have a fix, but I'm still testing.

     

    I recently tried -O1 optimisation to see what would happen, and the compiler crashed. I'm still trying to figure out why.

     

    I'm planning on doing more libc work, testing, and more optimisations. With any luck, there will be enough changes by January to make another GCC release worthwhile.

    • Like 1
  23. I just visited your "Imsomnia Labs" blog and was impressed :)

    It makes a very interesting read.

    Thanks, I've always enjoyed reading other peoples project blogs, and I was hoping someone would find mine interesting.

     

    For a long time I've made it a point to keep a record of what was done for each day of a project. That makes it easier to review past progress and see if there was some feature or design idea that got overlooked or backed out for some reason. Additionally for this project, there have been periods of a few weeks or so where it had to be put on the back burner. Having a record of what state things were left in is really helpful to get back on track after being away for a while.

     

    Wouldn't it be great to have the vi editor running on the TI-99/4A ?

    Even though I don't see any practical use, that would be very cool :P

     

    hhmm, can hardly think a regexp engine would fit in the TI's memory?

     

    Probably not. It might be fun to try and squeeze one in there though.

     

    By the way, if no one's run across this, check out the Contiki Operating System. This is a multitasking OS with a full IP stack and windowing system, built into 40K of ROM and 2K of RAM.

     

    I'd love to build up to something like that for the TI. But first thing's first...

  24. Well, it's patch time again.

     

    Here's what made it into this release:

     

    Bintils

    Allow TI-style quotes ('example')

    Allow two-byte character constants for immediate expressions (li r0, 'ab')

    Fix a BFD Makefile bug which prevented clean compilation

     

    GCC

    Fix tms9900_output_ascii, was emitting invalid code when non-text characters were used

    Divide and modulus operations now merged when possible

    Fix data symbol declarations, now TI compliant

    Fix "+=4" form, was missing comma in emitted code

    Fix alignment of code, in some cases it was possible to misalign code by using odd-length string constants

    Fix stack frame load/save differences, was using different locations between function prologue and epilogue in some cases (Thanks Tursi!)

    Save return pointer at bottom of stack. This may help for later stack trace construction

    Add optimizations for compare-and-branch operations with 16-bit values against -2, -1, 0, 1, and 2.

     

    Right now I only have optimizations for equality tests with -2, -1, 1, and 2 done. To get inequality tests, I need to convince GCC to emit tests against the overflow flag. GCC has no concept of this kind of instruction, so I need to play with that a bit more.

     

    The other weakness is the divide and modulus instructions. I haven't been able to convince GCC to use convenient registers for the source and destination. This means that in some cases, I need to insert additional MOVs which really shouldn't be necessary. More playing around required here too, I suppose.

     

    I've addressed all the problems Tursi found earlier, plus a few others. Unfortunately, libiberty is not on that list. Since a lot of those routines are OS-specific, and since there is no POSIX-like interface for the TI, these functions are of limited use right now. In the future that might change. (hint, hint)

     

    So here's the build procedure for everything. I've made sure these have been tested several times. There should be no problems following them.

     

     

    Patching the original files:

    $ cd binutils-2.19.1

    $ patch -p1 < binutils-2.19.1-tms9900-1.1.patch

     

    $ cd gcc-4.4.0

    $ patch -p1 < gcc-4.4.0-tms9900-1.2.patch

     

    Building binutils

    $ ./configure --target tms9900 --prefix INSTALLDIR

    $ make all

    $ make install

     

    Building GCC

    $ ./configure --target=tms9900 --prefix=INSTALLDIR --enable-languages=c

    $ make all-gcc

    $ make install

     

    Notice that GCC uses equals after the options, while binutils does not. Kind of annoying and easy to mix up. At this point, you will have all the GNU compilation tools ready to use for TI work. The binary format is ELF, since that stores the extra data needed by the linker and other tools. In earlier posts I've attached code to convert from ELF to TI-cart format. I've also got prototype converters for EA5 and EA3 formats too, but I haven't tested them very much.

     

    When compiling with GCC, I recommend using the -O2 and/or -Os options to reduce the overall code size. Using the default options can result in extra wordy code with unnecessary or duplicate instructions.

     

    There's still quite a bit of work left to do for GCC, so there will be more patches coming. I need to fill out the missing math support for 32-bit values, make sure signed multiply and divide work, and the other stuff mentioned above. I especially want to add more optimizations to the compiled output, but that will come as I get more familiar with what instruction patterns GCC likes to use.

    binutils-2.19.1-tms9900-1.1-patch.tar.gz

    gcc-4.4.0-tms9900-1.2-patch.tar.gz

    • Like 1
×
×
  • Create New...