Jump to content


  • Content Count

  • Joined

  • Last visited

  • Days Won


Everything posted by insomnia

  1. This is actually a bit involved. Just for some quick background, there are several tools used to compile a binary file: The compiler (GCC), the assembler(GAS), the linker (LD), and a tool to convert the result to a format the TI can use (for now that's just ELF2EA5 or ELF2CART). The linker is responsible for defining the address for the code and can use a configuration file to do this. This allows you can assign locations for everything in the output file. There are a lot of options available, and GNU has a pretty good manual at http://ftp.gnu.org/o...ode/ld_toc.html There is a "hello world" example for carts back in post #64 of this thread. It doesn't use a link file and instead uses command-line arguments to define locations in the output file. Here's a link file which is equivalent to that example, let's call it linkfile.ld: SECTIONS{ . = 0x6000; .text : {*(.text);} . = 0x2000; .data : {*(.data);} .bss : {*(.bss);} } The dot symbol is the current output location. By modifying it, the locations of the sections will be defined. In this example, all .text sections are linked assuming they will be loaded starting at location 0x6000, the .data and .bss sections are loaded starting at location 0x2000. There are a lot of ways to write the link file to accomplish the same result. Check the ld manual for more details here. This link file would be used by GCC by using the "-T" switch, like this: tms9900-gcc -T linkfile.ld ctr0.o main.o -o hello.elf Now, the segments you were wondering about are commonly called overlays, and there are a lot of examples which can be found online. None of these will be exactly what you are looking for, but the GNU tools are used for everything, and there are common concepts which would probably be helpful. Here's an example link file using three overlays: SECTIONS { . = 0x2000; .data : { *(.data); } .bss : { *(.bss); } OVERLAY 0x6000 : AT (0x0000) { .otext1 { overlay1.o(.text); } .otext2 { overlay2.o(.text); } .otext3 { overlay3.o(.text); } } } This will cause all .text sections of the overlay files to be at address 0x6000, and they will appear sequentially in the output file. The .data and .bss sections will start at address 0x2000. Here are the sections defined in the output file: [email protected]:~/dev/tios/src/test_overlay$ tms9900-readelf -S hello.elf There are 11 section headers, starting at offset 0x650: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .bss NOBITS 00002000 000800 000006 00 WA 0 0 1 [ 2] .otext1 PROGBITS 00006000 000200 000006 00 AX 0 0 2 [ 3] .rela.otext1 RELA 00000000 0008ec 000000 0c 9 2 4 [ 4] .otext2 PROGBITS 00006000 000400 000006 00 AX 0 0 2 [ 5] .rela.otext2 RELA 00000000 0008ec 000000 0c 9 4 4 [ 6] .otext3 PROGBITS 00006000 000600 000006 00 AX 0 0 2 [ 7] .rela.otext3 RELA 00000000 0008ec 000000 0c 9 6 4 [ 8] .shstrtab STRTAB 00000000 000606 000047 00 0 0 1 [ 9] .symtab SYMTAB 00000000 000808 0000b0 10 10 5 4 [10] .strtab STRTAB 00000000 0008b8 000034 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings) I (info), L (link order), G (group), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific) An additional tool will then be required to split up the output file into pieces suitable for Classic99 or MESS or actual hardware or whatever. Unfortunately, no one has written a tool to do that yet. Finally, GCC can produce position independent code (by using the "-fpic" switch), but that feature is currently broken for the TMS9900. If it were to work, the code would be compiled so it could be loaded to and run from any address. This probably not what you want.
  2. Glad to hear that. If ever you do find a problem or see an opportunity for improvement, don't hesitate to let me know. I love bug reports!
  3. Matters Computational: Ideas, Algorithms, Source Code http://www.jjj.de/fxt/fxtbook.pdf I found a link to this file while looking through the Atari 2600 programming section. It's a Creative Commons licensed book written by Jorg Arndt giving a brief overview of a huge swath of numerical techniques and theory. This starts with a ton of bit manipulation tricks, then moves into sorting, data structures, graph analysis, and permutations. The author wades deep into number theory with plenty of examples. Each section builds on the one before, so readers shouldn't feel overwhelmed. There is corresponding C code for most topics discussed and an extensive set of references. It's about a thousand pages of math-y goodness, and serves as a great starting point for more in-depth research. There's a section titled "Multiplication of Hypercomplex Numbers", for crying out loud. what's not to love? Enjoy
  4. Hey everybody, I've got a new set of patches for GCC, with changelog and stuff below. But first, since I've been away for a long time, I figured I should answer some of the questions people have asked so far. Retroclouds wanted to know if http://ultra-embedded.com/?fat_filelib would compile. It does, but I had to use include files from my PC since there's no standard C library yet. The resulting code may also need to be massaged for size or be split up into smaller pieces. It's about 21 KB of code, 3 KB of data. total: [Nr] Name Type Size(Hex) Size(Dec) [ 1] .text PROGBITS 00005462 = 21602 [ 3] .data PROGBITS 00000030 = 48 [ 4] .bss NOBITS 00000CCA = 3274 Of course, this is missing some other stuff: puts printf strncpy strncmp memcpy memset On the subject of LLVM, I've looked into it and it looks interesting but don't plan on using it any time soon. The problem I see with it right now is that it's new and mainly oriented for Intel archetectures. This means that it's likely to have structural changes in the near future, and any TMS9900 port may be short-lived. It's also not clear how word-to-byte conversions would work, which has by far been the most challenging part of the GCC port. As has been said earlier, for the code sizes we're dealing with for the TI, any difference in compile speed would meaningless. So, on to the GCC changes: Libgcc now built for TMS9900 Implemented optimized assembly functions in libgcc for: count leading zero bits count trailing zero bits find index of least significant set bit count set bits calculate parity bit signed and unsigned 32-bit division and modulus Fixed 32-bit multiplies, was only doing unsigned multiply Fixed 32-bit negation, was emitting invalid NEG instruction Removed fake PC register (Yay!) New build instructions to make libgcc Fixed function prologue and epilogue, was saving R11 unnecessarily Optimized function epilogue, saving a few cycles Enforced correct use of R11 register, was causing randomly broken code The main two features here are the addition of libgcc and better handling of R11. Libgcc is needed to complete coverage for 32-bit math. The other functions are frequently used by third party libraries. Later, this will be the proper place to add support for floating-point operations. The handling of R11 deserves some discussion here. There were three places where we determined which registers to save on the stack, and they did not all use the same algorithm. This resulted in sometimes saving R11 when it wasn't necessary, or not accounting for that space when values are stored on the stack. This results in corrupt registers and likley crashes. Another problem I found (not my fault this time) was that if a function can take advantage of peephole optimization patterns, any register allocated to fulfill that pattern will not be marked as used in the GCC internal tables. In the case where R11 is selected to be used for data, and this condition hits, my port did not know it needed to save the return pointer. At that point you've got another crash when the function tries to return to the caller using a corrupted R11. All these problems can pop up based on seemingly insignificant code changes. It all depends on how that code gets implemented, and which registers get allocated along the way. It's also maddening to try to debug. This was fixed by designating R11 for return pointer use only. We lose a general-purpose register, which I'm not crazy about, but at least the generated code will run. If I can figure out what's going on with the peepholes, I should be able to recover R11 for general use again. Another biggish change is the build instructions. This was needed for libgcc, and apparently I've been doing it wrong all along. The GCC developers use these instructions, so that's what I'll use too. Unpack and patch as before, and from the top level of the source tree do this: $ mkdir build $ cd build $ ../configure --prefix <path_to_installation_dir> --target=tms9900 --enable-languages=c $ make all-gcc all-target-libgcc $ make install When compiling, GCC will know where to find libgcc and you don't need to do anything to link against it. If anyone's interested, the details for all this plus the sad, depressing and ultimately futile search for a standard C library are on my blog. As always, please post any problems or questions you may have. I'll actually be around to answer them this time. gcc-4.4.0-tms9900-1.7-patch.tar.gz
  5. Well, it's patch time again. First off, an aplogy to Lucien for not responding earlier, but the short answer for "how to use a single quote?" is "you can't". There was a bug in binutils which prevented its use. I tried to be clever and allow either TI-style or C-style strings in the assembly code, but did a terrible job of it. The parser always treated escaped single quotes as the end of the string, which causes some frustration, to put it mildly. Fortunately, that's all been fixed in the latest patch. So here's the official changelog for binutils: Fixed bug prohibiting the use of single quotes in a string Strings my be in either TI-style 'stuff' or C-style "stuff" TI-style strings follow E/A text rules C-style strings may include standard escape codes "example\n" And the other things fixed in GCC: Fixed comparison against +-1 and +-2, they got broken in 1.5 Prevented incorrect use of fake PC register for real work Improved AND operations to use fewer setup instructions Fixed incorrect long-to-char conversions Fixed post-increment pointers which live on the stack Added optimization for setting byte quantities to zero Added optimization for (int)X = (unsigned char)((int)X) Removed double-counting space for saved registers on the stack Reduced overhead needed for multiply instructions Fixed bug causing structures to be loaded into registers Structures used as function arguments now passed by reference Fixed more bugs causing bad int-to-char conversions Work has kept me pretty busy lately, but progress continues. As always, if anyone finds problems or has suggestions, please let me know binutils-2.19.1-tms9900-1.4-patch.tar.gz gcc-4.4.0-tms9900-1.6-patch.tar.gz
  6. The example was written assuming the values are in currently in registers, and doesn't use correct C syntax. The idea was to use shorthand and c-like pseudocode to get the idea across quickly. A real-life example would look something like this: void do_something() { char x; int y; ... y=((int)x)<<4; ... } You're right X must be a value between >00 and >FF, and if the X value is in memory, it need not occupy a full word. Once copied into a register (using MOVB or something) the value will be stored in the high byte. What you wrote would be true for unsigned values, but not for signed ones. >FFxx in a register can be interpreted as either (char)(-1) or (unsigned char)(255) >FFFF is (int)(-1) >00FF would be (int)(255) There are optimizations for both of these, but I only used an example for signed values since the timings are the same and only differ by the SRA or SRL instruction. The initial implementation would produce this code: * Assume x has a value of -4 (>FC), and is stored in r2 as >FCxx * y = (-4)<<4 = -4 * 16 = -64 = >FFC0 sra r2, 8 * Convert to signed integer (r2=FFFC) sla r2, 4 * Left shift converted value (r2=FFC0) The optimization emits this code: * Assume x has a value of -4 (>FC), and is stored in r2 as >FCxx * y = (-4)<<4 = -4 * 16 = -64 = >FFC0 sra r2, 4 * Shift into final position (r2=FFCx) andi r2, >FFF0 * Mask unknown bits (r2=FFC0) And finally, for unsigned values: * Assume x has a value of 252 (>FC), and is stored in r2 as >FCxx * y = 252<<4 = 252 * 16 = 4032 = >0FC0 srl r2, 4 * Shift into final position (r2=0FCx) andi r2, >FFF0 * Mask unknown bits (r2=0FC0) Since fewer bit shifts are required, the optimized code runs faster (I figure about 33% faster on average), but uses one additional code word.
  7. Now that the compiler has had a lot of fixes applied, I can replace some of the workarounds with the code which was originally intended, This makes for more compact and much easier to understand code. My intent was just to undo the damage which was forced upon this code by the problems in the compiler. I've also added a rush.dsk disk image (containing an EA5 file named RUSH) to make it easier to try the resulting program. Hopefully this will make lucien feel better about making the source code available. RUSH_HOUR4.zip
  8. Well, after a really long time without any signs of life from this project, it's patch time. A big "thank you" goes out to Lucien. A lot of the updates here are a direct result of the effort he put into making Rush Hour. He did a great job wading through all the brokenness to make a functional game. Now it's time for everyone to benefit from that work. New Binutils fixes in this release: STST was incorrectly looking for two arguments SBO, SBZ and TB incorrectly using constants EQU'ed symbols sometimes replaced using wrong endianness GCC fixes: Fixed several word-to-byte conversion errors Fixed "unrecognizable instruction" for zero comparison operations Made optimizations for most comparison operations Improved correctness of condition flag handling Switch statements now work properly Fixed divison and modulus, operands were used in wrong order Fixed subtract, operands were occasionally used in wrong order Fixed stack frame corruption when local variables are in use Added optimizations for forms like (int Y)=((int)(char X))<<N The patch and build procedures are the same as always. Development notes are on my blog for those who are interested. Things are shaping up pretty well so far. (Yes it is taking forever, sorry about that.) I don't see any obvious holes to fill, or optimizations yet to do. At this point, I just need to exercize the compiler with larger programs and increase test coverage. If anyone finds a problem, or sees an area where improvements can be made, please let me know. I'm continuing to work on related projects (disk management tool, libc library, documentation). There's still lots to do, so these updates will keep coming. binutils-2.19.1-tms9900-1.3-patch.tar.gz gcc-4.4.0-tms9900-1.5-patch.tar.gz
  9. I do my development against MESS on a Linux machine, using GAS as my assembler. All code is written in Gedit (which is an equivalent to Notepad). I started doing development without any idea what tools were out there, so I've written them all myself (assember, disassembler, disk tools, font editor, etc.). However, I replaced my simple assembler with GAS as soon as I could. I don't use any debugging tools at all, although that would make my life much easier. Since I didn't have anything close to the comprehensive documentation now available here, I've had to reverse engineer some cartridges, disk images and a few of Sometimes's demos to better understand how the TI works I have a real machine packed away, but for now I'm content doing everything in MESS. This is probably the most difficult way possible to develop TI code, but I thought I'd share.
  10. Great job! That's just too darn cool. Unfortunately, I see you've had to put in a lot of effort to work around problems in the compiler. I'm sorry about that. You've also pointed out how badly a library of basic functions is needed (display stuff, keyboard support, file IO, sound), but it may be a while before I have a chance to get to that. First thing's first. Let me see if I can get rid of the problems you've found so far.
  11. Actually, the code that you found was in the disassembler, which isn't much help. What you want is line 722 of binutils-2.19.1/gas/config/tc-tms9900.c Change that line to { "stst", 0x02C0, {ARG_REGISTER, ARG_NONE}}, and you are back in business. The error you were seeing was due to the fact that this instruction was falsely insisting upon a second argument for STST. Something like this would have made it happy: stst r0, >0000 It also seems like there is a problem with the SBO, SBZ and TB instructions. During the assembly process, the bit offsets are being reduced by half. They are currently using constants in the same way as JMP, which is wrong. I need to add a new constant type for these CRU instructions for correct operation. At least LDCR and STCR look right. I haven't had a chance to look at your other issues yet, but I should have some answers for you tomorrow.
  12. Gah! I loved that game when I was a kid. Maybe it's just the nostalgia talking, or the fact that it was one of only a handful of TI games I had, or maybe even the fact that I haven't seen it in almost thirty years, but I'm still convinced that Wizard's Dominion is super-cool. Take your "facts" and "informed opinion" and get outta here. I'll be standing over there with my giant run-on sentences, thinking about how awesome stuff I barely remember is...
  13. OK, it's patch time again. This patch includes the fixes for all bugs mentioned here since the last patch release in addition to a few I found on my own. The same patch and build directions used before are used for this one too. Here's the changes in this release: Fixed a bug with byte initializers, it was handling negative values wrongly Fixed multiply bug, it was using the wrong registers Changed frame pointer from R8 to R9. Frame was being lost Byte reads from memory were assumed to be copied into register's LSB. Fixed a problem with AND improperly modifying input values. Fixed a bug where R11 was not saved if used as a data register. Modified output to use hex values for all constants and addresses I've also packaged up an ELF to EA5 converter and an example program made to run as an EA5 image. The program does the same useless flashing text thing that the cart example did. This was done to make the differences easier to spot. The changes made to the EA5 crt0 are a bit safer than the one used in the cart version (this one better handles zero size sections). I'll probably release a new version of the cart tool and example sometime soon which incorporates these changes. The next thing on my list is to update all the documentation. Everything I've posted so far is still valid, but there are probably holes where some subjects need more description. I also need to put together a library for the missing 32- but functions (multiply, divide, modulus, shift). These functions are already written and tested for the most part, so releasing them should be quick and easy. Finally, I need to make my V9T9 disk management tool ready for public consumption. It currently works, and the disk images it creates were used to test the EA5 converter, but it's super hacky at the moment. Once I spruce it up a bit and turn it into a useful tool, I can send it out the door. As always, the gory details are on my blog for those who are interested. gcc-4.4.0-tms9900-1.4-patch.tar.gz elf2ea5.tar.gz hello_ea5.tar.gz
  14. Well, that's funky. A few things you should know: By using inline assembly, to call "kscan" you are preventing GCC from knowing that it has to retain the "key" pointer stored in R1 and return pointer in R11. If "kscan" were to use these registers, the values stored there would be destroyed, and the C code would behave weirdly. That can be fixed by doing something like this: extern void kscan(); int key_scan(char* key) { KEY_UNIT=5; kscan(); *key=KEY; char c=STATUS&0x20; return c; } That results in this assembly: def key_scan key_scan ai r10, >FFFC mov r11, *r10 mov r9, @2(r10) mov r1, r9 li r1, >500 movb r1, @-31884 bl @kscan movb @-31883, *r9 movb @-31876, r1 andi r1, >2000 sra r1, 8 mov *r10+, r11 mov *r10+, r9 b *r11 ref kscan The different results you are seeing in the return values are due to GCC removing a sign extension on the read value of the STATUS macro. During the register allocation step, GCC is trying to be helpful and removing what it thinks is an unnecessary operation, since it believes that byte quantities are already stored in the least significant byte of the register. I missed this when I was fixing things for the weird byte format. By using the temporary variable, you are forcing the change in data types. I need to go take a hard look at the compiler and find a fix for this. (The decimal addresses are also annoying and unhelpful. That increment in "mov *r10+, r9" should probably be removed too. Looks like I'll be busy for a while.) By the way, I think TI chose to store bytes this way to allow the comparison logic to work for word and byte values. This would also remove the need for sign extend and zero extend logic. It also opens up the possibility of storing data in the otherwise unused low byte. I don't have anything to back up this reasoning, but it seems to make sense. In any event, no matter why this decision was made, we're stuck with it now.
  15. Ah, man you're gonna make me cry over here. I'd suggest that one could make buggy, hole-ridden code in any language. And in my experience, most crappy code is born of lazy or rushed programmers. It's so much simpler and faster to slap on a band-aid than to comprehensively understand how every line of code works. This is especially true for larger projects (operating systems come to mind). In general, testing effort and risk for bugs rise exponentially with code size. This would be true for any language. OK, if I'm being fair, C does have a higher bugs-per-KLOC ratio than many other languages, but I think that's a reasonable trade for the flexibility it provides. I wouldn't count on any language to magically make programming simple and foolproof. The design of any language involves compromises. That being said, Forth is pretty cool, and it's amazing how much functionality you can get out of such a simple idea. I would honestly love to see an operating system written in Forth. Writing a game of this complexity is pretty darn impressive too. Tons of talent and dedication are required to pull something like that off.
  16. OK, the multiply bug was more involved than the earlier ones, but here you go: In gcc-4.4.0/gcc/config/tms9900/tms9900.md. remove lines 1455 through 1484 (the "mulhisi3", and "*multhisi" patterns). Replace them with this: (define_insn "mulhisi3" [(set (match_operand:SI 0 "register_operand" "=r,r") (mult:SI (match_operand:HI 1 "register_operand" "r,r") (match_operand:HI 2 "general_operand" "rR>,Q")))] "" { /* When both input operands are registers, we may need to swap them. */ if(REG_P(operands[1]) && REG_P(operands[2])) { /* Check for forms like: r0 = r1 * r0 */ if(REGNO(operands[0]) == REGNO(operands[2])) { /* Swap operands, otherwise we will emit code like: mov r1, r0 mpy r0, r0 instead of: mpy r1, r0 */ rtx temp = operands[1]; operands[1] = operands[2]; operands[2] = temp; } } if(REGNO(operands[0]) != REGNO(operands[1])) { output_asm_insn("mov %1, %0", operands); } output_asm_insn("mpy %2, %0", operands); return(""); } [(set_attr "length" "2,3")]) The original code was an attempt to force the register allocator to use registers which were most convenient for the MPY instruction. Obviously,that didn't work out so well. This new code is less aggressive, accepting any register choice GCC may make. It also works in all optimization levels. An optional MOV instruction is now used to prepare for the multiply if the register allocator is not kind. Moving on to questions... GCC will attempt to make maximum use of all 16 registers, so there's no guarantee that there any lying around unused. If you have an assembly routine you would like to interface with C code, the information needed for that should be shown in earlier posts. If you would like more detailed information (calling convention, register usage and allocation order, etc) I'd be happy to let you know. I have a document I've been neglecting which should include all this stuff. GCC (or any compiler) is basically an engine to convert source code into assembly. That means you have complete control over what memory is used, and for what purpose. So unlike Basic, Java or Forth (I presume), there is no other code working behind the scenes you need to be aware of. All of the machine's resources are available to you, and any other limitations are of your own making. In the example code I've posted earlier, the registers are located at >8300 by the ctr0 code. Except for what's used by the registers, all of scratchpad memory is available. If you wanted to, you could put the registers elsewhere in scratchpad or 8-bit memory with no impact on the C code. You can store data anywhere in the system you like (like Lucien did in his bricks code). Using the linker, you can build your code to run from anywhere in memory. You could even put (small bits of) code into scratchpad and run from there for that extra performace boost (as I believe was done in Parsec). Remember, C was originally designed to write device drivers and operating systems, so the sky's the limit here.
  17. Thanks for the bug report, and keep 'em coming. It is embarrassing have my mistakes up where everyone can see them, but each bug found and fixed makes for a more capable tool for everyone. So again, thanks. I made an optimization pattern to convert a sequence like: li r2, >AA00 movb r2, *r1 li r2, >BB00 movb r2, *r1 to one like this: li r1, >AABB movb r2, *r1 swpb r2 movb r2, *r1 but I forgot to mask off the sign-extended bits of the constant in the lower byte. As a result, when the two values are ORed together, the byte stored in the upper byte is lost. >FFBB | >AA00 = >FFBB He also found another problem using -O0, which I don't usually use. The register I chose for the frame pointer, R8, is volatile. That means that it can be destroyed over a function call. The frame pointer is used as the base to locate local variables which live on the stack. If this is destroyed after a function call, the code following that call can behave unpredictably. This was just a dumb mistake. In order to preserve the ABI interface, I've moved the frame pointer to R9, which is preserved across function calls. The resulting code looks much safer now. I'll include these fixes in the next patch, but for the impatient here's how to fix these problems now: Change gcc-4.4.0/gcc/config/tms9900.md, line 2423 (near "*movhi_combine_consts") to look like this: operands[1] = GEN_INT(((INTVAL(operands[1]) & 0xFF) << | (INTVAL(operands[3]) & 0xFF)); Change gcc-4.4.0/gcc/config/tms9900.h, line 517 to look like this: #define FRAME_POINTER_REGNUM HARD_R9_REGNUM And gcc-4.4.0/gcc/config/tms9900.h, line 523 to look like this: #define STATIC_CHAIN_REGNUM HARD_R9_REGNUM By the way, I think there's a problem where the stack frame is improperly sized when using -O0. This seems to be the result of the order of operations in GCC. The size of the frame is not yet known when the code to build the function prologue is called. That's on my todo list.
  18. lucien, I use Linux for all my development and testing, and I don't have Cygwin installed on my Windows box, but I'll see what I can do for you right now. From your logs, it looks like you are building from: /gcc/lib/gcc/tms9900/4.4.0/ Seems an odd location.. I also see lines referencing /home/-/gcc-4.4.0/host-i686-pc-cygwin/gcc/xgcc, which seems like you are using user "-", which doesn't seem right either. These things might be messing up the build or install process if you don't have write or execute permissions in these directories for some reason. But it looks like GCC was built properly, and the GCC docs we built as well. You can confirm that by looking for a file named cc1 at: /gcc/lib/gcc/tms9900/4.4.0/host-i686-pc-cygwin/gcc/cc1 You can check to see that it works by going to that directory and doing this: $ echo "void test() {}" > test.c $ ./cc1 test.c test Analyzing compilation unit Performing interprocedural optimizations <visibility> <early_local_cleanups> <summary generate> <inline>Assembling functions: test Execution times (seconds) parser : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 (100%) wall 34 kB ( 7%) ggc TOTAL : 0.00 0.00 0.01 498 kB $cat test.s pseg even def test test ai r10, >0 mov r10, r8 b *r11 If cc1 is working, you may be having an installation problem, I'd recommend running "make distclean" and re-running the build process again (configure, make-all, make install) If it helps, I've included a copy of the output on my machine for these steps. For these logs, I've followed the directions posted above in a new directory. I'm using /home/eric/dev/tios/toolchain/WORKSPACE/temp/gcc-4.4.0 for my build location, and the /home/eric/dev/tios/toolchain/WORKSPACE/temp/bin for my install location. If you're still having problems, make similar copies of your output for these steps and send it my way. gcc_configure.txt gcc_build.txt gcc_install.txt
  19. Update time! It's about six months later than promised, but I haven't given up yet. Most of that time has been putting in a ton of hours at work and beating on the GCC code to get byte operations working properly. What's in this release is the fourth or fifth overhaul of the port. In the end I had to rewrite core bits of how GCC relates byte and word quantities. I've kept those changes to a minimum, so ports to later versions should still work. Here's what got changed in this patch: Add optimization to remove redundant moves in int-to-char casts Remove invalid CB compare immediate mode. Add optimizations for byte immediate comparison Added optimizations for shift and cast forms like (byte)X=(int)X>>N Remove invalid compare immediate with memory Improved support for subtract immediate Fixed bug causing gibberish in assembly output GCC now recognizes that bit shift operations set the comparison flags Fixed bug causing bytewise AND to operate on the wrong byte Add optimization for loading byte arrays into memory Confirmed that variadic functions work properly. Fixed the subtract instruction to handle constants Fixed the CI instruction, it was allowing memory operands Fixed a bug allowing the fake PC register to be used as a real register Encourage memory-to-memory copies instead of mem-reg-mem copies Added optimization to eliminate INV-INV-SZC sequences Modify GCC's register allocation engine to handle TMS9900 byte values Remove the 32 fake 8-bit registers. GCC now uses 16 16-bit registers Modify memory addressing to handle forms like @LABEL+CONSTANT(Rn) Clean up output assembly by vertically aligning operands Clean up output by combining constant expressions Optimize left shift byte quantities Fixed a bug where SZC used the wrong register Removed C instruction for "+=4" forms, AI is twice as fast Added 32-bit negate Fixed 32-bit subtract Fixed a bug causing MUL to use the wrong register Fixed a bug allowing shifts to use shift counts in the wrong register Confirmed that inline assembly works correctly Added optimization to convert "ANDI Rn, >00FF" to "SB Rn,Rn" Optimize compare-with-zero instructions by using a temp register Fixed a bug allowing *Rn and *Rn+ memory modes to be confused Removed most warnings from the build process There were also changes made to binutils, I hope this will be the last update for this. More meaningful error messages from the assembler DATA and BYTE constructs with no value did not allocate space Fix core dump in tms9900-objdump during disassembly The ELF conversion utility was also updated to allow crt0 to properly set memory before the C code executes. If it finds a "_init_data" label in the ELF file, it will fill out a record with all the information crt0 needs to do the initialization. In light of all these changes, I've made a new "hello world" program with lots of comments, a Makefile and all supporting files. I've also included the compiled .o, .elf, and converted cart image. In addition, there's also a hello.s file which is the assembly output from the compiler. I'm not sure if I mentioned this earlier, but the tms9900-as assembler will accept TI-syntax assembly files, but there are a number of additions: Added "or", "orb" aliases for "soc" and "socb" (that's been a gotcha for a several people here) Added "textz" directive - This appends a zero byte to the data. "textz '1234'" is equivalent to "byte >31, >32, >33, >34, 0" Added "ntext" directive - This prepends the byte count to the data. "ntext '1234'" is equivalent to "byte 4, >31, >32, >33, >34" Added "string" variants to all "text" directives No length limit for label names No limitation for constant calculations, all operations are allowed (xor, and, or, shifts, etc.) It think thats about enough for now I believe this is the biggest jump in usefulness yet. I've gone through and tested every instruction, and written several tests programs which did semi-interesting things from the compiler's point of view. They were, however, exceptionally dull from a user's point of view. For all the blow-by-blow details, check out my blog. As a final test of the byte handling code, I built that chess program posted back in December. No problems were seen and no hinky-looking code was generated. In addition, it was about 5% smaller. The build instructions are listed in post #43, and haven't changed since. So, let me know what you think, gcc-4.4.0-tms9900-1.3-patch.tar.gz binutils-2.19.1-tms9900-1.2-patch.tar.gz elf2cart.tar.gz hello.tar.gz
  20. That chess program was so insane, I needed to see what the compiler would do with it. Aside from a few unexpected fake register accesses (grr! fixed by hand), and failing to handle the giant lookup table (not suprising) it was pretty much drama-free. The resulting code with -O2 optimizations: 1463 lines of assembly. Are there errors? Could be... But I have no motivation to compare against that C code. At first glance, it looks right. Here's the readelf dump of the resulting object file: [email protected]:~/dev/tios/src/temp$ tms9900-readelf -S 2k_chess.o There are 8 section headers, starting at offset 0xefc: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .text PROGBITS 00000000 000034 000e54 00 AX 0 0 2 [ 2] .rela.text RELA 00000000 001e70 000da4 0c 6 1 4 [ 3] .data PROGBITS 00000000 000e88 000041 00 WA 0 0 2 [ 4] .bss NOBITS 00000000 000ec9 000cb2 00 WA 0 0 1 [ 5] .shstrtab STRTAB 00000000 000ec9 000031 00 0 0 1 [ 6] .symtab SYMTAB 00000000 00103c 000b00 10 7 145 4 [ 7] .strtab STRTAB 00000000 001b3c 000333 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings) I (info), L (link order), G (group), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific) So that's 3668 bytes of code (.text section), and 3315 bytes of data (.data + .bss sections). Of course that data size is shy about 128 MB or so. Still, I'm really impressed with how things are shaping up so far.
  21. I see you've been busy with your own projects, so don't feel bad. Well, one of this things I've noticed is that unless you compile with the -O2 optimization, the resulting code is terrible and excessively wordy. I think your example code shows that off pretty well. I've compiled your sample with my in-development compiler, and with the recent changes, it looks a lot better: (comments by me of course) [email protected]:~/dev/tios/src/temp$ /home/eric/dev/tios/toolchain/gcc-4.4.0/host-i686-pc-linux-gnu/gcc/cc1 tursi2.c pseg even **************************** * void strcpy(char *d, char *s) * R1 = *d * R2 = *s **************************** def strcpy strcpy ai r10, >FFFC * Allocate four bytes from the stack (s and d) mov r10, r8 * Initialize the frame pointer mov r1, *r8 * Save s on the stack mov r2, @2(r8) * Save d on the stack jmp L2 * Jump to bottom of copy loop L3 mov @2(r8), r1 * Copy source address to R1 movb *r1, r2 * Copy current character to R2 mov *r8, r1 * Copy destination address to R1 movb r2, *r1 * Copy current character to destination inc *r8 * Increment destination address inc @2(r8) * Increment source address L2 mov @2(r8), r1 * Copy source addresss to R1 movb *r1, r1 * Copy current character to R1 movb r1, r1 * Compare with zero, is this the terminator? jne L3 * If not, jump to top of loop * Else clean up and exit c *r10+, *r10+ * Free stack space b *r11 * Return to caller LC0 text 'hello world' byte 0 even def main main ai r10, >FFDE * Allocate 34 bytes from stack (a and return pointer) mov r11, *r10 * Save return pointer mov r10, r8 * Initialize frame pointer mov r8, r1 * First step for setting destination address for strcpy inct r1 * Final step for destination address (d=frame+2) li r2, LC0 * Set source address for strcpy bl @strcpy * Call strcpy clr r1 * Set return value for main mov *r10+, r11 * Restore return pointer ai r10, >20 * Free stack space b *r11 * Return to caller It looks like all your problems have been fixed by the in-development code. Horray! In fact, GCC does assume that there is a crt0 or some other launcher to set the initial stack pointer, initialize memory regions, set workspace location, set interrupts, etc. That code would most likely be written in assembly. I have an implementation I'd share with everyone, but it's not very exciting. Here's the same code with some optimizations applied, notice that it makes much better code. [email protected]:~/dev/tios/src/temp$ /home/eric/dev/tios/toolchain/gcc-4.4.0/host-i686-pc-linux-gnu/gcc/cc1 -O2 -Os tursi2.c The strcpy has been inlined in main, but a non-inlined version was left intact: pseg * Put code in program segment even * Start on even address **************************** * void strcpy(char *d, char *s) * R1 = *d * R2 = *s **************************** def strcpy strcpy jmp L2 L3 movb r3, *r1+ * Copy current character to destination, increment destination address inc r2 * Increment source address L2 movb *r2, r3 * Get next source character, is it the zero terminator? jne L3 * If not, go back to top of loop b *r11 * Return to caller ***************************** * Constant string used by main() LC0 text 'hello world' byte 0 even ***************************** * Entry point of program def main main ai r10, >FFE0 * Allocate 32 bytes of stack li r1, LC0 * Find source address mov r10, r8 * Find destination address (bottom of stack) ******** * Inlined strcpy() jmp L7 L8 movb r2, *r8+ * Copy current character to destination, increment destination address inc r1 * Increment source address L7 movb *r1, r2 * Get next source character, is it the zero terminator? jne L8 * If not, go back to top of copy loop ******** clr r1 * Set return value ai r10, >20 * Free stack space b *r11 * Return to caller Granted, the strcpy code could be better. More like this: **************************** * void strcpy(char *d, char *s) * R1 = *d * R2 = *s **************************** def strcpy strcpy L1 movb *r2+, r3 * Get next source character, increment source address movb r3, *r1+ * Copy current character to destination, increment destination address jne L1 * Was the copied character the zero terminator? b *r11 * Return to caller This is four bytes smaller, and I think the smallest implementation possible. But the GCC version isn't too far off. I think that's probably enough assembly for this post... A lot of the confusing code you are seeing is a result of the default compile options (which usually result in horribly ugly code). And all your errors have been fixed by some recent changes I've made in the development build. For everyone who may be wondering what I've been up to,here's a quick update: I've made some changes to the register allocation to make sure that the byte operations work properly. That seems to work nicely. I'm also trying to redo the division instructions to make it a bit more elegant. The current code requires extra moves in some cases. Ick. Most notably, I've fixed how memory accesses work. The code above show the results of that. I think that after the division stuff is wrapped up, it might be time for another release.
  22. Thanks everyone, I appreciate the support! I've done assembly optimization before on a lot of machines (x86, ARM, POWERPC), but this is the first time I've done anything with the TMS9900. I got a Mini Memory cart at the tail end of my TI days, but was never able to make much use of it. Actually, I don't even have any of my TI stuff here with me, it's all in a box somewhere in my parent's basement. All the assembly info I have was cobbled together by trawling through forums, Google searches, and the odd technical paper here or there. Of course, after I had everything I needed to start TI work again, somebody posted it all together in a much more concise and convenient form here. (Grr!) At any rate, even though I've written three (crappy, severely limited) compilers before, and I've used GCC before, this is the first time I've done any actual GCC development. The earlier experience helps some, but mostly, its been a lot of reading and trial-and-error to get here. I spent a few months writing a lot of libraries in 99000 assembly to get used to the language. And now that I've got a handle on it, I've been reviewing the GCC output and trying to implement any improvements I can find. Pretty much everything I've done so far has been posted on my blog. The only real stuff missing was the initial data-gathering part, and most of the research, which I can't imagine anyone would want to read anyway. Short version: the first time I've seen any of this stuff was less than two years ago. (Wow, that seems like a long time now that I've written that down) So this project is a whole pile of firsts for me. It's taking quite a bit longer than I originally thought, but things seem to be going well.
  23. Well, I haven't been very productive lately, what with Christmas and all, but here's what I've been up to: I'm still working on libc support, and have a few functions done: strchr, strpbrk, memchr, strcmp, strrchr memcmp, strcpy, strspn, memcpy, strcspn strstr, memmove, strlen, strtok, memset strncat, strncmp, strcat, strncpy Here's the GCC features done since the last release: Add optimisation to remove redundant moves in int-to-char casts Remove invalid CB compare immediate instruction Add optimisation to restore byte immediate comparison Added optimisations for forms like (byte)X=(int)X>>N Remove invalid compare immediate with memory Improved support for subtract immediate Fixed bug causing gibberish in assembly output GCC recognises that bit shift operations set the comparison flags I've also improved the error handling in the GAS assembler, so the messages are more helpful Stuff I'm working on at the moment: Make sure that the low byte of registers are not accessed directly. This happens because I'm lying to GCC. I told it that the TMS9900 actually has 32 8-bit registers, but can only use the even-numbered ones for byte values. Sometimes it gets confused, and tries to use the fake, odd-numbered ones instead. I think I have a fix, but I'm still testing. I recently tried -O1 optimisation to see what would happen, and the compiler crashed. I'm still trying to figure out why. I'm planning on doing more libc work, testing, and more optimisations. With any luck, there will be enough changes by January to make another GCC release worthwhile.
  24. Thanks, I've always enjoyed reading other peoples project blogs, and I was hoping someone would find mine interesting. For a long time I've made it a point to keep a record of what was done for each day of a project. That makes it easier to review past progress and see if there was some feature or design idea that got overlooked or backed out for some reason. Additionally for this project, there have been periods of a few weeks or so where it had to be put on the back burner. Having a record of what state things were left in is really helpful to get back on track after being away for a while. Probably not. It might be fun to try and squeeze one in there though. By the way, if no one's run across this, check out the Contiki Operating System. This is a multitasking OS with a full IP stack and windowing system, built into 40K of ROM and 2K of RAM. I'd love to build up to something like that for the TI. But first thing's first...
  25. Well, it's patch time again. Here's what made it into this release: Bintils Allow TI-style quotes ('example') Allow two-byte character constants for immediate expressions (li r0, 'ab') Fix a BFD Makefile bug which prevented clean compilation GCC Fix tms9900_output_ascii, was emitting invalid code when non-text characters were used Divide and modulus operations now merged when possible Fix data symbol declarations, now TI compliant Fix "+=4" form, was missing comma in emitted code Fix alignment of code, in some cases it was possible to misalign code by using odd-length string constants Fix stack frame load/save differences, was using different locations between function prologue and epilogue in some cases (Thanks Tursi!) Save return pointer at bottom of stack. This may help for later stack trace construction Add optimizations for compare-and-branch operations with 16-bit values against -2, -1, 0, 1, and 2. Right now I only have optimizations for equality tests with -2, -1, 1, and 2 done. To get inequality tests, I need to convince GCC to emit tests against the overflow flag. GCC has no concept of this kind of instruction, so I need to play with that a bit more. The other weakness is the divide and modulus instructions. I haven't been able to convince GCC to use convenient registers for the source and destination. This means that in some cases, I need to insert additional MOVs which really shouldn't be necessary. More playing around required here too, I suppose. I've addressed all the problems Tursi found earlier, plus a few others. Unfortunately, libiberty is not on that list. Since a lot of those routines are OS-specific, and since there is no POSIX-like interface for the TI, these functions are of limited use right now. In the future that might change. (hint, hint) So here's the build procedure for everything. I've made sure these have been tested several times. There should be no problems following them. Patching the original files: $ cd binutils-2.19.1 $ patch -p1 < binutils-2.19.1-tms9900-1.1.patch $ cd gcc-4.4.0 $ patch -p1 < gcc-4.4.0-tms9900-1.2.patch Building binutils $ ./configure --target tms9900 --prefix INSTALLDIR $ make all $ make install Building GCC $ ./configure --target=tms9900 --prefix=INSTALLDIR --enable-languages=c $ make all-gcc $ make install Notice that GCC uses equals after the options, while binutils does not. Kind of annoying and easy to mix up. At this point, you will have all the GNU compilation tools ready to use for TI work. The binary format is ELF, since that stores the extra data needed by the linker and other tools. In earlier posts I've attached code to convert from ELF to TI-cart format. I've also got prototype converters for EA5 and EA3 formats too, but I haven't tested them very much. When compiling with GCC, I recommend using the -O2 and/or -Os options to reduce the overall code size. Using the default options can result in extra wordy code with unnecessary or duplicate instructions. There's still quite a bit of work left to do for GCC, so there will be more patches coming. I need to fill out the missing math support for 32-bit values, make sure signed multiply and divide work, and the other stuff mentioned above. I especially want to add more optimizations to the compiled output, but that will come as I get more familiar with what instruction patterns GCC likes to use. binutils-2.19.1-tms9900-1.1-patch.tar.gz gcc-4.4.0-tms9900-1.2-patch.tar.gz
  • Create New...