Jump to content

Photo

GCC for the TI


310 replies to this topic

#51 Tursi OFFLINE  

Tursi

    River Patroller

  • 4,746 posts
  • HarmlessLion
  • Location:BUR

Posted Tue Jan 18, 2011 8:25 PM

Ran through the install steps today -- the main hitch was replacing "INSTALLDIR" with an actual folder, which I did as so:

export INSTALLDIR=/usr/bin/

then using $INSTALLDIR in the configure commands.

The make install for GGC failed on me, giving this:

make[3]: Entering directory `/cygdrive/c/work/TI/gcc/gcc-4.4.0/host-i686-pc-cygwin/libiberty/testsuite'
make[3]: Nothing to be done for `install'.
make[3]: Leaving directory `/cygdrive/c/work/TI/gcc/gcc-4.4.0/host-i686-pc-cygwin/libiberty/testsuite'
make[2]: Leaving directory `/cygdrive/c/work/TI/gcc/gcc-4.4.0/host-i686-pc-cygwin/libiberty'
/bin/sh: line 3: cd: tms9900/libssp: No such file or directory
make[1]: *** [install-target-libssp] Error 1
make[1]: Leaving directory `/cygdrive/c/work/TI/gcc/gcc-4.4.0'
make: *** [install] Error 2

As you can see, I'm using cygwin. I have to admit I'm not sure where the files were installed. I found some new files under /usr/bin/tms9900/, but I also found more reasonable looking files under /bin/bin with the tms9900 prefix. I ran with those. ;)

I started with a simple test app:

void strcpy(char *d, char *s) {
  while (*s) {
    *(d++) = *(s++);
  }
}

int main() {
  char a[32];
  strcpy(a,"hello world");
  return 0;
}

I compiled first with the -S option so I could watch it -- except for complaining about my strcpy (hehe - normal gcc behaviour), it compiled okay. The resulting assembly looked like this:

pseg
	even	

	def	strcpy
strcpy
	ai r10, >FFFC    make 4 bytes on stack (s and d)
	mov r10, r8      frame pointer again? Not sure.
	mov r1, *r8      store d pointer on stack at first
	mov r2, @2(r8)   store s pointer on stack at second
	jmp L2           jump into loop
L3
	movb @2(r8), r2  get s pointer into R2 <-- ERROR - later code assumes that we got the byte pointed to by s
	mov *r8, r1      get d pointer into R1
	movb r2, *r1     copy high byte of s pointer to address pointed to by d <-- ERROR related to above, OR could use movb *r2,*r1
	inc *r8          increment d pointer
	inc @2(r8)       increment s pointer
L2
	movb @2(r8), r1  get s pointer into r1 <--- ERROR - should get data pointed to by 's'
	movb r1, r1      test byte value for zero
	jne L3           if not zero, then branch
	c *r10+, *r10+   trick - add four bytes to clear the stack
	b *r11           return to caller
LC0
	text 'hello world'
	byte 0
	even	

	def	main
main
	ai r10, >FFDE  make room for 34 bytes on the stack (32 bytes for 'a' plus 2 for return address)
	mov r11, *r10  save return address
	mov r10, r8    save frame pointer?
	mov r8, r1     temporary pointer
	inct r1        skip over return address
	li r2, LC0     address of constant string
	bl @strcpy     branch to subroutine 
	clr r1         zero return value
	mov *r10+, r11 pop return address off stack
	ai r10, >20    clear 'a' from stack
	b *r11         return to caller

I guess there's a startup block that I don't see here (crt?), so I assume R10 is initialized on entry to main. To that end, it's almost there!

There is one issue involved in getting the data referenced by pointers -- and this is actually a problem I remember facing on my port as well, GCC doesn't mesh with the TI address modes very well in this particular case. When trying to get the data referenced by a pointer on the stack, you need two operations, as it's not possible to do the double indirection otherwise.

Sorry I have taken so long to chip in -- hope there's enough detail here to be helpful! I also caught up on your blog and I see you've been tackling some other nasty issues! Sadly the best I can do for now is cheerleading, but you definately have my encouragement!

#52 insomnia OFFLINE  

insomnia

    Star Raider

  • Topic Starter
  • 71 posts
  • Location:Pittsburgh, PA

Posted Tue Jan 18, 2011 11:28 PM

I see you've been busy with your own projects, so don't feel bad.

Well, one of this things I've noticed is that unless you compile with the -O2 optimization, the resulting code is terrible and excessively wordy. I think your example code shows that off pretty well.

I've compiled your sample with my in-development compiler, and with the recent changes, it looks a lot better: (comments by me of course)

eric@compaq:~/dev/tios/src/temp$ /home/eric/dev/tios/toolchain/gcc-4.4.0/host-i686-pc-linux-gnu/gcc/cc1 tursi2.c

	pseg
	even	

****************************
* void strcpy(char *d, char *s)
*    R1 = *d
*    R2 = *s
****************************
	def	strcpy
strcpy
	ai r10, >FFFC     * Allocate four bytes from the stack (s and d)
	mov r10, r8       * Initialize the frame pointer
	mov r1, *r8       * Save s on the stack
	mov r2, @2(r8)    * Save d on the stack
	jmp L2            * Jump to bottom of copy loop
L3
	mov @2(r8), r1    * Copy source address to R1
	movb *r1, r2      * Copy current character to R2
	mov *r8, r1       * Copy destination address to R1
	movb r2, *r1      * Copy current character to destination
	inc *r8           * Increment destination address
	inc @2(r8)        * Increment source address
L2
	mov @2(r8), r1    * Copy source addresss to R1
	movb *r1, r1      * Copy current character to R1
	movb r1, r1       * Compare with zero, is this the terminator?
	jne L3            * If not, jump to top of loop
                          * Else clean up and exit
	c *r10+, *r10+    * Free stack space 
	b *r11            * Return to caller

LC0
	text 'hello world'
	byte 0
	even	

	def	main
main
	ai r10, >FFDE    * Allocate 34 bytes from stack (a and return pointer)
	mov r11, *r10    * Save return pointer
	mov r10, r8      * Initialize frame pointer
	mov r8, r1       * First step for setting destination address for strcpy
	inct r1          * Final step for destination address (d=frame+2)
	li r2, LC0       * Set source address for strcpy
	bl @strcpy       * Call strcpy
	clr r1           * Set return value for main
	mov *r10+, r11   * Restore return pointer
	ai r10, >20      * Free stack space
	b *r11           * Return to caller


It looks like all your problems have been fixed by the in-development code. Horray!

In fact, GCC does assume that there is a crt0 or some other launcher to set the initial stack pointer, initialize memory regions, set workspace location, set interrupts, etc. That code would most likely be written in assembly. I have an implementation I'd share with everyone, but it's not very exciting.

Here's the same code with some optimizations applied, notice that it makes much better code.

eric@compaq:~/dev/tios/src/temp$ /home/eric/dev/tios/toolchain/gcc-4.4.0/host-i686-pc-linux-gnu/gcc/cc1 -O2 -Os tursi2.c

The strcpy has been inlined in main, but a non-inlined version was left intact:

	pseg              * Put code in program segment
	even	          * Start on even address

****************************
* void strcpy(char *d, char *s)
*    R1 = *d
*    R2 = *s
****************************
	def	strcpy
strcpy
	jmp L2
L3
	movb r3, *r1+   * Copy current character to destination, increment destination address
	inc r2          * Increment source address
L2
	movb *r2, r3    * Get next source character, is it the zero terminator?
	jne L3          * If not, go back to top of loop
	b *r11          * Return to caller


*****************************
* Constant string used by main()
LC0
	text 'hello world'
	byte 0
	even	

*****************************
* Entry point of program

	def	main
main
	ai r10, >FFE0      * Allocate 32 bytes of stack
	li r1, LC0         * Find source address
	mov r10, r8        * Find destination address (bottom of stack)

********
* Inlined strcpy()
	jmp L7             
L8
	movb r2, *r8+      * Copy current character to destination, increment destination address
	inc r1             * Increment source address
L7
	movb *r1, r2       * Get next source character, is it the zero terminator?
	jne L8             * If not, go back to top of copy loop
********

	clr r1             * Set return value
	ai r10, >20        * Free stack space
	b *r11             * Return to caller


Granted, the strcpy code could be better. More like this:

****************************
* void strcpy(char *d, char *s)
*    R1 = *d
*    R2 = *s
****************************
	def	strcpy
strcpy
L1
        movb *r2+, r3   * Get next source character, increment source address
	movb r3, *r1+   * Copy current character to destination, increment destination address
        jne L1          * Was the copied character the zero terminator?
	b *r11          * Return to caller

This is four bytes smaller, and I think the smallest implementation possible. But the GCC version isn't too far off.

I think that's probably enough assembly for this post...

A lot of the confusing code you are seeing is a result of the default compile options (which usually result in horribly ugly code). And all your errors have been fixed by some recent changes I've made in the development build.

For everyone who may be wondering what I've been up to,here's a quick update:

I've made some changes to the register allocation to make sure that the byte operations work properly. That seems to work nicely. I'm also trying to redo the division instructions to make it a bit more elegant. The current code requires extra moves in some cases. Ick.

Most notably, I've fixed how memory accesses work. The code above show the results of that.

I think that after the division stuff is wrapped up, it might be time for another release.

#53 Tursi OFFLINE  

Tursi

    River Patroller

  • 4,746 posts
  • HarmlessLion
  • Location:BUR

Posted Wed Jan 19, 2011 4:08 AM

Oh, I'm not confused by any of the code (except that I tried to build mine not to use the frame pointer), I'm pretty familiar with the differences between optimized and non-optimized code. That's why my only real critiques were meant to be focused on incorrect code. It does look like your development code there will produce a running output! An occasional extra move here and there is not so bad, and sometimes may be necessary for generic code. The 9900 doesn't really lend itself all that well to GCC, but your code seems to be doing a wonderful job. For timing critical code, we can always go to asm.

As for the startup code, doesn't need to be elegant. LI R10,>2000 (or wherever) and B @MAIN looks like it'd be enough. ;)

Looking forward to the next release! I have some serious stuff to try through it if we get the little stuff going.

Edited by Tursi, Wed Jan 19, 2011 4:11 AM.


#54 Opry99er OFFLINE  

Opry99er

    Quadrunner

  • 8,246 posts
  • Location:Cookeville, TN

Posted Wed Jan 19, 2011 10:00 AM

This is great!! =) I don't understand it 100%, but it's awesome to see a working C compiler. =) Too cool, guys...

#55 Astharot OFFLINE  

Astharot

    Star Raider

  • 74 posts
  • Married
  • Location:Rome

Posted Wed Jan 19, 2011 1:36 PM

This is great!! =) I don't understand it 100%, but it's awesome to see a working C compiler. =) Too cool, guys...


If compiler works we can play to chess!

Look this:
/***************************************************************************/
/*                               micro-Max,                                */
/* A chess program smaller than 2KB (of non-blank source), by H.G. Muller  */
/***************************************************************************/
/* version 4.7 (1922 characters) features:                                 */
/* - recursive negamax search                                              */
/* - all-capture MVV/LVA quiescence search                                 */
/* - (internal) iterative deepening                                        */
/* - best-move-first 'sorting'                                             */
/* - a hash table storing score and best move                              */
/* - futility pruning                                                      */
/* - king safety through magnetic, frozen king in middle-game              */
/* - R=2 null-move pruning                                                 */
/* - Check-evasion extension in full-width nodes                           */
/* - keep hash and repetition-draw recognition at game level               */
/* - evaluation distinguishing B/N, and more distributed promotion bonus   */
/* - full FIDE rules (expt under-promotion) and move-legality checking     */

#define W while
#define K(A,B) *(int*)(T+A+(B&<img src='http://www.atariage.com/forums/public/style_emoticons/<#EMO_DIR#>/icon_cool.gif' class='bbc_emoticon' alt='8)' />+S*(B&7))
#define J(A) K(y+A,b[y])-K(x+A,u)-K(H+A,t)

#define U (1<<24)
struct _ {int K,V;char X,Y,D;} A[U];           /* hash table, 16M+8 entries*/

int M=136,S=128,I=8e3,Q,O,K,N,R,J,Z,k=8,*p,c[9]; /* M=0x88                 */

char L,
w[]={0,2,2,7,-1,8,12,23},                      /* relative piece values    */
o[]={-16,-15,-17,0,1,16,0,1,16,15,17,0,14,18,31,33,0, /* step-vector lists */
     7,-1,11,6,8,3,6,                          /* 1st dir. in o[] per piece*/
     6,3,5,7,4,5,3,6},                         /* initial piece setup      */
b[129],                                        /* board: half of 16x8+dummy*/
T[1035],                                       /* hash translation table   */

n[]=".?+nkbrq?*?NKBRQ";                        /* piece symbols on printout*/

D(k,q,l,e,E,n)          /* recursive minimax search, k=moving side, n=depth*/
int k,q,l,e,E,n;        /* (q,l)=window, e=current eval. score, E=e.p. sqr.*/
{                       /* e=score, z=prev.dest; J,Z=hashkeys; return score*/
 int j,r,m,v,d,h,i,F,G,V,P,f=J,g=Z;
 char t,p,u,x,y,X,Y,H,B;
 struct _*a=A+(J+k*E&U-1);                     /* lookup pos. in hash table*/

 q--;                                          /* adj. window: delay bonus */
 d=a->D;m=a->V;X=a->X;Y=a->Y;                  /* resume at stored depth   */
 if(a->K-Z|l>I|                                /* miss: other pos. or empty*/
  !(m<=q|X&8&&m>=l|X&S))                       /*   or window incompatible */
  d=Y=0;                                       /* start iter. from scratch */
 X&=~M;                                        /* start at best-move hint  */

 W(d++<n||d<3||                                /* iterative deepening loop */
   l>I&K==I&&(N<1e6&d<98||                    /* root: deepen upto time   */
   (K=X,L=Y&~M,d=3)))                          /* time's up: go do best    */
 {x=B=X;                                       /* start scan at prev. best */
  h=Y&S;                                       /* request try noncastl. 1st*/
  P=d<3?-I:D(24-k,-l,1-l,-e,S,d-3);            /* Search null move         */
  n+=d==3&P==I;                                /* Extend 1 ply if in check */
  m=-P<l|R>35?d-2?-I:e:-P;                     /* Prune or stand-pat       */
  N++;                                         /* node count (for timing)  */
  do{u=b[x];                                   /* scan board looking for   */
   if(u&k)                                     /*  own piece (inefficient!)*/
   {r=p=u&7;                                   /* p = piece type (set r>0) */
    j=o[p+16];                                 /* first step vector f.piece*/
    W(r=p>2&r<0?-r:-o[++j])                    /* loop over directions o[] */
    {A:                                        /* resume normal after best */
     y=x;F=G=S;                                /* (x,y)=move, (F,G)=castl.R*/
     do{                                       /* y traverses ray, or:     */
      H=y=h?Y^h:y+r;                           /* sneak in prev. best move */
      if(y&M)break;                            /* board edge hit           */
      m=E-S&b[E]&&y-E<2&E-y<2?I:m;             /* bad castling             */
      if(p<3&y==E)H^=16;                       /* shift capt.sqr. H if e.p.*/
      t=b[H];if(t&k|p<3&!(y-x&7)-!t)break;     /* capt. own, bad pawn mode */
      i=37*w[t&7]+(t&192);                     /* value of capt. piece t   */
      m=i<0?I:m;                               /* K capture                */
      if(m>=l&d>1)goto C;                      /* abort on fail high       */

      v=d-1?e:i-p;                             /* MVV/LVA scoring          */
      if(d-!t>1)                               /* remaining depth          */
      {v=p<6?b[x+8]-b[y+8]:0;                  /* center positional pts.   */
       b[G]=b[H]=b[x]=0;b[y]=u|32;             /* do move, set non-virgin  */
       if(!(G&M))b[F]=k+6,v+=50;               /* castling: put R & score  */
       v-=p-4|R>29?0:20;                       /* penalize mid-game K move */
       if(p<3)                                 /* pawns:                   */
       {v-=9*((x-2&M||b[x-2]-u)+               /* structure, undefended    */
              (x+2&M||b[x+2]-u)-1              /*        squares plus bias */
             +(b[x^16]==k+36))                 /* kling to non-virgin King */
             -(R>>2);                          /* Pawn-push bonus in ending*/
        V=y+r+1&S?647-p:2*(u&y+16&32);         /* Promotion or 6/7th bonus */
        b[y]+=V;i+=V;}                         /* upgrade, or promote to Q */
       J+=J(0);Z+=J(<img src='http://www.atariage.com/forums/public/style_emoticons/<#EMO_DIR#>/icon_cool.gif' class='bbc_emoticon' alt='8)' />+G-S;                    /* update hash key(s)       */
       v+=e+i;V=m>q?m:q;                       /* new eval and alpha       */
       v=d>3|v>V?-D(24-k,-l,-V,-v,             /* recursive eval. of reply */
                              F,d-1):v;        /* or fail low if futile    */
       if(K-I&&v+I&&l>I&x==K&y==L)             /* move pending. if legal,  */
       {Q=-e-i;O=F;                            /* in root & found, exit D  */
        R+=i>>7;return I;                      /* captured non-P material  */
       }
       J=f;Z=g;                                /* restore hash key(s)      */
       b[G]=k+6;b[F]=b[y]=0;b[x]=u;b[H]=t;     /* undo move,G can be dummy */
      }
      if(v>m)                                  /* new best, update max,best*/
       m=v,X=x,Y=y|S&F;                        /* mark double move with S  */
      if(h){h=0;goto A;}                       /* redo after doing old best*/
      if(x+r-y|u&32|                           /* not 1st step,moved before*/
         p>2&(p-4|j-7||                        /* no P & no lateral K move,*/
         b[G=x+3^r>>1&7]-k-6                   /* no virgin R in corner G, */
         ||b[G^1]|b[G^2])                      /* no 2 empty sq. next to R */
        )t+=p<5;                               /* fake capt. for nonsliding*/
      else F=y;                                /* enable e.p.              */
     }W(!t);                                   /* if not capt. continue ray*/
  }}}W((x=x+9&~M)-B);                          /* next sqr. of board, wrap */
C:if(m>I-M|m<M-I)d=98;                         /* mate holds to any depth  */
  m=m+I|P==I?m:0;                              /* best loses K: (stale)mate*/
  if(a->D<99)                                  /* protect game history     */
   a->K=Z,a->V=m,a->D=d,                       /* always store in hash tab */
   a->X=X|8*(m>q)|S*(m<l),a->Y=Y;              /* move, type (bound/exact),*/
/*if(l>I)printf("%2d ply, %9d searched, score=%6d by %c%c%c%c\n",d-1,N-S,m,
     'a'+(X&7),'8'-(X>>4),'a'+(Y&7),'8'-(Y>>4&7)); /* uncomment for Kibitz */
 }                                             /*    encoded in X S,8 bits */
 return m+=m<e;                                /* delayed-loss bonus       */
}

main()
{
 K=8;W(K--)
 {b[K]=(b[K+112]=o[K+24]+<img src='http://www.atariage.com/forums/public/style_emoticons/<#EMO_DIR#>/icon_cool.gif' class='bbc_emoticon' alt='8)' />+8;b[K+16]=18;b[K+96]=9;  /* initial board setup*/
  L=8;W(L--)b[16*L+K+8]=(K-4)*(K-4)+(L-3.5)*(L-3.5); /* center-pts table   */
 }                                                   /*(in unused half b[])*/
 N=1035;W(N-->M)T[N]=rand()>>9;

 W(1)                                                /* play loop          */
 {N=-1;W(++N<121)
   printf(" %c",N&8&&(N+=7)?10:n[b[N]&15]);          /* print board        */
  p=c;W((*p++=getchar())>10);                        /* read input line    */
  K=I;                                               /* invalid move       */
  if(*c-10)K=*c-16*c[1]+799,L=c[2]-16*c[3]+799;      /* parse entered move */
  k^=D(k,-I,I+1,Q,O,3)-I?0:24;                       /* think or check & do*/
 }
}

Edited by Astharot, Wed Jan 19, 2011 1:39 PM.


#56 Tursi OFFLINE  

Tursi

    River Patroller

  • 4,746 posts
  • HarmlessLion
  • Location:BUR

Posted Thu Jan 20, 2011 6:31 AM

If compiler works we can play to chess!


hehe... not directly with that code! This line:

#define U (1<<24)
struct _ {int K,V;char X,Y,D;} A[U];           /* hash table, 16M+8 entries*/

Needs over 100MB of RAM! :)

Neat though, I've never seen that one...

#57 Astharot OFFLINE  

Astharot

    Star Raider

  • 74 posts
  • Married
  • Location:Rome

Posted Thu Jan 20, 2011 1:55 PM

If compiler works we can play to chess!


hehe... not directly with that code! This line:

#define U (1<<24)
struct _ {int K,V;char X,Y,D;} A[U];           /* hash table, 16M+8 entries*/

Needs over 100MB of RAM! :)

Neat though, I've never seen that one...


Yes--- but you can redux hash table to 16k for example ... :)

#58 insomnia OFFLINE  

insomnia

    Star Raider

  • Topic Starter
  • 71 posts
  • Location:Pittsburgh, PA

Posted Fri Jan 21, 2011 12:21 AM

That chess program was so insane, I needed to see what the compiler would do with it. Aside from a few unexpected fake register accesses (grr! fixed by hand), and failing to handle the giant lookup table (not suprising) it was pretty much drama-free.

The resulting code with -O2 optimizations: 1463 lines of assembly. Are there errors? Could be... But I have no motivation to compare against that C code. At first glance, it looks right.

Here's the readelf dump of the resulting object file:

eric@compaq:~/dev/tios/src/temp$ tms9900-readelf -S 2k_chess.o
There are 8 section headers, starting at offset 0xefc:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00000000 000034 000e54 00  AX  0   0  2
  [ 2] .rela.text        RELA            00000000 001e70 000da4 0c      6   1  4
  [ 3] .data             PROGBITS        00000000 000e88 000041 00  WA  0   0  2
  [ 4] .bss              NOBITS          00000000 000ec9 000cb2 00  WA  0   0  1
  [ 5] .shstrtab         STRTAB          00000000 000ec9 000031 00      0   0  1
  [ 6] .symtab           SYMTAB          00000000 00103c 000b00 10      7 145  4
  [ 7] .strtab           STRTAB          00000000 001b3c 000333 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

So that's 3668 bytes of code (.text section), and 3315 bytes of data (.data + .bss sections). Of course that data size is shy about 128 MB or so.

Still, I'm really impressed with how things are shaping up so far.

#59 Astharot OFFLINE  

Astharot

    Star Raider

  • 74 posts
  • Married
  • Location:Rome

Posted Thu Jan 27, 2011 12:45 PM

That chess program was so insane, I needed to see what the compiler would do with it. Aside from a few unexpected fake register accesses (grr! fixed by hand), and failing to handle the giant lookup table (not suprising) it was pretty much drama-free.

The resulting code with -O2 optimizations: 1463 lines of assembly. Are there errors? Could be... But I have no motivation to compare against that C code. At first glance, it looks right.

Here's the readelf dump of the resulting object file:

eric@compaq:~/dev/tios/src/temp$ tms9900-readelf -S 2k_chess.o
There are 8 section headers, starting at offset 0xefc:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00000000 000034 000e54 00  AX  0   0  2
  [ 2] .rela.text        RELA            00000000 001e70 000da4 0c      6   1  4
  [ 3] .data             PROGBITS        00000000 000e88 000041 00  WA  0   0  2
  [ 4] .bss              NOBITS          00000000 000ec9 000cb2 00  WA  0   0  1
  [ 5] .shstrtab         STRTAB          00000000 000ec9 000031 00      0   0  1
  [ 6] .symtab           SYMTAB          00000000 00103c 000b00 10      7 145  4
  [ 7] .strtab           STRTAB          00000000 001b3c 000333 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

So that's 3668 bytes of code (.text section), and 3315 bytes of data (.data + .bss sections). Of course that data size is shy about 128 MB or so.

Still, I'm really impressed with how things are shaping up so far.


If you change #define U (1<<24) to
#define U (1<<14)
can you compile a ti99 version?
I think this program defeat video chess...
And if you project a chess graphic board you make VIDEO CHESS II !!!
:)

Edited by Astharot, Thu Jan 27, 2011 12:46 PM.


#60 Vorticon OFFLINE  

Vorticon

    River Patroller

  • 2,726 posts
  • Location:Eagan, MN, USA

Posted Thu Jan 27, 2011 2:16 PM

Oooo! Please do :lust:
Writing a new chess program for the TI is still on my future projects list.

#61 Vorticon OFFLINE  

Vorticon

    River Patroller

  • 2,726 posts
  • Location:Eagan, MN, USA

Posted Thu Jan 27, 2011 2:20 PM

printf(" %c",N&8&&(N+=7)?10:n[b[N]&15]); /* print board */


Seriously??? No wonder I can't stomach C :twisted: I guess this type of code separates the professionals from the hobbyists ;)

#62 Tursi OFFLINE  

Tursi

    River Patroller

  • 4,746 posts
  • HarmlessLion
  • Location:BUR

Posted Fri Jan 28, 2011 10:19 PM

printf(" %c",N&8&&(N+=7)?10:n[b[N]&15]); /* print board */


Seriously??? No wonder I can't stomach C :twisted: I guess this type of code separates the professionals from the hobbyists ;)


No, that code was written to be clever, that's all. Anyone writing professional code like that won't last long in group projects - it's completely illegible. Not only that, but because of the N+=7, but also uses N in the same statement, it's not guaranteed to produce the same code on all compilers (it is legal for the compiler to do that first (because of the parenthesis) or at the end of the entire statement. On top of that, the precendence of the operators & and && is often confused by people (bitwise is higher), making this statement very confusing.

It would be more clear (and take the same amount of compiled code -- possibly more if the optimizer makes more sense of it), to break it up as so (if /I/ read it right!!):

register int out = n[b[N]&15];  // could be optimized in a register if available
if (N&<img src='http://www.atariage.com/forums/public/style_emoticons/<#EMO_DIR#>/icon_cool.gif' class='bbc_emoticon' alt='8)' /> {
  N+=7;
  if (N) {
    out=10;
  } else {
    out=n[b[N]&15];  // N changed, so do it again
  }
}
printf(" %c", out);

The difference being, more comments are warranted ;)

It seems like all its doing there is deciding whether to print a board cell, or a line feed (10), so I'd probably just write the loop differently for readability. This one, of course, was an example in small source code size.

Edited by Tursi, Fri Jan 28, 2011 10:21 PM.


#63 Vorticon OFFLINE  

Vorticon

    River Patroller

  • 2,726 posts
  • Location:Eagan, MN, USA

Posted Sat Jan 29, 2011 11:24 AM

Thanks :) I feel a little less dumb now :P

#64 insomnia OFFLINE  

insomnia

    Star Raider

  • Topic Starter
  • 71 posts
  • Location:Pittsburgh, PA

Posted Sat Jul 23, 2011 3:06 PM

Update time!

It's about six months later than promised, but I haven't given up yet.

Most of that time has been putting in a ton of hours at work and beating on the GCC code to get byte operations working properly. What's in this release is the fourth or fifth overhaul of the port. In the end I had to rewrite core bits of how GCC relates byte and word quantities. I've kept those changes to a minimum, so ports to later versions should still work.

Here's what got changed in this patch:


Add optimization to remove redundant moves in int-to-char casts
Remove invalid CB compare immediate mode.
Add optimizations for byte immediate comparison
Added optimizations for shift and cast forms like (byte)X=(int)X>>N
Remove invalid compare immediate with memory
Improved support for subtract immediate
Fixed bug causing gibberish in assembly output
GCC now recognizes that bit shift operations set the comparison flags
Fixed bug causing bytewise AND to operate on the wrong byte
Add optimization for loading byte arrays into memory
Confirmed that variadic functions work properly.
Fixed the subtract instruction to handle constants
Fixed the CI instruction, it was allowing memory operands
Fixed a bug allowing the fake PC register to be used as a real register
Encourage memory-to-memory copies instead of mem-reg-mem copies
Added optimization to eliminate INV-INV-SZC sequences
Modify GCC's register allocation engine to handle TMS9900 byte values
Remove the 32 fake 8-bit registers. GCC now uses 16 16-bit registers
Modify memory addressing to handle forms like @LABEL+CONSTANT(Rn)
Clean up output assembly by vertically aligning operands
Clean up output by combining constant expressions
Optimize left shift byte quantities
Fixed a bug where SZC used the wrong register
Removed C instruction for "+=4" forms, AI is twice as fast
Added 32-bit negate
Fixed 32-bit subtract
Fixed a bug causing MUL to use the wrong register
Fixed a bug allowing shifts to use shift counts in the wrong register
Confirmed that inline assembly works correctly
Added optimization to convert "ANDI Rn, >00FF" to "SB Rn,Rn"
Optimize compare-with-zero instructions by using a temp register
Fixed a bug allowing *Rn and *Rn+ memory modes to be confused
Removed most warnings from the build process


There were also changes made to binutils, I hope this will be the last update for this.


More meaningful error messages from the assembler
DATA and BYTE constructs with no value did not allocate space
Fix core dump in tms9900-objdump during disassembly


The ELF conversion utility was also updated to allow crt0 to properly set memory before the C code executes. If it finds a "_init_data" label in the ELF file, it will fill out a record with all the information crt0 needs to do the initialization.

In light of all these changes, I've made a new "hello world" program with lots of comments, a Makefile and all supporting files. I've also included the compiled .o, .elf, and converted cart image. In addition, there's also a hello.s file which is the assembly output from the compiler.

I'm not sure if I mentioned this earlier, but the tms9900-as assembler will accept TI-syntax assembly files, but there are a number of additions:


Added "or", "orb" aliases for "soc" and "socb" (that's been a gotcha for a several people here)
Added "textz" directive - This appends a zero byte to the data.
"textz '1234'" is equivalent to "byte >31, >32, >33, >34, 0"
Added "ntext" directive - This prepends the byte count to the data.
"ntext '1234'" is equivalent to "byte 4, >31, >32, >33, >34"
Added "string" variants to all "text" directives
No length limit for label names
No limitation for constant calculations, all operations are allowed (xor, and, or, shifts, etc.)


It think thats about enough for now

I believe this is the biggest jump in usefulness yet. I've gone through and tested every instruction, and written several tests programs which did semi-interesting things from the compiler's point of view. They were, however, exceptionally dull from a user's point of view. For all the blow-by-blow details, check out my blog.

As a final test of the byte handling code, I built that chess program posted back in December. No problems were seen and no hinky-looking code was generated. In addition, it was about 5% smaller.

The build instructions are listed in post #43, and haven't changed since.

So, let me know what you think,

Attached Files



#65 Ksarul OFFLINE  

Ksarul

    River Patroller

  • 4,126 posts

Posted Sat Jul 23, 2011 4:11 PM

Thanks for continuing to improve this one!

#66 RXB OFFLINE  

RXB

    River Patroller

  • 2,721 posts
  • Location:Vancouver, Washington, USA

Posted Sat Jul 23, 2011 6:34 PM

Very very cool stuff. I remember when the SAMS came out and many were wondering if a 10Meg verion of the AMS was possible.

My Intel MacPro has 6Gig of RAM now so Classic99 using 100Meg would not be even a dent in that memory.


Would be so cool to have a 100Meg SAMS so we could finally run a little more current C pograms. Then we could at least take a stab at that chess program.

#67 Tursi OFFLINE  

Tursi

    River Patroller

  • 4,746 posts
  • HarmlessLion
  • Location:BUR

Posted Sun Jul 24, 2011 12:22 AM

Fastastic! I saw your update on your blog before I got here. I will definately try to play with this sometime this week. Thanks for your effort!

RXB: I've thought about the expanded AMS concepts (Thierry posted a nice 16MB schematic), but I've decided that Classic99 probably will not support larger memory expansions than 1MB, or other paging systems. At least till there's a notable body of software that uses it. ;)

AMS itself I kind of like, and hope to do something with. But projects are all behind.

#68 retroclouds ONLINE  

retroclouds

    Stargunner

  • 1,530 posts
  • Location:Germany

Posted Sun Jul 24, 2011 1:50 AM

Thanks for your hard work on this. Very cool stuff.
Will also give the assembler a try, I like the 'ntext' directive :thumbsup: :thumbsup: :thumbsup:

#69 lucien2 OFFLINE  

lucien2

    Moonsweeper

  • 282 posts
  • Location:Switzerland

Posted Sun Jul 24, 2011 1:00 PM

So, let me know what you think.


I tried all day to compile this GCC stuff under Windows.

At first, I tried with "mingw". I had to download and compile "gmp" and "mpfr". Stopped with some "gmp.h" error. I arrived here GCC Specs. Lots of trouble, "patch -p1 < gcc-4.4.0-tms9900-1.3.patch" didn't work really good.

Then, I tried with Cygwin. "patch -p1 < gcc-4.4.0-tms9900-1.3.patch" worked perfectly. Here's the output of "make all-gcc" or "make install", I don't remember. "./configure --prefix gcc --target=tms9900 --enable-languages=c" was not so bad, I think.

Spoiler


#70 insomnia OFFLINE  

insomnia

    Star Raider

  • Topic Starter
  • 71 posts
  • Location:Pittsburgh, PA

Posted Sun Jul 24, 2011 6:14 PM

lucien,

I use Linux for all my development and testing, and I don't have Cygwin installed on my Windows box, but I'll see what I can do for you right now.

From your logs, it looks like you are building from:
/gcc/lib/gcc/tms9900/4.4.0/

Seems an odd location..

I also see lines referencing /home/-/gcc-4.4.0/host-i686-pc-cygwin/gcc/xgcc, which seems like you are using user "-", which doesn't seem right either. These things might be messing up the build or install process if you don't have write or execute permissions in these directories for some reason.

But it looks like GCC was built properly, and the GCC docs we built as well. You can confirm that by looking for a file named cc1 at:

/gcc/lib/gcc/tms9900/4.4.0/host-i686-pc-cygwin/gcc/cc1

You can check to see that it works by going to that directory and doing this:

$ echo "void test() {}" > test.c
$ ./cc1 test.c
 test
Analyzing compilation unit
Performing interprocedural optimizations
 <visibility> <early_local_cleanups> <summary generate> <inline>Assembling functions:
 test
Execution times (seconds)
 parser                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 (100%) wall      34 kB ( 7%) ggc
 TOTAL                 :   0.00             0.00             0.01                498 kB
$cat test.s 
        pseg
        even    

        def     test
test
        ai   r10, >0
        mov  r10, r8
        b    *r11

If cc1 is working, you may be having an installation problem, I'd recommend running "make distclean" and re-running the build process again (configure, make-all, make install)

If it helps, I've included a copy of the output on my machine for these steps. For these logs, I've followed the directions posted above in a new directory. I'm using /home/eric/dev/tios/toolchain/WORKSPACE/temp/gcc-4.4.0 for my build location, and the /home/eric/dev/tios/toolchain/WORKSPACE/temp/bin for my install location.

If you're still having problems, make similar copies of your output for these steps and send it my way.

Attached Files



#71 lucien2 OFFLINE  

lucien2

    Moonsweeper

  • 282 posts
  • Location:Switzerland

Posted Sun Jul 24, 2011 7:07 PM

Oops, cc1 was generated in /home/-/gcc-4.4.0/host-i686-pc-cygwin/gcc (yes, "-" is my user name :D ). I didn't look there.

I tried "cc1 test.c" and it works.

The output that I posted was for "make all-gcc". I was waiting for something like "compilation OK" or some output in the "/gcc" directory (the install location).

I just tried "make install", and it worked. Now, I have "tms9900-cpp.exe", "tms9900-gcc-4.4.0.exe", "tms9900-gcc.exe" and "tms9900-gcov.exe" in "/gcc/bin".

Looks good enough for today, thanks. (It's 3AM here)

#72 lucien2 OFFLINE  

lucien2

    Moonsweeper

  • 282 posts
  • Location:Switzerland

Posted Mon Jul 25, 2011 3:24 AM

OK, now I have this problem when I compile the "hello world" exemple:
Attached File  make_hello.txt   3.13KB   24 downloads

It uses "as" and not "tms9900-as" to assemble the temporary "main.s" file. I replaced "as" in "/bin" with "tms9900-as" and it worked!

Here are the log files from building GCC:
Attached File  gcc_configure.txt   4.47KB   14 downloads
Attached File  gcc_build.txt   349.84KB   13 downloads
Attached File  gcc_install.txt   21.42KB   13 downloads

I installed Cygwin with these options:
Devel / gcc-core
Devel / libiconv
Devel / make
Devel / patchutils
Libs / libgmp-devel
Libs / libmpfr-devel
Interpreters / m4

Edited by lucien2, Mon Jul 25, 2011 3:47 AM.


#73 lucien2 OFFLINE  

lucien2

    Moonsweeper

  • 282 posts
  • Location:Switzerland

Posted Mon Jul 25, 2011 5:46 AM

Sorry, but I already found a bug in the code generation.

With this main.c:
#define VDP_READ_DATA_REG	(*(volatile char*)0x8800)
#define VDP_WRITE_DATA_REG	(*(volatile char*)0x8C00)
#define VDP_ADDRESS_REG		(*(volatile char*)0x8C02)
#define VDP_READ_FLAG		0x00
#define VDP_WRITE_FLAG		0x40
#define VDP_REG_FLAG		0x80
#define VDP_SCREEN_ADDRESS	0

void main() {
	VDP_ADDRESS_REG=13;
	VDP_ADDRESS_REG=7|VDP_REG_FLAG;
}

With "cc1 main.c", it sets VDP register 7 to 0x0D
	pseg
	even	

	def	main
main
	ai   r10, >0
	mov  r10, r8
	li   r1, >8C02
	li   r2, >D00
	movb r2, *r1
	li   r1, >8C02
	li   r2, >8700
	movb r2, *r1
	b    *r11

but with "cc1 -O2 main.c", it sets VDP register 7 to 0xFF...
	pseg
	even	

	def	main
main
	li   r1, >8C02
	li   r2, >FF87
	movb r2, *r1
	swpb r2
	movb r2, *r1
	b    *r11


#74 insomnia OFFLINE  

insomnia

    Star Raider

  • Topic Starter
  • 71 posts
  • Location:Pittsburgh, PA

Posted Mon Jul 25, 2011 9:37 PM

Thanks for the bug report, and keep 'em coming.

It is embarrassing have my mistakes up where everyone can see them, but each bug found and fixed makes for a more capable tool for everyone. So again, thanks.

I made an optimization pattern to convert a sequence like:
li   r2, >AA00
movb r2, *r1
li   r2, >BB00
movb r2, *r1

to one like this:
li   r1, >AABB
movb r2, *r1
swpb r2
movb r2, *r1

but I forgot to mask off the sign-extended bits of the constant in the lower byte. As a result, when the two values are ORed together, the byte stored in the upper byte is lost.
>FFBB | >AA00 = >FFBB

He also found another problem using -O0, which I don't usually use. The register I chose for the frame pointer, R8, is volatile. That means that it can be destroyed over a function call. The frame pointer is used as the base to locate local variables which live on the stack. If this is destroyed after a function call, the code following that call can behave unpredictably.

This was just a dumb mistake. In order to preserve the ABI interface, I've moved the frame pointer to R9, which is preserved across function calls. The resulting code looks much safer now.

I'll include these fixes in the next patch, but for the impatient here's how to fix these problems now:

Change gcc-4.4.0/gcc/config/tms9900.md, line 2423 (near "*movhi_combine_consts") to look like this:

    operands[1] = GEN_INT(((INTVAL(operands[1]) & 0xFF) << <img src='http://www.atariage.com/forums/public/style_emoticons/<#EMO_DIR#>/icon_cool.gif' class='bbc_emoticon' alt='8)' /> |
                           (INTVAL(operands[3]) & 0xFF));

Change gcc-4.4.0/gcc/config/tms9900.h, line 517 to look like this:

#define FRAME_POINTER_REGNUM		HARD_R9_REGNUM

And gcc-4.4.0/gcc/config/tms9900.h, line 523 to look like this:

#define STATIC_CHAIN_REGNUM	        HARD_R9_REGNUM

By the way, I think there's a problem where the stack frame is improperly sized when using -O0. This seems to be the result of the order of operations in GCC. The size of the frame is not yet known when the code to build the function prologue is called. That's on my todo list.

#75 lucien2 OFFLINE  

lucien2

    Moonsweeper

  • 282 posts
  • Location:Switzerland

Posted Tue Jul 26, 2011 1:10 AM

Thanks for the bug report, and keep 'em coming.


Here's one :)

int i;

void main() {
	i=i*0x1234;
}

O2:
	pseg
	even	

	def	main
main
	mov  @i, r1
	li   r3, >1234
	mpy  r1, r1         // must be mpy r3,r1
	mov  r2, @i
	b    *r11
	cseg
	def i
i
	bss 2

O0 crash:
$ cc1 -O0 main.c
 main
Analyzing compilation unit
Performing interprocedural optimizations
 <visibility> <early_local_cleanups> <summary generate> <inline>Assembling funct
ions:
 main
main.c: In function 'main':
main.c:5: internal compiler error: in refers_to_regno_for_reload_p, at reload.c:
6440
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.





0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users