Jump to content

Photo

GCC for the TI


310 replies to this topic

#301 Tursi ONLINE  

Tursi

    River Patroller

  • 4,752 posts
  • HarmlessLion
  • Location:BUR

Posted Thu May 11, 2017 1:11 AM

Just as a note, I was able to use the install script to build this under the Linux Subsystem for Windows, except for the install step failed to find libc++. I didn't need that right now, so I just changed it not to build the c++ compiler, and it seemed to work. It was /really/ noisy, tons of pedantic warnings (all over the place, not just in the patches), but I didn't have to do anything special. Just wanted to try it and see if it worked better than Cygwin. ;)



#302 insomnia OFFLINE  

insomnia

    Star Raider

  • Topic Starter
  • 71 posts
  • Location:Pittsburgh, PA

Posted Mon May 29, 2017 7:52 PM

Hey everyone,

 

I've got a new set of patches for the compiler to send out. It's pretty thin, but that's a good thing.

 

Here's what's new:

 

Fixed a multiplication bug reported by Chue, In come cases, the input arguments were clobbered leading to a wrong result.

Fixed incorrect instruction sizes. This will result in better optimized code.

Added .size directive to calculate function sizes. This is helpful during development.

Reduced size of the patch file. This makes it easier to understand which changes were made to the baseline.

 

I've updated the GCC installer to use this latest patch, and added an updated "hello world" program. This program no longer needs the elf2cart tool, makes improvements to the crt0 as well as fixing some bugs in the vdp_copy_from_sys function.

 

Honestly, there's not much else to say here. The compiler is pretty mature and stable at this point, but I'm always interested to hear of any problems or opportunities for improvement.



#303 chue ONLINE  

chue

    Chopper Commander

  • 134 posts

Posted Mon May 29, 2017 8:44 PM

Awesome, looking forward to trying out the new patch!



#304 TheMole OFFLINE  

TheMole

    Dragonstomper

  • 744 posts
  • Location:Belgium

Posted Thu Jul 13, 2017 6:54 AM

This is as much for future reference for myself as it is for anyone compiling on a macOS, but I needed to do three things in order to build the latest patch on my machine:

 

  1. Edit the install script to invoke curl correctly by adding double quotes around the WGET variable assigned, and changing the curl option from -C to -O

  2. Install gmp and mpfr libraries via Homebrew (brew install gmp mpfr)

  3. remove the c++ build target. Unfortunately, c++ doesn't build with clang due to duplicate symbols. I don't need c++ for my use, so not a big problem, but something worth noting nonetheless. Perhaps this is related to the problem Tursi was seeing when building with the Linux Subsystem for Windows back in May?



#305 TheMole OFFLINE  

TheMole

    Dragonstomper

  • 744 posts
  • Location:Belgium

Posted Tue Aug 15, 2017 3:41 AM

Played with the latest version a little bit, and I've discovered what I think might be a bug. I'm comparing patch 12 vs patch 15 here, and the assembler it outputs for this piece of C code:

 

bankswitch.h

#ifndef BANKSWITCH_H
#define BANKSWITCH_H

#define BANKSIZE	0x2000
#define BANKADDRESS 0x6000

// Variable to store our current bank, used by all trampoline functions
extern volatile unsigned int currentbank;

// Where to write to, to switch banks
// You can switch banks by addressing this as an array, and writing
// any value to it.
// 		e.g.	bankbase[2] = 1;
// This will switch to bank 2
extern volatile unsigned int *bankbase;

#endif

resource_copy.c:

#include "bankswitch.h"

#define _binary_start_bank	5

// Copy from cartridge ROM to a section in RAM
void rom_to_ram(unsigned long start, unsigned long end, unsigned char *dst)
{
	unsigned int srcoffset 	= (unsigned int)start % BANKSIZE;
	unsigned int length		= (unsigned int)(end - start);
	unsigned int srcbank	= _binary_start_bank + ((start - srcoffset) / BANKSIZE);

	// Do we need to read across different banks?
	while ( (srcoffset + length) > BANKSIZE )
	{
		unsigned int templength = BANKSIZE - srcoffset;

		// activate source bank
		bankbase[srcbank] = 1;

		// copy contents until end of bank
		memcpy(dst, (unsigned char*)(BANKADDRESS + srcoffset), templength);

		// update bank, offset and remaining length
		length -= templength;
		srcbank++;				// prime for next bank
		srcoffset = 0;			// start at start of bank
		dst += templength;		// Move destination pointer
	}

	// Copy contents of one (last) bank
	bankbase[srcbank] = 1;
	memcpy(dst, (void*)(BANKADDRESS + srcoffset), length);

	// Restore bank
	bankbase[currentbank] = 1;
}

When compiled with patch 12, the code works perfectly fine (it is, as I'm sure is obvious to most, code to copy binary blobs from paged cartridge ROM to RAM). When compiled with patch 15, it copies garbage (haven't been able to identify where it goes wrong). So as a first step to debug this, I figured I'd look at the assembly code generated by the two compilers.

 

For patch 12:

000000ba <rom_to_ram>:
  ba:	02 2a ff f2 	ai r10, >FFF2
  be:	c0 0a       	mov r10, r0
  c0:	cc 0b       	mov r11, *r0+
  c2:	cc 09       	mov r9, *r0+
  c4:	cc 0d       	mov r13, *r0+
  c6:	cc 0e       	mov r14, *r0+
  c8:	c4 0f       	mov r15, *r0
  ca:	c1 81       	mov r1, r6
  cc:	c1 c2       	mov r2, r7
  ce:	c3 85       	mov r5, r14
  d0:	c2 02       	mov r2, r8
  d2:	02 48 1f ff 	andi r8, >1FFF
  d6:	c3 44       	mov r4, r13
  d8:	63 42       	s r2, r13
  da:	c0 88       	mov r8, r2
  dc:	04 c1       	clr r1
  de:	61 81       	s r1, r6
  e0:	61 c2       	s r2, r7
  e2:	18 00       	joc 0
  e4:	06 06       	dec r6
  e6:	c2 46       	mov r6, r9
  e8:	0a 39       	sla r9, 3
  ea:	c0 47       	mov r7, r1
  ec:	09 d1       	srl r1, 13
  ee:	e2 41       	soc r1, r9
  f0:	02 29 00 05 	ai r9, >0005
  f4:	c3 c9       	mov r9, r15
  f6:	a3 c9       	a r9, r15
  f8:	02 04 00 00 	li r4, >0000
  fc:	10 00       	jmp 0

000000fe <L8>:
  fe:	02 28 e0 00 	ai r8, >E000
 102:	05 08       	neg r8
 104:	c0 60 00 00 	mov @>0000, r1
 108:	a0 4f       	a r15, r1
 10a:	02 03 00 01 	li r3, >0001
 10e:	c4 43       	mov r3, *r1
 110:	c0 4e       	mov r14, r1
 112:	c0 c8       	mov r8, r3
 114:	ca 84 00 0c 	mov r4, @>000C(r10)
 118:	ca 88 00 0a 	mov r8, @>000A(r10)
 11c:	06 94       	bl *r4
 11e:	c2 2a 00 0a 	mov @>000A(r10), r8
 122:	63 48       	s r8, r13
 124:	05 89       	inc r9
 126:	a3 88       	a r8, r14
 128:	05 cf       	inct r15
 12a:	04 c8       	clr r8
 12c:	c1 2a 00 0c 	mov @>000C(r10), r4

00000130 <L7>:
 130:	c0 4d       	mov r13, r1
 132:	a0 48       	a r8, r1
 134:	c0 88       	mov r8, r2
 136:	02 22 60 00 	ai r2, >6000
 13a:	02 81 20 00 	ci r1, >2000
 13e:	1b 00       	jh 0
 140:	a2 49       	a r9, r9
 142:	a2 60 00 00 	a @>0000, r9
 146:	02 01 00 01 	li r1, >0001
 14a:	c6 41       	mov r1, *r9
 14c:	c0 4e       	mov r14, r1
 14e:	c0 cd       	mov r13, r3
 150:	06 a0 00 00 	bl @>0000
 154:	c0 60 00 00 	mov @>0000, r1
 158:	a0 41       	a r1, r1
 15a:	a0 60 00 00 	a @>0000, r1
 15e:	02 02 00 01 	li r2, >0001
 162:	c4 42       	mov r2, *r1
 164:	c2 fa       	mov *r10+, r11
 166:	c2 7a       	mov *r10+, r9
 168:	c3 7a       	mov *r10+, r13
 16a:	c3 ba       	mov *r10+, r14
 16c:	c3 da       	mov *r10, r15
 16e:	02 2a 00 06 	ai r10, >0006
 172:	04 5b       	b *r11

For patch 15:

000000bc <rom_to_ram>:
  bc:	02 2a ff f2 	ai r10, >FFF2
  c0:	c0 0a       	mov r10, r0
  c2:	cc 0b       	mov r11, *r0+
  c4:	cc 09       	mov r9, *r0+
  c6:	cc 0d       	mov r13, *r0+
  c8:	cc 0e       	mov r14, *r0+
  ca:	c4 0f       	mov r15, *r0
  cc:	c1 81       	mov r1, r6
  ce:	c1 c2       	mov r2, r7
  d0:	c3 85       	mov r5, r14
  d2:	c2 02       	mov r2, r8
  d4:	02 48 1f ff 	andi r8, >1FFF
  d8:	c3 44       	mov r4, r13
  da:	63 42       	s r2, r13
  dc:	c0 88       	mov r8, r2
  de:	04 c1       	clr r1
  e0:	61 81       	s r1, r6
  e2:	61 c2       	s r2, r7
  e4:	18 00       	joc 0
  e6:	06 06       	dec r6
  e8:	c0 87       	mov r7, r2
  ea:	08 d6       	sra r6, 13
  ec:	09 d7       	srl r7, 13
  ee:	0a 32       	sla r2, 3
  f0:	e1 c2       	soc r2, r7
  f2:	c2 47       	mov r7, r9
  f4:	02 29 00 05 	ai r9, >0005
  f8:	c3 c9       	mov r9, r15
  fa:	a3 c9       	a r9, r15
  fc:	02 04 00 00 	li r4, >0000
 100:	10 00       	jmp 0

00000102 <L8>:
 102:	02 28 e0 00 	ai r8, >E000
 106:	05 08       	neg r8
 108:	c0 60 00 00 	mov @>0000, r1
 10c:	a0 4f       	a r15, r1
 10e:	02 03 00 01 	li r3, >0001
 112:	c4 43       	mov r3, *r1
 114:	c0 4e       	mov r14, r1
 116:	c0 c8       	mov r8, r3
 118:	ca 84 00 0c 	mov r4, @>000C(r10)
 11c:	ca 88 00 0a 	mov r8, @>000A(r10)
 120:	06 94       	bl *r4
 122:	c2 2a 00 0a 	mov @>000A(r10), r8
 126:	63 48       	s r8, r13
 128:	05 89       	inc r9
 12a:	a3 88       	a r8, r14
 12c:	05 cf       	inct r15
 12e:	04 c8       	clr r8
 130:	c1 2a 00 0c 	mov @>000C(r10), r4

00000134 <L7>:
 134:	c0 4d       	mov r13, r1
 136:	a0 48       	a r8, r1
 138:	c0 88       	mov r8, r2
 13a:	02 22 60 00 	ai r2, >6000
 13e:	02 81 20 00 	ci r1, >2000
 142:	1b 00       	jh 0
 144:	a2 49       	a r9, r9
 146:	a2 60 00 00 	a @>0000, r9
 14a:	02 01 00 01 	li r1, >0001
 14e:	c6 41       	mov r1, *r9
 150:	c0 4e       	mov r14, r1
 152:	c0 cd       	mov r13, r3
 154:	06 a0 00 00 	bl @>0000
 158:	c0 60 00 00 	mov @>0000, r1
 15c:	a0 41       	a r1, r1
 15e:	a0 60 00 00 	a @>0000, r1
 162:	02 02 00 01 	li r2, >0001
 166:	c4 42       	mov r2, *r1
 168:	c2 fa       	mov *r10+, r11
 16a:	c2 7a       	mov *r10+, r9
 16c:	c3 7a       	mov *r10+, r13
 16e:	c3 ba       	mov *r10+, r14
 170:	c3 da       	mov *r10, r15
 172:	02 2a 00 06 	ai r10, >0006
 176:	04 5b       	b *r11

That's as far as I got. You'll notice the code is largely the same, except for right after line 20 (adress 0x0bc and 0x0ba in the respective disassemblies), where there's a small difference in generated code (patch 15 is slightly longer).



#306 insomnia OFFLINE  

insomnia

    Star Raider

  • Topic Starter
  • 71 posts
  • Location:Pittsburgh, PA

Posted Mon Aug 21, 2017 7:57 PM

OK, I found the problem in the compiler. The issue is that there was a bug with right shifts of 32-bit values.

 

Here's how I found the problem:

 

After doing a diff between the two assembly snippets, there was only one mismatched block:

patch 12  (works)
  e6:	c2 46       	mov r6, r9
  e8:	0a 39       	sla r9, 3
  ea:	c0 47       	mov r7, r1      
  ec:	09 d1       	srl r1, 13
  ee:	e2 41       	soc r1, r9

patch 15 (fails)
  e8:	c0 87       	mov r7, r2     
  ea:	08 d6       	sra r6, 13
  ec:	09 d7       	srl r7, 13
  ee:	0a 32       	sla r2, 3
  f0:	e1 c2       	soc r2, r7

From looking at the source code and surrounding assembly, I was able to determine this block of code was part of the command:

unsigned int srcbank    = _binary_start_bank + ((start - srcoffset) / BANKSIZE);

Specifically, this code implemented ((start - srcoffset) / BANKSIZE. The (start - srcoffset) calculation was done by the instructions between offsets 0xde and 0xe4. This leaves us with (N / BANKSIZE). Since BANKSIZE is a power of two,  the compiler uses a bit shift to do the division, transforming the code into (N >> 13).

 

Knowing all this, lets walk through the code to see what's going on.

1) Assume a bit pattern stored in R6 and R7

   .----r6--------. .----r7--------.
   abcdefghijklmnop qrstuvwxyzABCDEF

2) By manual calculation, we should have this result:

   0000000000000abc defghijklmnopqrs


3) Trace the code

                        Assembly        Pseudo        bitfield value
patch 12  (works)       ----------      ----------    ----------------
  e6:	c2 46       	mov r6, r9
  e8:	0a 39       	sla r9, 3       r9 = r6<<3  = defghijklmnop000
  ea:	c0 47       	mov r7, r1      
  ec:	09 d1       	srl r1, 13      r1 = r7>>13 = 0000000000000qrs
  ee:	e2 41       	soc r1, r9      r9 |= r1    = defghijklmnopqrs  <-- Correct!


                        Assembly        Pseudo        bitfield value
patch 15 (fails)        ----------      ----------    ----------------
  e8:	c0 87       	mov r7, r2      r2 = r7     = qrstuvwxyzABCDEF  <-- Should use r6, not r7
  ea:	08 d6       	sra r6, 13      r6 = r6>>13 = 0000000000000abc
  ec:	09 d7       	srl r7, 13      r7 = r7>>13 = 0000000000000qrs
  ee:	0a 32       	sla r2, 3       r2 = r7<<3  = tuvwxyzABCDEF000
  f0:	e1 c2       	soc r2, r7      
  f2:	c2 47       	mov r7, r9      r9 = r7|r2  = tuvwxyzABCDEFqrs  <-- Wrong!

Ultimately, this was a typo in the format used for this instruction, and has been fixed. I'll look for similar problems in the other shift instructions and get a new patch sent out in a day or two.

 

Thanks for the bug report Mole! If anyone finds anything else, those fixes will get included too.

 

 



#307 TheMole OFFLINE  

TheMole

    Dragonstomper

  • 744 posts
  • Location:Belgium

Posted Tue Aug 22, 2017 2:22 PM

Excellent, glad to hear my report was useful!

And thanks for the explanation, I love reading about these kinds of details :).



#308 insomnia OFFLINE  

insomnia

    Star Raider

  • Topic Starter
  • 71 posts
  • Location:Pittsburgh, PA

Posted Thu Aug 24, 2017 1:24 AM

Well, it's patch time again.

 

The first post of this thread has been updated with the newest patch file and an updated installer script.

 

Here's the changes in this latest set:

  Fixed 32-bit right constant shift, failed with some constants
  Fixed all 32-bit variable shifts, was using r0 as temp register
  Fixed carry bit in 32-bit add
  Fixed invalid instruction in some 32-bit add forms
 

The first one is a result of the bug TheMole reported earlier. Not much more to say about that one.

 

While testing shift instructions, I found that in some cases the compiler decided to use R0 as a temp register before the final calculation was complete. I'm still not sure why that was happening, since R0 was supposed to be the last register selected for temp registers. This was fixed by disallowing input registers as temps.

 

Another unexpected bug was that the carry between the low and high words was inverted, resulting in strange results.

 

There was another edge case where the compiler tried to use the AI instruction with memory addresses. This was fixed by the use of a temp register.

 

It looks like the only bugs left are in unusual edge cases, which is great. I'm still trying to find improvements, but at this point, it's getting hard to find clear improvements. So if anyone has a good idea, I'd be happy to hear it.

 

 



#309 insomnia OFFLINE  

insomnia

    Star Raider

  • Topic Starter
  • 71 posts
  • Location:Pittsburgh, PA

Posted Thu Aug 24, 2017 1:34 AM

Well, it's patch time again.

 

The first post of this thread has been updated with the newest patch file and an updated installer script.

 

Here's the changes in this latest set:

  Fixed 32-bit right constant shift, failed with some constants
  Fixed all 32-bit variable shifts, was using r0 as temp register
  Fixed carry bit in 32-bit add
  Fixed invalid instruction in some 32-bit add forms
 

The first one is a result of the bug TheMole reported earlier. Not much more to say about that one.

 

While testing shift instructions, I found that in some cases the compiler decided to use R0 as a temp register before the final calculation was complete. I'm still not sure why that was happening, since R0 was supposed to be the last register selected for temp registers. This was fixed by disallowing input registers as temps.

 

Another unexpected bug was that the carry between the low and high words was inverted, resulting in strange results.

 

There was another edge case where the compiler tried to use the AI instruction with memory addresses. This was fixed by the use of a temp register.

 

It looks like the only bugs left are in unusual edge cases, which is great. I'm still trying to find improvements, but at this point, it's getting hard to find clear improvements. So if anyone has a good idea, I'd be happy to hear it.

 

 



#310 insomnia OFFLINE  

insomnia

    Star Raider

  • Topic Starter
  • 71 posts
  • Location:Pittsburgh, PA

Posted Thu Aug 24, 2017 1:36 AM

Well, it's patch time again.

 

The first post of this thread has been updated with the newest patch file and an updated installer script.

 

Here's the changes in this latest set:

  Fixed 32-bit right constant shift, failed with some constants
  Fixed all 32-bit variable shifts, was using r0 as temp register
  Fixed carry bit in 32-bit add
  Fixed invalid instruction in some 32-bit add forms
 

The first one is a result of the bug TheMole reported earlier. Not much more to say about that one.

 

While testing shift instructions, I found that in some cases the compiler decided to use R0 as a temp register before the final calculation was complete. I'm still not sure why that was happening, since R0 was supposed to be the last register selected for temp registers. This was fixed by disallowing input registers as temps.

 

Another unexpected bug was that the carry between the low and high words was inverted, resulting in strange results.

 

There was another edge case where the compiler tried to use the AI instruction with memory addresses. This was fixed by the use of a temp register.

 

It looks like the only bugs left are in unusual edge cases, which is great. I'm still trying to find improvements, but at this point, it's getting hard to find clear improvements. So if anyone has a good idea, I'd be happy to hear it.

 

 



#311 TheMole OFFLINE  

TheMole

    Dragonstomper

  • 744 posts
  • Location:Belgium

Posted Sat Sep 2, 2017 7:25 AM

I'm seeing a number of issues with this patch (some of which might have been there in previous versions as well, since I haven't gotten this far into compiling and running the program with previous versions).

 

First off, I get a segfault while compiling certain larger code files. Not sure if it is truly related to the length of the file, or just an unfortunate arrangement of code, but commenting out certain functions (and stripping out their use) will make the segfault go away. I can provide a code file that triggers this if it helps.

 

Secondly, there seems to still be some funky math going on. I need to investigate further, but without changing any code a number of calculations have seemingly different results compared to patch 12, which brakes a number of things in the game. I tried to find an as simple as possible specific piece of code that has a different outcome in both versions, and this is an instance:

int cnt, y;

cnt = 180;
while (cnt--)
{
	y = (cnt % 16) - 8;

	// Bounce arrow
	// maplocation_y is an unsigned char
	if (y >= 0)
		vdpchar(0x1F80, levels[currentlevel -1].maplocation_y + (y / 2));
	else
		vdpchar(0x1F80, levels[currentlevel -1].maplocation_y - (y / 2));
}

This code simply bounces a sprite up and down on the screen, and the bounce is different in both versions. The "range" of calculated y-values is visible wider (it bounces more) and shifted towards the bottom of the screen somewhat.

 

Now, if I change the definition of y from "int" to "char", it actually does the right thing (even if "cnt" stays a 16-bit integer). Alternatively, if I explicitly cast the result of the (y / 2) calculation to char (right before adding it to the maplocation_y struct member in the vdpchar function call), it also works. So I think something is wrong with some of the word-to-byte casts.

 

It's a simple fix in the above case, but I often go from words to bytes in my code (mostly when pushing things to the VDP), and the latest patch breaks this code in numerous places where a fix isn't so simple: collision detection, sprite rendering, ...

 

Anyway, great progress so far, and it's awesome to see that you keep swatting these bugs. I hope my reports aren't too disheartening :).


Edited by TheMole, Sat Sep 2, 2017 7:26 AM.





0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users