Jump to content
IGNORED

Question about how GPU code is defined/copied in Atari 3D Renderer Sample


Sam Bushman

Recommended Posts

HI guys,

I am currently looking through a test program Atari wrote in 1995 for a 3D renderer for the Atari Jaguar. I am trying to learn a little about how one can generate 3D graphics on the Jag.

 

The program is made up of C code that defines structures for storing different renderers and input handling code for switching the active renderer. Each of these structures stores the starting address and length of the stored renderer and a function pointer to the GPU drawing code. Each iteration through the main loop, the C code copies the active renderer program into GPU local memory, updates the 3D scene variables, and calls the renderer's draw function.

 

My question deals with how this GPU code is assembled and stored in memory. Each renderer has the following lines in it's source file:

_gourcode:

.dc.l startblit, endblit-startblit

.gpu

.include "globlreg.inc"

.include "polyregs.inc"

.org G_RAM

startblit:

...

...

endblit:

For each renderer structure, the *code label (gourcode in the above example) is referenced for storing the starting address and length of the GPU code for copying purposes. A label between the startblit and endblit labels is referenced for storing the GPU drawing function pointer. My interpretation of this code is that the .gpu directive tells the assembler to treat the following code as GPU assembly (properly handling GPU-specific instructions) and the .org directive causes the address of startblit to be the first address of GPU internal memory (therefore causing the assembled code that follows to be loaded in this memory range).

What my question is, since all 6 renderers in this demo define their GPU programs in the same way (including the usage of the .org directive), how does the assembler handle the address that each of these programs are loaded at when the final program is loaded into Jaguar memory? It appears to me that each GPU program would overwrite whatever was previously written to G_RAM, and therefore not allow the C code to copy different renderer's GPU code into GPU internal memory.

If more code from this test program is needed to answer my question, please let me know.

Thanks for any help and have a happy new year guys!

  • Like 1
Link to comment
Share on other sites

The assembler DOESN'T handle it at all. All the code indeed assembles with the same gpu ram address and will overwrite each other. The main program (on the 68000) handles loading each bit of gpu code as needed. Look at this in tri68k.c:

 

TRIrender(Object *obj, Camera *cam, Lightmodel *lmodel)
{
	Triangle *tri;
	int	tricount;
	Polygon	*pgon, *altpgon;
	unsigned andclips, orclips, curclip;
	int	camx, camy, camz;
	int 	i;
	Texmap *tmap;
	unsigned color, basei;

	/* load up GPU code we will need */
	GPUload(gpufixdivcode);
See how the function loads the gpu code it will use shortly? You'll find other GPUload() calls elsewhere in the code as needed. When it needs to run the code, it uses the GPUexec() function to start the gpu running.
  • Like 1
Link to comment
Share on other sites

Thanks Chilly, that makes sense. I still have some confusion about where these programs reside in the executable's address space. I currently have the following assumptions (which some are incorrect):
1. Since no OS runs on the Jaguar, the address space of the executable is the same as the address space of the memory mappings for the Jaguar console (there is no indirection between the address the program is poking and the actual hardware memory address).

2. The .org directive tells the assembler that the following code should begin at the expressed address in the executable's address space (G_RAM in the initial example).

3. When the executable is loaded onto the Jaguar, code assembled after the .org directive will loaded into the system address space address equal to the definition of G_RAM (meaning the beginning of the GPU local memory).

4. I would think the various GPU programs would clobber each other when the assembled object files are linked together (as they all use the .org directive with the address G_RAM).

 

Clearly this cannot all be true, as all the GPU programs being used by the various renderers must exist in the executable in order to be copied to GPU local memory when GPUload is called in the first place. What I am not understanding is how the .org directive effects the memory layout of the final executable and the addresses the linker assigns to the various labels of the GPU programs.

 

I'm a bit of an assembly noob, so thank you for your patience and time. Cheers.

Link to comment
Share on other sites

The code/data is in the 68000 code/data right where it is in the source. All org does is tell the assembler to change the addresses computed for that code once it's moved to the right place. So the gpu code may end up at say 0x18000 on the 68000 side, but all the labels and branches inside that block point to the gpu local ram, and once the 68000/blitter moves that chunk of code from 0x18000 to the gpu local ram, it will work properly. It won't work if you try to run it where it is in the system ram... unless you left off the org command so that it assembles to run right there in system ram (which then means following rules needed to run in system ram instead of in local ram).

  • Like 2
Link to comment
Share on other sites

  • 2 weeks later...

I'm currently trying to compile this demo to run on my Skunkboard. I have Michael Hill's SDK for OS X installed, and I'm working to convert the included makefile over to using VBCC and SMAC instead of GCC and MAC. Currently, when SMAC attempts to assemble the first object file listed in the makefile (wfrend.o) I get the following output (running SMAC with the verbose flag):


smac -fb -v -I./include wfrend.s

[including: wfrend.s]

[including: jaguar.inc]

[Leaving: jaguar.inc]

[including: globlreg.inc]

[Leaving: globlreg.inc]

[including: trapregs.inc]

[Leaving: trapregs.inc]

[including: load.inc]

[including: loadxpt.inc]

[Leaving: loadxpt.inc]

[Leaving: load.inc]

[including: clip.inc]

[Leaving: clip.inc]

[including: wfdraw.inc]

[Leaving: wfdraw.inc]

[including: clipsubs.inc]

[Leaving: clipsubs.inc]

[including: init.inc]

[Leaving: init.inc]

GPU RAM USE: 2524 Bytes used

1572 Bytes free

2032 Bytes available for buffer

[Leaving: wfrend.s]

(*top*)[0]: Internal Error Number 7

make: *** [wfrend.o] Error 1

This error doesn't seem to give me much to work with at first glance. Is there some sort of error output file, documentation, or common knowledge about the assembler's error codes I can work with to determine what is happening here?

 

Thanks guys!

Link to comment
Share on other sites

I just built smac 1.0.18 (linked to me here by JagChris: http://3do.cdinteractive.co.uk/viewtopic.php?f=35&t=3591 )

 

Even with the new version and compiling smac with the -m32 flag from source I get the same error on the same file when assembling the 3D example. Is there anything else I can look at?

Edited by Sam Bushman
Link to comment
Share on other sites

Looking into the smac source a bit (keep in mind, this is my first time looking at the smac source :P ) I see that the only place where an error 7 is generated is in sect.c, line 390. It looks like this is the function that resolves fixups for code sections. It looks like a "chunk" is attempting to be cached, but is not found. I'll keep looking at it and see if I can provide more info. Thanks.

Edited by Sam Bushman
Link to comment
Share on other sites

The current file number and line number give *TOP* and 0, which combined with the weird fixup word leads me to think this is an extraneous fix up that somehow got onto the fixup list.

 

EDIT: Putting a continue in place of the interror call caused a segment fault as the fixup list at that point is wonked. Putting a break at that point caused the assembly to finish with no apparent problems. Everything including the symbol table looks fine.

Edited by Chilly Willy
Link to comment
Share on other sites

Spoke too soon about everything looking okay. If you look at the listing, you see

 

  129  00000046  CC03                   	move	PC,return
  130  00000048  0C83                   	addqt	#4,return
  131                                   
  132                                   ;***********************************************************************
  133                                   ; main loop goes here
  134                                   ; it is assumed that "return" points to "triloop"
  135                                   ;***********************************************************************
  136                                   
  137                                   triloop:
  138  0000004A  xxxx                   	addqt	#(skipface-triloop),return
  139  0000004C  9064                   	moveta	return,altskip
And in the .o file you find "CC 03 0C 83 90 64", so we're skipping this. And now that I think about it, the above is a quick add that needs a sub32 expr to calculate the offset for the add.
Link to comment
Share on other sites

I did. While smac (32-bit only) runs through the assembly for tube_se just fine, rmac kicks out tons of errors. You might grab the source and figure out what's causing them. As to 64-bit, smac is SOMEWHAT 64-bit clean, but has issues with the tokens (and maybe other stuff, but certainly with the tokens). Looking at rmac, it looks like you cleaned up the issues with the tokens. So I'm willing to admit I was wrong on that... I probably assumed that the errors rmac had with the tube_se assembly was a 64-bit issue like it was with smac.

Link to comment
Share on other sites

Okay, this is really weird. I checked that I'm using the latest rmac, and I am. Then I tried using it for tube_se... and now it works fine. Why wasn't it working earlier? It gave a list of errors a page long, so I switched back to smac. Now, not an error. My brain is melting. :)

 

Does rmac support more than a.out format for the output object files?

Link to comment
Share on other sites

Okay, this is really weird. I checked that I'm using the latest rmac, and I am. Then I tried using it for tube_se... and now it works fine. Why wasn't it working earlier? It gave a list of errors a page long, so I switched back to smac. Now, not an error. My brain is melting. :)

 

 

You must be editing Blackout by mistake. Rum does that.

  • Like 2
Link to comment
Share on other sites

AFAIK RMAC outputs BSD style .o objects. But it can probably be easily modified to support others as well.

 

Also found a bug in RMAC when using the -l switch: If you assemble GPU code with that switch it causes RMAC to barf on "OR" opcodes (I suspect it may cause similar/worse effects on other platforms; YMMV). :P

Edited by Shamus
Link to comment
Share on other sites

I just built and tried rmac with the 3D example project (thanks for all the work on VJ and this Shamus ;) ). Sadly, it appears that rmac has issues handling a forward reference in drawpoly.inc :(

 

rmac -fb -u -I./include gourrend.s

drawpoly.inc 198: Error: undefined register equate 'endloop'

drawpoly.inc 520: Error: multiply-defined label 'endloop'

GPU RAM USE: 2992 Bytes used

1104 Bytes free

1560 Bytes available for buffer

make: *** [gourrend.o] Error 2

Do I need to adjust the rmac makefile at all for building on 64-bit OS X?

Edited by Sam Bushman
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...