So, without further ado...
The Jaguar's Object Procesor (OP) is a great piece of hardware. It's quite capable and flexible, and the raw power it gives makes one forgive its shortcomings and bugs. Which is of course a shame as most of the surrounding hardware is so bad (well, excluding the 68000, that's great of course ).
It's also one of the main things one has to master in order to program the console, independant of the language used or library. As the title says, it's scary. It introduces many concepts that can baffle newcomers and/or inexperienced people in general. So what this post aims to do is demistify and explain the chip as much as possible.
1. The Object Processor
1.a. Why it's called like that anyway?
A big part of the OP puzzle is hidden in its name. Object Processor. Something that processes objects. An object is a bit of an abstract term though. What constitutes as an object in the console's terms?
Instead of using graphs, bar/pie charts, let's go with an example instead. This is the title screen of Downfall, a game which a few of you might be aware of.
downfall.png 89.24KB 2 downloadsClick here to see a video in action if you're not aware of the game and come back to read the rest of this. What you see is the game's logo, some info text and some parallax thing in the background. You see that all those things are overlapping more or less. In order to achieve this in a typical bit-mapped screen of 80s/early 90s computers you'd have to do something like this:
a) Clear the screen
b) Draw the "farthest" layer of the parallax scroll
c) Draw the 2nd "farthest" layer of the parallax scroll, but since there are overlapping pixels with the layer drawn in (b), clear those first
d) Repeat © for layers 3-5
e) Draw the score/hiscore texts, again erasing all overlapping pixels first
f) Draw the logo (erasing again)
g) Draw the text (erasing again)
As you probably realise by now the CPU (or specialised hardware like the blitter) will spend a considerable time drawing and clearing pixels (also called masking) every time the screen is refreshed. Also important is the fact that a lot of specialised code has to be written in order to perform all these steps fast enough.
Could we do better? Enter the OP.
In contrast with bitmapped displays, i.e. you get a block of RAM, you fill it with bits and pixels are rendered on screen, the OP doesn't have a display. Instead you have to instruct it to render rectangles of graphics around the screen. Their widths, heights, colour depths, transparency and many more parameters are highly customisable. To put it in another way: to the OP everything is a sprite - player graphics, bullets, backgrounds, you name it. If it's not an object then it doesn't appear on screen.
So, to get back to the above example, we could define an object as big as the screen we're rendering and do all that busy work there. But (and this is what makes more sense) we can instuct the OP to render each different layer as a separate object and then combine them by itself. It will the be the OP's responsibility to compose the screen out of the parts we tell it to use.
1.b. Object lists
What we learned so far is that the Jaguar can render various boxes of data onto the screen. This of course raises a few questions like "how do we tell it how many boxes to render?" and "how do we describe these boxes so the hardware can understand them?". The answer to both these is Object Lists (OL).
An OL is really a forward linked list (If you already know what a linked list is then skip this paragraph). Very briefly put, imagine having an array where we want to store a few parameters per object - say we want to store 10 sprite's x, y positions, width and height. We could go right ahead and dimension an array in BASIC like
DIM stuffand then use that array to store everything. But what happens if we don't know beforehand how many sprites we'll use? One solution is to just overdimension and hope that limit never gets crossed, but that's wasting RAM and we don't want to. A more RAM optimal way is to dimension a 5th field that will tell us where in ram the next index actually is. Our modified array will store x,y,w,h,adr_of_next_index. So in order to traverse the array we have to know the address of the first index, go there, do what we need to do, then fetch adr_of_next_index, jump to that address, and so on until we reach the list's end (let's say that the list ends when we read a 0 in adr_of_next_index).
array.png 12.5KB 3 downloads
linkedlists.png 30.51KB 2 downloads
(No shame if you didn't digest all that info in the first read - go back and read it again if you're not sure how it works. Or reply to this post with a question. Ultimately it doesn't matter but it helps understanding some of the stuff you'll see later).
So by using this arrangement we can store as many object as we require with the minumum RAM waste.
1.c. Anatomy of an object
Let's see exactly how flexible the OP is by looking at how an object is defined.
There are five kinds of objects:
- Bitmap object
- Scaled bitmap object
- Graphics Processor object
- Branch object
- Stop object
Each object is at least one phrase long (one phrase is 64 bits, or 8 bytes). So what probably happens is that the OP reads a phrase off the OL for the current object, and if the type demands it then read extra phrase(s). The first 3 bits of each object's first phrase contain the TYPE field. That's 0 for bitmapped object, 1 for scaled, 2 for GPU object, 3 for branch object and 4 for stop object.
1.c.1. Bitmapped object
Let's have a quick look at the fields for a normal bitmapped object. Don't pay too much attention at the descriptionm especially if you don't understand something - we'll explain everything eventually.
TYPE - object type (hardcoded to 0)
YPOS - object y position in screen
HEIGHT - object height
LINK - address of next object in OL
DATA - address of graphics data for the object
XPOS - object x position in screen
DEPTH - object's bit depth
PITCH - how many bytes is a single scanline of the object
INDEX - For bitmapped objects, choose which palette to use
DWIDTH - how many phrases wide the object is in memory (can be different than PITCH!)
IWIDTH - how many phrases wide the object is on screen (can be different than PITCH and DWIDTH!)
REFLECT - flag that controls if the object will be drawn mirrored horizontally
RMW - Adds the current pixel value to what's already on the line buffer (advanced topic for now!)
TRANS - flag that enables transparency. Colour 0 will not be drawn on screen
RELEASE - flag that tells the OP to yield the bus to other chips
FIRSTPIX - tells the OP which is to be considered the first pixel to be drawn per scanline
Now that's a lot of things, right? If you were to implement those in software you'd consume a lot of bytes. And since CPUs are usually comfortable with 8, 16 or 32 bit values (byte/word/longword) you'd need a word per parameter (even for a flag) and a longword per address to be safe. There are 16 parameters in that list, two of them are addresses. So we're looking at 14*2+2*4=36 bytes=288 bits! How did they squeeze it down to 128? (if you wonder "why 128?" then remember: a bitmapped object is 2 phrases!) Well, for starters some fields are on/off (0/1) so they just need a single bit to represent them. Similarly other fields like YPOS/XPOS don't need the full 16 bits that we allocated because of physical constraints - no reason to have a X value of 65535 for example! So most of the fields are chopped down in a similar fashion. Finally, the address fields are always aligned to 8 bytes due to hardware constraints (burst read access if I'm not mistaken). This means that the address values will be a multiple of 8 - so 8,16,32,48 etc. If you look these numbers in binary or hex notation you'll notice that the last 3 bits are zero. So no need to store them at all!
All the above tricks do save a lot of RAM and bandwidth for the OP but it comes as a cost. Namely, it's not very easy to construct and update an OL from a CPU standpoint. Inserting and updating bitfields require a few instuctions per operation and since the 68000 is a bit slow when it tries to shift values (an essential part unless you can avoid it), it can become quite demanding to update even basic values like screen x,y coordinates. So when a lot of objects are used it is recommended to use the GPU (Tom) to update the lists.
One final thing worth mentioning here is that during screen composing the OP will trash the OL partially (I am not sure which fields are modified at the moment, perhaps someone can help?). If the OP is allowed to run on the next frame with the list trashed you will definitely see the screen do weird things, from blanking out completely to displaying garbage and then crashing the machine! So it is necessary to update the OL every vertical blank (VBL). I know of two methods here: one is to update the actual list and the other is to keep a second copy of the OL and copy it to the live list during the VBL. The later is what raptor uses and it is much less complicated: you can do the processing during the frame is displayed and not have to struggle modifying things at the last moment.
1.c.2. Scaled bitmapped object
The scaled object is exactly the same as the bitmapped, except that the type is of value 1 here, with the addition of third phrase that contains some extra fields.
HSCALE - Horizontal scale factor in 3.5 unsigned fixed point format.
VSCALE - Vertical scale factor in 3.5 unsigned fixed point format.
REMAINDER - This seems to be internal state kept by the OP (anyone can help here?)
So what's all this 3.5 fixed point format thing then? First of all, fixed point is a way for a CPU that does not support numbers with fractions to support them. If you consider numbers in the decimal system you have the integer part followed by the fractional. They are separated using a deliminator (comma or dot, depending on your region). If we remove the deliminator from the number then a number like "123.456" becomes "123456". There is no way we can figure out where the fractional part begins unless someone tells us that it's after the 3rd digit. So if we all agree that the integer part is the first 3 digits and the fractional part the last 3 then we've made a fixed point system. We can now encode all numbers from 000.000 to 999.999 using 6 digits. (if you wonder why 000.000 and not -999.999, then notice that the sign is also a digit and we'll have to spend an extra digit to encode it). We call that "3.3 unsigned fixed point format". If we require negative numbers too we have to allocate an extra digit so our numbers can be "-999.999" to "+999.999" - that's a 4.3 signed format.
Now, let's switch to binary representation. It's pretty much the same thing as the above only with two available digits (0 and 1). So to our example above a 3.3 fixed point binary format would be able to store numbers from 000.000 to 111.111. So, our 3.5 unsigned format would of course hold numbers from 000.00000 to 111.11111.
So much for the definitions. But that doesn't help us much in the way of knowing how much we zoom our object, right? After all, what's 0.11111 binary when converted to decimal system where we're more comfortable? Let's begin with the easy stuff - integers. Since we use 3 bits for integer part we can store decimal numbers 0 to 2^3-1=7.
Moving on to the scary part: fractions! So ask yourself: what is 0.1 in the decimal system? It's 1/10th, right? And 0.001? That's 1/100th. 0.0001 is 1/1000th an so on. So what happens is that we divide 1 by the base as many times as we have decimal places. If we formulate this, it's something like "1/(10^fraction_digit)" to represent a fraction digit (10 raised to the power of the number of fraction digits is the same as dividing with 10 as many times as the number of fraction digits). It just so happens though that 10 is the decimal system's base. So we can change the formula to "1/(base^fraction_digit)". Finally, "1" is used because our examples had 1 in them. So the final transformation we do to the formula is "number/(base^fraction_digit". This lets us represent any digit in the fractional part. I hope you've got it by now, but if not... Let's switch to binary number system. Our base is 2 here and the range for number here is 0 to 1 so we can write our generic formula as "0 (or 1)/(2^fraction_digit)".
Let's write some examples then: %0.00001 (notice that numbers prefixed with % are considered binary by assemblers like rmac) is actually 1/2^5 in decimal, so 1/32 or 0.03125. %0.01 is 1/2^2=1/4=0.25. %0.1 is 1/2=0.5.
So what numbers can we represent on a 3.5 unsigned binary format? I would expect %000.00000 to produce nothing as it's a scale factor of exactly 0 so let's leave that out for now. %000.00001 would be the smallest number and %111.11111 the largest. %000.00001 as we wrote above is 0.03125. So that's actually our increment - all scale factors will be an integer multiple of this. For example, the next number in sequence, %000.00010, is 0.0625 which is, true enough, double of 0.03125. %001.00000 is obviously 1, so that's the number we need to put in order to have no scaling at all. And so on and so forth until we reach %111.11111, which is 7+1/2+1/4+1/8+1/16+1/32=7.96875 - that's the largest scale we can have from the OP.
1.c.3. Graphics Processor object
Scary stuff - let's leave that out for now!
1.c.4. Branch object
This object enables the OP to skip parts of the OL or even create loops if used carefully.
TYPE - object type (hardcoded to 3)
YPOS - if a comparison is performed, this is the value to compare against.
CC - Condition Code
LINK - If a branch is taken, this is the address to branch to.
First of all, if we simply want to branch to a different point in the OL we can simply set CC to 0, YPOS to $7fff and fill the LINK field with the address to branch to. This can be used to remove objects that are unused at the time of display (for example, say you have 30 objects that display bullets and you only have 10 active. You could add a branch object before the bullet objects and branch so the OP will skip 20 objects and display the last 10).
The other three cases can branch if the Video Counter (VC) is equal (CC=0), smaller (CC=2) or larger (CC=1) compared to the value YPOS contains. This can lighten the OP's load greatly. For example, consider the following playfield:
There's no reason for the OP to render the lower parts of the screen while it's rendering the upper half. So we can set YPOS to half the screen height and save tons of bandwidth.
Also, using comparison branches you can effectively create loops (i.e. render the same object for the first 50 scanlines) but I'm not sure if there's any vaule to doing this - most likely the objects will become trashed!
There are also two other branch types but I'll leave them alone for now as they're more specialised.
One final note (careful readers will probably wonder about this): If the branch is not taken, then the OP expects the next object to be on the next phrase from the branch object! If you violate this, then funky things will happen !
1.c.5. Stop object
Pretty straightforward stuff, just stick a 4 in the TYPE field and fill the rest of the phrase with zeros. The OP will stop processing more obejcts after this. You're done!
1.c.6. Reference: the reference manual on Objects.
Here's a direct quote from the jaguar reference manual.
Bit Mapped Object This object displays an unscaled bit mapped object. The object must be on a 16 byte boundary in 64 bit RAM. First Phrase Bits Field Description 0-2 TYPE Bit mapped object is type zero 3-13 YPOS This field gives the value in the vertical counter (in half lines) for the first (top) line of the object. The vertical counter is latched when the Object Processor starts so it has the same value across the whole line. If the display is interlaced the number is even for even lines and odd for odd lines. If the display is non-interlaced the number is always even. The object will be active while the vertical counter >= YPOS and HEIGHT > 0. 14-23 HEIGHT This field gives the number of data lines in the object. As each line is displayed the height is reduced by one for non-interlaced displays or by two for interlaced displays. (The height becomes zero if this would result in a negative value.) The new value is written back to the object. 24-42 LINK This defines the address of the next object. These nineteen bits replace bits 3 to 21 in the register OLP. This allows an object to link to another object within the same 4 Mbytes. 43-63 DATA This defines where the pixel data can be found. Like LINK this is a phrase address. These twenty-one bits define bits 3 to 23 of the data address. This allows object data to be positioned anywhere in memory. After a line is displayed the new data address is written back to the object. Second Phrase Bits Field Description 0-11 XPOS This defines the X position of the first pixel to be plotted. This 12 bit field defines start positions in the range -2048 to +2047. Address 0 refers to the left-most pixel in the line buffer. 12-14 DEPTH This defines the number of bits per pixel as follows: 0 1 bit/pixel 1 2 bits/pixel 2 4 bits/pixel 3 8 bits/pixel 4 16 bits/pixel 5 24 bits/pixel 15-17 PITCH This value defines how much data, embedded in the image data, must be skipped. For instance two screens and their common Z buffer could be arranged in memory in successive phrases (in order that access to the Z buffer does not cause a page fault). The value 8 * PITCH is added to the data address when a new phrase must be fetched. A pitch value of one is used when the pixel data is contiguous - a value of zero will cause the same phrase to be repeated. 18-27 DWIDTH This is the data width in phrases. i.e. Data for the next line of pixels can be found at 8 * (DATA + DWIDTH) 28-37 IWIDTH This is the image width in phrases (must be non zero), and may be used for clipping. 38-44 INDEX For images with 1 to 4 bits/pixel the top 7 to 4 bits of the index provide the most significant bits of the palette address. 45 REFLECT Flag to draw object from right to left. 46 RMW Flag to add object to data in line buffer. The values are then signed offsets for intensity and the two colour vectors. 47 TRANS Flag to make logical colour zero and reserved physical colours transparent. 48 RELEASE This bit forces the Object Processor to release the bus between data fetches. This should typically be set for low colour resolution objects because there is time for another bus master to use the bus between data fetches. For high colour resolution objects the bus should be held by the Object Processor because there is very little time between data fetches and other bus masters would probably cause DRAM page faults thereby slowing the system. External bus masters, the refresh mechanism and graphics processor DMA mechanism all have higher bus priorities and are unaffected by this bit. 49-54 FIRSTPIX This field identifies the first pixel to be displayed. This can be used to clip an image. The significance of the bits depends on the colour resolution of the object and whether the object is scaled. The least significant bit is only significant for scaled objects where the pixels are written into the line buffer one at a time. The remaining bits define the first pair of pixels to be displayed. In 1 bit per pixel mode all five bits are significant, In 2 bits per pixel mode only the top four bits are significant. Writing zeroes to this field displays the whole phrase. 55-63 Unused write zeroes. Scaled Bit Mapped Object This object displays a scaled bit mapped object. The object must be on a 32 byte boundary in 64 bit RAM. The first 128 bits are identical to the bit mapped object except that TYPE is one. An extra phrase is appended to the object. Bits Field Description 0-7 HSCALE This eight bit field contains a three bit integer part and a five bit fractional part. The number determines how many pixels are written into the line buffer for each source pixel. 8-15 VSCALE This eight bit field contains a three bit integer part and a five bit fractional part. The number determines how many display lines are drawn for each source line. This value equals HSCALE for an object to maintain its aspect ratio. 16-23 REMAINDER This eight bit field contains a three bit integer part and a five bit fractional part. The number determines how many display lines are left to be drawn from the current source line. After each display line is drawn this value is decremented by one. If it becomes negative then VSCALE is added to the remainder until it becomes positive. HEIGHT is decremented every time VSCALE is added to the remainder. The new REMAINDER is written back to the object. 24-63 Unused write zeroes. Graphics Processor Object This object interrupts the graphics processor, which may act on behalf of the Object Processor. The Object Processor resumes when the graphics processor writes to the object flag register. Bits Field Description 0-2 TYPE GPU object is type two 3-13 YPOS This object is active when the vertical count matches YPOS unless YPOS = 07FF in which case it is active for all values of vertical count. 14-63 DATA These bits may be used by the GPU interrupt service routine. They are memory mapped as the object code registers OB0-3, so the GPU can use them as data or as a pointer to additional parameters. Execution continues with the object in the next phrase. The GPU may set or clear the (memory mapped) Object Processor flag and this can be used to redirect the Object Processor using the following object. Branch Object This object directs object processing either to the LINK address or to the object in the following phrase. Bits Field Description 0-2 TYPE Branch object is type three 3-13 YPOS This value may be used to determine whether the LINK address is used. 14-15 CC These bits specify what condition is used to determine whether to branch as follows: 0 Branch if YPOS == VC or YPOS == 7FF 1 Branch if YPOS > VC 2 Branch if YPOS < VC 3 Branch if Object Processor flag is set 4 Branch if on second half of display line (HC10 = 1) 16-23 unused 24-42 LINK This defines the address of the next object if the branch is taken. The address is defined as described for the bit mapped object. 43-63 unused Stop Object This object stops object processing and interrupts the host. Bits Field Description 0-2 TYPE Stop object is type four 3-63 DATA These bits may be used by the CPU interrupt service routine. They are memory mapped so the CPU can use them as data or as a pointer to additional parameters.1.d. Bit depths - bandwidth
After digesting the basics objects one of the most confusing aspects for newcomers (especially rb+) is bit depth. "Since I draw some graphic on my desktop/laptop computer, it should just appear on the screen, right?". Well, yes and no.
In an ideal world we'd draw everything in as many colours we like and give it to the hardware to cope. Unfortunately the OP is quite fast but it cannot cope with this idea. It might appear so at first but as you start piling up objects one on top of the other it simply runs out of juice. When the OP is composing the screen, it more or less does the following for each line:
- Goes through the whole OL until it reaches a stop object (branch objects are of course evaluated)
- For each object the OP has to
- parse object coordinates
- parse screen coordinates
- translate screen coordinates to object coordinates
- figure out where it should read from the object's graphics address
- fetch pixels
- combine pixels with the pixels of the previous objects (transparency, ordering etc) if any
That's why the OP's designers added bit depths in the chip. If you know that your character sprite won't use more than 16 colours on screen (which translates to 4 bits, 2^4=16) then why waste 12 (or 20 in 24bpp mode) more? Multiply that with the number of objects you would like to have on screen (say 50?) and you end up with a lot of saved bandwidth. And that's for 16 colours, if you want to use even less then you can save much more.
Combining is also costly, especially when transparency comes into place. There's potentially a lot of read data thrown away just because a lot of objects are piling up on top of the others. Also, because the whole list is parsed per line, the OP also has to parse objects that might not apply in all scanlines, thus waste even more bandwidth. The use of branch objects can help massively here.
In conclusion, it makes good sense to plan ahead what you want to do and be bandwidth considerate. (After all, drawing the screen is only part of the problem - you also need logic, audio, inputs, and many more things)
1.e. Bitmapped objects, palettes and pixel formats
For 16, 24 and CRY modes it is easy to store colour information. Since each pixel is so many bits, we can encode the intensities for Red, Green and Blue directly on the pixel data. Very briefly, for 16bpp the format is:
Bit 0123456789abcdef RRRRRBBBBBGGGGGGso, 5 bits for Red and Blue (0-31), and 6 for Green (0-63).
For 24bpp objects we have 8 bits (0-255) for Red, Blue and Green and 8 unused.
Bit 0123456789abcdef0123456789abcdef RRRRRRRRBBBBBBBBGGGGGGGG00000000(note that the RBG order is intentional, that's how the OP expects values to be written)
CRY mode - one byte for RGB and one for intensity - let's leave that for later.
Let's go back to <16 bpp modes. Since in these modes we don't have enough bits to store component intensities, the solution is to store the intensities in a designated memory area separately and for the object itself just mark down the index to the intensities table. That memory area is called a Palette or CLUT (Colour Look-Up Table). The OP's CLUT table holds 256 entries and it uses the 16bpp format described above. So for example if we use 4bpp mode and our first 4 pixels are 9,2,5,8, the object's first two bytes will look like this:
Pixel 0 1 2 3 Values 9 2 5 8 Values 1001 0010 0101 1000Notice that each pixel has all its bits packed one after the other. This is true for all bit depths and is called chunky format.
1.f. Let's display an object on screen
1.g. Advanced topics (CRY, RMW etc)
2. Raptor lists
Here's an object as defined in a raptor list. Values in red are identical (or almost identical) to OL fields. (well, code snippets can't be coloured it seems, so I'll get back to this)
(REPEAT COUNTER) - Create this many objects of this type (or 1 for a single object)
sprite_active - sprite active flag
sprite_x - 16.16 x value to position at
sprite_y - 16.16 y value to position at
sprite_xadd - 16.16 x addition for sprite movement
sprite_yadd - 16.16 y addition for sprite movement
sprite_width - width of sprite (in pixels)
sprite_height - height of sprite (in pixels)
sprite_flip - flag for mirroring data left<>right
sprite_coffx - x offset from center for collision box center
sprite_coffy - y offset from center for collision box center
sprite_hbox - width of collision box
sprite_vbox - height of collision box
sprite_gfxbase - start of bitmap data
(BIT DEPTH) - bitmap depth (1/2/4/8/16/24)
(CRY/RGB) - bitmap GFX type
(TRANSPARENCY) - bitmap TRANS flag
sprite_framesz - size per frame in bytes of sprite data
sprite_bytewid - width in bytes of one line of sprite data
sprite_animspd - frame delay between animation changes
sprite_maxframe - number of frames in animation chain
sprite_animloop - repeat or play once
sprite_wrap - wrap on screen exit, or remove
sprite_timer - frames sprite is active for (or spr_inf)
sprite_track - use 16.16 xadd/yadd or point to 16.16 x/y table
sprite_tracktop - pointer to loop point in track table (if used)
sprite_scaled - flag for scaleable object
sprite_scale_x - x scale factor (if scaled)
sprite_scale_y - y scale factor (if scaled)
sprite_was_hit - initially flagged as not hit
sprite_CLUT - no_CLUT (8/16/24 bit) or CLUT (1/2/4 bit)
sprite_colchk - if sprite can collide with another
sprite_remhit - flag to remove (or keep) on collision
sprite_bboxlink - single for normal bounding box, else pointer to table
sprite_hitpoint - Hitpoints before death
sprite_damage - Hitpoints deducted from target
sprite_gwidth - GFX width (of data)
So it's evident that raptor lists try to be close to the OP's object definitions while adding extra fields to help the processing of sprites (animation, hitpoints, collision etc.).
3. Wrapping up
Hopefully everyone reading this post got something out of it. It's nothing more than re-stating what the hardware manual says with as many explanations to the newcomer as possible. Also it shows how much stuff rb+ and raptor do behind your back (constructing OLs, calculating object parameters, aligning graphics data so it will be processed ok and so much more).
Thanks for your patience while reading this! Let me know if there's something not clear, if I omitted something or if there's an error somewhere.
Edited by ggn, Thu Mar 16, 2017 2:47 AM.