F18A programming, info, and resources

RXB · October 11, 2013

Does any realize there is existing software written for the SAMS and 9938 and 9958 that I have already posted exists.

9938 and 9958 had like 10 different applications written for them and would work on all of these devices. Of course most were TEXT Editors or TEXT related.

SAMS has many programs including a C99 Compiler for the SAMS Written by RA Green.

RAGLIBARIAN, RAGLINKER, RAGMACROASM, RAGMULTIPLAN, RAGWRITTER all had patches that modified for using the SAMS.

Just because people have never seen them or used them does not mean they do not exist.

Besides the SAMS is not that hard to build and a few put them into the Console.

matthew180 · October 11, 2013

This is not a SAMS advocacy thread. Please try to stay on topic. Also, if the SAMS is not so hard to build, then start making and selling them so everyone can have one.

TheMole · October 12, 2013

It is hard to justify the time though, since the demand is very low and the effort required very high.

You've mentioned this a couple of times and I think you sold over a hundred of them, so may I ask what number you would have considered reasonable? Looking at the number of active posters on this board, I think your product has sold exceptionally well and you should be very proud of how popular your device is!

Besides that, demand would probably grow significantly if more people could see it in action, even if with an emulator... Maybe we can convince Tursi or Michael (or even better, both...) to at least include support for the features that are already used in existing software (like ECM mode for sprites in TI-Scramble and palette and scanline interrupt support in Tursi's image viewer program)?

This gradual approach might get us to a fully supported F18A at some point .

TheMole · October 12, 2013

The biggest problem with the v9938 and the v9958, as far as features for games go, is the lack of horizontal scrolling support. That was inexcusable by the time these chips came out, as both the NES and SMS VDP (also spiritual successors to the TMS9918a) both came before and support that. Hell, the SMS VDP is even software compatible with the 9918, and to a much bigger degree hardware compatible as it doesn't add new address line requirements for the video features (it includes an embedded tms9919 sound chip, which has it's own address lines of course). I think that's why only the Geneve and the MSX-2 ended up using the Yamaha chips, as any real game system could not have done without that feature. I don't understand this oversight.

matthew180 · October 12, 2013

You've mentioned this a couple of times and I think you sold over a hundred of them, so may I ask what number you would have considered reasonable?

I don't know the exact numbers, I didn't track it after the initial pre-order. When I talk about "hard to justify the time based on demand", I'm talking more in terms of the requests I get for information on programming the features. It might be a chicken-and-egg scenario though. I'm not getting a lot of demand from programmers for info, and for those who do ask I write detailed information and the questions seem to stop. I'd love to write the documentation, but my son would love to play catch too. The time investment is hard.

I also feel that the register-use document provides an assembly programmer, familiar with the 9918A, with a lot of information they would need to use almost all of the F18A enhancements. However I might be naive about that. It could just be very obvious to me because I made it.

Besides that, demand would probably grow significantly if more people could see it in action, even if with an emulator...

I absolutely agree. I would also love to write an emulator with F18A support, write cool demos, make a few games, etc. I also want to write a new assembler that lets you write the GPU code right in the same source files (among other enhancements), and make a new interpreted language that supports the F81A so non-assembly programmers can take advantage of it as well. Dead horse beating commence: time my friend, time.

Maybe we can convince Tursi or Michael (or even better, both...) to at least include support for the features that are already used in existing software (like ECM mode for sprites in TI-Scramble and palette and scanline interrupt support in Tursi's image viewer program)?

Tursi has enough F18A support in Classic99 to run the programs he was working on, so the GPU is there, horizontal interrupt support, and a few other pieces I think. Not comprehensive, but at least it is a start. I have offered to help the emulator developers, but it is a big task and they are just as busy with life as the rest of us. Tursi has said he plans to add F18A support (I think), but I do not know if any other emulator developers are planning to add it.

The biggest problem with the v9938 and the v9958, as far as features for games go, is the lack of horizontal scrolling support.

The 9958 is just a 9938 with a handful of enhancements, horizontal scroll support being one of them. The 9958 documentation is actually written as just the differences over the 9938. Still, the 9938/58 scrolling is only one screen, i.e. there is not concept of "pages" like the NES VDP has, and subsequently the F18A has because I designed the scrolling based on the NES design. I also followed the NES VDP in using "bit planes" for adding more tile and sprite color, vs packing the bits next to each other in bytes (which would have been impractical for the 3-bit color anyway.) I thought the similarities would help existing developers feel more familiar with the F18A, but I don't suppose NES programmers are going to use or program a computer that had a 9918A in it.

Edited October 12, 2013 by matthew180

+Ksarul · October 12, 2013

Don't forget that Guillaume's MLC (My Little Compiler) has support for your GPU already Matthew--so there is some response to your published information, just not a lot yet. He tends to add features as people ask for them or as his own needs take him, so as demand increases, his compiler gains more functionality.

Edited October 12, 2013 by Ksarul

Omega-TI · October 12, 2013

I absolutely agree. I would also love to write an emulator with F18A support, write cool demos, make a few games, etc. I also want to write a new assembler that lets you write the GPU code right in the same source files (among other enhancements), and make a new interpreted language that supports the F81A so non-assembly programmers can take advantage of it as well. Dead horse beating commence: time my friend, time.

The one overriding thing for the F18A that I personally would like to see developed, is a very small Extended BASIC package of "CALL LINKS" that would give the XB programmer functional 80 columns in BASIC. (I don't know if it's even possible).

The 80 column equivalents of:

ACCEPT AT

DISPLAY AT

While retaining

CALL CLEAR

CALL SCREEN functionality

In January I'll have some extra time on my hands, and I plan to get down to business and write a program on the 'ol TI. To do things how I REALLY want to do them requires 80 columns.

While an add-on set of links like this would reside on disk, or the CF card in my case, I wonder if in the future it would be possible make routines such as this part of the F18A itself.

matthew180 · October 12, 2013

Don't forget that Guillaume's MLC (My Little Compiler) has support for your GPU already Matthew--so there is some response to your published information, just not a lot yet. He tends to add features as people ask for them or as his own needs take him, so as demand increases, his compiler gains more functionality.

Absolutely! MLC was the first "public" (non beta) software to use F18A features. And I'm surprised more people don't use MLC... Or maybe they do and I'm simply unaware (no surprise there). I was very happy when Guillaume started asking questions.

matthew180 · October 12, 2013

The one overriding thing for the F18A that I personally would like to see developed, is a very small Extended BASIC package of "CALL LINKS" that would give the XB programmer functional 80 columns in BASIC. (I don't know if it's even possible).

The biggest problem with BASIC and XB is the heavy use of VRAM for language support. On other systems like the MSX it is much easier to add support for the F18A from BASIC because VRAM is left alone to be "VRAM". Some features would be possible, but adding support for the extra colors and such would probably not work very well.

In January I'll have some extra time on my hands, and I plan to get down to business and write a program on the 'ol TI. To do things how I REALLY want to do them requires 80 columns.

Just 80-columns? There is way more to the enhancements than just plain old 80-columns. Also note that the "text" modes reduce the tile patterns to 6x8 pixels.

While an add-on set of links like this would reside on disk, or the CF card in my case, I wonder if in the future it would be possible make routines such as this part of the F18A itself.

Nope. The "library" could be added to the flash ROM and a special load routine could copy the code down to the low 8K for use in XB, but other than that the F18A is stuck on an 8-bit data bus and has no interface to the address bus. Like the 9918A, the F18A can only respond when the host system accesses it.

+OLD CS1 · October 12, 2013

DE-15

Thank you for getting this right.

Asmusr · October 13, 2013

I also feel that the register-use document provides an assembly programmer, familiar with the 9918A, with a lot of information they would need to use almost all of the F18A enhancements. However I might be naive about that. It could just be very obvious to me because I made it.

The area I haven't seen documented much is the tiles. I can figure out how the ECM works, but I'm not sure about pages, banners, scrolling and fixed/non scrolling tiles. I'm not sure how the scanline interrupt works either. Where do you put your interrupt routine? It would be great if you would write something about that.

matthew180 · October 13, 2013

The area I haven't seen documented much is the tiles. I can figure out how the ECM works, but I'm not sure about pages, banners, scrolling and fixed/non scrolling tiles. I'm not sure how the scanline interrupt works either. Where do you put your interrupt routine? It would be great if you would write something about that.

I'll start with the horizontal (scan line) interrupt (HINT) since that is the easiest to explain. I need an illustration of two for the other stuff (at least I'd *like* to make an illustration or two), so I'll add those later today (hopefully).

Since the 9918A only has one interrupt line, the when you use the HINT it triggers the same interrupt line. Therefore on the host side you have to check for both VINT and HINT when you are using both. This means you probably can't use the 99/4A's console ISR with the HINT.

The HINT works very much like the 9938, and I did try to do some enhancements like the 9938 where possible.

To set up a HINT you set the scan line you want the interrupt to trigger on in VR19. A value of zero (0) will disable the interrupt. You also have to enable the HINT, just like the VINT, in VR0:

       MSB                       LSB
       0   1   2   3   4   5   6   7
       -----------------------------
9918A  0   0   0   0   0   0   M3 EXTVID
9938   0  DG  IE2 IE1  M5  M4  M3  0
F18A   0   0   0  IE1  0   M4  M3  0

IE = interrupt enable.  IE0 is in VR1 and is the VINT enable, as per the 9918A.

The F18A does not care if the zero bits are zero (0) or one (1), they are ignored. So VR0:3 (IE1) has to be set to one (1) *and* VR19 has to have a value other than zero (0) to enable the interrupt. This way, even if poorly written legacy software sets VR0:3 to one (1), the F18A will still not generate the HINT because VR19 defaults to zero (and the only way to update VR19 is to unlock the F18A.)

The HINT is reported just like the 9938 via status register SR1:

       MSB                       LSB
       0   1   2   3   4   5   6   7
       ------------------------------
9938   LPF LPS ID0 ID1 ID2 ID3 ID4 HF
F18A   ID0 ID1 ID2  X   X   X  BLK HF

LPF = light pen flag
LPS = light pen switch
BLK = horizontal or vertical blanking is active
HF = horizontal interrupt flag

The HF works the same way as the VINT flag of the 9918A VDP. If VR0:3 (IE1) is set to one (1) then when the scan line reaches the value in VR19 (and VR19 is not zero (0)), the VDP interrupt output is triggered (set low), and SR1:7 (HF) is set to one (1) and stays set until you read SR1.

The F18A GPU also has direct access to the current scan line counter as well as all the VDP registers. The GPU is also very fast.

matthew180 · October 15, 2013

The F18A provides support for multiple "pages" when scrolling, and a "page" is simply additional "name tables" (NT) in the traditional sense of the 9918A VDP. There can be up to four NTs which are always consecutive in VRAM starting with NT1 which begins based on the NT-base in VR2.

When either the horizontal scroll register (HSR) or vertical scroll register (VSR) are incremented, there needs to be something to display at the edges where visual data is coming into view. Since scrolling can take place in two directions at once, there are four options for where the new data can come from:

1. One page, both horizontal and vertical directions wrap.

2. Two horizontal pages, vertical direction wraps.

3. Two vertical pages, horizontal direction wraps.

4. Four pages, no wrapping.

When using scrolling, the Name Table Base Address (NTBA) in VR2 is limited to 2-bits instead of the normal 4-bits. Thus the NT start address can only be located on 4K boundaries instead of 1K boundaries when using scrolling:

VR2:
MSB                LSB
0  1  2  3  4  5  6   7
X  X  X  X A0 A1  A2  A3 - Normal
X  X  X  X A0 A1 VPS HPS - Vertical / Horizontal Page Start from VR29

When scrolling, the HPS and VPS bits come from VR29:6 and VR29:7 and change from 0 -> 1 or 1 -> 0 depending on the horizontal / vertical page size selections in VR30. Allowing these bits to toggle, but having VR30:1 / VR30:2 set to one (1), is what causes a new name table to be selected when scrolling in either direction. If VR30:1 / VR30:2 are zero (0) for a given direction, then wrapping occurs in that direction instead of using a new name table.

VR29:
MSB                LSB
0  1  2  3  4  5   6   7
X  X  X  X  X  X  HPS VPS

VR30:
   MSB                      LSB
   0      1      2    3 4 5 6 7
HBSIZE HPSIZE VPSIZE  SPRITEMAX

HBSIZE = horizontal banner size
HPSIZE = horizontal page size
VPSIZE = vertical page size

Because the VPS has a higher bit-value (bit-2) than the HPS bit (bit-3) in the VRAM address, the horizontal name table will always come in memory before the vertical name table. This also means that two-page scrolling in the vertical direction will always use name tables 1 and 3, and skip name table 2. Two-page scrolling in the horizontal direction will always use name tables 1 and 2.

VRAM Address, 14-bits:

 14   13   12   11   10  9   8   7  6  5 4 3 2 1 - number of bits
8192 4096 2048 1024 512 256 128 64 32 16 8 4 2 1 - bit place value

 MSB                              LSB
 0 1 2 3 | 4 5 6 7 8 9 10 11 | 12 13 14 - bit number (reverse of "industry standard")
---------------------------------------
VR2:4..7 | y raster counter  | x MOD 8  - normal

  MSB                                             LSB
  0   1  |  2  |  3  | 4 5 6 7 8 9 10 11 |   12 13 14
VR2:4..5 | VPS | HPS |    y modified     | x modified - scrolling

It can be seen that the VSP will select between 0 (0K) or 2048 (2K) offset, and HPS will select a 0 (0K) or 1024 (1K) offset. VR2 is reduced to 2-bits and can locate the four name tables at 0K, 4K, 8K, or 12K when scrolling is being used.

The X pixel counter and Y raster counter are modified by the horizontal and vertical scroll registers, and when the counters reach their limits they reset and cause the VPS or HPS to toggle if the page size is one (1) in VR30, thus causing a new name table to be used, otherwise wrapping occurs in that direction.

Edited October 15, 2013 by matthew180

+OLD CS1 · October 16, 2013

In light of some of the recent advances and the demos which will be running at the Faire, I think it would be nice to have a couple of clinics detailing the technical aspects of the F18A, how to exploit them and especially how they have been exploited. Like "The Machine Hidden Inside the F18A" to go over the GPU and related techniques.

I would love to see a "Titanium" clinic on how to implement scrolling, a "Scramble" clinic on the process of converting a game from another platform starting at scratch, and a "Mr. Chin" clinic on programming for multiple platforms. Maybe a demonstration of MLC with accent on its F18A capabilities.

matthew180 · October 16, 2013

Scroll Limit Registers

======================

The F18A has four registers: VR50, VR51, VR52, and VR53 that can be used to limit the area in which the scroll registers will affect the tiles. The registers are a full byte each, thus they can define a Top, Bottom, Left, and Right boundary at the pixel level. The top (VR50) must be less than the bottom (VR51), and the left (VR52) must be less than the right (VR53). A value of zero disables that particular boundary.

For example, if VR50 (top boundary) has a value of 29, then all lines from 0 to 29 will not be affected by scrolling, but lines 30 to 191/239 (for row30) will be affected by scrolling.

Used together, the four registers can be used to define a "window" in which scrolling takes place, but outside the window the display is fixed.

Fixed Map

=========

The "Fixed Map" is used to allow any tile to be "fixed" in place and not affected by scrolling. The Fixed Map is a bit-map, so there is 1-bit per tile to control if the tile is fixed or affected by scrolling. VR10 specifies the Fixed Map Base Address and is 7-bits, so the table can be located on 128-byte boundaries. VR49 contains a "fixed enable" bit which must be one (1) to enable the fixed map.

The scroll limit registers and fixed map can be used at the same time. When a tile, pixel row, or pixel column is not affected by scrolling, it will always display tile information from the first name table, i.e. "Page 1".

Asmusr · October 16, 2013

Thank you Matthew. I will have to read it 2-3 times more, and then I will ask my questions.

Asmusr · October 19, 2013

How do you vsync from the GPU side? Post #5 mentions blanking at >7001, can you just read that address? Are the status registers memory mapped at the GPU side?

matthew180 · October 19, 2013

The GPU is a modified 9900 so it can access VRAM, the VDP Registers, etc. as 16-bit or 8-bit values. Yes, the VDP Registers are memory mapped into the GPU's address space (they are also readable by the host system), as well as the current scan line, blanking, etc:

-- Address building
-- VRAM 14-bit, 16K @ >0000 to >3FFF (0011 1111 1111 1111)
-- GRAM 11-bit, 2K  @ >4000 to >47FF (0100 x111 1111 1111)
-- PRAM  7-bit, 128 @ >5000 to >5x7F (0101 xxxx x111 1111)
-- VREG  6-bit, 64  @ >6000 to >6x3F (0110 xxxx xx11 1111)
-- current scanline @ >7000 to >7xx0 (0111 xxxx xxxx xxx0)
-- blanking         @ >7001 to >7xx1 (0111 xxxx xxxx xxx1)
-- 32-bit counter   @ >8000 to >8xx6 (1000 xxxx xxxx x110)
-- 32-bit rng       @ >9000 to >9xx6 (1001 xxxx xxxx x110)
-- F18A version     @ >A000 to >Axxx (1010 xxxx xxxx xxxx)
-- GPU status data  @ >B000 to >Bxxx (1011 xxxx xxxx xxxx)

VRAM = VDP RAM

GRAM = GPU RAM (only accessible by the GPU)

PRAM = Palette RAM, 16-bit access *ONLY*, i.e. you can not use MOVB to PRAM.

VREG = VDP Registers

To wait for VSYNC, you could do something like this:

NUM192 BYTE 192
.
.
.
WAIT_B CB   @>7000,@NUM192     * Wait for the blanking period
       JL   WAIT_B

.

If you want to check the current scan line, then you would just do something like:

* Wait for scan line 128
LIN128 BYTE 128
.
.
.
WAIT   CB   @>7000,@LIN128
       JH   WAIT

- or -
       MOVB @LINE128,R1
WAIT   CB   @>7000,R1
       JH   WAIT

- or -
       LI   R1,>8000
WAIT   CB   @>7000,R1
       JH   WAIT

.

Here is the GPU part of the Palette Register test I wrote. It increments a set of palette registers, which will cycle through every color over time:

* Palette Register Update Test
       DEF MAIN
       AORG >3F10
MAIN   IDLE
       LI   R0,>5010          * Update the palette registers 8 to 15
       LI   R2,8
UPDPAL INC  *R0+              * Update the PR, inc R0 by 2 (INC is a word op)
       DEC  R2
       JNE  UPDPAL
       JMP  MAIN
       END

.

This code was used in my scroll demo, based on Titanium, to block copy the name table up or down by one tile row after scrolling up or down by seven pixels. It demonstrates the GPU's ability to modify VRAM in 16-bit chunks:

**
* Use the GPU to copy the screen up or down one line.
* Assumes the name table is at >0000
*
       DEF  MAIN
       AORG >3F00
MAIN
       IDLE                   * >3F00

*      Vector table
       B    @UP               * >3F02 for scroll up
       B    @DN               * >3F06 for scroll dn
       B    @MAIN

*      Scroll up one line
UP
       LI   R0,32             * SRC
       CLR  R1                * DST
       LI   R2,368            * (768-32)/2
UPLOOP MOV  *R0+,*R1+         * 16-bit copy
       DEC  R2
       JNE  UPLOOP
       B    @MAIN

*      Scroll down one line
DN
       LI   R0,734            * SRC 768-32-2
       LI   R1,766            * DST 768-2
       LI   R2,368            * (768-32)/2
DNLOOP MOV  *R0,*R1           * 16-bit copy
       DECT R0                * Back up two bytes
       DECT R1                * Back up two bytes
       DEC  R2
       JNE  DNLOOP
       B    @MAIN

       END

.

This GPU code was used to change the color of certain scan lines by updating the palette register based on the scan line value:

* Scan line test.  All pattern data points to PR1 (>5002)
       DEF MAIN
WHITE  EQU  >0FFF
RED    EQU  >0F00
GREEN  EQU  >00F0
BLUE   EQU  >000F

       AORG >3F10
MAIN   IDLE
WAIT0  CB   @>7000,@BYT191    * Loop while between 0 and 191
       JH   WAIT0
WAIT_N CB   @>7000,@LAST      * Wait for a change in the scan line
       JEQ  WAIT_N
       MOVB @>7000,@LAST
       LI   R1,WHITE          * Lines 0, 96, and 191 will be white
       CLR  R0                * Check for certain scan lines and set specific colors
       MOVB @LAST,R0          * Using R0 so CI can be used.
       SWPB R0
       MOV  R0,R0             * Compare to 0
       JNE  TST0
       JMP  WAIT_B            * Color already white
TST0   CI   R0,96             * White at line 96
       JNE  TST1
       JMP  WAIT_B
TST1   CI   R0,191            * White at line 191
       JNE  TST2
       JMP  WAIT_B
TST2   ANDI R0,>0007          * Every 8 scan lines are red
       JNE  TST3
       LI   R1,RED
       JMP  WAIT_B
TST3   LI   R1,BLUE
       ANDI R0,>0003          * Every 4 scan lines are blue
       JEQ  WAIT_B
       LI   R1,GREEN          * All other lines are green
WAIT_B MOVB @>7001,@>7001     * Wait for the blanking period
       JEQ  WAIT_B
       MOV  R1,@>5002         * Update the palette
       B    @WAIT0
BYT191 BYTE 191
LAST   BYTE 0
       END

Edited October 20, 2013 by matthew180

Asmusr · October 19, 2013

Thank you, so the VDP registers are mapped from >6000, but where is the ordinary 9918A status register?

While we're talking GPU, it can hardly be a coincidence that the op-codes for PUSH, POP, CALL and RET are the same as for the "New Stack Instructions" that are implemented in Asm994A: PHWS, PPWS, BSTK and RSTK (described in Enhancements.pdf):

PUSH = PHWS = >0D00 (actually >2C40 for PHWS R0,R0 ???)

POP = PPWS = >0F00 (actually >2C80 for PPWS R0,R0 ???)

CALL = BSTK = >0C80 (>0CA0 symbolic)

RET = RSTK = >0C00

Unfortunately PHWS and PPWS don't seem to assemble as described. They require two operands or you get an error, which is very strange, and the op-codes are different. Any comments?

matthew180 · October 20, 2013

SR0 (the original status register) is not exposed to the GPU. Oversight I guess. Anything you can get from SR0 you can also determine via the GPU though. It would not take the GPU very long to read through the SAT to determine sprite collision status, and the scan line data plus blanking status provide the other SR0 information. Still, I suppose it would have been nice to provide the SR0 input to the GPU. Sorry. Maybe in a future firmware.

It is no coincidence about the PUSH, POP, CALL, and RET opcodes, I used the Asm994a opcodes where it made sense. However Asm994a goes *WAY* overboard with all the PUSH and POP variations, IMO. Note that my instructions only use the same "opcode" (the part of the instruction that identifies the operation to perform), and not necessarily the entire instruction *format*. My stack instructions *always* use R15 as the stack pointer and take one operand that supports all addressing modes. The stack operations are always 16-bit.

--          0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |12 |13 |14 |15 |
--         ---------------------------------------------------------------+
-- 1 arith  1 |opcode | B |  Td   |       D       |  Ts   |       S       |
-- 2 arith  0   1 |opc| B |  Td   |       D       |  Ts   |       S       |
-- 3 math   0   0   1 | --opcode- |     D or C    |  Ts   |       S       |
-- 4 jump   0   0   0   1 | ----opcode--- |     signed displacement       |
-- 5 shift  0   0   0   0   1 | --opcode- |       C       |       W       |
-- 5 stack* 0   0   0   0   1 | 1 ------opcode--- | Ts/Td |      S/D      |
-- 6 pgm    0   0   0   0   0   1 | ----opcode--- |  Ts   |       S       |
-- 7 ctrl   0   0   0   0   0   0   1 | ----opcode--- |     not used      |
-- 7 ctrl   0   0   0   0   0   0   1 | opcode & immd | X |       W       |
--
-- The stack format is new for added opcodes.  The original four shift
-- opcodes have a '0' in bit-5, but have 3-bits for the instruction
-- selection.  So, using bit-5 as a '1' allows detection of the new
-- instructions and modifies the remaining bits to specify the src or
-- dst of the operation, since the stack always works with R15.

.

I still want to write an assembler and I will add support for the new instructions, however until then you can use the new opcodes by putting them *inline* with DATA statements. I know this is a pain in the ass since you have to precode the operand, but if you use symbolic addressing then the opcode will always be the same. Here are some examples that will compile just fine with Asm994a (I used Asm994a exclusively while writing my F18A tests):

* CALL 0C80 0000 1100 10Ts SSSS
* RET  0C00 0000 1100 0000 0000
* PUSH 0D00 0000 1101 00Ts SSSS
* POP  0F00 0000 1111 00Td DDDD
* SLC  0E00 0000 1110 00Ts SSSS

CALSYM EQU  >0CA0             * CALL Symbolic 0000 1100 1010 0000
RET    EQU  >0C00             * RET  0000 1100 0000 0000
PUSH0  EQU  >0D00             * PUSH R0  0000 1101 0000 0000
POP0   EQU  >0F00             * POP  R0  0000 1111 0000 0000
PUSH11 EQU  >0D0B             * PUSH R11 0000 1111 0000 1011
POP11  EQU  >0F0B             * POP  R11 0000 1111 0000 1011

.
.
.
       DATA PUSH11            * PUSH R11, save address before BL
       BL   @GETIDX           * Typical 9900 subroutine call
       DATA POP11             * POP R11
       B    *R11              * Typical 9900 return
.
.
.
       DATA CALSYM            * CALL with symbolic addressing
       DATA IDXCPY            * Address of subroutine to call
.
.
.
IDXCPY
. . . Do something . . .
      DATA RET                * Return

.

By using the DATA statement the opcodes are simply placed in the execution path. Using CALL with symbolic addressing allows the same CALL opcode to be used for any subroutine, and the assembler can still be used to resolve the subroutine address. Also, unless you are pushing and popping all manner of registers, setting up the few PUSH and POP instructions is pretty quick.

Edited October 20, 2013 by matthew180

Asmusr · October 20, 2013

I wanted to see what the GPU was capable of, so I converted my first TI demo to run on the GPU. The original demo can be found here:

http://atariage.com/forums/topic/162941-assembly-on-the-994a/page-9?do=findComment&comment=2730495

Basically it's just drawing lines in bitmap mode. It's not using the PIX instruction or the bitmap layer. The original demo was running at 2-3 frames per second. On the GPU it's more like 200 FPS!

If you look at the code, everything from AORG GPUPRG is running on the GPU. I try to wait for vsync/blanking (source code lines 278-282), but I don't think it's working. I'm even waiting twice: for blanking, for non-blanking, and for blanking again. Probably some simple mistake.

Note that this doesn't run on Classic99. Even if you disable the vsync loop, which I wouldn't expect to work, there are still some problems with the GPU emulation. (I noticed that the instruction MOV R11,*R10+ doesn't seem to increment R10, but I'm not sure.)

While the GPU is drawing the lines, the good old TMS9900 is also pumping data to the VDP for the scrolling background. I think it's really cool how this can go on in parallel.

A final thing to notice is that I'm using the new CALL and RET instructions of the GPU, which can be entered as BSTK and RSTK in Asm994a.

Sorry for the video quality. YouTube has almost destroyed it.

gpulines.zip

matthew180 · October 20, 2013

Very Nice!

I like the way you got the GPU code to compile in with the non-GPU code. I thought I tried to do the same thing early on with Asm994a and it would not compile. That will make things a lot easier!

Nope it is not waiting for VSYNC because I gave you a bad example. Sorry. I fixed the previous example and this should work:

NUM192 BYTE 192
.
.
.
WAIT_B CB   @>7000,@NUM192     * Wait for the blanking period
       JL   WAIT_B

.

The reason using just >7001 did not work is because >7001 reports Horizontal AND Vertical blanking. Basically any time the raster is outside the visible area, >7001 will be one, otherwise it will be zero. This was done to support rater effects, and to detect VSYNC you need to check the scan line value.

The GPU in Classic99 was just a copy of the 9900 CPU code, so it does not support all of the customizations I made in the F18A. It does have a few of the changes I think, but I'm pretty sure it does not have the stack instructions yet. That is probably why it does not run on Classic99.

I'm glad BSTK and RSTK work, but I really don't like the mnemonic names... Not as simple and clear as CALL and RET.

When you get it working how you like, keep in mind that the PIX instruction will make it faster. :-) It can do the GM2 calculation as a single instruction, or using the BML it can read, compare, and optionally plot a pixel in a single instruction.

Edited October 20, 2013 by matthew180

Asmusr · October 20, 2013

With a proper wait for vsync the demo becomes a lot smoother. Now the GPU is busy waiting much of the time.

By drawing the objects from top to bottom I have enough time to blank the bitmap screen (pattern tables) each frame and draw it again before the 'beam' catches up, but to implement a 3D engine, for instance, I think you would have to use double buffering and the F18A bitmap layer, and this again means that you would only have enough VDP RAM to draw on half of the screen.

gpulines.zip

Tursi · October 20, 2013

Beautiful! Regarding the GPU emulation in Classic99, it runs the exact same code for the basic functionality, so, for instance, MOV R11,*R10+ should work fine, it's used in many places.

However, although I've implemented all of the modified GPU instructions /none/ of them have been tested. I have the information I need to make a proper spec for each of them, but no time. I'd have expected the stack instructions to work, but I took at guess at how they worked when I did the implementation and did not yet go back to verify them, so... if they are broken I'm not too surprised. F18A parts in Classic99 are all still in the "hack" phase, unfortunately, even the register unlock is not correct, it's just there as enough to be able to single-step through GPU code.

matthew180 · October 21, 2013

Oh cool, I didn't know you incorporated all the modifications.

CALL, RET, PUSH, and POP work pretty much like most stack operations on most other CPUs. They all work with R15 as the stack pointer and do a post-decrement or pre-increment accordingly:

CALL - copy PC to the stack, decrement R15, copy operand value to PC (branch).

PUSH - copy operand value to the stack, decrement R15.

RET - increment R15, copy stack value to PC.

POP - increment R15, copy stack value to operand.

In register transfer notation:

-- CALL <gas> = (R15) <= PC , R15 <= R15 - 2 , PC <= gas
-- PUSH <gas> = (R15) <= (gas) , R15 <= R15 - 2
-- POP  <gad> = R15 <= R15 + 2 , (gad) <= (R15)
-- RET        = R15 <= R15 + 2 , PC <= (R15)

F18A programming, info, and resources

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members