# Bitmap mode.

## Recommended Posts

In TI BASIC, are array elements fetched by indexed look-up or by sequential walk up the array's contents? IOW, how much of a penalty is using an array?

Beats me, but it is more than compensated for by.eliminating the CALL LINK("LINE") in 240. Looks like your demo is half way through, so it's looking like about 6 hours. In my tests I used the 16 bit data bus which really peps up TML, hence my 4 hour estimate.

##### Share on other sites

Stop watching it or it will never boil!

HA!

I've been home sick as a dog today, so I've been spending a lot of time in the den, messing with a bunch of different projects. This is the longest the TI has been on in a single session in months.

** EDIT ** 5 Hours 25 Minutes on the real thing! It sure gave the TI a real workout.

Edited by --- Ω ---

##### Share on other sites

How would we do this in fixed point? How many "virtual" decimal places would we need?

Just try it!!

##### Share on other sites

It takes around 4 hours to run as you have shown it. One easy way to get a slight speedup is to select the 2 color mode when TML starts up. Hold down a key when it starts to get the menu and select the appropriate options.

220 X1=(XI+ZI+160)*.75::Y1=90-YY+ZI (this way you only multiply once. this gives a full height hat; you can multiply y1 by .75 if you want to keep the correct proportions. I don't think you need line 230.

220 X1=XI+ZI+160 :: Y1=90-YY+ZI

230 ON ERROR 270

A big improvement can be obtained if you work from the front to the back. Make line 160:

160 FOR ZI=64 TO -64 STEP -1

Make an array to hold the maximum heights of the pixels in each column and initialize it so all the elements are 193. Then, before printing the pixel in line 235 check the row (X1) of the pixel you are about to print. If the row is less (i.e. higher on the screen) than the highest row then print the pixel and store that X1 as the new high in the array. If it is greater (i.e. lower on the screen) then skip to line 250 without printing the pixel. This way you don't need the PD in line 235 and can eliminate line 240. Drawing that line takes a lot of time. With this change it runs in about 2:20 which edges out the Atari BASIC. Would be interesting to see how the Atari fares with this change.

There is no big mystery in getting TML. Go to page 1 of Development resources on this site.

Pretty awesome analysis and optimization pass! Way to go!

I applied them to my BASIC version and updated my post back on page 3. At System Maximum, Classic99 drew the entire hat at scale 0.4 in only 12 minutes and 37 seconds. The complete elimination of overdraw made a huge difference.

Edited by Tursi

##### Share on other sites

Here's a 512K cart animation of the hat for playback in Classic99.

I certify that everything was done on a TI-99/4A. The picture shows the rendering in action...

hat8.bin

##### Share on other sites

Just try it!!

Sadly, I'm not sure I know how to . The math in this one seems a little too complex for me. Is it simply a matter of scaling *everything* up by the same powers of ten? So, if I want two decimal places, multiply everything by 100, then divide it down just before I do the plot?

So:

`150 XP=144:XR=4.71238905:XF=XR/XP`

becomes

`150 XP=14400:XR=471:XF=XR/XP`

I would imagine using fixed point could yeild orders of magnitude performance improvements. I'd be very interested in a Forth or assembly solution: One using normal FP and trig functions, and one using fixed-point and lookup tables of pre-scaled values for the trig. If it could be written as a paper/essay that would be great. Fixed point is something I'd like to spend more time looking at. It's something I used years ago (in the 90's) but I didn't know it as fixed-point. A client wanted voltages and current reported to 1 decimal place, but our PLC didn't do floating point. I just multiplied the readings by 10 and sent them on, and the client end divided them by 10. Simples.

##### Share on other sites

When you go to fixed point in assembly, unless there's a good reason to keep it, the first thing to throw out is the concept of decimal. You're working in bits now, so powers of two are more efficient. The first thing you need to determine is more less what you were trying to - how many bits of fraction do you need?

For something like this... we're plotting full screen, so we know the integer range is 0-255 - which is 8 bits. With our 16-bit CPU, that makes an 8.8 fixed point seem reasonable right off the bat. 8 bits of fraction means our smallest step is 1/256 (or about 0.00390625) -- which as a gut feel sounds like it will probably do.

For basic four-function math, fixed point is really simple. Usually, as in cases like this, you want to use the integer result -- just shift away the fraction and you have an integer. Need to convert an integer to fixed-point, just shift the other way. Want to add two? As long as they have the same number of bits of fraction (shift as needed), then just use the add instruction. (Same with subtract). Multiply and divide are a little harder because you need to remember the number of bits of fraction changes (8.8 * 8.8 = 16.16, and 16.16 / 8.8 = 8.8 -- add the bits of fraction together). The add and subtract "just working" is really nice when you want to count by a non-integer value, because you just increment/add normally, and then just work with the integer portion.

For this code, we have adding, subtracting, multiplying, sine and square root. There are two tricky bits.

At first glance, some of the math won't fit inside an 8.8 (20736-SZ, for instance), and its signed, costing us a bit. Sometimes99er did a nice table for us, so we can see what the limits are... from that it looks like 6 of 10 variables fit into a signed 8.8 value. For 3 of those other four, though, they have a maximum range of 144, which is not much greater than 128. And one (ZS) is a positive value from 0-20736.

ZS needs 15 bits to represent it, but if we look at the others, we can get by with 8 bits of integer. So if we have 1 bit of sign, and 8 bits of integer, that leaves 7 bits of fraction. (1/128 = 0.0078125). Sometimes99er's math ALSO suggests that is probably accurate enough! (This is a fantastic table, saves us a lot of work )

So, we can try to build with a signed 9.7 fixed point and still work with TI-sized words -- the nice thing about fixed point is that it doesn't care where the decimal point actually is. You can even mix precisions as long as you remember to shift up the decimal point when the numbers have to interact with each other.

ZS will require special treatment. But it's only used in two places, both in square root calculations. And that brings us to the second trouble - square root. Seems like the range of inputs is roughly 0-41472 - which is too big for a lookup table (which is the quickest way to handle sine). Best way to handle this is probably to just take in an integer, and we can find a simple algorithm and adapt that. (Wikipedia has a binary algorithm, it looks like.)

I'll see if I can build something...

##### Share on other sites

Square root calculation:

The square root of x is the biggest number r we can represent where r*r <= x.

If x is a 16 bit number r must be < 256, ie there's no need to consider bit values > 128 in the result.

So we can start with the bit value 128 (the middle bit value) and check if the square is <= x. If it is we add it to r.

We then continue with the next bit with value 64, add it to r in a temp variable and check if the square of temp is <= x. If it is we add 64 to r.

And so on down to the least significant bit. It also works with FP numbers.

##### Share on other sites

ZS will require special treatment. But it's only used in two places, both in square root calculations. And that brings us to the second trouble - square root. Seems like the range of inputs is roughly 0-41472 - which is too big for a lookup table (which is the quickest way to handle sine). Best way to handle this is probably to just take in an integer, and we can find a simple algorithm and adapt that. (Wikipedia has a binary algorithm, it looks like.)

If we forced integer before square root then there's a smaller difference in the resulting bitmaps.

//XL=Math.sqrt(20736-ZS)+0.5;

XL=Math.sqrt(int(20736-ZS))+0.5;

//XT=Math.sqrt(XI*XI+ZS)*XF;

XT=Math.sqrt(int(XI*XI+ZS))*XF;

At this point XL is integer but XT is floating point.

If we force the result of square root to be integer before multiplying then it has an interesting affect.

// XT=Math.sqrt(XI*XI+ZS)*XF;

// XT=Math.sqrt(int(XI*XI+ZS))*XF;

XT=int(Math.sqrt(int(XI*XI+ZS)))*XF;

##### Share on other sites

Interesting, thanks Sometimes99er. And thanks Rasmus, but I've already got my assembly square root coded and tested. It takes in an integer and returns a fixed point value as described above (took a few tries to work out how the heck to do that and stay in 16 bits, but it turned out to be rather simple). Hopefully it will look like Sometimes' first chart there.

It's getting close... it draws a shape resembling the curve of the brim, in the right order, but something in the inner loop is still messing up. It's hard to keep track of signed versus unsigned and keeping the shifts right, so likely I just missed something and need to step through the inner loop (I've been through the outer loop in detail and tested the square root, sine, and plot functions to my satisfaction). The big problem (and likely place I missed something) is that multiply doesn't deal with signed values on the 9900, and you have to be careful when you deal with those. Most of the multiplies are either positive or can be made positive (thanks to the squaring), but there are a few that I need to be careful.

I'm way late for bed tonight, though, so I'll dive in tomorrow. If anyone wants to take a stab at it, I'll go ahead and post it, but no worries if not, I think it's pretty close. (And though the time is meaningless since it doesn't work, it only takes a minute or so at normal speed to finish ).

``` DEF START,TEST
* THIS VERSION USES SENIORFALCON'S OPTIMIZATIONS
* FOR EVEN MORE SPEED. WHEE!
* I'VE REFLECTED THOSE CHANGES IN THE BASIC LISTING
* THANKS TO SOMETIMES99ER FOR WORKING OUT THE DATA!
* I'VE OPTED TO USE THE FIXED POINT FOR ALL VARIABLES
* EXCEPT ZS, FOR CONSISTENCY
* LABELS FOR SAVE UTILITY
SFIRST
B @START

* array for highest pixel
ROWS
BSS 256

* bits for pixel
BITS
DATA >8040,>2010,>0804,>0201

* SINE TABLE - 9.7 fixed point entries, 256 total
SINTAB
DATA 0,3,6,9,13,16,19,22
DATA 25,28,31,34,37,40,43,46
DATA 49,52,55,58,60,63,66,68
DATA 71,74,76,79,81,84,86,88
DATA 91,93,95,97,99,101,103,105
DATA 106,108,110,111,113,114,116,117
DATA 118,119,121,122,122,123,124,125
DATA 126,126,127,127,127,127,127,127
DATA 127,127,127,127,127,127,127,126
DATA 126,125,124,123,122,122,121,119
DATA 118,117,116,114,113,111,110,108
DATA 106,105,103,101,99,97,95,93
DATA 91,88,86,84,81,79,76,74
DATA 71,68,66,63,60,58,55,52
DATA 49,46,43,40,37,34,31,28
DATA 25,22,19,16,13,9,6,3
DATA 0,-3,-6,-9,-13,-16,-19,-22
DATA -25,-28,-31,-34,-37,-40,-43,-46
DATA -49,-52,-55,-58,-60,-63,-66,-68
DATA -71,-74,-76,-79,-81,-84,-86,-88
DATA -91,-93,-95,-97,-99,-101,-103,-105
DATA -106,-108,-110,-111,-113,-114,-116,-117
DATA -118,-119,-121,-122,-122,-123,-124,-125
DATA -126,-126,-127,-127,-127,-128,-128,-128
DATA -128,-128,-128,-128,-127,-127,-127,-126
DATA -126,-125,-124,-123,-122,-122,-121,-119
DATA -118,-117,-116,-114,-113,-111,-110,-108
DATA -106,-105,-103,-101,-99,-97,-95,-93
DATA -91,-88,-86,-84,-81,-79,-76,-74
DATA -71,-68,-66,-63,-60,-58,-55,-52
DATA -49,-46,-43,-40,-37,-34,-31,-28
DATA -25,-22,-19,-16,-13,-9,-6,-3

* 9.7 signed fixed point variables in registers
* note: NOT in memory, so don't use @XF
XF EQU 15
ZI EQU 14
ZT EQU 13
XL EQU 12
* RET EQU 11 - for BL
XI EQU 10
XT EQU 9
YY EQU 8
ONE EQU 7
* ZS IS AN INTEGER VALUE
ZS EQU 6
* 32-bit temp, uses 4 and 5
T32B EQU 5
T32 EQU 4
* Temp vars
T1 EQU 3
T2 EQU 2
* PIXEL VARIABLES
X1 EQU 1
Y1 EQU 0
* return save
SAVE
BSS 2

* registers for bitmap (and 5A00 is the address of the sprite table)
* background is transparent (the only color never redefined)
* PDT - >0000
* SIT - >1800
* SDT - >1800
* CT  - >2000
* SAL - >1B00
BMREGS DATA >81E0,>8002,>8206,>83ff,>8403,>8536,>8603,>8700,>5B00,>0000

START
LWPI >8300
* LOAD THE ROWS ARRAY WITH 193 ENTRIES
LI R0,ROWS
LI R1,193*256
LI R2,256
LP1
MOVB R1,*R0+
DEC R2
JNE LP1
* 140 GRAPHICS 8+16:SETCOLOR 2,0,0
BL @BITMAP
* erase the pattern table
CLR R0
CLR R1
LI R2,>1800
BL @VDPFILL
* set the color table to white on black
LI R0,>2000
LI R1,>F100
LI R2,>1800
BL @VDPFILL

* LOAD ONE WITH THE FIXED POINT VALUE OF 1 (>0080)
LI ONE,>0080

* 150 XP=144:XR=4.71238905:XF=XR/XP
* This ratio is precalculated to be 0.32725
* to get 9.7 fixed point, we can just multiply
* by 1.0 in fixed point, which is 000000001 0000000,
* or >0080, which is 128. The result in decimal
* is 4.18879 (but we can't keep the fraction, so
* we get >0004. To test we got it right, we can
* convert back, 4/128 = 0.3125, which is as close as
* we can get.) When fractions are involved, the multiply
* method for constants may be easier than shifting
LI XF,>0004
* 160 FOR ZI=64 TO -64 STEP -1
* 64 = >40, SHIFT BY 7 = >2000
* THIS *COULD* BE AN INTEGER, BUT JUST KEEPING IT CONSISTENT
LI ZI,>2000
L160
* >E000 IS -64
CI ZI,>E000
JLT L260

* 170 ZT=ZI*2.25:ZS=ZT*ZT
* We have to do two multiplies here, so we're going
* to end up in a 32-bit value temporarily anyway. That
* actually makes life a little easier.
* 2.25 * 128 = 288, WHICH IS >120
LI T32,>0120
MOV ZI,T1
ABS T1   * this is okay, because we are going to square it anyway
MPY T1,T32

* now T32 is 32-bits wide, and contains an 18.14 bit number.
* but we want to multiply again (to square it), so we need
* two 9.7 numbers. We need to shift by 7 bits (if it was 8,
* we could just move bytes around. Oh well  ).
* so this is a little ugly, but we want to keep 9 bits from
* T32B, and move in 7 bits from T32
SRL T32B,7  * make room, throwing away 7 fractional bits
SLA T32,9  * get the more significant bits into the right place
SOC T32,T32B * merge the two 16-bit words
* now just put them into place, and multiply again
* we know from analysis that the 'sign bit' shouldn't be set IN THIS CODE
MOV T32B,T32
MOV T32B,T1
MPY T1,T32

* So, T32 now contains a 32-bit 18.14 number, again, but for simplicity we
* are going to move that down into ZS as a 16-bit unsigned integer
* so we just need to extract 16 bits of integer, as we don't expect overflow
* and don't want fraction. Of course, those 16 bits are split across the
* two words...
MOV T32B,ZS  * least significant - we want two bits from this
SRL ZS,14  * toss the rest
SLA T32,2  * prepare the most significant
SOC T32,ZS  * and merge it in

* 180 XL=INT(SQR(20736-ZS)+0.5)
* ZS is a normal int, so this shouldn't be too bad to start
* the result is also an int, and the +0.5 is just for rounding
* our sqrt will return one of our fractional values, as noted,
* to be consistent.
LI T1,20736
S ZS,T1
BL @SQRT  * T1 IN AS positive INT, T1 OUT AS 9.7
AI T1,>40  * 0.5 is 1/2 of 128, so this adds .5 - when we truncate, we get rounding.
ANDI T1,>FF80 * zero out any fraction
MOV T1,XL  * and store it
* 190 FOR XI=0-XL TO XL
CLR XI
S XL,XI
L190
C XI,XL
JGT L255
* 200 XT=SQR(XI*XI+ZS)*XF
* pretty similar to above, again we are squaring to get positive
* so that makes the unsigned MPY easier to deal with
MOV XI,T1
ABS T1   * make sure it's positive
MOV T1,T32
MPY T1,T32  * XI*XI - get it down to integer then add ZS
MOV T32B,T1  * least significant - we want two bits from this
SRL T1,14  * toss the rest
SLA T32,2  * prepare the most significant
SOC T32,T1  * and merge it in
A ZS,T1   * add ZS (we're an integer so can just add)
BL @SQRT  * T1 in as positive int, T1 OUT as 9.7
MOV XF,T32  * prepare to mult - we know these values are positive
MPY T1,T32  * do it
SRL T32B,7  * make room, throwing away 7 fractional bits
SLA T32,9  * get the more significant bits into the right place
SOC T32,T32B * merge the two 16-bit words
MOV T32B,XT
* 210 YY=(SIN(XT)+SIN(XT*3)*0.4)*56
MOV XT,T1  * prepare for first sine
BL @SINE  * convert T1 to SIN(T1) (both In and Out are 9.7)
MOV T1,T32  * save scratch - we know mults are coming
MOV XT,T1  * prepare for second
A XT,T1   * simpler than MPY by 3, no need to shift result
A XT,T1
BL @SINE  * as above
CLR Y1   * temp flag for negative
A T1,T32  * add together - now this MAY be negative!
JGT NOTNEG1
JEQ NOTNEG1
SETO Y1   * it is negative, so remember and make positive
ABS T32
NOTNEG1
LI T1,>0033  * roughly 0.4 (actually 0.398)
MPY T1,T32  * does the multiply - you know the drill, fix up number
SRL T32B,7  * make room, throwing away 7 fractional bits
SLA T32,9  * get the more significant bits into the right place
SOC T32,T32B * merge the two 16-bit words
MOV T32B,T1  * from analysis, we know this value will be less than 1

LI T32,>1C00 * 56 x less than 1 will be less than 56, so it fits
MPY T1,T32

SRL T32B,7  * make room, throwing away 7 fractional bits
SLA T32,9  * get the more significant bits into the right place
SOC T32,T32B * merge the two 16-bit words
MOV T32B,YY

MOV Y1,Y1  * check if it should be negative
JEQ NOTNEG2
NEG YY   * yes, it should
NOTNEG2
* 220 X1=XI*0.75+ZI+128:Y1=90-YY+ZI
CLR Y1   * temp negative flag
MOV XI,X1
JGT NOTNEG3
JEQ NOTNEG3
SETO Y1   * it is negative, remember
ABS X1
NOTNEG3
LI T32,>0060 * 0.75
MPY X1,T32
SRL T32B,7  * make room, throwing away 7 fractional bits
SLA T32,9  * get the more significant bits into the right place
SOC T32,T32B * merge the two 16-bit words
MOV T32B,X1
MOV Y1,Y1  * check for negative
JEQ NOTNEG4
NEG X1   * it is negative, fix it
NOTNEG4
A ZI,X1
SRA X1,7  * make integer for the plot function (sign extend!)

MOV YY,Y1
A ZI,Y1
SRA Y1,7  * make integer for the plot function
AI Y1,-90
* 225 IF ROWS(X1)<=Y1 THEN 250
CB @ROWS(X1),Y1
JLE L250

* 230 ROWS(X1)=Y1
MOVB Y1,@ROWS(X1)
* 240 TRAP 250:COLOR 1:PLOT X1,Y1
* NOTE: PLOT EXPECTS THE PIXEL IN REGISTERS X1,Y1
BL @PLOT
* 250 NEXT XI
L250
A ONE,XI  * THIS IS WHY I STORED 'ONE' IN A REGISTER
JMP L190
* 255 NEXT ZI
L255
S ONE,ZI  * WORKS HERE TOO!
JMP L160

* 260 GOTO 260
L260
LIMI 2
LIMI 0
JMP L260

* VDP access
* Write single byte to R0 from MSB R1
* Destroys R0 (actually just oRs it)
VSBW
ORI R0,>4000
SWPB R0
MOVB R0,@>8C02
SWPB R0
MOVB R0,@>8C02
MOVB R1,@>8C00
B *R11
* Write R2 bytes from R1 to VDP R0
* Destroys R0,R1,R2
VDPFILL
ORI R0,>4000
SWPB R0
MOVB R0,@>8C02
SWPB R0
MOVB R0,@>8C02
VMBWLP
MOVB R1,@>8C00
DEC R2
JNE VMBWLP
B *R11

VDPWA
SWPB R0
MOVB R0,@>8C02
SWPB R0
MOVB R0,@>8C02
B *R11

* Read single byte at R0 into MSB R1
VSBR
SWPB R0
MOVB R0,@>8C02
SWPB R0
MOVB R0,@>8C02
MOVB @>8800,R1
B *R11
* load regs list to VDP address, end on >0000 and write >D0 (for sprites)
* address of table in R1 (destroyed)
MOV *R1+,R0
JEQ LDRDN
SWPB R0
MOVB R0,@>8C02
SWPB R0
MOVB R0,@>8C02
LDRDN
LI R1,>D000
MOVB R1,@>8C00
B *R11
* Setup for normal bitmap mode
* returns with video off - set VDP R1 to E2 to enable (>81E2)
BITMAP
MOV R11,@SAVE
* set display and disable sprites
LI R1,BMREGS

* set up SIT - We load the standard 0-255, 3 times
LI R0,>5800
BL @VDPWA
CLR R2
NQ#
CLR R1
LP#
MOVB R1,@>8C00
AI R1,>0100
CI R1,>0000
JNE LP#
INC R2
CI R2,3
JNE NQ#

MOV @SAVE,R11
B *R11
* IN AND OUT IN T1
* T1 in = integer
* T1 out = 9.7 signed fixed point
* Uses T2,X1,Y1,T32
* http://samples.sainsburysebooks.co.uk/9781483296692_sample_809121.pdf
* modified a bit - we pretend the input is a 16.8 value (the
* entire fractional part will be 0), that let's us get out a
* 8.8 value, because the algorithm needs an even number of fractional
* bits. Then we just shift once to get .7
SQRT
CLR X1    root
CLR T2    remHi (t1 is remLo)
LI Y1,16   count = ((WORD/2-1)+(FRACBITS>>1)) -> 11+4, +1 for loop
SQRT1
SLA T2,2   remHi = (remHi << 2) | (remLo >> 14);
MOV T1,T32
SRL T32,14
SOC T32,T2
SLA T1,2   remLo <<= 2;
SLA X1,1   root <<= 1;
MOV X1,T32   testDiv = (root << 1) + 1;
SLA T32,1
INC T32
C T2,T32   if (remHi >= testDiv) {
JL SQRT2
S T32,T2   remHi -= testDiv;
INC X1    root += 1;
SQRT2
DEC Y1    while (--count != 0);
JNE SQRT1

MOV X1,T1   return( root);
SRL T1,1   Get it down to x.7 fixed point
B *R11
* IN AND OUT IN T1, uses T32
* BOTH DIRECTIONS are signed 9.7 fixed point
* T1 in is in RADIANS.
* we have 256 entries representing a circle,
* so we need to convert radians to units.
* the /rough/ conversion ratio is to multiply
* by 40.5845. (2PI radians = 255 units).
* our table starts at zero, so we normalize
* that too. The nice thing about a 256 entry
* table is we just need to extract a byte for
* the index, it'll loop nicely.
SINE
MOV T1,T1   a loop is probably not most efficient way to do this..
JGT SINE1   branch if no longer negative
JEQ SINE1   zero is okay too
SINELP
AI T1,>0324   two * PI is roughly this (6.28125)
JLT SINELP   not there yet, hopefully not too far!
SINE1
LI T32,>144A  roughly 40.5845 (40.578125)
MPY T1,T32   convert to units - we want an int, but to round we first want 15.1
SRL T32B,13   make room, throwing away 13 fractional bits
SLA T32,3   get the more significant bits into the right place
SOC T32,T32B  merge the two 16-bit words
INC T32B   add 0.5 to the 15.1 number for rounding's sake
ANDI T32B,>01FE  SHOULD be okay, but doesn't cost much to be safe (make word too)
MOV @SINTAB(T32B),T1 get result from table
B *R11
* INPUT X1,Y1 - kills T1,T2 as well
PLOT
* use the E/A routine for address
MOV  Y1,T1        R1 is the Y value.
SLA  T1,5
SOC  Y1,T1
ANDI T1,>FF07
MOV  X1,T2        R0 is the X value.
ANDI T2,7
A    X1,T1        T1 is the byte offset.
S    T2,T1        T2 is the bit offset.

* now to the VDP functions, which will screw up X1,Y1
MOV R11,@SAVE
MOV T1,R0
BL @VSBR
SOCB @BITS(T2),R1
BL @VSBW
MOV @SAVE,R11
B *R11

TEST
* TEST THE SINE AND PLOT FUNCTIONS
LWPI >8300
* 140 GRAPHICS 8+16:SETCOLOR 2,0,0
BL @BITMAP
* erase the pattern table
CLR R0
CLR R1
LI R2,>1800
BL @VDPFILL
* set the color table to white on black
LI R0,>2000
LI R1,>F100
LI R2,>1800
BL @VDPFILL

* LOAD ONE WITH THE FIXED POINT VALUE OF 1 (>0080)
CLR XT   * X coordinate
LPLP
MOV XT,T1
SLA T1,1
MOV XT,T32
LI T1,3   * 9.7 fraction
MPY T1,T32  * 25.7 here because of the integer x
MOV T32B,T1  * so this is 9.7
BL @SINE
MOV T1,T32
LI T1,96
MPY T1,T32  * 25.7 format now! get 9.7 is all we care about and works if that's all we use
SRA T32B,7  * deletes ALL the fraction now - don't merge anything
MOV T32B,Y1
AI Y1,96
MOV XT,X1
BL @PLOT
INC XT
CI XT,256
JNE LPLP
b @L260

SLAST
END
```

##### Share on other sites

I got 98 errors after I tried to compile it. Were there any options I needed to set? Anyway I'll have to play later, I have to get ready and leave.

##### Share on other sites

Here's a 512K cart animation for playback in Classic99.

I certify that everything was done on a TI-99/4A. The picture shows the rendering in action...

This is truly amazing and shows just what the humble TI is capable of, it is the most convincing 3D image I have ever seen on an early micro, you would almost swear that the rendered image was sitting on top of some bits & pieces that had been acting as a support for a real hat!!!

##### Share on other sites

Have I already mentioned that I love my Geneve - more than ever before after this challenge?

Edited by mizapf

##### Share on other sites

Ohm, I don't know why it didn't build for you... built with Asm994A but there shouldn't be any weird assumptions. It would help to know what and which line some of the errors were on.

BUT, anyway, here is the working version. Had some silly math errors in my code, but happily the sine and square root function work well. (People are free to borrow the square root if they like, the sine table is probably not so helpful to other tasks.) Runtime with NO overdrive is roughly 64s.

Video, too:

http://youtu.be/SQS8NWYXLfo

You can certainly see the rounding errors, but, it draws very quickly (no overdrive, see video). Again, Senior Falcon's optimizations make a big difference, and I'm sure they'd help the Atari a lot too. (Actually, I have an A8 emulator, so I tried it. It was still pretty slow, but running in high speed gave a possible time of a bit over 2 hours - line draws are probably not too slow on the A800!)

It's a bit rough in places, you can see the rounding errors if you look (and there's an odd notch on the top of the hat), but it did the right thing. Not bad for 16-bit numbers!

``` DEF START,TEST
* THIS VERSION USES SENIORFALCON'S OPTIMIZATIONS
* FOR EVEN MORE SPEED. WHEE!
* I'VE REFLECTED THOSE CHANGES IN THE BASIC LISTING
* THANKS TO SOMETIMES99ER FOR WORKING OUT THE DATA!
* I'VE OPTED TO USE THE FIXED POINT FOR ALL VARIABLES
* EXCEPT ZS, FOR CONSISTENCY
* LABELS FOR SAVE UTILITY
SFIRST
B @START

* array for highest pixel
ROWS
BSS 256

* bits for pixel
BITS
DATA >8040,>2010,>0804,>0201

* SINE TABLE - 9.7 fixed point entries, 256 total
SINTAB
DATA 0,3,6,9,13,16,19,22
DATA 25,28,31,34,37,40,43,46
DATA 49,52,55,58,60,63,66,68
DATA 71,74,76,79,81,84,86,88
DATA 91,93,95,97,99,101,103,105
DATA 106,108,110,111,113,114,116,117
DATA 118,119,121,122,122,123,124,125
DATA 126,126,127,127,127,127,127,127
DATA 127,127,127,127,127,127,127,126
DATA 126,125,124,123,122,122,121,119
DATA 118,117,116,114,113,111,110,108
DATA 106,105,103,101,99,97,95,93
DATA 91,88,86,84,81,79,76,74
DATA 71,68,66,63,60,58,55,52
DATA 49,46,43,40,37,34,31,28
DATA 25,22,19,16,13,9,6,3
DATA 0,-3,-6,-9,-13,-16,-19,-22
DATA -25,-28,-31,-34,-37,-40,-43,-46
DATA -49,-52,-55,-58,-60,-63,-66,-68
DATA -71,-74,-76,-79,-81,-84,-86,-88
DATA -91,-93,-95,-97,-99,-101,-103,-105
DATA -106,-108,-110,-111,-113,-114,-116,-117
DATA -118,-119,-121,-122,-122,-123,-124,-125
DATA -126,-126,-127,-127,-127,-128,-128,-128
DATA -128,-128,-128,-128,-127,-127,-127,-126
DATA -126,-125,-124,-123,-122,-122,-121,-119
DATA -118,-117,-116,-114,-113,-111,-110,-108
DATA -106,-105,-103,-101,-99,-97,-95,-93
DATA -91,-88,-86,-84,-81,-79,-76,-74
DATA -71,-68,-66,-63,-60,-58,-55,-52
DATA -49,-46,-43,-40,-37,-34,-31,-28
DATA -25,-22,-19,-16,-13,-9,-6,-3

* note: NOT in memory, so don't use @XF
* 9.7 signed fixed point variables in registers
XF EQU 15
XT EQU 14
YY EQU 13
* INTEGER VALUES
ZS EQU 12
* RET EQU 11 - for BL
ZI EQU 10
XL EQU 9
XI EQU 8
* 32-bit temp, uses 4 and 5
T32B EQU 7
T32 EQU 6
T16 EQU 5
* Temp vars
T1 EQU 4
T2 EQU 3
NEGFL EQU 2
* PIXEL VARIABLES
X1 EQU 1
Y1 EQU 0
* return save
SAVE
BSS 2

* registers for bitmap (and 5A00 is the address of the sprite table)
* background is transparent (the only color never redefined)
* PDT - >0000
* SIT - >1800
* SDT - >1800
* CT  - >2000
* SAL - >1B00
BMREGS DATA >81E0,>8002,>8206,>83ff,>8403,>8536,>8603,>8700,>5B00,>0000

START
LWPI >8300
* LOAD THE ROWS ARRAY WITH 193 ENTRIES
LI R0,ROWS
LI R1,193*256
LI R2,256
LP1
MOVB R1,*R0+
DEC R2
JNE LP1
* 140 GRAPHICS 8+16:SETCOLOR 2,0,0
BL @BITMAP
* erase the pattern table
CLR R0
CLR R1
LI R2,>1800
BL @VDPFILL
* set the color table to white on black
LI R0,>2000
LI R1,>F100
LI R2,>1800
BL @VDPFILL

* 150 XP=144:XR=4.71238905:XF=XR/XP
* This ratio is precalculated to be 0.032725
* to get 9.7 fixed point, we can just multiply
* by 1.0 in fixed point, which is 000000001 0000000,
* or >0080, which is 128. The result in decimal
* is 4.18879 (but we can't keep the fraction, so
* we get >0004. To test we got it right, we can
* convert back, 4/128 = 0.03125, which is as close as
* we can get.) When fractions are involved, the multiply
* method for constants may be easier than shifting
LI XF,>0004
* 160 FOR ZI=64 TO -64 STEP -1
* Making this an integer!
* THIS *COULD* BE AN INTEGER, BUT JUST KEEPING IT CONSISTENT
LI ZI,64
L160
CI ZI,-64
JLT L260

* 170 ZT=ZI*2.25:ZS=ZT*ZT
* We have to do two multiplies here, so we're going
* to end up in a 32-bit value temporarily anyway. That
* actually makes life a little easier.
* 2.25 * 128 = 288, WHICH IS >120
* note: ZT not used
LI T32,>0120
MOV ZI,T1
ABS T1   * this is okay, because we are going to square it anyway
MPY T1,T32

* now T32 is 32-bits wide, and contains an 25.7 bit number.
* ZI(16.0) times T32 (9.7) yields 25.7 bits.
* So since we want a 9.7, we just have to take the least
* significant word, no shifting needed! Of course we ignore
* the possibility of overflow, but the largest value should
* be 64*2.25 = 144, which fits in 9 bits.
* now just put them into place, and multiply again
* we know from analysis that the 'sign bit' shouldn't be set here
MOV T32B,T32
MOV T32B,T1
MPY T1,T32

* So, T32 now contains a 32-bit 18.14 number, but for simplicity we
* are going to move that down into ZS as a 16-bit unsigned integer
* so we just need to extract 16 bits of integer, as we don't expect overflow
* and don't want fraction. Of course, those 16 bits are split across the
* two words...
MOV T32B,ZS  * least significant - we want two bits from this
SRL ZS,14  * toss the rest
SLA T32,2  * prepare the most significant
SOC T32,ZS  * and merge it in

* 180 XL=INT(SQR(20736-ZS)+0.5)
* ZS is a normal int, so this shouldn't be too bad to start
* the result is also an int, and the +0.5 is just for rounding
* our sqrt will return one of our fractional values, as noted,
* to be consistent.
LI T1,20736
S ZS,T1
BL @SQRT  * T1 IN AS positive INT, T1 OUT AS 9.7
AI T1,>40  * 0.5 is 1/2 of 128, so this adds .5 - when we truncate, we get rounding.
SRL T1,7  * make an integer for counting
MOV T1,XL  * and store it
* 190 FOR XI=0-XL TO XL
CLR XI
S XL,XI
L190
C XI,XL
JGT L255
* 200 XT=SQR(XI*XI+ZS)*XF
* pretty similar to above, again we are squaring to get positive
* so that makes the unsigned MPY easier to deal with
MOV XI,T1  * Integer
ABS T1   * make sure it's positive
MOV T1,T32
MPY T1,T32  * XI*XI - 16.0 * 16.0 = 32.0, so just take the LSW
MOV T32B,T1  * least significant - still 16.0
A ZS,T1   * add ZS (we're an integer so can just add - max is 41472, so unsigned!)
BL @SQRT  * T1 in as positive int, T1 OUT as 9.7
MOV XF,T32  * prepare to mult - we know these values are positive
MPY T1,T32  * do it - 9.7*9.7 = 18.14
SRL T32B,7  * make room, throwing away 7 fractional bits
SLA T32,9  * get the more significant bits into the right place
SOC T32,T32B * merge the two 16-bit words
MOV T32B,XT
* 210 YY=(SIN(XT)+SIN(XT*3)*0.4)*55 -- was 55, needed to adjust for rounding errors
* order of op, we do SIN(XT*3)*0.4 first...
MOV XT,T1  * prepare for second sine
A XT,T1   * simpler than MPY by 3, no need to shift result
A XT,T1
BL @SINE  * as above (sin uses T32 & T32B!)
CLR NEGFL  * temp flag for negative
MOV T1,T32  * prepare for mult and test
JGT NOTNEG1
JEQ NOTNEG1
SETO NEGFL  * it is negative, so remember and make positive
ABS T32
NOTNEG1
LI T1,>0033  * roughly 0.4 (actually 0.398)
MPY T1,T32  * does the multiply - you know the drill, fix up number
SRL T32B,7  * make room, throwing away 7 fractional bits
SLA T32,9  * get the more significant bits into the right place
SOC T32,T32B * merge the two 16-bit words
MOV T32B,T16 * from analysis, we know this value will be less than 1, save it
MOV NEGFL,NEGFL * check if it should be negative
JEQ NOTNEG2
NEG T16   * yes, it should
NOTNEG2

MOV XT,T1  * prepare for first sine
BL @SINE  * convert T1 to SIN(T1) (both In and Out are 9.7)
CLR NEGFL  * temp flag for negative
A T1,T16  * add it in to the stored result
JGT NOTNEG3
JEQ NOTNEG3
SETO NEGFL  * it is negative, so remember
ABS T16
NOTNEG3

LI T32,>1B80 * 55 x less than 1 will be less than 55, so it fits
MPY T16,T32

SRL T32B,7  * make room, throwing away 7 fractional bits
SLA T32,9  * get the more significant bits into the right place
SOC T32,T32B * merge the two 16-bit words
MOV T32B,YY

MOV NEGFL,NEGFL * check if it should be negative
JEQ NOTNEG4
NEG YY   * yes, it should
NOTNEG4
* 220 X1=XI*0.75+ZI+128:Y1=90-YY+ZI
CLR NEGFL  * temp negative flag
MOV XI,X1  * integer
JGT NOTNEG5
JEQ NOTNEG5
SETO NEGFL  * it is negative, remember
ABS X1
NOTNEG5
LI T32,>0060 * 0.75
MPY X1,T32  * now 25.7, so just take the LSW
MOV T32B,X1  * now 9.7
MOV NEGFL,NEGFL * check for negative
JEQ NOTNEG6
NEG X1   * it is negative, fix it
NOTNEG6
SRA X1,7  * make integer for the plot function (sign extend!)
A ZI,X1   * add (integer) ZI
LI Y1,>2D00  * 90 in 9.7
S YY,Y1
SRA Y1,7  * make integer for the plot function
A ZI,Y1   * add (integer) ZI
* 225 IF ROWS(X1)<=Y1 THEN 250
SWPB Y1   * stupid Big Endian....
CB @ROWS(X1),Y1
JLE L250

* 230 ROWS(X1)=Y1
MOVB Y1,@ROWS(X1)
SWPB Y1
* 240 TRAP 250:COLOR 1:PLOT X1,Y1
* NOTE: PLOT EXPECTS THE PIXEL IN REGISTERS X1,Y1
BL @PLOT
* 250 NEXT XI
L250
INC XI
JMP L190
* 255 NEXT ZI
L255
DEC ZI
JMP L160

* 260 GOTO 260
L260
LIMI 2
LIMI 0
JMP L260

* VDP access
* Write single byte to R0 from MSB R1
* Destroys R0 (actually just oRs it)
VSBW
ORI R0,>4000
SWPB R0
MOVB R0,@>8C02
SWPB R0
MOVB R0,@>8C02
MOVB R1,@>8C00
B *R11
* Write R2 bytes from R1 to VDP R0
* Destroys R0,R1,R2
VDPFILL
ORI R0,>4000
SWPB R0
MOVB R0,@>8C02
SWPB R0
MOVB R0,@>8C02
VMBWLP
MOVB R1,@>8C00
DEC R2
JNE VMBWLP
B *R11

VDPWA
SWPB R0
MOVB R0,@>8C02
SWPB R0
MOVB R0,@>8C02
B *R11

* Read single byte at R0 into MSB R1
VSBR
SWPB R0
MOVB R0,@>8C02
SWPB R0
MOVB R0,@>8C02
MOVB @>8800,R1
B *R11
* load regs list to VDP address, end on >0000 and write >D0 (for sprites)
* address of table in R1 (destroyed)
MOV *R1+,R0
JEQ LDRDN
SWPB R0
MOVB R0,@>8C02
SWPB R0
MOVB R0,@>8C02
LDRDN
LI R1,>D000
MOVB R1,@>8C00
B *R11
* Setup for normal bitmap mode
BITMAP
MOV R11,@SAVE
* set display and disable sprites
LI R1,BMREGS

* set up SIT - We load the standard 0-255, 3 times
LI R0,>5800
BL @VDPWA
CLR R2
NQ#
CLR R1
LP#
MOVB R1,@>8C00
AI R1,>0100
CI R1,>0000
JNE LP#
INC R2
CI R2,3
JNE NQ#

MOV @SAVE,R11
B *R11
* IN AND OUT IN T1
* T1 in = integer
* T1 out = 9.7 signed fixed point
* Uses T2,X1,Y1,T32
* http://samples.sainsburysebooks.co.uk/9781483296692_sample_809121.pdf
* modified a bit - we pretend the input is a 16.8 value (the
* entire fractional part will be 0), that let's us get out a
* 8.8 value, because the algorithm needs an even number of fractional
* bits. Then we just shift once to get .7
SQRT
CLR X1    root
CLR T2    remHi (t1 is remLo)
LI Y1,16   count = ((WORD/2-1)+(FRACBITS>>1)) -> 11+4, +1 for loop
SQRT1
SLA T2,2   remHi = (remHi << 2) | (remLo >> 14);
MOV T1,T32
SRL T32,14
SOC T32,T2
SLA T1,2   remLo <<= 2;
SLA X1,1   root <<= 1;
MOV X1,T32   testDiv = (root << 1) + 1;
SLA T32,1
INC T32
C T2,T32   if (remHi >= testDiv) {
JL SQRT2
S T32,T2   remHi -= testDiv;
INC X1    root += 1;
SQRT2
DEC Y1    while (--count != 0);
JNE SQRT1

MOV X1,T1   return( root);
SRL T1,1   Get it down to x.7 fixed point
B *R11
* IN AND OUT IN T1, uses T32
* BOTH DIRECTIONS are signed 9.7 fixed point
* T1 in is in RADIANS.
* we have 256 entries representing a circle,
* so we need to convert radians to units.
* the /rough/ conversion ratio is to multiply
* by 40.5845. (2PI radians = 255 units).
* our table starts at zero, so we normalize
* that too. The nice thing about a 256 entry
* table is we just need to extract a byte for
* the index, it'll loop nicely.
SINE
MOV T1,T1   a loop is probably not most efficient way to do this..
JGT SINE1   branch if no longer negative
JEQ SINE1   zero is okay too
SINELP
AI T1,>0324   two * PI is roughly this (6.28125)
JLT SINELP   not there yet, hopefully not too far!
SINE1
LI T32,>144A  roughly 40.5845 (40.578125)
MPY T1,T32   convert to units - we want an int, but to round we first want 15.1
SRL T32B,13   make room, throwing away 13 fractional bits
SLA T32,3   get the more significant bits into the right place
SOC T32,T32B  merge the two 16-bit words
INC T32B   add 0.5 to the 15.1 number for rounding's sake
ANDI T32B,>01FE  SHOULD be okay, but doesn't cost much to be safe (make word too)
MOV @SINTAB(T32B),T1 get result from table
B *R11
* INPUT X1,Y1 - kills T1,T2 as well
PLOT
* use the E/A routine for address
MOV  Y1,T1        R1 is the Y value.
SLA  T1,5
SOC  Y1,T1
ANDI T1,>FF07
MOV  X1,T2        R0 is the X value.
ANDI T2,7
A    X1,T1        T1 is the byte offset.
S    T2,T1        T2 is the bit offset.

* now to the VDP functions, which will screw up X1,Y1
MOV R11,@SAVE
MOV T1,R0
BL @VSBR
SOCB @BITS(T2),R1
BL @VSBW
MOV @SAVE,R11
B *R11

TEST
* TEST THE SINE AND PLOT FUNCTIONS
LWPI >8300
* 140 GRAPHICS 8+16:SETCOLOR 2,0,0
BL @BITMAP
* erase the pattern table
CLR R0
CLR R1
LI R2,>1800
BL @VDPFILL
* set the color table to white on black
LI R0,>2000
LI R1,>F100
LI R2,>1800
BL @VDPFILL
* draw a sine wave
CLR XT   * X coordinate
LPLP
MOV XT,T1
SLA T1,1
MOV XT,T32
LI T1,3   * 9.7 fraction (pixels to radians)
MPY T1,T32  * 25.7 here because of the integer x
MOV T32B,T1  * so this is 9.7
BL @SINE
MOV T1,T32
LI T1,96
MPY T1,T32  * 25.7 format now! get 9.7 is all we care about and works if that's all we use
SRA T32B,7  * deletes ALL the fraction now - don't merge anything
MOV T32B,Y1
AI Y1,96
MOV XT,X1
BL @PLOT
INC XT
CI XT,256
JNE LPLP
b @L260

SLAST
END
```

I took this one cause I thought it might be a good example of fixed point, but it's not. I had to switch between integers and fixed to keep parts of the math simple, and dealing with negative numbers and MPY results larger than 16 bits made those parts of the code somewhat complex. Sorry about that.

##### Share on other sites

Very nice. I might borrow your SQRT routine for another project.

Any point in trying to convert this to run on the F18A GPU? I guess it would finish in 2-3 secs.

##### Share on other sites

(Actually, I have an A8 emulator, so I tried it. It was still pretty slow, but running in high speed gave a possible time of a bit over 2 hours - line draws are probably not too slow on the A800!)

I don't know if you have the latest beta version of the Altirra emulator for the Atari, but it has an optimized version of Basic available and the option to use improved floating point routines. I'd be interested to see what the Atari could do with your optimized code with those 2 options. Could you post the updated code or try those emulation options yourself?

Thanks,

Bob

##### Share on other sites

I won't be pulling down any new Atari emulators here... but this should be it for the A8. I spaced out the lines with changes so you can see how little difference there is.

```10 DIM RR[320]
20 FOR I=0 TO 320:RR[I]=193:NEXT I

140 GRAPHICS 8+16:SETCOLOR 2,0,0
150 XP=144:XR=4.71238905:XF=XR/XP

160 FOR ZI=64 TO -64 STEP -1

170 ZT=ZI*2.25:ZS=ZT*ZT
180 XL=INT(SQR(20736-ZS)+0.5)
190 FOR XI=0-XL TO XL
200 XT=SQR(XI*XI+ZS)*XF
210 YY=(SIN(XT)+SIN(XT*3)*0.4)*56
220 X1=XI+ZI+160:Y1=90-YY+ZI

223 IF RR(X1)<=Y1 THEN 250
226 RR(X1)=Y1

230 TRAP 250:COLOR 1:PLOT X1,Y1

240 REM COLOR 0:PLOT X1,Y1+1:DRAWTO X1,191

250 NEXT XI:NEXT ZI
260 GOTO 260
```

Edited by Tursi

##### Share on other sites

Very nice. I might borrow your SQRT routine for another project.

Any point in trying to convert this to run on the F18A GPU? I guess it would finish in 2-3 secs.

Seems like it would.. but without an F18A here I don't intend to do it. The total program is just slightly over 1k, so it would fit in its entirety.

##### Share on other sites

I actually used this 'graph' as a benchmark for various configurations of TMS99xx processors running a port of the Powertran Cortex BASIC. Quickest was 1 minute 17 seconds. [OK, OK, I suspect yours is a little more detailed. And it's not symmetrical so you have to calculate both the left and right sides. ]

Details at [http://www.avjd51.dsl.pipex.com/tms99110_breadboard/tms99110_breadboard.htm] and scroll right to the bottom of the page.

A copy of the program for the BBC Micro here: [http://41j.com/blog/wp-content/uploads/2012/03/beebug_3dobj.jpg]. Slightly different parameters for the graph again.

I would love to see your Powertran Cortex Basic for the TI-99/4A in cartridge format. The link for the Cortex BASIC user guide seems broken !?

Your "hat" inspired me to do it in Flash. Wonder how the depth is managed within the code ...

[media=544,416]http://sometimes.planet-99.net/pic/hat3.swf[/media]

##### Share on other sites

I just tried the optimized Atari code using the faster Basic and floating point available in Altirra and it clocked in at an impressive 6.5 minutes. How come I don't see the defects in the Atari output like in the TI example? FYI line 10 and 20 of your example have the wrong type of brackets, but it was a quick fix "[ ] = ( )"

Below is the new output, I didn't do any real analysis, but it looks pretty close to the same to me.

Bob

Edited by bfollett

##### Share on other sites

I would love to see your Powertran Cortex Basic for the TI-99/4A in cartridge format. The link for the Cortex BASIC user guide seems broken !?

Have repaired the link to the Cortex user guide. Thanks for pointing it out.

Stuart.

##### Share on other sites

Cortex Basic is awesome. I just added it to the Software/Apps menu in http://js99er.net/

##### Share on other sites

How come I don't see the defects in the Atari output like in the TI example?

I'm guessing that is because the Atari program is written for 320 columns of pixels and steps one pixel column at a time. The TI programs (mine included) don't change the program but instead multiply the column by .75 which means the program will try to display 4 pixels every three columns, determined by how the column number is rounded off. A proper fix would modify the program earlier and eliminate the need to multiply by .75.

##### Share on other sites

The version of the Powertran Cortex BASIC manual that I reconstructed a few years ago is also available on the Powertran Cortex website from Dave Hunter. It is a really great Cortex resource site. http://www.powertrancortex.com/documentation.html

## Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

×   Pasted as rich text.   Paste as plain text instead

Only 75 emoji are allowed.

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.