Bitmap mode.

bfollett · February 28, 2015

I'm guessing that is because the Atari program is written for 320 columns of pixels and steps one pixel column at a time. The TI programs (mine included) don't change the program but instead multiply the column by .75 which means the program will try to display 4 pixels every three columns, determined by how the column number is rounded off. A proper fix would modify the program earlier and eliminate the need to multiply by .75.

That could be it. If that change is made the program would probably run a bit faster not having to do an extra multiplication on each pixel plot.

Bob

senior_falcon · February 28, 2015

Eliminating the extra multiplication would lead to a trivial increase in speed. What would speed the program up a lot is to only do the computation for 3/4 the number of pixels!

Tursi · February 28, 2015

I just tried the optimized Atari code using the faster Basic and floating point available in Altirra and it clocked in at an impressive 6.5 minutes. How come I don't see the defects in the Atari output like in the TI example? FYI line 10 and 20 of your example have the wrong type of brackets, but it was a quick fix "[ ] = ( )"

Below is the new output, I didn't do any real analysis, but it looks pretty close to the same to me.

Bob

Because you're using floating point and I'm not. The defects are not an artifact of the optimizations - those are spot on accurate (I also did a lot of testing on the Apple2 emulator, to try ideas out, and it's quite fast there unthrottled). Basically my TI assembly artifacts are because of compounded rounding errors -- even some of my steps are actually off, but just by a bit, so it's close.

Tursi · February 28, 2015

Eliminating the extra multiplication would lead to a trivial increase in speed. What would speed the program up a lot is to only do the computation for 3/4 the number of pixels!

Yes... I tried a couple of optimizations after my post - I adjusted the scale (XF) to completely eliminate the need for the SINE function (was able to do a direct table lookup inline), and tweaked a couple of other constants to remove one of the signed multiplications completely. The runtime went from 64 seconds to 57 seconds -- which is a valid optimization, but since the joy of this project is watching it draw, and you don't really see that small a difference, I didn't bother with pushing any further. Dropping it into scratchpad would certainly help, but you'd have to be selecting what goes there since the program is a bit too big (main loop might fit). I think the next cool version is the F18A version, which hopefully someone will do for us and video. It's pretty much a straight port - load the code, change the VDP accesses to direct memory accesses, and it should just run. (That was all I did for the Mandlebrot one back in the day).

Asmusr · February 28, 2015

Well, alright then.

It actually looks very much the same in Classic99. ;-)

HAT.dsk

hat - source.txt

Omega-TI · February 28, 2015

Wow, just wow! I tried that on my R.I. box... I was a little suspicious of that, but sure enough.... CONFIRMED!

+mizapf · February 28, 2015

I just wanted to port that program to TIC for the Geneve ... until I noticed that we don't have float types. We really don't have floats in TIC?? Grmpf.

+InsaneMultitasker · February 28, 2015

I just wanted to port that program to TIC for the Geneve ... until I noticed that we don't have float types. We really don't have floats in TIC?? Grmpf.

The floats I have used in TIC were written as a "wrapper" around the Geneve OS math XOPs. I do not know if Clint Pulley, Mike Maksimik, or another individual wrote the functions. I can look for the file(s) if you are interested.

+mizapf · March 1, 2015

Not so important - I just wanted to see how much faster we could get with TIC. Maybe you want to try? ... I already did the ABASIC version.

Tursi · March 1, 2015

Well, alright then.

It actually looks very much the same in Classic99.

Niiiiiice! Thanks, Rasmus!

I guess I didn't think Classic99 would do it right, but even if it did, I wouldn't have believed the results.

which will screw up X1,Y1 - not!

Hah! Made me laugh!

Edited March 1, 2015 by Tursi

+InsaneMultitasker · March 1, 2015

Not so important - I just wanted to see how much faster we could get with TIC. Maybe you want to try? ... I already did the ABASIC version.

No thanks, my in-process projects have long since exceeded a reasonable queue depth. I need to pop a few things off the stack and then invest in a new internal stack handler.

bfollett · March 1, 2015

They've been working on optimizing this further in Atari 8 bit forum. Someone could tweak this a little again for the 256 pixel TI display. The original 3 hour runtime is now down to around 20 minutes on the Atari using a faster Basic (Turbo Basic).

100 DIM RR(320)

110 FOR I=0 TO 320:RR(I)=193:NEXT I

130 XP=144:XR=4.71238905:XF=XR/XP

140 FOR ZI=64 TO -64 STEP -1

150 ZT=ZI*2.25:ZS=ZT*ZT

160 XL=INT(SQR(20736-ZS)+0.5)

170 ZX=ZI+160:ZY=90+ZI

180 FOR XI=0 TO XL

190 XT=SQR(XI*XI+ZS)*XF

200 YY=(SIN(XT)+SIN(XT*3)*0.4)*56

210 X1=XI+ZX:Y1=ZY-YY

220 IF RR(X1)>Y1 THEN RR(X1)=Y1:PLOT X1,Y1

230 X1=ZX-XI

240 IF RR(X1)>Y1 THEN RR(X1)=Y1:PLOT X1,Y1

250 NEXT XI:NEXT ZI

260 GOTO 260

Edited March 1, 2015 by bfollett

Stuart · March 1, 2015

Converted the first (un-optimised) listing to Cortex BASIC and ran it on a couple of systems. Results below.

-- TI-99/4A with a Cortex BASIC cartridge (running under Classic99, normal speed): 1 hour 10 minutes

-- TM990 system (TMS9900, 3 MHz clock): 48 minutes

-- Powertran Cortex (TMS9995, 3 MHz clock): 35 minutes

-- TMS99110 breadboard system (4 MHz clock): 12 minutes

Listing:

140 COLOUR 1,7: GRAPH
150 XP=144: XR=4.71238905: XF=XR/XP
160 FOR ZI=-64 TO 64
170 ZT=ZI*2.25: ZS=ZT*ZT
180 XL=INT[sQR[20736-ZS]+0.5]
190 FOR XI=0-XL TO XL
200 XT=SQR[XI*XI+ZS]*XF
210 YY=(SIN[XT]+SIN[XT*3]*0.4)*56
220 X1=XI+ZI+160: Y1=90-YY+ZI
235 PLOT X1*0.75,Y1*0.75
240 UNPLOT X1*0.75,Y1*0.75+1 TO X1*0.75,144
250 NEXT XI: NEXT ZI

Stuart.

whicker · March 2, 2015

Whoa. Something called Cortex BASIC for the TI-99/4A? Something I've not yet come across.

So much new stuff to be found.

+Ksarul · March 2, 2015

It is a descendent of the BASIC TI used on their TI 990 computers. It was first ported over to the Powertran Cortex in the early Eighties and Stuart then ported the Cortex version over to the TI-99 a year or two ago.

+Lee Stewart · March 2, 2015

Here's the original ported to fbForth 2.0 and run on Classic99. It took 2 hours, 7 minutes, 25 seconds:

BASE->R                         ( save current radix)
DECIMAL                         ( set current radix to decimal)
>F 4.71238905 >F 108 F/ FCONSTANT XF    ( create floating point [FP] constant XF)
0 CONSTANT ZS                           ( create integer constant ZS)
( define IDLE to loop until <break> key tapped)
: IDLE BEGIN ?TERMINAL UNTIL ;
( Archimedes Spiral Hat Plot)
: DO_HAT  
  GRAPHICS2                     ( set up bitmap mode)
  [ HEX ] 1 83D6 ! [ DECIMAL ]  ( set screen-blank counter to never blank)
  65 -64 DO                     ( z loop)
    I I * S->F                  ( square z index and make FP)
    >F 2.84765625 F*            ( FP multiply by factor to scale square of index to 108^2)
    F->S ' ZS !                 ( convert to integer and store new zs)
    11664 ZS - S->F SQR         ( 108^2 - zs; convert to FP; take FP square root)
    F->S DUP 1+ SWAP MINUS DO   ( convert to integer and set up x loop with [xl+1 -xl DO])
      I I * ZS + S->F SQR XF F* ( xt = [xi^2 + zs]^0.5 * xf)
      FDUP SIN                  (  dup xt and get sine; {stack: xt sin[xt]})
      FSWAP                     (  swap positions; {stack: sin[xt] xt})
      >F 3 F* SIN               (  sin[xt*3])
      >F 0.4 F*                 (  0.4 * sin[xt*3])
      F+ >F 56 F*               ( yy = [sin[xt*3] + 0.4 * sin[xt*3]] * 56)
      F->S                      ( convert yy to integer {stack: yy})
      90 SWAP - J +             ( y1 = 90 - yy + zi)
      I J + 128 + SWAP          ( x1 = xi + zi + 128; swap positions {stack: x1 y1})
      OVER OVER                 ( copy both {stack: x1 y1 x1 y1})
      DRAW DOT                  ( plot dot {stack: x1 y1})
      1+ OVER 191               ( prepare to erase all dots below {stack: x1 y1+1 x1 191})
      UNDRAW LINE               ( erase all dots below last dot plotted)
    LOOP                    ( x loop again)
  LOOP                  ( z loop again)
  IDLE ;                ( idle until <break> tapped)
R->BASE                 ( restore radix)

...lee

JamesD · March 2, 2015

Here's the original ported to fbForth 2.0 and run on Classic99. It took 2 hours, 7 minutes, 25 seconds:

...

...lee

I thought Forth would be faster than that.

JamesD · March 2, 2015

Well, alright then.

It actually looks very much the same in Classic99.

The hat has a dent.

Willsy · March 2, 2015

I thought Forth would be faster than that.

The floating point is the overhead. Not the language, IMO. I reckon a fixed-point Forth version should come in somewhere around twice the fixed-point assembly version (i.e. approximately half as fast).

Edited March 2, 2015 by Willsy

+Lee Stewart · March 2, 2015

I thought Forth would be faster than that.

The floating point is the overhead. Not the language, IMO. I reckon a fixed-point Forth version should come in somewhere around twice the fixed-point assembly version (i.e. approximately half as fast).

Floating point is, indeed, the overhead. Though fbForth 2.0 does not use the GPLLNK routines (which TI Basic and, I believe, TI Extended Basic do) for SQR and SIN, they are pretty much the same routines done completely in ALC, which I converted from code used in the Geneve and stashed in cartridge space. The console GPLLNK routines are slower because of the use of the GPL interpreter (twice, in the case of Basic). The sections of those routines that use console XML routines are actually faster than 100 % GPL routines because the console ROMs are on a 16-bit bus, whereas the same fbForth 2.0 routines are all on the multiplexed, 8-bit bus.

In the case of the XML FP routines (add, subtract, multiply, divide, etc.), it may be a wash—I haven't tested their timing between Basic and fbForth 2.0.

I will try to do an all-integer version after I first try the reverse-indexed-loops version (described somewhere above by @senior_falcon and @Tursi). I wanted to first get a baseline. It is gratifying that fbForth 2.0 is faster then TI Basic. Assuming that Classic99 and the real iron run at the same speed, TI Basic took 2.5X longer to run the prgogram than fbForth 2.0—this per Kevan's run reported in post #127.

...lee

bfollett · March 2, 2015

Take a peek at the following posts in the Atari forum. I posed the question as to which variables would need to be tweaked so the TI program didn't need to multiply every X and Y point by .75. The answer came is in post 42 and a follow up at 44. I'd like to see someone with better knowledge of the TI emulators try the optimized code in post 42. Maybe TI extended Basic with missing link.

http://atariage.com/forums/topic/218503-graphics-8-fedora-hat/?p=3189139

Bob

JamesD · March 2, 2015

Take a peek at the following posts in the Atari forum. I posed the question as to which variables would need to be tweaked so the TI program didn't need to multiply every X and Y point by .75. The answer came is in post 42 and a follow up at 44. I'd like to see someone with better knowledge of the TI emulators try the optimized code in post 42. Maybe TI extended Basic with missing link.

http://atariage.com/forums/topic/218503-graphics-8-fedora-hat/?p=3189139

Bob

FWIW, I did get that code running on a CoCo 1/2 in emulation using the suggested changes in 256x192 mode. Edited March 2, 2015 by JamesD

JamesD · March 2, 2015

ECB + The Missing Link adaptation of the last Atari version

I have no idea if this works, I just based it on a combination of moocowmoo's version and the latest Atari version and added line 101 to adjust for 256 pixel wide screen.
I don't have ECB or The Missing Link and I don't even know TI BASIC so this could destroy your computer for all I know.
I *think* it should at least be close.

Oh, and as I said in the Atari thread, I was being lazy with line 101. Substitute the constants generated into line 100.
FWIW, you probably wouldn't be able to measure the difference so I'm not too worried about it.

100 SX=144::SY=56::SZ=64::CX=320::CY=192
101 SX=.8*SX::SY=.8*SY::SZ=.8*SZ
110 DIM RR(CX)
120 FOR I=0 TO CX::RR(I)=CY::NEXT I
130 CALL LINK("CLEAR")
140 CX=CX*0.5::CY=CY*0.46875::FX=SX/64::FZ=SZ/64
150 XF=4.71238905/SX
160 FOR ZI=64 TO -64 STEP -1
170   ZT=ZI*FX::ZS=ZT*ZT
180   XL=INT(SQR(SX*SX-ZS)+0.5)
190   ZX=ZI*FZ+CX::ZY=CY+ZI*FZ
200   FOR XI=0 TO XL
210     XT=SQR(XI*XI+ZS)*XF
220     YY=(SIN(XT)+SIN(XT*3)*0.4)*SY
230     X1=XI+ZX::Y1=ZY-YY
240     IF RR(X1)>Y1 THEN RR(X1)=Y1::CALL LINK("PD") :: CALL LINK("PIXEL",Y1,X1)
250     X1=ZX-XI
260     IF RR(X1)>Y1 THEN RR(X1)=Y1::CALL LINK("PD") :: CALL LINK("PIXEL",Y1,X1)
270   NEXT XI::NEXT ZI
280 GOTO 280

Edited March 2, 2015 by JamesD

bfollett · March 2, 2015

ECB + The Missing Link adaptation of the last Atari version

I have no idea if this works, I just based it on a combination of moocowmoo's version and the latest Atari version and added line 101 to adjust for 256 pixel wide screen.

I don't have ECB or The Missing Link and I don't even know TI BASIC so this could destroy your computer for all I know.

I *think* it should at least be close.

Oh, and as I said in the Atari thread, I was being lazy with line 101. Substitute the constants generated into line 100.

FWIW, you probably wouldn't be able to measure the difference so I'm not too worried about it.
100 SX=144::SY=56::SZ=64::CX=320::CY=192
101 SX=.8*SX::SY=.8*SY::SZ=.8*SZ
110 DIM RR(CX)
120 FOR I=0 TO CX::RR(I)=CY::NEXT I
130 CALL LINK("CLEAR")
140 CX=CX*0.5::CY=CY*0.46875::FX=SX/64::FZ=SZ/64
150 XF=4.71238905/SX
160 FOR ZI=64 TO -64 STEP -1
170   ZT=ZI*FX::ZS=ZT*ZT
180   XL=INT(SQR(SX*SX-ZS)+0.5)
190   ZX=ZI*FZ+CX::ZY=CY+ZI*FZ
200   FOR XI=0 TO XL
210     XT=SQR(XI*XI+ZS)*XF
220     YY=(SIN(XT)+SIN(XT*3)*0.4)*SY
230     X1=XI+ZX::Y1=ZY-YY
240     IF RR(X1)>Y1 THEN RR(X1)=Y1::CALL LINK("PD") :: CALL LINK("PIXEL",Y1,X1)
250     X1=ZX-XI
260     IF RR(X1)>Y1 THEN RR(X1)=Y1::CALL LINK("PD") :: CALL LINK("PIXEL",Y1,X1)
270   NEXT XI::NEXT ZI
280 GOTO 280 

Quick observation. You left the variable CX on line 100 as 320 instead of changing it to 256.

Bob

JamesD · March 2, 2015

Quick observation. You left the variable CX on line 100 as 320 instead of changing it to 256.

Bob

Du-OH!

*edit*

I did the same on the changes to run it on a CoCo over in the Atari thread.

Edited March 2, 2015 by JamesD

Bitmap mode.

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members