Jump to content
bfollett

Bitmap mode.

Recommended Posts

I'm guessing that is because the Atari program is written for 320 columns of pixels and steps one pixel column at a time. The TI programs (mine included) don't change the program but instead multiply the column by .75 which means the program will try to display 4 pixels every three columns, determined by how the column number is rounded off. A proper fix would modify the program earlier and eliminate the need to multiply by .75.

That could be it. If that change is made the program would probably run a bit faster not having to do an extra multiplication on each pixel plot.

 

Bob

Share this post


Link to post
Share on other sites

Eliminating the extra multiplication would lead to a trivial increase in speed. What would speed the program up a lot is to only do the computation for 3/4 the number of pixels!

Share this post


Link to post
Share on other sites

I just tried the optimized Atari code using the faster Basic and floating point available in Altirra and it clocked in at an impressive 6.5 minutes. How come I don't see the defects in the Atari output like in the TI example? FYI line 10 and 20 of your example have the wrong type of brackets, but it was a quick fix "[ ] = ( )"

 

Below is the new output, I didn't do any real analysis, but it looks pretty close to the same to me.

 

Bob

 

Because you're using floating point and I'm not. The defects are not an artifact of the optimizations - those are spot on accurate (I also did a lot of testing on the Apple2 emulator, to try ideas out, and it's quite fast there unthrottled). Basically my TI assembly artifacts are because of compounded rounding errors -- even some of my steps are actually off, but just by a bit, so it's close.

Share this post


Link to post
Share on other sites

Eliminating the extra multiplication would lead to a trivial increase in speed. What would speed the program up a lot is to only do the computation for 3/4 the number of pixels!

 

Yes... I tried a couple of optimizations after my post - I adjusted the scale (XF) to completely eliminate the need for the SINE function (was able to do a direct table lookup inline), and tweaked a couple of other constants to remove one of the signed multiplications completely. The runtime went from 64 seconds to 57 seconds -- which is a valid optimization, but since the joy of this project is watching it draw, and you don't really see that small a difference, I didn't bother with pushing any further. Dropping it into scratchpad would certainly help, but you'd have to be selecting what goes there since the program is a bit too big (main loop might fit). I think the next cool version is the F18A version, which hopefully someone will do for us and video. ;) It's pretty much a straight port - load the code, change the VDP accesses to direct memory accesses, and it should just run. (That was all I did for the Mandlebrot one back in the day).

Share this post


Link to post
Share on other sites

I just wanted to port that program to TIC for the Geneve ... until I noticed that we don't have float types. We really don't have floats in TIC?? Grmpf.

Share this post


Link to post
Share on other sites

I just wanted to port that program to TIC for the Geneve ... until I noticed that we don't have float types. We really don't have floats in TIC?? Grmpf.

The floats I have used in TIC were written as a "wrapper" around the Geneve OS math XOPs. I do not know if Clint Pulley, Mike Maksimik, or another individual wrote the functions. I can look for the file(s) if you are interested.

Share this post


Link to post
Share on other sites

Not so important - I just wanted to see how much faster we could get with TIC. Maybe you want to try? ;) ... I already did the ABASIC version.

Share this post


Link to post
Share on other sites

Well, alright then.

 

It actually looks very much the same in Classic99. ;-)

 

 

Niiiiiice! Thanks, Rasmus!

 

I guess I didn't think Classic99 would do it right, but even if it did, I wouldn't have believed the results. ;)

 

 

 

which will screw up X1,Y1 - not!

 

Hah! Made me laugh! :)

Edited by Tursi

Share this post


Link to post
Share on other sites

Not so important - I just wanted to see how much faster we could get with TIC. Maybe you want to try? ;) ... I already did the ABASIC version.

No thanks, my in-process projects have long since exceeded a reasonable queue depth. I need to pop a few things off the stack and then invest in a new internal stack handler. ;)

Share this post


Link to post
Share on other sites

They've been working on optimizing this further in Atari 8 bit forum. Someone could tweak this a little again for the 256 pixel TI display. The original 3 hour runtime is now down to around 20 minutes on the Atari using a faster Basic (Turbo Basic).



100 DIM RR(320)


110 FOR I=0 TO 320:RR(I)=193:NEXT I


130 XP=144:XR=4.71238905:XF=XR/XP


140 FOR ZI=64 TO -64 STEP -1


150 ZT=ZI*2.25:ZS=ZT*ZT


160 XL=INT(SQR(20736-ZS)+0.5)


170 ZX=ZI+160:ZY=90+ZI


180 FOR XI=0 TO XL


190 XT=SQR(XI*XI+ZS)*XF


200 YY=(SIN(XT)+SIN(XT*3)*0.4)*56


210 X1=XI+ZX:Y1=ZY-YY


220 IF RR(X1)>Y1 THEN RR(X1)=Y1:PLOT X1,Y1


230 X1=ZX-XI


240 IF RR(X1)>Y1 THEN RR(X1)=Y1:PLOT X1,Y1


250 NEXT XI:NEXT ZI


260 GOTO 260


Edited by bfollett

Share this post


Link to post
Share on other sites

Converted the first (un-optimised) listing to Cortex BASIC and ran it on a couple of systems. Results below.

 

-- TI-99/4A with a Cortex BASIC cartridge (running under Classic99, normal speed): 1 hour 10 minutes

-- TM990 system (TMS9900, 3 MHz clock): 48 minutes

-- Powertran Cortex (TMS9995, 3 MHz clock): 35 minutes

-- TMS99110 breadboard system (4 MHz clock): 12 minutes

 

Listing:

 

140 COLOUR 1,7: GRAPH
150 XP=144: XR=4.71238905: XF=XR/XP
160 FOR ZI=-64 TO 64
170 ZT=ZI*2.25: ZS=ZT*ZT
180 XL=INT[sQR[20736-ZS]+0.5]
190 FOR XI=0-XL TO XL
200 XT=SQR[XI*XI+ZS]*XF
210 YY=(SIN[XT]+SIN[XT*3]*0.4)*56
220 X1=XI+ZI+160: Y1=90-YY+ZI
235 PLOT X1*0.75,Y1*0.75
240 UNPLOT X1*0.75,Y1*0.75+1 TO X1*0.75,144
250 NEXT XI: NEXT ZI

 

Stuart.

  • Like 1

Share this post


Link to post
Share on other sites

Whoa. Something called Cortex BASIC for the TI-99/4A? Something I've not yet come across.

 

So much new stuff to be found.

Share this post


Link to post
Share on other sites

It is a descendent of the BASIC TI used on their TI 990 computers. It was first ported over to the Powertran Cortex in the early Eighties and Stuart then ported the Cortex version over to the TI-99 a year or two ago.

Share this post


Link to post
Share on other sites

Here's the original ported to fbForth 2.0 and run on Classic99. It took 2 hours, 7 minutes, 25 seconds:

 

 

BASE->R                         ( save current radix)
DECIMAL                         ( set current radix to decimal)
>F 4.71238905 >F 108 F/ FCONSTANT XF    ( create floating point [FP] constant XF)
0 CONSTANT ZS                           ( create integer constant ZS)
( define IDLE to loop until <break> key tapped)
: IDLE BEGIN ?TERMINAL UNTIL ;
( Archimedes Spiral Hat Plot)
: DO_HAT  
  GRAPHICS2                     ( set up bitmap mode)
  [ HEX ] 1 83D6 ! [ DECIMAL ]  ( set screen-blank counter to never blank)
  65 -64 DO                     ( z loop)
    I I * S->F                  ( square z index and make FP)
    >F 2.84765625 F*            ( FP multiply by factor to scale square of index to 108^2)
    F->S ' ZS !                 ( convert to integer and store new zs)
    11664 ZS - S->F SQR         ( 108^2 - zs; convert to FP; take FP square root)
    F->S DUP 1+ SWAP MINUS DO   ( convert to integer and set up x loop with [xl+1 -xl DO])
      I I * ZS + S->F SQR XF F* ( xt = [xi^2 + zs]^0.5 * xf)
      FDUP SIN                  (  dup xt and get sine; {stack: xt sin[xt]})
      FSWAP                     (  swap positions; {stack: sin[xt] xt})
      >F 3 F* SIN               (  sin[xt*3])
      >F 0.4 F*                 (  0.4 * sin[xt*3])
      F+ >F 56 F*               ( yy = [sin[xt*3] + 0.4 * sin[xt*3]] * 56)
      F->S                      ( convert yy to integer {stack: yy})
      90 SWAP - J +             ( y1 = 90 - yy + zi)
      I J + 128 + SWAP          ( x1 = xi + zi + 128; swap positions {stack: x1 y1})
      OVER OVER                 ( copy both {stack: x1 y1 x1 y1})
      DRAW DOT                  ( plot dot {stack: x1 y1})
      1+ OVER 191               ( prepare to erase all dots below {stack: x1 y1+1 x1 191})
      UNDRAW LINE               ( erase all dots below last dot plotted)
    LOOP                    ( x loop again)
  LOOP                  ( z loop again)
  IDLE ;                ( idle until <break> tapped)
R->BASE                 ( restore radix) 

 

 

 

post-29677-0-21215900-1425270929_thumb.jpg

 

...lee

Share this post


Link to post
Share on other sites

Here's the original ported to fbForth 2.0 and run on Classic99. It took 2 hours, 7 minutes, 25 seconds:

...

...lee

I thought Forth would be faster than that.

Share this post


Link to post
Share on other sites

Well, alright then.

 

 

It actually looks very much the same in Classic99. ;-)

The hat has a dent.

Share this post


Link to post
Share on other sites

I thought Forth would be faster than that.

The floating point is the overhead. Not the language, IMO. I reckon a fixed-point Forth version should come in somewhere around twice the fixed-point assembly version (i.e. approximately half as fast).

Edited by Willsy

Share this post


Link to post
Share on other sites

I thought Forth would be faster than that.

 

 

The floating point is the overhead. Not the language, IMO. I reckon a fixed-point Forth version should come in somewhere around twice the fixed-point assembly version (i.e. approximately half as fast).

 

Floating point is, indeed, the overhead. Though fbForth 2.0 does not use the GPLLNK routines (which TI Basic and, I believe, TI Extended Basic do) for SQR and SIN, they are pretty much the same routines done completely in ALC, which I converted from code used in the Geneve and stashed in cartridge space. The console GPLLNK routines are slower because of the use of the GPL interpreter (twice, in the case of Basic). The sections of those routines that use console XML routines are actually faster than 100 % GPL routines because the console ROMs are on a 16-bit bus, whereas the same fbForth 2.0 routines are all on the multiplexed, 8-bit bus.

 

In the case of the XML FP routines (add, subtract, multiply, divide, etc.), it may be a wash—I haven't tested their timing between Basic and fbForth 2.0.

 

I will try to do an all-integer version after I first try the reverse-indexed-loops version (described somewhere above by @senior_falcon and @Tursi). I wanted to first get a baseline. It is gratifying that fbForth 2.0 is faster then TI Basic. Assuming that Classic99 and the real iron run at the same speed, TI Basic took 2.5X longer to run the prgogram than fbForth 2.0—this per Kevan's run reported in post #127.

 

...lee

Share this post


Link to post
Share on other sites

Take a peek at the following posts in the Atari forum. I posed the question as to which variables would need to be tweaked so the TI program didn't need to multiply every X and Y point by .75. The answer came is in post 42 and a follow up at 44. I'd like to see someone with better knowledge of the TI emulators try the optimized code in post 42. Maybe TI extended Basic with missing link.

 

http://atariage.com/forums/topic/218503-graphics-8-fedora-hat/?p=3189139

 

Bob

Share this post


Link to post
Share on other sites

Take a peek at the following posts in the Atari forum. I posed the question as to which variables would need to be tweaked so the TI program didn't need to multiply every X and Y point by .75. The answer came is in post 42 and a follow up at 44. I'd like to see someone with better knowledge of the TI emulators try the optimized code in post 42. Maybe TI extended Basic with missing link.

 

http://atariage.com/forums/topic/218503-graphics-8-fedora-hat/?p=3189139

 

Bob

FWIW, I did get that code running on a CoCo 1/2 in emulation using the suggested changes in 256x192 mode. Edited by JamesD

Share this post


Link to post
Share on other sites

ECB + The Missing Link adaptation of the last Atari version

I have no idea if this works, I just based it on a combination of moocowmoo's version and the latest Atari version and added line 101 to adjust for 256 pixel wide screen.
I don't have ECB or The Missing Link and I don't even know TI BASIC so this could destroy your computer for all I know.
I *think* it should at least be close.

Oh, and as I said in the Atari thread, I was being lazy with line 101. Substitute the constants generated into line 100.
FWIW, you probably wouldn't be able to measure the difference so I'm not too worried about it.

100 SX=144::SY=56::SZ=64::CX=320::CY=192
101 SX=.8*SX::SY=.8*SY::SZ=.8*SZ
110 DIM RR(CX)
120 FOR I=0 TO CX::RR(I)=CY::NEXT I
130 CALL LINK("CLEAR")
140 CX=CX*0.5::CY=CY*0.46875::FX=SX/64::FZ=SZ/64
150 XF=4.71238905/SX
160 FOR ZI=64 TO -64 STEP -1
170   ZT=ZI*FX::ZS=ZT*ZT
180   XL=INT(SQR(SX*SX-ZS)+0.5)
190   ZX=ZI*FZ+CX::ZY=CY+ZI*FZ
200   FOR XI=0 TO XL
210     XT=SQR(XI*XI+ZS)*XF
220     YY=(SIN(XT)+SIN(XT*3)*0.4)*SY
230     X1=XI+ZX::Y1=ZY-YY
240     IF RR(X1)>Y1 THEN RR(X1)=Y1::CALL LINK("PD") :: CALL LINK("PIXEL",Y1,X1)
250     X1=ZX-XI
260     IF RR(X1)>Y1 THEN RR(X1)=Y1::CALL LINK("PD") :: CALL LINK("PIXEL",Y1,X1)
270   NEXT XI::NEXT ZI
280 GOTO 280 
Edited by JamesD

Share this post


Link to post
Share on other sites

 

ECB + The Missing Link adaptation of the last Atari version

 

I have no idea if this works, I just based it on a combination of moocowmoo's version and the latest Atari version and added line 101 to adjust for 256 pixel wide screen.

I don't have ECB or The Missing Link and I don't even know TI BASIC so this could destroy your computer for all I know.

I *think* it should at least be close.

 

Oh, and as I said in the Atari thread, I was being lazy with line 101. Substitute the constants generated into line 100.

FWIW, you probably wouldn't be able to measure the difference so I'm not too worried about it.

 

100 SX=144::SY=56::SZ=64::CX=320::CY=192
101 SX=.8*SX::SY=.8*SY::SZ=.8*SZ
110 DIM RR(CX)
120 FOR I=0 TO CX::RR(I)=CY::NEXT I
130 CALL LINK("CLEAR")
140 CX=CX*0.5::CY=CY*0.46875::FX=SX/64::FZ=SZ/64
150 XF=4.71238905/SX
160 FOR ZI=64 TO -64 STEP -1
170   ZT=ZI*FX::ZS=ZT*ZT
180   XL=INT(SQR(SX*SX-ZS)+0.5)
190   ZX=ZI*FZ+CX::ZY=CY+ZI*FZ
200   FOR XI=0 TO XL
210     XT=SQR(XI*XI+ZS)*XF
220     YY=(SIN(XT)+SIN(XT*3)*0.4)*SY
230     X1=XI+ZX::Y1=ZY-YY
240     IF RR(X1)>Y1 THEN RR(X1)=Y1::CALL LINK("PD") :: CALL LINK("PIXEL",Y1,X1)
250     X1=ZX-XI
260     IF RR(X1)>Y1 THEN RR(X1)=Y1::CALL LINK("PD") :: CALL LINK("PIXEL",Y1,X1)
270   NEXT XI::NEXT ZI
280 GOTO 280 

Quick observation. You left the variable CX on line 100 as 320 instead of changing it to 256.

 

Bob

Share this post


Link to post
Share on other sites

Quick observation. You left the variable CX on line 100 as 320 instead of changing it to 256.

 

Bob

Du-OH!

*edit*

I did the same on the changes to run it on a CoCo over in the Atari thread.

Edited by JamesD

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...