bfollett Posted February 28, 2015 Author Share Posted February 28, 2015 I'm guessing that is because the Atari program is written for 320 columns of pixels and steps one pixel column at a time. The TI programs (mine included) don't change the program but instead multiply the column by .75 which means the program will try to display 4 pixels every three columns, determined by how the column number is rounded off. A proper fix would modify the program earlier and eliminate the need to multiply by .75. That could be it. If that change is made the program would probably run a bit faster not having to do an extra multiplication on each pixel plot. Bob Quote Link to comment Share on other sites More sharing options...
senior_falcon Posted February 28, 2015 Share Posted February 28, 2015 Eliminating the extra multiplication would lead to a trivial increase in speed. What would speed the program up a lot is to only do the computation for 3/4 the number of pixels! Quote Link to comment Share on other sites More sharing options...
Tursi Posted February 28, 2015 Share Posted February 28, 2015 I just tried the optimized Atari code using the faster Basic and floating point available in Altirra and it clocked in at an impressive 6.5 minutes. How come I don't see the defects in the Atari output like in the TI example? FYI line 10 and 20 of your example have the wrong type of brackets, but it was a quick fix "[ ] = ( )" Below is the new output, I didn't do any real analysis, but it looks pretty close to the same to me. Bob Because you're using floating point and I'm not. The defects are not an artifact of the optimizations - those are spot on accurate (I also did a lot of testing on the Apple2 emulator, to try ideas out, and it's quite fast there unthrottled). Basically my TI assembly artifacts are because of compounded rounding errors -- even some of my steps are actually off, but just by a bit, so it's close. Quote Link to comment Share on other sites More sharing options...
Tursi Posted February 28, 2015 Share Posted February 28, 2015 Eliminating the extra multiplication would lead to a trivial increase in speed. What would speed the program up a lot is to only do the computation for 3/4 the number of pixels! Yes... I tried a couple of optimizations after my post - I adjusted the scale (XF) to completely eliminate the need for the SINE function (was able to do a direct table lookup inline), and tweaked a couple of other constants to remove one of the signed multiplications completely. The runtime went from 64 seconds to 57 seconds -- which is a valid optimization, but since the joy of this project is watching it draw, and you don't really see that small a difference, I didn't bother with pushing any further. Dropping it into scratchpad would certainly help, but you'd have to be selecting what goes there since the program is a bit too big (main loop might fit). I think the next cool version is the F18A version, which hopefully someone will do for us and video. It's pretty much a straight port - load the code, change the VDP accesses to direct memory accesses, and it should just run. (That was all I did for the Mandlebrot one back in the day). Quote Link to comment Share on other sites More sharing options...
Asmusr Posted February 28, 2015 Share Posted February 28, 2015 Well, alright then. It actually looks very much the same in Classic99. HAT.dsk hat - source.txt 6 Quote Link to comment Share on other sites More sharing options...
Omega-TI Posted February 28, 2015 Share Posted February 28, 2015 Wow, just wow! I tried that on my R.I. box... I was a little suspicious of that, but sure enough.... CONFIRMED! Quote Link to comment Share on other sites More sharing options...
+mizapf Posted February 28, 2015 Share Posted February 28, 2015 I just wanted to port that program to TIC for the Geneve ... until I noticed that we don't have float types. We really don't have floats in TIC?? Grmpf. Quote Link to comment Share on other sites More sharing options...
+InsaneMultitasker Posted February 28, 2015 Share Posted February 28, 2015 I just wanted to port that program to TIC for the Geneve ... until I noticed that we don't have float types. We really don't have floats in TIC?? Grmpf.The floats I have used in TIC were written as a "wrapper" around the Geneve OS math XOPs. I do not know if Clint Pulley, Mike Maksimik, or another individual wrote the functions. I can look for the file(s) if you are interested. Quote Link to comment Share on other sites More sharing options...
+mizapf Posted March 1, 2015 Share Posted March 1, 2015 Not so important - I just wanted to see how much faster we could get with TIC. Maybe you want to try? ... I already did the ABASIC version. Quote Link to comment Share on other sites More sharing options...
Tursi Posted March 1, 2015 Share Posted March 1, 2015 (edited) Well, alright then. It actually looks very much the same in Classic99. Niiiiiice! Thanks, Rasmus! I guess I didn't think Classic99 would do it right, but even if it did, I wouldn't have believed the results. which will screw up X1,Y1 - not! Hah! Made me laugh! Edited March 1, 2015 by Tursi Quote Link to comment Share on other sites More sharing options...
+InsaneMultitasker Posted March 1, 2015 Share Posted March 1, 2015 Not so important - I just wanted to see how much faster we could get with TIC. Maybe you want to try? ... I already did the ABASIC version.No thanks, my in-process projects have long since exceeded a reasonable queue depth. I need to pop a few things off the stack and then invest in a new internal stack handler. Quote Link to comment Share on other sites More sharing options...
bfollett Posted March 1, 2015 Author Share Posted March 1, 2015 (edited) They've been working on optimizing this further in Atari 8 bit forum. Someone could tweak this a little again for the 256 pixel TI display. The original 3 hour runtime is now down to around 20 minutes on the Atari using a faster Basic (Turbo Basic). 100 DIM RR(320) 110 FOR I=0 TO 320:RR(I)=193:NEXT I 130 XP=144:XR=4.71238905:XF=XR/XP 140 FOR ZI=64 TO -64 STEP -1 150 ZT=ZI*2.25:ZS=ZT*ZT 160 XL=INT(SQR(20736-ZS)+0.5) 170 ZX=ZI+160:ZY=90+ZI 180 FOR XI=0 TO XL 190 XT=SQR(XI*XI+ZS)*XF 200 YY=(SIN(XT)+SIN(XT*3)*0.4)*56 210 X1=XI+ZX:Y1=ZY-YY 220 IF RR(X1)>Y1 THEN RR(X1)=Y1:PLOT X1,Y1 230 X1=ZX-XI 240 IF RR(X1)>Y1 THEN RR(X1)=Y1:PLOT X1,Y1 250 NEXT XI:NEXT ZI 260 GOTO 260 Edited March 1, 2015 by bfollett Quote Link to comment Share on other sites More sharing options...
Stuart Posted March 1, 2015 Share Posted March 1, 2015 Converted the first (un-optimised) listing to Cortex BASIC and ran it on a couple of systems. Results below. -- TI-99/4A with a Cortex BASIC cartridge (running under Classic99, normal speed): 1 hour 10 minutes -- TM990 system (TMS9900, 3 MHz clock): 48 minutes -- Powertran Cortex (TMS9995, 3 MHz clock): 35 minutes -- TMS99110 breadboard system (4 MHz clock): 12 minutes Listing: 140 COLOUR 1,7: GRAPH 150 XP=144: XR=4.71238905: XF=XR/XP 160 FOR ZI=-64 TO 64 170 ZT=ZI*2.25: ZS=ZT*ZT 180 XL=INT[sQR[20736-ZS]+0.5] 190 FOR XI=0-XL TO XL 200 XT=SQR[XI*XI+ZS]*XF 210 YY=(SIN[XT]+SIN[XT*3]*0.4)*56 220 X1=XI+ZI+160: Y1=90-YY+ZI 235 PLOT X1*0.75,Y1*0.75 240 UNPLOT X1*0.75,Y1*0.75+1 TO X1*0.75,144 250 NEXT XI: NEXT ZI Stuart. 1 Quote Link to comment Share on other sites More sharing options...
whicker Posted March 2, 2015 Share Posted March 2, 2015 Whoa. Something called Cortex BASIC for the TI-99/4A? Something I've not yet come across. So much new stuff to be found. Quote Link to comment Share on other sites More sharing options...
+Ksarul Posted March 2, 2015 Share Posted March 2, 2015 It is a descendent of the BASIC TI used on their TI 990 computers. It was first ported over to the Powertran Cortex in the early Eighties and Stuart then ported the Cortex version over to the TI-99 a year or two ago. Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted March 2, 2015 Share Posted March 2, 2015 Here's the original ported to fbForth 2.0 and run on Classic99. It took 2 hours, 7 minutes, 25 seconds: BASE->R ( save current radix) DECIMAL ( set current radix to decimal) >F 4.71238905 >F 108 F/ FCONSTANT XF ( create floating point [FP] constant XF) 0 CONSTANT ZS ( create integer constant ZS) ( define IDLE to loop until <break> key tapped) : IDLE BEGIN ?TERMINAL UNTIL ; ( Archimedes Spiral Hat Plot) : DO_HAT GRAPHICS2 ( set up bitmap mode) [ HEX ] 1 83D6 ! [ DECIMAL ] ( set screen-blank counter to never blank) 65 -64 DO ( z loop) I I * S->F ( square z index and make FP) >F 2.84765625 F* ( FP multiply by factor to scale square of index to 108^2) F->S ' ZS ! ( convert to integer and store new zs) 11664 ZS - S->F SQR ( 108^2 - zs; convert to FP; take FP square root) F->S DUP 1+ SWAP MINUS DO ( convert to integer and set up x loop with [xl+1 -xl DO]) I I * ZS + S->F SQR XF F* ( xt = [xi^2 + zs]^0.5 * xf) FDUP SIN ( dup xt and get sine; {stack: xt sin[xt]}) FSWAP ( swap positions; {stack: sin[xt] xt}) >F 3 F* SIN ( sin[xt*3]) >F 0.4 F* ( 0.4 * sin[xt*3]) F+ >F 56 F* ( yy = [sin[xt*3] + 0.4 * sin[xt*3]] * 56) F->S ( convert yy to integer {stack: yy}) 90 SWAP - J + ( y1 = 90 - yy + zi) I J + 128 + SWAP ( x1 = xi + zi + 128; swap positions {stack: x1 y1}) OVER OVER ( copy both {stack: x1 y1 x1 y1}) DRAW DOT ( plot dot {stack: x1 y1}) 1+ OVER 191 ( prepare to erase all dots below {stack: x1 y1+1 x1 191}) UNDRAW LINE ( erase all dots below last dot plotted) LOOP ( x loop again) LOOP ( z loop again) IDLE ; ( idle until <break> tapped) R->BASE ( restore radix) ...lee Quote Link to comment Share on other sites More sharing options...
JamesD Posted March 2, 2015 Share Posted March 2, 2015 Here's the original ported to fbForth 2.0 and run on Classic99. It took 2 hours, 7 minutes, 25 seconds: ... ...lee I thought Forth would be faster than that. Quote Link to comment Share on other sites More sharing options...
JamesD Posted March 2, 2015 Share Posted March 2, 2015 Well, alright then. It actually looks very much the same in Classic99. The hat has a dent. Quote Link to comment Share on other sites More sharing options...
Willsy Posted March 2, 2015 Share Posted March 2, 2015 (edited) I thought Forth would be faster than that. The floating point is the overhead. Not the language, IMO. I reckon a fixed-point Forth version should come in somewhere around twice the fixed-point assembly version (i.e. approximately half as fast). Edited March 2, 2015 by Willsy Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted March 2, 2015 Share Posted March 2, 2015 I thought Forth would be faster than that. The floating point is the overhead. Not the language, IMO. I reckon a fixed-point Forth version should come in somewhere around twice the fixed-point assembly version (i.e. approximately half as fast). Floating point is, indeed, the overhead. Though fbForth 2.0 does not use the GPLLNK routines (which TI Basic and, I believe, TI Extended Basic do) for SQR and SIN, they are pretty much the same routines done completely in ALC, which I converted from code used in the Geneve and stashed in cartridge space. The console GPLLNK routines are slower because of the use of the GPL interpreter (twice, in the case of Basic). The sections of those routines that use console XML routines are actually faster than 100 % GPL routines because the console ROMs are on a 16-bit bus, whereas the same fbForth 2.0 routines are all on the multiplexed, 8-bit bus. In the case of the XML FP routines (add, subtract, multiply, divide, etc.), it may be a wash—I haven't tested their timing between Basic and fbForth 2.0. I will try to do an all-integer version after I first try the reverse-indexed-loops version (described somewhere above by @senior_falcon and @Tursi). I wanted to first get a baseline. It is gratifying that fbForth 2.0 is faster then TI Basic. Assuming that Classic99 and the real iron run at the same speed, TI Basic took 2.5X longer to run the prgogram than fbForth 2.0—this per Kevan's run reported in post #127. ...lee Quote Link to comment Share on other sites More sharing options...
bfollett Posted March 2, 2015 Author Share Posted March 2, 2015 Take a peek at the following posts in the Atari forum. I posed the question as to which variables would need to be tweaked so the TI program didn't need to multiply every X and Y point by .75. The answer came is in post 42 and a follow up at 44. I'd like to see someone with better knowledge of the TI emulators try the optimized code in post 42. Maybe TI extended Basic with missing link. http://atariage.com/forums/topic/218503-graphics-8-fedora-hat/?p=3189139 Bob Quote Link to comment Share on other sites More sharing options...
JamesD Posted March 2, 2015 Share Posted March 2, 2015 (edited) Take a peek at the following posts in the Atari forum. I posed the question as to which variables would need to be tweaked so the TI program didn't need to multiply every X and Y point by .75. The answer came is in post 42 and a follow up at 44. I'd like to see someone with better knowledge of the TI emulators try the optimized code in post 42. Maybe TI extended Basic with missing link. http://atariage.com/forums/topic/218503-graphics-8-fedora-hat/?p=3189139 Bob FWIW, I did get that code running on a CoCo 1/2 in emulation using the suggested changes in 256x192 mode. Edited March 2, 2015 by JamesD Quote Link to comment Share on other sites More sharing options...
JamesD Posted March 2, 2015 Share Posted March 2, 2015 (edited) ECB + The Missing Link adaptation of the last Atari versionI have no idea if this works, I just based it on a combination of moocowmoo's version and the latest Atari version and added line 101 to adjust for 256 pixel wide screen.I don't have ECB or The Missing Link and I don't even know TI BASIC so this could destroy your computer for all I know.I *think* it should at least be close.Oh, and as I said in the Atari thread, I was being lazy with line 101. Substitute the constants generated into line 100.FWIW, you probably wouldn't be able to measure the difference so I'm not too worried about it. 100 SX=144::SY=56::SZ=64::CX=320::CY=192 101 SX=.8*SX::SY=.8*SY::SZ=.8*SZ 110 DIM RR(CX) 120 FOR I=0 TO CX::RR(I)=CY::NEXT I 130 CALL LINK("CLEAR") 140 CX=CX*0.5::CY=CY*0.46875::FX=SX/64::FZ=SZ/64 150 XF=4.71238905/SX 160 FOR ZI=64 TO -64 STEP -1 170 ZT=ZI*FX::ZS=ZT*ZT 180 XL=INT(SQR(SX*SX-ZS)+0.5) 190 ZX=ZI*FZ+CX::ZY=CY+ZI*FZ 200 FOR XI=0 TO XL 210 XT=SQR(XI*XI+ZS)*XF 220 YY=(SIN(XT)+SIN(XT*3)*0.4)*SY 230 X1=XI+ZX::Y1=ZY-YY 240 IF RR(X1)>Y1 THEN RR(X1)=Y1::CALL LINK("PD") :: CALL LINK("PIXEL",Y1,X1) 250 X1=ZX-XI 260 IF RR(X1)>Y1 THEN RR(X1)=Y1::CALL LINK("PD") :: CALL LINK("PIXEL",Y1,X1) 270 NEXT XI::NEXT ZI 280 GOTO 280 Edited March 2, 2015 by JamesD Quote Link to comment Share on other sites More sharing options...
bfollett Posted March 2, 2015 Author Share Posted March 2, 2015 ECB + The Missing Link adaptation of the last Atari version I have no idea if this works, I just based it on a combination of moocowmoo's version and the latest Atari version and added line 101 to adjust for 256 pixel wide screen. I don't have ECB or The Missing Link and I don't even know TI BASIC so this could destroy your computer for all I know. I *think* it should at least be close. Oh, and as I said in the Atari thread, I was being lazy with line 101. Substitute the constants generated into line 100. FWIW, you probably wouldn't be able to measure the difference so I'm not too worried about it. 100 SX=144::SY=56::SZ=64::CX=320::CY=192 101 SX=.8*SX::SY=.8*SY::SZ=.8*SZ 110 DIM RR(CX) 120 FOR I=0 TO CX::RR(I)=CY::NEXT I 130 CALL LINK("CLEAR") 140 CX=CX*0.5::CY=CY*0.46875::FX=SX/64::FZ=SZ/64 150 XF=4.71238905/SX 160 FOR ZI=64 TO -64 STEP -1 170 ZT=ZI*FX::ZS=ZT*ZT 180 XL=INT(SQR(SX*SX-ZS)+0.5) 190 ZX=ZI*FZ+CX::ZY=CY+ZI*FZ 200 FOR XI=0 TO XL 210 XT=SQR(XI*XI+ZS)*XF 220 YY=(SIN(XT)+SIN(XT*3)*0.4)*SY 230 X1=XI+ZX::Y1=ZY-YY 240 IF RR(X1)>Y1 THEN RR(X1)=Y1::CALL LINK("PD") :: CALL LINK("PIXEL",Y1,X1) 250 X1=ZX-XI 260 IF RR(X1)>Y1 THEN RR(X1)=Y1::CALL LINK("PD") :: CALL LINK("PIXEL",Y1,X1) 270 NEXT XI::NEXT ZI 280 GOTO 280 Quick observation. You left the variable CX on line 100 as 320 instead of changing it to 256. Bob Quote Link to comment Share on other sites More sharing options...
JamesD Posted March 2, 2015 Share Posted March 2, 2015 (edited) Quick observation. You left the variable CX on line 100 as 320 instead of changing it to 256. Bob Du-OH! *edit* I did the same on the changes to run it on a CoCo over in the Atari thread. Edited March 2, 2015 by JamesD Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.