JamesD Posted March 15, 2015 Share Posted March 15, 2015 The multipy by .8 in is an extra step that is required in the TI code that is not required in the Atari code. In addition the TI had the additional overhead of writing its video via a port, which is on the upper 8 bits of the data bus only. Additional overhead that the Atari version doesn't have. Again with the .8 extra step. If the TI assembly uses an extra step to multiply by .8 it's his choice. This is NOT required with the proper algorithm. Look at the BASIC code in post #186. Not one multiply by .8. The extra multiply does not account for the difference in the image anyway. Quote Link to comment Share on other sites More sharing options...
+OLD CS1 Posted March 15, 2015 Share Posted March 15, 2015 WTF just happened? Quote Link to comment Share on other sites More sharing options...
JamesD Posted March 15, 2015 Share Posted March 15, 2015 WTF just happened? The usual. Computer wars and bragging rights. Quote Link to comment Share on other sites More sharing options...
Tursi Posted March 16, 2015 Share Posted March 16, 2015 I suspect it's throwing crumbs over the bridge, but here's a 320x192 version. Because the display does not have 320 horizontal pixels, the hardware wraps larger coordinates around. (The code is making no effort to adjust for that. Please read the code before you critique.) As you can see, the scaling /does/ account for the differences in the image. Unless you can't, in which case I can't help you anyway. * runs for 320x192, ignoring the physical size of the screen. * because if we make it fit, we are accused of being unfair * to the Atari 8-bit. And this is a big deal to some people. * Sure a good thing we all do this for FUN. * * this version copies only sqrt and plot into scratchpad * * Fixed point of 3.13 in most cases * Port from DMSC's Atari800XL code * Port to TI-99/4A 9900 and additional opts by Tursi * Assemble with something that likes labels > 6 chars DEF START SQRT EQU >8330 PLOT EQU >8368 * Constants (INTEGER) - TI SX = 320, SY=192 cx equ 160 sx / 2 cy equ 90 sy * 15 / 32 fy equ 56 sy * 7 / 24 * because we aren't allowed to draw to less than 320 X pixels, * we don't need to scale anything, we can use inc/dec on X and Y both * these constants replace the squaring with simple counting loops initZt2 equ >2000 TOFP(1.0) deltaZt equ >0002 TOFP(1.0/(64.0*64.0)) initAZt equ >ff02 deltaZt - TOFP(2.0 / 64.0) stepZt2 equ >0004 2 * deltaZt * The values of delta² are too small for 3.13 bits of fixed-point, so we * use 8 more bits, for an 3.21 bits of precision. This is not slow because * we only use additions. deltaXf equ >0065 TOFP( 20.0*20.0 / (9.0*9.0*sx*sx) * 256 ) - the *256 adds 8 bits of zeros initAXf equ >0065 deltaXf stepXf2 equ >00CA 2 * deltaXf * variables in registers vdpadr equ 15 zt2 EQU 14 azt EQU 13 xs EQU 12 RET EQU 11 * for BL ys EQU 10 zi EQU 9 x1 equ 8 x2 equ 7 axf equ 6 xf2adr equ 5 tmp equ 4 z1 equ 3 y1 equ 2 r1 equ 1 r0 equ 0 * for plot tmp1 equ 4 tmp2 equ 3 * this one is easier to store in memory * but we'll stick in scratchpad after the regs * note we reserve 4 to preserve alignment, but * it is big endian, and left aligned (4th byte is unused) * accessed as *xf2adr for first word, and @xf2+2 for second xf2 EQU >8320 * 3 bytes long START LWPI >8300 * frequently used VDP address li vdpadr,>8c02 * backup scratchpad li r0,>8320 li r1,scratch li r2,112 scrblp mov *r0+,*r1+ dec r2 jne scrblp * copy square root to scratchpad li r0,SQRTX li r1,SQRT sqcplp MOV *R0+,*R1+ CI R0,ENDX JNE sqcplp * set up graphics and sine table BL @BITMAP BL @initsine * clear out the oldY table (entries of 192) li r0,oldY li r1,>C0C0 li r2,160 rlp mov r1,*r0+ dec r2 jne rlp * erase the pattern table LI R0,>4000 write address >0000 CLR R1 LI R2,>1800 BL @VDPFILL * set the color table to white on black LI R0,>6000 write address >2000 LI R1,>F100 LI R2,>1800 BL @VDPFILL * 24-bit variable - accessed by address li xf2adr,xf2 * init defaults li zt2,initZt2 li azt,initAZt li xs,>003f start centered li ys,>003f li zi,128 * ; Outer for loop: loopZi * x1 = xs + cx; // SIGNED mov xs,x1 ai x1,cx * x2 = x1; mov x1,x2 * ** only xf2 needs 24 bits ** * xf2 = 0; clr *xf2adr clr @xf2+2 * sorry for this awful syntax. No, not really. * axf = initAXf; li axf,initAXf * inner for loop loopXi * di = sqrti( (xf2>> + zt2 ); mov *xf2adr,r0 * two MS bytes here, LSB in next byte, so no shift needed a zt2,r0 * if( di >= 0x2000 ) break; (inner loop - moved from the end, no visible difference but frees di) * sqrt returns >2000 if the input is >= 0x2000, so we can just test here and save a step ci r0,>2000 jhe lXiEnd bl @sqrt * return in r0 * we need higher resolution numbers to combine the two multiplies - ratio is *768.1216, * but the whole fraction is important, rounding is VERY visible. * tmp = (0x3244 * di) >> 13; // * (PI/2) - 3.13 * 3.13 = 6.26, shift and truncate li tmp,>3244 mpy tmp,r0 * output in r0,r1 srl r1,13 sla r0,3 soc r1,r0 * merge the two words back to 3.13 (not saved to tmp yet) * tmp = (tmp*489)>>13; * multiplier to go from fixed point max range (1.57) to 3/4 circle (768) li tmp,489 mpy tmp,r0 * output in r0,r1 srl r1,12 * not shifting all the way to get a multiply by 2 for the index sla r0,4 soc r0,r1 * merge the two words back to 3.13 (not saved to tmp yet) * z1 = sinetab[tmp]; mov @sinetab(r1),z1 * tmp = tmp + tmp + tmp; mov r1,tmp a r1,tmp a r1,tmp * z1 += sinefour[tmp&0x3ff]; andi tmp,>07fe * a @sinefour(tmp),z1 * delayed till below * tmp = ((z1 * fy) >> 13); // SIGNED li r0,fy a @sinefour(tmp),z1 * moved from above so we can add and test in one step jlt iisneg mpy z1,r0 srl r1,13 sla r0,3 soc r0,r1 * merge the two words back to 3.13 (not saved to tmp yet) jmp idone iisneg neg z1 mpy z1,r0 * 3.13 x 16.0 = 19.13, LSW is already correct srl r1,13 sla r0,3 soc r0,r1 * merge the two words back to 3.13 (not saved to tmp yet) neg r1 idone * y1 = ys + cy - tmp; mov ys,y1 ai y1,cy s r1,y1 * if( oldY[x1] > y1 ) c @oldY(x1),y1 jl noplot1 * oldY[x1] = y1; mov y1,@oldY(x1) * dc->SetPixel(x1, y1, RGB(0,0,0)); mov x1,r1 mov y1,r0 bl @plot noplot1 * if( oldY[x2] > y1 ) c @oldY(x2),y1 jl noplot2 * oldY[x2] = y1; mov y1,@oldY(x2) * dc->SetPixel(x2, y1, RGB(0,0,0)); mov x2,r1 mov y1,r0 bl @plot noplot2 * end of inner loop processing (normally after di, but it doesn't change) * x1++, x2--, xf2 += axf, axf += stepXf2 inc x1 dec x2 * xf2 needs to be done bytewise, cause we have to split axf for it mov axf,r0 swpb r0 * LSB in MSB position for LSB of xf2 (ab, no need to mask) ab r0,@xf2+2 * add the LSB jnc nocarry inc *xf2adr * add in the carry to the MSW nocarry andi r0,>00FF * MSB in LSB position for MSW of xf2 a r0,*xf2adr * and add in the MSB ai axf,stepXf2 jmp loopXi lXiEnd * outer loop end * zt2 += azt, azt += stepZt2, --xs, --ys, zi-- (condition) a azt,zt2 ai azt,stepZt2 dec xs dec ys dec zi jne loopZi * ; End of program end * restore scratchpad li r0,scratch li r1,>8320 li r2,112 scrrlp mov *r0+,*r1+ dec r2 jne scrrlp waitlp LWPI >83E0 * GPLWS BL @>000E * SCAN (so you can cancel screen blank) LIMI 2 LIMI 0 JMP waitlp ************************************************************************************* * utility code ************************************************************************************* * VDP access * Write R2 bytes from R1 to VDP R0 * Destroys R0,R1,R2 VDPFILL SWPB R0 MOVB R0,*vdpadr SWPB R0 MOVB R0,*vdpadr VMBWLP MOVB R1,@>8C00 DEC R2 JNE VMBWLP B *R11 * load regs list to VDP address, end on >0000 and write >D0 (for sprites) * address of table in R1 (destroyed) LOADRG LOADLP MOV *R1+,R0 JEQ LDRDN SWPB R0 MOVB R0,*vdpadr SWPB R0 MOVB R0,*vdpadr JMP LOADLP LDRDN LI R1,>D000 MOVB R1,@>8C00 B *R11 * Setup for normal bitmap mode BITMAP MOV R11,@SAVE * set display and disable sprites LI R1,BMREGS BL @LOADRG * set up SIT - We load the standard 0-255, 3 times LI R0,>5800 SWPB R0 MOVB R0,*vdpadr SWPB R0 MOVB R0,*vdpadr LI R2,3 CLR R1 LP# MOVB R1,@>8C00 AI R1,>0100 JNE LP# DEC R2 JNE LP# MOV @SAVE,R11 B *R11 * IN AND OUT IN R0 * fractions only > 0.999999 undefined * adapted from dmsc's code * R0 in = 3.13 signed fixed point * Uses separate workspace - looks similar to following * http://samples.sainsburysebooks.co.uk/9781483296692_sample_809121.pdf * uses regs r0-r5 in new workspace SQWP EQU >8324 we need some workspace, this preserves calling regs SQRTX MOV R0,@SQWP LWPI SQWP still have r0! (x) CLR r1 root (r) CLR r2 remHi (h) (r0 is remLo) * clr r4 (q) (doesn't need init, this line just for reference) SLA R0,3 lose the integer part LI r3,13 count = (7+FPSCALE/2) -> 7+6 SQRT0 sla r1,1 r = r<<1; mov r1,r4 q = h + (0xFFFF ^ r); inv r4 a r2,r4 jlt sqrt2 if( q >= 0 ) { r += 2; h = q; } inct r1 mov r4,r2 sqrt2 sla r2,2 h = (h << 2) | (x>>14); mov r0,r5 srl r5,14 soc r5,r2 sla r0,2 x <<= 2; DEC r3 while (--count != 0); JNE SQRT0 MOV r1,@>8300 return r; LWPI >8300 B *R11 * INPUT R1,R0 - kills TMP1,TMP2 as well PLOTX * use the E/A routine for address MOV R0,tmp1 R0 is the Y value. SLA tmp1,5 SOC R0,tmp1 ANDI tmp1,>FF07 MOV R1,tmp2 R1 is the X value. ANDI tmp2,7 A R1,tmp1 tmp1 is the byte offset. S tmp2,tmp1 tmp2 is the bit offset. * inline VDP! SWPB tmp1 set up read address MOVB tmp1,*vdpadr SWPB tmp1 MOVB tmp1,*vdpadr ORI tmp1,>4000 we need this later, and provides a VDP delay MOVB @>8800,R1 read the byte from VDP SWPB tmp1 set up write address MOVB tmp1,*vdpadr SWPB tmp1 MOVB tmp1,*vdpadr SOCB @BITS(tmp2),R1 or the bit and provide VDP delay MOVB R1,@>8C00 write the byte back B *R11 ENDX * init the sine tables * r1 - temp for reflected offset (0-510) * r2 - add value * r3 - current output value * r4 - current change table entry * r5 - table output offset (0-510) * r6 - temp for negative output value * r7,r8 - temp for x0.4 output * r9 - loop counter initSine mov r11,@SAVE * need this to get home! li r2,54 * starting value clr r9 clr r3 nextbyte clr r4 movb @genTable(r9),r4 * we don't have a stack, easier to do it inline bl @genOne bl @genOne bl @genOne bl @genOne bl @genOne bl @genOne bl @genOne bl @genOne * what we DO have is lots of registers inc r9 ci r9,32 jne nextbyte li r3,>2000 bl @genone mov @SAVE,r11 B *R11 * set all four points on the curve, and load both tables genOne li r1,512 s r5,r1 * reflection offset mov r3,r6 neg r6 * negative version mov r3,@sinetab(r5) mov r3,@sinetab+512(r1) mov r6,@sinetab+1024(r5) mov r6,@sinetab+1536(r1) * make the *0.4 version (r3 is always positive here) li r6,>0ccd * 0.4 in 3.13 mov r3,r7 mpy r6,r7 srl r8,13 * shift fraction sla r7,3 * shift int soc r7,r8 * make 3.13 gdone mov r8,r6 neg r6 mov r8,@sinefour(r5) mov r8,@sinefour+512(r1) mov r6,@sinefour+1024(r5) mov r6,@sinefour+1536(r1) inct r5 a r2,r3 * ; Read bit, test if sum must be decreased sla r4,1 jnc nodec dec r2 nodec B *R11 * bits for pixel BITS DATA >8040,>2010,>0804,>0201 * registers for bitmap (and 5A00 is the address of the sprite table) * background is transparent (the only color never redefined) * PDT - >0000 * SIT - >1800 * SDT - >1800 * CT - >2000 * SAL - >1B00 BMREGS DATA >81E0,>8002,>8206,>83ff,>8403,>8536,>8603,>8700,>5B00,>0000 * data for sine generation genTable data >f000,>0200,>0100,>1004,>0404,>0820,>8210,>4221 data >0888,>4444,>4488,>8912,>2448,>9224,>8922,>4912 * BSS section * spot to save return addresses SAVE bss 2 * spot to store the sine table (full 1024 entries) sinetab bss 2048 * sine divided by 4, to remove a multiply inline sinefour bss 2048 * row table for hidden surface (one word per column) oldY bss 640 scratch bss 256-32 END Only changes to the base code were the x-size defines. This pasted code uses a slightly different scratchpad helper (only plot and sqrt), but only sqrt made much difference in scratchpad. Also optimized the end of the loop some, again, noted no visible difference in execution time. Runtimes: Full 8-bit RAM - 24 s Scratchpad assist - 20 s 16-bit RAM - 17 s (yeah, I still had to get one in there ) All versions - offline buffer saves 1 second, just like before. So it appears that the extra pixels do make a difference, largely because I had discounted deltaXf, which is based on the screen width and is related to the inner loop limit. So now that all is right with the world and the 8-bitter is faster, everyone is happy again, and we can go back to playing, right? 2 Quote Link to comment Share on other sites More sharing options...
JamesD Posted March 17, 2015 Share Posted March 17, 2015 (edited) As you can see, the scaling /does/ account for the differences in the image. Unless you can't, in which case I can't help you anyway. Only changes to the base code were the x-size defines. This pasted code uses a slightly different scratchpad helper (only plot and sqrt), but only sqrt made much difference in scratchpad. Also optimized the end of the loop some, again, noted no visible difference in execution time. Runtimes: Full 8-bit RAM - 24 s Scratchpad assist - 20 s 16-bit RAM - 17 s (yeah, I still had to get one in there ) All versions - offline buffer saves 1 second, just like before. So it appears that the extra pixels do make a difference, largely because I had discounted deltaXf, which is based on the screen width and is related to the inner loop limit. So now that all is right with the world and the 8-bitter is faster, everyone is happy again, and we can go back to playing, right? There are certainly 16 bit upgrades for the TI so I see no problem with the 16 bit version. The curves on the front of the image still don't match that of other versions. If you can't see that you are in denial. Is there an error in a table or just one additional optimization? http://atariage.com/forums/topic/215138-bitmap-mode/?p=3183327 Edited March 17, 2015 by JamesD Quote Link to comment Share on other sites More sharing options...
Tursi Posted March 17, 2015 Share Posted March 17, 2015 I'm not in denial, I just don't understand why you're so intent on turning something fun into a battle. Certainly the last one I will participate in. 1 Quote Link to comment Share on other sites More sharing options...
+OLD CS1 Posted March 17, 2015 Share Posted March 17, 2015 I'm not in denial, I just don't understand why you're so intent on turning something fun into a battle. Certainly the last one I will participate in. Others appreciate your efforts. Grain of salt, and all that. I carry a small amount of guilt on the matter: I did mention giving the TI a competitive advantage over the Atari on the topic. As far as this one goes, I have read similar threads with back-and-forths about fairness of comparing platforms in demo compos, so I would expect it to go with the territory any time multiple platforms are involved. Quote Link to comment Share on other sites More sharing options...
JamesD Posted March 17, 2015 Share Posted March 17, 2015 I'm not in denial, I just don't understand why you're so intent on turning something fun into a battle. Certainly the last one I will participate in. I'm pointing out a fact and you are calling it a battle? It's not a battle. I'm not saying your work sucks. In fact, I think you did an awesome job. I'm just saying it doesn't generate the same image so you can't compare it to another program that does generate the same image for bragging rights on speed. At least not without acknowledging the difference in the results. If you compare two versions with floating point that generate the same image, I'm fine with that. If you compare two versions with table lookups that generate the same image, I'm fine with that. If you compare a version with lookup tables versus a floating point version that generate the same image, I'm fine with that as well. But if you want to compare speed between the two groups when the results are clearly different, that's where I have a problem. Even on the same platform. Hence, my statement you are comparing Apples and Oranges. FWIW, someone over in the thread in the Atari area pointed out the assembly Atari version also generates a different image. I'm guessing the Atari image someone posted that looked the same was probably from running an old version by accident. Check out the comparison here: http://manillismo.blogspot.com/2015/03/fedora-hat-diferencias.html A pixel here and there is one thing but that's pretty significant and you'll see close to the same difference on the TI between float and non float versions. When it appeared as though the Atari generated the same image as the floating point version... I thought the difference on the TI version should be pointed out. Since the Atari doesn't generate the same image as the original either I'd say your assembly version is a fair comparison. The TI and Atari assembly versions are close to the same size and speed even though the TI requires some overhead for the VDP. Quote Link to comment Share on other sites More sharing options...
Retrospect Posted March 18, 2015 Share Posted March 18, 2015 The hat thing ..... there's a program for the Powertran Cortex computer with the hat rendering ... .I don't know how long that takes though as I've not tried it (don't have a cortex and won't use the emulator) The powertran cortex was a british kit computer , the idea was brought to us by three engineers at Texas Instruments but they couldn't market it as a full computer. Point is - it uses a 12mhz 9900 family CPU but as far as I know the VDP is somewhat cut-down? Be interesting to see how long it takes to render compared to the 99 and atari? Quote Link to comment Share on other sites More sharing options...
Stuart Posted March 18, 2015 Share Posted March 18, 2015 The hat thing ..... there's a program for the Powertran Cortex computer with the hat rendering ... .I don't know how long that takes though as I've not tried it (don't have a cortex and won't use the emulator) The powertran cortex was a british kit computer , the idea was brought to us by three engineers at Texas Instruments but they couldn't market it as a full computer. Point is - it uses a 12mhz 9900 family CPU but as far as I know the VDP is somewhat cut-down? Be interesting to see how long it takes to render compared to the 99 and atari? Take a look at post #163. Quote Link to comment Share on other sites More sharing options...
Stuart Posted March 18, 2015 Share Posted March 18, 2015 The hat thing ..... there's a program for the Powertran Cortex computer with the hat rendering ... .I don't know how long that takes though as I've not tried it (don't have a cortex and won't use the emulator) The powertran cortex was a british kit computer , the idea was brought to us by three engineers at Texas Instruments but they couldn't market it as a full computer. Point is - it uses a 12mhz 9900 family CPU but as far as I know the VDP is somewhat cut-down? Be interesting to see how long it takes to render compared to the 99 and atari? When you say "The idea was brought to us by ..." does that imply you worked for Powertran at the time? Quote Link to comment Share on other sites More sharing options...
Retrospect Posted March 18, 2015 Share Posted March 18, 2015 (edited) When you say "The idea was brought to us by ..." does that imply you worked for Powertran at the time? no, i meant * us * as in the public, the consumer .... I have never worked for powertran unfortunately. It was advertised , and maybe even sold in kit form, in an electronics magazine in 1982. Edited March 18, 2015 by Retrospect Quote Link to comment Share on other sites More sharing options...
JamesD Posted March 18, 2015 Share Posted March 18, 2015 (edited) The hat thing ..... there's a program for the Powertran Cortex computer with the hat rendering ... .I don't know how long that takes though as I've not tried it (don't have a cortex and won't use the emulator) The powertran cortex was a british kit computer , the idea was brought to us by three engineers at Texas Instruments but they couldn't market it as a full computer. Point is - it uses a 12mhz 9900 family CPU but as far as I know the VDP is somewhat cut-down? Be interesting to see how long it takes to render compared to the 99 and atari? Take a look at post #163. Based on that comparison, the Powertran is about twice as fast as the TI and the breadboard system is over five times as fast as the TI. If my math is correct (New TI time / x = old TI-99 time / other machine time), using the optimization I added after that test run was made... The Powertran should be able to run it in about 0:27:30. The breadboard machine should be able to do it in around 0:09:25 Those are just an estimate but they should be close unless you have to insert delays for the VDP. *edit* The assembly version is tough to estimate but I would expect in the neighborhood of 4 seconds for the breadboard +- a second. Edited March 18, 2015 by JamesD 1 Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted March 20, 2015 Share Posted March 20, 2015 King of the hill. Enough said... http://www.ebay.com/itm/1981-Commodore-CBM-8032-computer-photo-vintage-print-ad-/361246688768?&_trksid=p2056016.l4276 1 Quote Link to comment Share on other sites More sharing options...
+OLD CS1 Posted March 20, 2015 Share Posted March 20, 2015 I want to take that program and run it on a 128 in 80 column mode. Means I have to hook it up this weekend, I suppose Quote Link to comment Share on other sites More sharing options...
Retrospect Posted March 20, 2015 Share Posted March 20, 2015 King of the hill. Enough said... http://www.ebay.com/itm/1981-Commodore-CBM-8032-computer-photo-vintage-print-ad-/361246688768?&_trksid=p2056016.l4276 mmmmm ..... I see a hat on a pet ...... that was false advertising, surely? I had no idea a pet could render anything. Other than a screen of text Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted March 20, 2015 Share Posted March 20, 2015 I think this was an advertisement for a graphics board for the PET. If you look at the code listing, there are graphic commands not native to the PET. So technically its cheating 1 Quote Link to comment Share on other sites More sharing options...
sometimes99er Posted March 20, 2015 Share Posted March 20, 2015 http://oldcomputers.net/pet4032.htmlCommodore also released the CBM 8032 at about the same time as the PET 4032. It is similar, but displays 80 characters per line of text. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.