Graphics 8 Fedora Hat

JamesD · March 4, 2015

I made the following changes to get it to run on an Apple IIe emulator.

The scale probably needs further adjustment to get it exact but it runs.
Takes over 32 minutes to complete the image at standard speed.
The Coleco ADAM would require similar changes but with different CX and SX values.

100 SX=120:SY=56:SZ=64:CX=280:CY=192

130 HGR2:COLOR 3

240     IF RR(X1)>Y1 THEN RR(X1)=Y1:HPLOT X1,Y1

260     IF RR(X1)>Y1 THEN RR(X1)=Y1:HPLOT X1,Y1

With TV emulation enabled on the emulator.

JamesD · March 5, 2015

MSX.
The value of SX may need reduced.
I tried testing it on an emulator but the keyboard emulation kept screwing up my typing so I said the heck with it.

100 SX=115:SY=56:SZ=64:CX=256:CY=192

130 SCREEN 2

240     IF RR(X1)>Y1 THEN RR(X1)=Y1:PSET(X1,Y1)

260     IF RR(X1)>Y1 THEN RR(X1)=Y1:PSET(X1,Y1)

Britishcar · March 5, 2015

These mods are fun. It's interesting to see the increased speed via optimization, etc.

JamesD, I think there's a typo in your A2E code:

Line:

130 HGR2:COLOR 3

Should be:

130 HGR2:HCOLOR=3

Does anyone have a TI-99/4a Extended BASIC version worked out?

JamesD · March 5, 2015

These mods are fun. It's interesting to see the increased speed via optimization, etc.

JamesD, I think there's a typo in your A2E code:

Line:

130 HGR2:COLOR 3

Should be:

130 HGR2:HCOLOR=3

Does anyone have a TI-99/4a Extended BASIC version worked out?

You are correct. I couldn't cut and past from the emulator so I had to retype the lines and messed up.

TI-99/4a BASIC doesn't have support for bitmap oriented commands and neither does TI Extended BASIC.

It would take a lengthy subroutine to implement SET(X1,Y1)

It would have to decide which screen character needs modified, then read that character, modify the proper byte and write it back.

Expect run times to be many hours *if* you can even modify a character like that.

You end up DIV x1 by character width and y1 by character height, get the character, figure out which row of the character the bit is on and then or the specific bit with that row of the character.

Keep in mind bit number produced by this math is opposite of bit order in the character so you need an IF THEN or ON GOTO setup with 8 possibilities.

Now write the character back.

I don't even know if modifying the character that way is even possible.

The code isn't so much complicated as it is lengthy. SET(X1,Y1) is probably as long or longer than the code that would call it.

That's why people used The Missing Link.

I found the manual here:

ftp://ftp.whtech.com/programming/The%20Missing%20Link%20software%20manual.pdf

Super Extended BASIC appears to have commands that would do the trick as well.

I'm not sure the code I posted in the TI area needs the call to "PD" before "PIXEL".

Edited March 5, 2015 by JamesD

JamesD · March 5, 2015

You end up DIV x1 by character width and y1 by character height, get the character, figure out which row of the character the bit is on and then or the specific bit with that row of the character.

Actually, for the required graphic mode, rather than getting the character # you calculate the address of that character, add the offset to the byte and then you need to PEEK it from video RAM (VPEEK?). Then you can OR the byte with the bit and write it back with a POKE. Whatever BASIC you use will need direct support for the graphics mode or the ability to manually send commands to the VDP.

Given decent documentation I could figure it out but I'm guessing someone has previously done it.

JamesD · March 6, 2015

This is the TI-99/4a BASIC version.
It requires Extended BASIC and a third party BASIC extension called The Missing Link that adds bitmapped graphics support.
I didn't bother completely figuring out the proper scale. The reduced screen width (240) is needed because of how the TI VDP RAM is used.
DIM does not appear to support a variable as a parameter.
The CALL LINK commands are making calls to The Missing Link.

100 SX=100::SY=56::SZ=64::CX=240::CY=192
110 DIM RR(256)
120 FOR I=0 TO CX::RR(I)=CY::NEXT I
130 CALL LINK("CLEAR")
140 CX=CX*0.5::CY=CY*0.46875::FX=SX/64::FZ=SZ/64
150 XF=4.71238905/SX
160 FOR ZI=64 TO -64 STEP -1
170   ZT=ZI*FX::ZS=ZT*ZT
180   XL=INT(SQR(SX*SX-ZS)+0.5)
190   ZX=ZI*FZ+CX::ZY=CY+ZI*FZ
200   FOR XI=0 TO XL
210     XT=SQR(XI*XI+ZS)*XF
220     YY=(SIN(XT)+SIN(XT*3)*0.4)*SY
230     X1=XI+ZX::Y1=ZY-YY
240     IF RR(X1)>Y1 THEN RR(X1)=Y1:: CALL LINK("PIXEL",Y1,X1)
250     X1=ZX-XI
260     IF RR(X1)>Y1 THEN RR(X1)=Y1:: CALL LINK("PIXEL",Y1,X1)
270   NEXT XI::NEXT ZI
280 GOTO 280

JamesD · March 6, 2015

Commodore Plus/4 changes

130 COLOR 0,1:COLOR 1,2:GRAPHIC 1,1
240     IF RR(X1)>Y1 THEN RR(X1)=Y1:DRAW 1,X1,Y1
260     IF RR(X1)>Y1 THEN RR(X1)=Y1:DRAW 1,X1,Y1

Edited March 6, 2015 by JamesD

JamesD · March 6, 2015

Keep in mind that I've been doing just enough to get things running, I'm not worried about the code being perfect.

devwebcl · March 6, 2015

These mods are fun. It's interesting to see the increased speed via optimization, etc.

I agree 100%

JamesD · March 7, 2015

The Coleco ADAM requires similar changes as the Apple II since it's BASIC is based on Applesoft II but with a screen size the same as the CoCo 1/2
Output looks almost identical to the CoCo 1/2 with this color choice but the 9918 has more colors to choose from.
Any difference with this color choice would be due to differences in the floating point library.

101 SX=.8*SX:SY=.8*SY:SZ=.8*SZ
 
130 HGR2:COLOR=3

240 IF RR(X1)>Y1 THEN RR(X1)=Y1:HPLOT X1,Y1

260 IF RR(X1)>Y1 THEN RR(X1)=Y1:HPLOT X1,Y1

*edit*
It took a little over 13 minutes to render in MESS but I don't know how accurate the ADAM MESS driver is.
Other benchmarks have shown the ADAM BASIC to be fast so the results might be accurate but this machine driver supposedly has problems so it might be a little off.

Edited March 7, 2015 by JamesD

Tursi · March 8, 2015

Sorry for peeking in, I was curious how fast you guys had the Atari 8-bit version running.

Hope you don't mind if I cross-post too, then.

I think I have the record in the TI forum for both slowest and fastest port. I have a version that runs in pure TI BASIC (redefining the characters similar to how JamesD describes above, though the routine isn't all that long). I didn't let it run to completion, and it had to be scaled down to draw most of it, as only 127 characters are available to be redefined, but based on how far it got I had a rough estimate of around 45 hours to complete. A later optimized version that has the draw enhancements and also let you scale down both the size of the output and the step of the loops was able to draw the whole hat at 0.4 scale, but that's not a fair test since it also did only 40% of the loops.

9 DIM COLS(255)
10 DIM CC$(126),HPAT$(3),UPAT$(3)
11 FOR A=0 TO 126
12 CC$(A)="0000000000000000"
13 NEXT A
15 OLDMR=-1
17 OLDDR=-1
19 OLDCH=33
20 CURCHAR=33
25 STARTCHAR=33
30 HEX$="0123456789ABCDEF"
32 HPAT$(0)="89ABCDEF89ABCDEF"
34 HPAT$(1)="45674567CDEFCDEF"
36 HPAT$(2)="23236767ABABEFEF"
38 HPAT$(3)="1133557799BBDDFF"
50 GOTO 5000
99 REM     PLOT A DOT AT DOTCOL,DOTROW (0-BASED)  
100 MR=INT(DOTROW/8)+1
130 MC=INT(DOTCOL/8)+1
135 IF (MR=OLDMR)*(MC=OLDMC)THEN 150
140 CALL GCHAR(MR,MC,CH)
145 OLDMR=MR
146 OLDMC=MC
150 IF CH>=STARTCHAR THEN 160
152 IF CURCHAR>159 THEN 310
155 CH=CURCHAR
156 CALL HCHAR(MR,MC,CH)
157 CURCHAR=CURCHAR+1
160 AROF=CH-STARTCHAR
170 TC$=CC$(AROF)
180 XC=DOTCOL-(MC-1)*8
190 P=(DOTROW-(MR-1)**2+1
210 IF XC<4 THEN 220
212 P=P+1
214 XC=XC-4
220 X$=SEG$(TC$,P,1)
260 TT$=SEG$(HPAT$(XC),POS(HEX$,X$,1),1)
265 IF TT$=X$ THEN 300
270 TC$=SEG$(TC$,1,P-1)&TT$&SEG$(TC$,P+1,16-P)
280 IF CH=OLDCH THEN 290
285 CALL CHAR(OLDCH,CC$(OLDCH-STARTCHAR))
287 OLDCH=CH
290 CC$(AROF)=TC$
300 RETURN
305 REM    OUT OF INK!  
310 CALL SCREEN(7)
320 RETURN
4998 REM    MAIN CODE START - INIT THE DISPLAY  
5000 FOR A=1 TO 16
5010 CALL COLOR(A,16,2)
5020 NEXT A
5022 FOR A=0 TO 255
5024 COLS(A)=193
5026 NEXT A
5030 CALL CLEAR
5040 CALL SCREEN(16)
5050 INPUT "SCALE? (0.5 RECOMMENDED):":SCALE
5053 PRINT "STEP? (";INT((1/SCALE)*10)/10;"RECOMMENDED):";
5055 INPUT ST
5060 CALL CLEAR
5069 REM TIMING - REQUIRES A CLOCK DEVICE
5070 REM OPEN #1:"CLOCK"
5075 REM INPUT #1:A$,B$,SS$
5089 REM    DRAW A HAT! 
5090 XP=144
5091 XR=4.71238905
5092 XF=XR/XP
5100 FOR ZI=64 TO -64 STEP -ST
5110 ZT=ZI*2.25
5111 ZS=ZT*ZT
5120 XL=INT(SQR(20736-ZS)+0.5)
5130 FOR XI=0-XL TO XL STEP ST
5140 XT=SQR(XI*XI+ZS)*XF
5150 YY=(SIN(XT)+SIN(XT*3)*0.4)*56
5160 DOTCOL=INT((XI+ZI)*SCALE+128+.5)
5161 IF (DOTCOL>255)+(DOTCOL<0)THEN 5190
5163 DOTROW=INT((96-YY+ZI)*SCALE+.5)
5164 IF (DOTROW>191)+(DOTROW<0)THEN 5190
5165 IF COLS(DOTCOL)<=DOTROW THEN 5190
5169 COLS(DOTCOL)=DOTROW
5170 GOSUB 100
5190 NEXT XI
5192 NEXT ZI
5199 REM    FINISHED - SIT FOREVER   
5200 REM INPUT #1:A$,B$,SE$
5201 CALL SCREEN(2)
5205 CALL CHAR(OLDCH,CC$(OLDCH-STARTCHAR))
5210 CALL KEY(0,K,S)
5220 IF S=0 THEN 5210
5230 PRINT SS$,SE$
5240 END

The assembly version manages the code in 26 seconds (emulated, should be close to right). It's using a 256 entry lookup table for sine and 9.7 fixed point numbers for most of the values. You can see that the limited accuracy impacts the image in places, but it's pretty close. A 24-bit number with more fractional bits would probably be enough, but 16-bit is the native word size on the TI.

 DEF START
* THIS VERSION USES ALL THE OPTIMIZATIONS TO DATE.
 * PLUS SCRATCHPAD UTILITIES AND INLINE SINE LOOKUP
 * THANKS TO SOMETIMES99ER FOR WORKING OUT THE DATA!
* relocated to scratchpad - addresses worked
 * out by hand! Use caution when modifying them!
 SQRT EQU >8324
 PLOT EQU >8350
 SMULT EQU >838E
 DRAWPX EQU >83A8
 *FREE EQU >83F8 - only 8 bytes of scratchpad free!
* LABELS FOR SAVE UTILITY
 SLOAD
 SFIRST
  B @START
 
 * array for highest pixel
 ROWS
  BSS 256
 
 * backup for scratchpad, we're going to just
 * blindly decimate it. So we need to restore
 * it before we let the console interrupt run
 * at the end of execution. I could be picky,
 * selective, or careful, but this works too. 
 SCRATCH
  BSS 224
 
 * bits for pixel
 BITS
  DATA >8040,>2010,>0804,>0201
 
 * SINE TABLE - 9.7 fixed point entries, 256 total
 SINTAB
  DATA 0,3,6,9,13,16,19,22
  DATA 25,28,31,34,37,40,43,46
  DATA 49,52,55,58,60,63,66,68
  DATA 71,74,76,79,81,84,86,88
  DATA 91,93,95,97,99,101,103,105
  DATA 106,108,110,111,113,114,116,117
  DATA 118,119,121,122,122,123,124,125
  DATA 126,126,127,127,127,127,127,127
  DATA 127,127,127,127,127,127,127,126
  DATA 126,125,124,123,122,122,121,119
  DATA 118,117,116,114,113,111,110,108
  DATA 106,105,103,101,99,97,95,93
  DATA 91,88,86,84,81,79,76,74
  DATA 71,68,66,63,60,58,55,52
  DATA 49,46,43,40,37,34,31,28
  DATA 25,22,19,16,13,9,6,3
  DATA 0,-3,-6,-9,-13,-16,-19,-22
  DATA -25,-28,-31,-34,-37,-40,-43,-46
  DATA -49,-52,-55,-58,-60,-63,-66,-68
  DATA -71,-74,-76,-79,-81,-84,-86,-88
  DATA -91,-93,-95,-97,-99,-101,-103,-105
  DATA -106,-108,-110,-111,-113,-114,-116,-117
  DATA -118,-119,-121,-122,-122,-123,-124,-125
  DATA -126,-126,-127,-127,-127,-128,-128,-128
  DATA -128,-128,-128,-128,-127,-127,-127,-126
  DATA -126,-125,-124,-123,-122,-122,-121,-119
  DATA -118,-117,-116,-114,-113,-111,-110,-108
  DATA -106,-105,-103,-101,-99,-97,-95,-93
  DATA -91,-88,-86,-84,-81,-79,-76,-74
  DATA -71,-68,-66,-63,-60,-58,-55,-52
  DATA -49,-46,-43,-40,-37,-34,-31,-28
  DATA -25,-22,-19,-16,-13,-9,-6,-3
 
 * note: NOT in memory, so don't use @XF
 * 9.7 signed fixed point variables in registers
 XF EQU 15
 XT EQU 14
 YY EQU 13
* INTEGER VALUES
 ZS EQU 12
 * RET EQU 11 - for BL
 ZI EQU 10
 XL EQU 9
 XI EQU 8
* 32-bit temp, uses 6 and 7
 T32B EQU 7
 T32 EQU 6
* Temp vars
 T16 EQU 5
 T1 EQU 4
 T2 EQU 3
 NEGFL EQU 2
* PIXEL VARIABLES
 X1 EQU 1
 Y1 EQU 0
* out of registers, use RAM (these ARE @ZY)
 ZX EQU >8320
 ZY EQU >8322
* return save
 SAVE
  BSS 2
 
 * registers for bitmap (and 5A00 is the address of the sprite table)
 * background is transparent (the only color never redefined)
 * PDT - >0000
 * SIT - >1800
 * SDT - >1800
 * CT  - >2000
 * SAL - >1B00
 BMREGS DATA >81E0,>8002,>8206,>83ff,>8403,>8536,>8603,>8700,>5B00,>0000
 
 START
  LWPI >8300
* LOAD THE ROWS ARRAY WITH 192 ENTRIES
  LI R0,ROWS
  LI R1,192*256
  LI R2,256
 LP1
  MOVB R1,*R0+
  DEC R2
  JNE LP1
 
 * backup scratchpad
  LI R0,>8320   * skip our WP
  LI R1,SCRATCH
  LI R2,56   * 4 bytes at a time
 LS1
  MOV *R0+,*R1+
  MOV *R0+,*R1+
  DEC R2
  JNE LS1
 
 * now copy utilities in
  LI R0,SQRTX   * first function
  LI R1,>8324   * first free word
 LC1
  MOV *R0+,*R1+  * copy one word
  CI R0,SLAST   * check for done (thus no unroll)
  JL LC1
* 140 GRAPHICS 8+16:SETCOLOR 2,0,0
  BL @BITMAP
 * erase the pattern table
  CLR R0
  CLR R1
  LI R2,>1800
  BL @VDPFILL
 * set the color table to white on black
  LI R0,>2000
  LI R1,>F100
  LI R2,>1800
  BL @VDPFILL
 
 * 130 XP=144:XR=4.71238905:XF=XR/XP
 * I'm not sure why they spelled it this way...
 * goal of the above math is to covert the Y axis
 * of 192 pixels into one circle in Radians (2PI).
 * It would have been more clear if XP was 192
 * and XR was 6.2831854, these values seem
 * obfuscated. Anyway, that's what it is.
 * To avoid conversion to radians then back to
 * my sine table units, we can just adjust the
 * scale factor. For me, 192 needs to equal
 * 256, so my ratio is 256/192=1.333333
 * which is >00A9 in fixed point (169, losing the .3333)
 * As an added bonus, we can clip to the right
 * range by simply masking now.
  LI XF,>00A9
* 140 FOR ZI=64 TO -64 STEP -1
 * Making this an integer!
  LI ZI,64
 L160
 
 * 150 ZT=ZI*2.25:ZS=ZT*ZT
 * We have to do two multiplies here, so we're going
 * to end up in a 32-bit value temporarily anyway. That
 * actually makes life a little easier.
 * 2.25 * 128 = 288, WHICH IS >120
 * note: ZT not used 
  LI T32,>0120
  MOV ZI,T1
  ABS T1   * this is okay, because we are going to square it anyway
  MPY T1,T32
 
 * now T32 is 32-bits wide, and contains an 25.7 bit number.
 * ZI(16.0) times T32 (9.7) yields 25.7 bits.
 * So since we want a 9.7, we just have to take the least
 * significant word, no shifting needed! Of course we ignore
 * the possibility of overflow, but the largest value should
 * be 64*2.25 = 144, which fits in 9 bits.
 * now just put them into place, and multiply again
 * we know from analysis that the 'sign bit' shouldn't be set here
  MOV T32B,T32
  MOV T32B,T1
  MPY T1,T32
 
 * So, T32 now contains a 32-bit 18.14 number, but for simplicity we
 * are going to move that down into ZS as a 16-bit unsigned integer
 * so we just need to extract 16 bits of integer, as we don't expect overflow
 * and don't want fraction. Of course, those 16 bits are split across the
 * two words...
  MOV T32B,ZS  * least significant - we want two bits from this
  SRL ZS,14  * toss the rest
  SLA T32,2  * prepare the most significant
  SOC T32,ZS  * and merge it in
 
 * 160 XL=INT(SQR(20736-ZS)+0.5)
 * ZS is a normal int, so this shouldn't be too bad to start
 * the result is also an int, and the +0.5 is just for rounding
 * our sqrt will return one of our fractional values, as noted,
 * to be consistent.
  LI T1,20736
  S ZS,T1
  BL @SQRT  * T1 IN AS positive INT, T1 OUT AS 9.7
  SRL T1,7  * make an integer for counting
  MOV T1,XL  * and store it
* 170 ZX=ZI+160:ZY=90+ZI
  MOV ZI,T1
  AI T1,127  * smaller screen
  MOV T1,@ZX
  MOV ZI,T1
  AI T1,90
  MOV T1,@ZY
* 180 FOR XI=0 TO XL
 * even this loop always executes once (0 to 0), so
 * I can put the condition at the bottom.
  CLR XI
 L190
* 190 XT=SQR(XI*XI+ZS)*XF
 * pretty similar to above, again we are squaring to get positive
 * so that makes the unsigned MPY easier to deal with
 * XT needs to be integer now, not 9.7
  MOV XI,T32  * Integer (always positive now)
  MPY XI,T32  * XI*XI - 16.0 * 16.0 = 32.0, so just take the LSW
 MOV T32B,T1  * least significant - still 16.0
  A ZS,T1   * add ZS (we're an integer so can just add - max is 41472, so unsigned!)
  BL @SQRT  * T1 in as positive int, T1 OUT as 9.7
  MOV XF,T32  * prepare to mult - we know these values are positive
  MPY T1,T32  * do it - 9.7*9.7 = 18.14
* it matters to keep the fraction for the XT*3 below, so, keep it
  SRL T32B,7  * make room, throwing away 7 fractional bits
  SLA T32,9  * get the more significant bits into the right place
  SOC T32,T32B * merge the two 16-bit words
 MOV T32B,XT
* 200 YY=(SIN(XT)+SIN(XT*3)*0.4)*55 -- was 55, needed to adjust for rounding errors
 * order of op, we do SIN(XT*3)*0.4 first...
  MOV XT,T1  * prepare for second sine
  A XT,T1   * simpler than MPY by 3, no need to shift result
  A XT,T1
  SRL T1,6  * shift out fraction, but multiply by 2 (we'll trim the extra bit below)
  INC T1   * rounding
  ANDI T1,>01FE * mask for lookup
  MOV @SINTAB(T1),T2
  LI T1,>0033  * roughly 0.4 (actually 0.398)
  BL @SMULT  * Signed multiply, result in T32B
  MOV T32B,T16
 
  SRL XT,6  * shift out fraction, but multiply by 2 (we'll trim the extra bit below)
  INC XT   * rounding
  ANDI XT,>01FE * mask for lookup (We don't use XT again)
  MOV @SINTAB(XT),T2
  A T16,T2
  LI T1,>1B80  * 55 x less than 1 will be less than 55, so it fits
  BL @SMULT  * Signed multiply, result in T32B 
* We can just make YY an integer right here
  SRA T32B,7  * discard fraction (sign extend!)
  MOV T32B,YY
 
 * now go plot the two pixels
  BL @DRAWPX
* 250 NEXT XI
  INC XI
  C XI,XL   * I know it's always positive now,
  JLE L190  * so I can use an unsigned test
* 255 NEXT ZI
 L255
  DEC ZI
  CI ZI,-65
  JGT L160
 
 * 260 GOTO 260
 * restore scratchpad before enabling interrupts
  LI R0,SCRATCH
  LI R1,>8320   * skip our WP
  LI R2,56   * 4 bytes at a time
 LS2
  MOV *R0+,*R1+
  MOV *R0+,*R1+
  DEC R2
  JNE LS2
WAIT
  LIMI 2
  LIMI 0
  JMP WAIT
 
 * VDP access
* Write single byte to R0 from MSB R1
 * Destroys R0 (actually just oRs it)
 VSBW
  ORI R0,>4000
  SWPB R0
  MOVB R0,@>8C02
  SWPB R0
  MOVB R0,@>8C02
  MOVB R1,@>8C00
  B *R11
* Write R2 bytes from R1 to VDP R0
 * Destroys R0,R1,R2
 VDPFILL
  ORI R0,>4000
  SWPB R0
  MOVB R0,@>8C02
  SWPB R0
  MOVB R0,@>8C02
 VMBWLP
  MOVB R1,@>8C00
  DEC R2
  JNE VMBWLP
  B *R11
 
 * Write address or register
 VDPWA
  SWPB R0
  MOVB R0,@>8C02
  SWPB R0
  MOVB R0,@>8C02
  B *R11
 
 * load regs list to VDP address, end on >0000 and write >D0 (for sprites)
 * address of table in R1 (destroyed)
 LOADRG
 LOADLP
  MOV *R1+,R0
  JEQ LDRDN
  SWPB R0
  MOVB R0,@>8C02
  SWPB R0
  MOVB R0,@>8C02
  JMP LOADLP
 LDRDN
  LI R1,>D000
  MOVB R1,@>8C00
  B *R11
* Setup for normal bitmap mode
 BITMAP
  MOV R11,@SAVE
* set display and disable sprites
  LI R1,BMREGS
  BL @LOADRG
 
 * set up SIT - We load the standard 0-255, 3 times
  LI R0,>5800
  BL @VDPWA
  CLR R2
 NQ#
  CLR R1
 LP#
  MOVB R1,@>8C00
  AI R1,>0100
  CI R1,>0000
  JNE LP#
  INC R2
  CI R2,3
  JNE NQ#
 
  MOV @SAVE,R11
  B *R11
* use this and a listing to get scratchpad addresses for the fctns
 * AORG >8324
* IN AND OUT IN T1
 * T1 in = integer
 * T1 out = 9.7 signed fixed point
 * Uses T2,X1,Y1,T32
 * http://samples.sains...mple_809121.pdf
 * modified a bit - we pretend the input is a 16.8 value (the
 * entire fractional part will be 0), that let's us get out a
 * 8.8 value, because the algorithm needs an even number of fractional
 * bits. Then we just shift once to get .7
 SQRTX
  CLR X1    root
  CLR T2    remHi (t1 is remLo)
  LI Y1,16   count = ((WORD/2-1)+(FRACBITS>>1)) -> 11+4, +1 for loop
SQRT1
  SLA T2,2   remHi = (remHi << 2) | (remLo >> 14);
  MOV T1,T32
  SRL T32,14
  SOC T32,T2
  SLA T1,2   remLo <<= 2;
  SLA X1,1   root <<= 1;
  MOV X1,T32   testDiv = (root << 1) + 1;
  SLA T32,1
  INC T32
  C T2,T32   if (remHi >= testDiv) {
  JL SQRT2
  S T32,T2   remHi -= testDiv;
  INC X1    root += 1;
 SQRT2
  DEC Y1    while (--count != 0);
  JNE SQRT1
 
  MOV X1,T1   return( root);
  SRL T1,1   Get it down to x.7 fixed point
  B *R11
* INPUT X1,Y1 - kills T1,T2 as well
 PLOTX
 * use the E/A routine for address
  MOV  Y1,T1        R1 is the Y value.
  SLA  T1,5
  SOC  Y1,T1
  ANDI T1,>FF07
  MOV  X1,T2        R0 is the X value.
  ANDI T2,7
  A    X1,T1        T1 is the byte offset.
  S    T2,T1        T2 is the bit offset.
 
 * inline VDP!
  SWPB T1    set up read address
  MOVB T1,@>8C02
  SWPB T1
  MOVB T1,@>8C02
  ORI T1,>4000  we need this later, and provides a VDP delay
  MOVB @>8800,R1  read the byte from VDP
  SWPB T1    set up write address
  MOVB T1,@>8C02
  SWPB T1
  MOVB T1,@>8C02
  SOCB @BITS(T2),R1 or the bit and provide VDP delay
  MOVB R1,@>8C00  write the byte back
 B *R11
 
 * signed fixed point multiply - T1 * T2 = T32B
 * ONLY T2 is allowed to be negative!! Result
 * will be negative if T2 was.
 * Uses T1,T2,NEGFL,T32,T32B
 SMULTX
  CLR NEGFL  * temp flag for negative
  MOV T2,T32  * prepare for mult and test
  JGT NOTNEG1
  SETO NEGFL  * it is negative, so remember and make positive
  ABS T32
 NOTNEG1
  MPY T1,T32  * does the multiply - you know the drill, fix up number
 SRL T32B,7  * make room, throwing away 7 fractional bits
  SLA T32,9  * get the more significant bits into the right place
  SOC T32,T32B * merge the two 16-bit words
 MOV NEGFL,NEGFL * check if it should be negative
  JEQ NOTNEG2
  NEG T32B  * yes, it should
 NOTNEG2
  B *R11
 
 DRAWXX
  MOV R11,@SAVE * need this to get back!
 
 * 210 X1=XI*0.75+ZX:Y1=ZY-YY
 * XI can never be negative now, so we can remove all that code
  MOV XI,X1  * integer
  LI T32,>0060 * 0.75
  MPY X1,T32  * now 25.7, so just take the LSW (unsigned mult!)
  AI T32B,>40  * 0.5 in x.7, for rounding
  SRA T32B,7  * make integer for the plot function (sign extend!)
  MOV T32B,X1  * get the integer
  A @ZX,X1  * add (integer) ZX
 MOV @ZY,Y1  * get ZY (integer)
  S YY,Y1   * subtract YY (integer)
* 220 IF RR(X1)>Y1 THEN RR(X1)=Y1:PLOT X1,Y1
  SWPB Y1   * stupid Big Endian....
  MOV Y1,T16  * plot kills X1,Y1, and we need Y1 again
  CB @ROWS(X1),Y1
  JLE L230
  MOVB Y1,@ROWS(X1)
  SWPB Y1
 * NOTE: PLOT EXPECTS THE PIXEL IN REGISTERS X1,Y1
  BL @PLOT
 
 * 230 X1=ZX-XI*0.75
 L230
  MOV @ZX,X1
  S T32B,X1  * use the scaled X1 on both sides of the origin
* 240 IF RR(X1)>Y1 THEN RR(X1)=Y1:PLOT X1,Y1
  MOV T16,Y1  * get it back, still swapped
  CB @ROWS(X1),Y1
  JLE L250
  MOVB Y1,@ROWS(X1)
  SWPB Y1
 * NOTE: PLOT EXPECTS THE PIXEL IN REGISTERS X1,Y1
  BL @PLOT
 
 * Return to caller
 L250
  MOV @SAVE,R11
  B *R11
 
 SLAST
  END

devwebcl · March 8, 2015

A faster version, but only for the complete wireframe:

It took 22 minutes 17 seconds in Altirra w/ TBXL (at full speed):

http://manillismo.blogspot.com/2015/03/fedora-fast-wireframe.html

140 GRAPHICS 8+16:SETCOLOR 2,0,0
150 XP=144:XR=4.71238905:XF=XR/XP

151 REM VERSION RAPIDA COMPLETA (IN LINEA 240)
155 COLOR 1
159 REM LA MITAD?  -64
160 FOR ZI=0 TO 64
    
    170 ZT=ZI*2.25:ZS=ZT*ZT
    180 XL=INT(SQR(20736-ZS)+0.5)
    
    181 ZX=ZI+160:ZY=90+ZI
    182 ZX2=-ZI+160:ZY2=90-ZI
    
    185 REM LA MITAD?  0-XL
    190 FOR XI=0 TO XL
    
        200 XT=SQR(XI*XI+ZS)*XF
        210 YY=(SIN(XT)+SIN(XT*3)*0.4)*56
        
        219 REM TODO: SACAR 90-
        220 X1 = XI+ZX:Y1 =ZY-YY
        221 X12= XI+ZX2:Y12=ZY2-YY
        222 X13=-XI+ZX:Y13=ZY-YY
        223 X14=-XI+ZX2:Y14=ZY2-YY
        
        230 trap 250: PLOT X1,Y1:PLOT X12,Y12:PLOT X13,Y13:PLOT X14,Y14
        
    250 NEXT XI: NEXT ZI

260 GOTO 260

dmsc · March 8, 2015

Hi!

A faster version, but only for the complete wireframe:

It took 22 minutes 17 seconds in Altirra w/ TBXL (at full speed):

http://manillismo.blogspot.com/2015/03/fedora-fast-wireframe.html

You can do better (and still with hidden lines removed) by using a little of trigonometry, now runtime in TBXL (PAL) is 16min 16sec:

100 SX=144:SY=56:SZ=64:CX=320:CY=192
110 C1=2.2*SY:C2=1.6*SY
120 DIM RR(CX)
130 FOR I=0 TO CX:RR(I)=CY:NEXT I
140 GRAPHICS 8+16:SETCOLOR 2,0,0:COLOR 1
150 CX=CX*0.5:CY=CY*0.46875:FX=SX/64:FZ=SZ/64
160 XF=4.71238905/SX
170 FOR ZI=64 TO -64 STEP -1
180   ZT=ZI*FX:ZS=ZT*ZT
190   XL=INT(SQR(SX*SX-ZS)+0.5)
200   ZX=ZI*FZ+CX:ZY=CY+ZI*FZ
210   FOR XI=0 TO XL
220     A=SIN(SQR(XI*XI+ZS)*XF)
230     Y1=ZY-A*(C1-C2*A*A)
240     X1=XI+ZX
250     IF RR(X1)>Y1 THEN RR(X1)=Y1:PLOT X1,Y1
260     X1=ZX-XI
270     IF RR(X1)>Y1 THEN RR(X1)=Y1:PLOT X1,Y1
280   NEXT XI
290 NEXT ZI

In NTSC, the runtime is slower, 17min 20sec. Note that if you turn off DMA during the drawing, the runtime is only 12min 15sec.

devwebcl · March 8, 2015

Hi!

You can do better (and still with hidden lines removed) by using a little of trigonometry, now runtime in TBXL (PAL) is 16min 16sec:
100 SX=144:SY=56:SZ=64:CX=320:CY=192
110 C1=2.2*SY:C2=1.6*SY
120 DIM RR(CX)
130 FOR I=0 TO CX:RR(I)=CY:NEXT I
140 GRAPHICS 8+16:SETCOLOR 2,0,0:COLOR 1
150 CX=CX*0.5:CY=CY*0.46875:FX=SX/64:FZ=SZ/64
160 XF=4.71238905/SX
170 FOR ZI=64 TO -64 STEP -1
180   ZT=ZI*FX:ZS=ZT*ZT
190   XL=INT(SQR(SX*SX-ZS)+0.5)
200   ZX=ZI*FZ+CX:ZY=CY+ZI*FZ
210   FOR XI=0 TO XL
220     A=SIN(SQR(XI*XI+ZS)*XF)
230     Y1=ZY-A*(C1-C2*A*A)
240     X1=XI+ZX
250     IF RR(X1)>Y1 THEN RR(X1)=Y1:PLOT X1,Y1
260     X1=ZX-XI
270     IF RR(X1)>Y1 THEN RR(X1)=Y1:PLOT X1,Y1
280   NEXT XI
290 NEXT ZI
In NTSC, the runtime is slower, 17min 20sec. Note that if you turn off DMA during the drawing, the runtime is only 12min 15sec.

but trying showing the complete image, not hiding anything at all.

dmsc · March 8, 2015

Hi!,

but trying showing the complete image, not hiding anything at all.

Well, you can replace lines 250 and 270 with a simple PLOT X1,Y1 and remove the lines 120 and 130 this will draw the complete wireframe.

I can't run it now, but I suspect will be about the same speed.

Edited March 8, 2015 by dmsc

devwebcl · March 8, 2015

Hi!,

Well, you can replace lines 250 and 270 with a simple PLOT X1,Y1 and remove the lines 120 and 130 this will draw the complete wireframe.

I can't run it now, but I suspect will be about the same speed.

I doubt it will be at the same speed.

I am saving several iterations at:

160 FOR ZI=0 TO 64

That's why I am using four PLOT's

JamesD · March 8, 2015

Stupid comment deleted

Edited March 8, 2015 by JamesD

JamesD · March 8, 2015

Microsoft basic isn't known for blinding speed but it's still competitive.

The last change cut the CoCo 3 NTSC time to 19:15.
Combining lines resulted in 19:14. Over a lot more lines it would add up but not here because there are so few lines.
Renumbering only helps with GOTOs and GOSUBs so no change there.
Defining the most used variables first will speed up Microsoft BASIC a bit and the first test of this resulted in 18:44.
If I put a little more effort into it I could probably cut that a little more but finding what is optimal would require many trials.

Enabling the 6309 native mode on a CoCo 3 equipped with one results in about a 21% speed increase on the same code.

That should let this complete in 14:48.

BASIC-09 on the CoCo 3 should complete this in under 5 minutes, possibly in under 2 if the ration on Ahl's benchmark holds but it's tough to tell without testing.

The BBC, Apple IIgs and IIc+ should all do very well running this due to their higher clock speed.

Assembly language vs BASIC is obviously not a fair comparison.
For that matter, neither is comparing a machine rendering 256x192 vs other machines rendering 320x192.

It's also not fair comparing a machine with a 2 color screen vs 16 color.
If this were a benchmark you would want everyone rendering the same size image and same number of colors if possible.
Also remember that a small code sample may run faster on the TI because you can stick code in scratchpad RAM but you won't always get the same speedup with a larger program.

dmsc · March 8, 2015

Hi!,

I doubt it will be at the same speed.

I am saving several iterations at:
160 FOR ZI=0 TO 64
That's why I am using four PLOT's

Yes, I know. But your timings don't look right. If I remove the IF's and the array initialization to my program, I get 15min 23sec in PAL, 16min 25sec in NTSC. This is on an 800XL with TBXL 1.5 (emulated).

If I combine my program with yours, I get a runtime of 8min 47sec on PAL, or 9min 21sec on NTSC:

100 SX=144:SY=56:SZ=64:CX=320:CY=192
110 C1=2.2*SY:C2=1.6*SY
120 GRAPHICS 8+16:SETCOLOR 2,0,0:COLOR 1
130 CX=CX*0.5:CY=CY*0.46875:FX=SX/64:FZ=SZ/64
140 XF=4.71238905/SX
150 FOR ZI=0 TO 64
160   ZT=ZI*FX:ZS=ZT*ZT:ZT=ZI*FZ
170   XL=INT(SQR(SX*SX-ZS)+0.5)
180   ZX1=CX+ZT:ZY1=CY+ZT
190   ZX2=CX-ZT:ZY2=CY-ZT
200   FOR XI=0 TO XL
210     A=SIN(SQR(XI*XI+ZS)*XF)
220     A=A*(C1-C2*A*A)
230     Y1=ZY1-A:Y2=ZY2-A
240     X1=ZX1+XI:X3=ZX2+XI
250     X2=ZX1-XI:X4=ZX2-XI
260     IF Y1<191.5 THEN PLOT X1,Y1:PLOT X2,Y1
270     PLOT X3,Y2:PLOT X4,Y2
280   NEXT XI
290 NEXT ZI
300 GOTO 300

Note that your program had a bug, you were TRAP-ing on points below the screen and the missing plotting the corresponding points above. This is why I put the "IF Y1<191.5" over the first two points only.

JamesD · March 8, 2015

VZ200 completed it in 15:27 but it only supports 128x64 from BASIC.

100 SX=50:SY=18:SZ=18:CX=128:CY=64
140 MODE(1):COLOR 4,0
250 IF RR(X1)>Y1 THEN RR(X1)=Y1:SET(X1,Y1)
270 IF RR(X1)>Y1 THEN RR(X1)=Y1:SET(X1,Y1)

JamesD · March 8, 2015

Hi!,

Yes, I know. But your timings don't look right. If I remove the IF's and the array initialization to my program, I get 15min 23sec in PAL, 16min 25sec in NTSC. This is on an 800XL with TBXL 1.5 (emulated).

If I combine my program with yours, I get a runtime of 8min 47sec on PAL, or 9min 21sec on NTSC:
100 SX=144:SY=56:SZ=64:CX=320:CY=192
110 C1=2.2*SY:C2=1.6*SY
120 GRAPHICS 8+16:SETCOLOR 2,0,0:COLOR 1
130 CX=CX*0.5:CY=CY*0.46875:FX=SX/64:FZ=SZ/64
140 XF=4.71238905/SX
150 FOR ZI=0 TO 64
160   ZT=ZI*FX:ZS=ZT*ZT:ZT=ZI*FZ
170   XL=INT(SQR(SX*SX-ZS)+0.5)
180   ZX1=CX+ZT:ZY1=CY+ZT
190   ZX2=CX-ZT:ZY2=CY-ZT
200   FOR XI=0 TO XL
210     A=SIN(SQR(XI*XI+ZS)*XF)
220     A=A*(C1-C2*A*A)
230     Y1=ZY1-A:Y2=ZY2-A
240     X1=ZX1+XI:X3=ZX2+XI
250     X2=ZX1-XI:X4=ZX2-XI
260     IF Y1<191.5 THEN PLOT X1,Y1:PLOT X2,Y1
270     PLOT X3,Y2:PLOT X4,Y2
280   NEXT XI
290 NEXT ZI
300 GOTO 300
Note that your program had a bug, you were TRAP-ing on points below the screen and the missing plotting the corresponding points above. This is why I put the "IF Y1<191.5" over the first two points only.

That only drew the back half of the image for me. Changing line 150 to start at -64 caused it to draw the entire image.

devwebcl · March 9, 2015

That only drew the back half of the image for me. Changing line 150 to start at -64 caused it to draw the entire image.

Actually that's the optimization... I only draw a half of the image in axis-y and the rest is calculated at runtime in the same position, that's the reason it should be faster (I am only considering math optimization, not VBLANK, NTSC/PAL or other hacks).

fujidude · March 9, 2015

I translated the original program as found in Analog magazine into a Python program. I chose Tkinter as the GUI library, since it is part of the standard Python package. This is my 1st GUI program in Python, thus it is pretty simplistic as far as GUI elements are concerned.

This program took aproximately a second to run on my core i7, and most of that was just the GUI app setting up to open, not the actual calculations. I know that because as I was making the program and testing, it took almost as long to run a program with just one instruction to draw a single line.

Anyway, I hope no one gets upset that I introduced something modern here that isn't retro. I would like to explain myself in advance in that regard: I no longer have any retro hardware. I use emulation on modern machines for my retro fixes. But the look and feel of the software and programing environments of the retro equipment isn't the only aspect of retro I enjoy. I really loved the exploration and learning back in those early days, especially as it concerned making programs. And that same magic is captured for me again on modern systems, with the Python programming language. It's pretty close to a universal language these days, kind of like BASIC was in the past. It is interpreted also, so it is quick to develop with; again, just like BASIC. It comes preinstalled on Linux and Mac OSX. Is freely available for Windows also.

So without further delay, for those who are interested, here is the Python version listing:

#-------------------------------------------------------------------------------
# Name:        archimedes.py
# Purpose:     mplement the Archimedes' spiral prgoram in Python 2.x
#
# Author:      fujidude for just the Python version, original code
#              Charles Bachand, pub. Antic magazine, issue 7, pp.60-61.
#
# Created:     08-03-2015
#-------------------------------------------------------------------------------


# import necessary modules
from Tkinter import *
import math


def end():
    rootwin.destroy()


rootwin=Tk() # main (root) application window based on Tkinter
rootwin.wm_title("Archimedes' Spiral")

quitBtn=Button(rootwin,text="Exit",command=end)
quitBtn.pack(side="bottom")

graphzone=Canvas(rootwin, width = 320, height = 192, bg = "black")
graphzone.pack()

XP = 144
XR = 4.71238905
XF = XR / XP

for ZI in range(-64, 65):
    ZT = ZI * 2.25
    ZS = ZT * ZT
    XL = int(math.sqrt(20736 - ZS) + 0.5)
    for XI in range(0 - XL, XL+1):
        XT = math.sqrt(XI * XI + ZS) * XF
        YY = (math.sin(XT) + math.sin(XT * 3) * 0.4) * 56
        X1 = XI + ZI + 160
        Y1 = 90 - YY + ZI
        graphzone.create_oval(X1, Y1, X1, Y1, fill = "white")
        graphzone.create_line(X1, Y1+1, X1, 191, fill="black") # remove this line for transparent version

rootwin.mainloop()

Again, that is more or less a direct style translation of the original Analog listing. I might try to make some of the optimization changes suggested here in another version. Depending on if there is even a shred of interest.

JamesD · March 9, 2015

Actually that's the optimization... I only draw a half of the image in axis-y and the rest is calculated at runtime in the same position, that's the reason it should be faster (I am only considering math optimization, not VBLANK, NTSC/PAL or other hacks).

I see what you are doing and after some tracing the subtractions aren't working in several places. I'm guessing an emulation issue or a bad ROM image for the emulator.

*edit*

Or variable names are limited to 2 digits. Du-Oh!

Edited March 9, 2015 by JamesD

dmsc · March 9, 2015

Hi!,

I see what you are doing and after some tracing the subtractions aren't working in several places. I'm guessing an emulation issue or a bad ROM image for the emulator.

*edit*

Or variable names are limited to 2 digits. Du-Oh!

Yes, it is strange.

Attached is a disk image with both programs, the wireframe only and the visible faces, it is a bootable image that autoloads a menu to select which program to run.

Also, I included the timing calculations for both PAL and NTSC, to verify runtimes.

fedora.atr

Graphics 8 Fedora Hat

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members