Jump to content
IGNORED

Graphics 8 Fedora Hat


Wally1

Recommended Posts

What "fair", isn't this just "let's see it go?" :)

 

Besides, I wanted to challenge the Atari guys... I'm not challenging your porting prowess, JamesD :)

 

(And the TI code doesn't fit in scratchpad either, but the Atari doesn't have 4 wait states per memory access, so it'd still be fair even if it did ;) ).

Link to comment
Share on other sites

I doubt an optimized Asm version would be more than 3 times faster than the best optimized Basic version.

 

The way I see it, the best optimization would be to work out what granularity and range is needed for each, then produce square root and trig lookup tables instead of using the functions every iteration.

  • Like 1
Link to comment
Share on other sites

What "fair", isn't this just "let's see it go?" :)

 

Besides, I wanted to challenge the Atari guys... I'm not challenging your porting prowess, JamesD :)

 

(And the TI code doesn't fit in scratchpad either, but the Atari doesn't have 4 wait states per memory access, so it'd still be fair even if it did ;) ).

Porting prowess? It's BASIC. All I have to do is change 3 or 4 lines... except on the TI. What idiot decided this :: is better than this : ?

Oh right... ANSI. Language by committee.

 

If more emulators allowed me to paste code you'd see several more versions. All 8 bit BASICs' built in editors suck.

There's nothing quite like accidentally hitting ESC on an SDL based emulator right when you get the code typed in and working to piss a person off.

 

BTW, the Laser 2001's BASIC interpreter appears to be based largely on Applesoft BASIC.

I now think the Laser 500's BASIC is based on MSX BASIC or Spectravideo BASIC which is roughly the same thing.

 

Link to comment
Share on other sites

I doubt an optimized Asm version would be more than 3 times faster than the best optimized Basic version.

 

The way I see it, the best optimization would be to work out what granularity and range is needed for each, then produce square root and trig lookup tables instead of using the functions every iteration.

 

I did some quick and dirty profiling. For some reason the SQR function is painfully slow. I knocked out something in Action and it was ~60 times as fast.

 

Since everything to the screen is either a BYTE or an INTEGER i.e. 192x320, proper scaling and not using floating point would help a lot. Of course if you just use the built in FP routines it won't make a difference since you would just be quickly calling slow routines. :)

 

I haven't really looked hard enough at the program to tell what is going on. The hints given here certainly help. If everything was scaled before hand to nothing bigger then an INT, it would help. Even if I had a complete understanding of the program with explanations of what is going on, it would still be hard to figure out the why and what for.

 

Good example

180 XL=INT(SQR(20736-ZS)+0.5)

 

Where did the 20736 come from?

  • Like 1
Link to comment
Share on other sites

I think that crept into it with the TI translation.

 

I suspect the whole thing originated elsewhere and was adapted with scaling to the Atari to begin with so suffers a speed penalty just to begin with.

What would be good is to get the algorithm/formula in its purest form so that system adaptions could be done without unnecessary calculations thrown in just to scale the graphics.

  • Like 1
Link to comment
Share on other sites

Besides, I wanted to challenge the Atari guys... I'm not challenging your porting prowess, JamesD :)

I accept the challenge :-)

 

Attached is a version not very optimized, it is stand-alone except for the call to CIOV to set the graphics mode. The specifics are:

- Arithmetic using 3.13 bits signed fixed-point,

- Sine function with a resolution of 10bits on the angle units,

- Square root with 2.12 bits of accuracy,

- Don't plot the hidden lines,

- Only 899 bytes.

 

The runtimes are 43 sec on PAL, 46 sec on NTSC.

 

 

I doubt an optimized Asm version would be more than 3 times faster than the best optimized Basic version.

Well, it is already 20 times faster, and this is without using tables for the multiplication and square root.

 

I will try cleaning the source code to post it later.

fedora.xex

Edited by dmsc
  • Like 2
Link to comment
Share on other sites

Very strange, went back to 2.40 and it does indeed work, so I cleared my settings on 2.60 beta 41 and the best I could get was a half picture of the hat with garbage in the middle of the screen.

 

Odd..

 

Aha...Found out why, its the power up ram settings, set to DMA 3 it works perfectly but I've found stuff that won't work with DMA 3

Edited by Mclaneinc
Link to comment
Share on other sites

Hi!

 

Very strange, went back to 2.40 and it does indeed work, so I cleared my settings on 2.60 beta 41 and the best I could get was a half picture of the hat with garbage in the middle of the screen.

 

Odd..

 

Aha...Found out why, its the power up ram settings, set to DMA 3 it works perfectly but I've found stuff that won't work with DMA 3

I found the bug!!

 

It was missing a "#" in a "lda #0", so it worked if the memory location 0 had a 0 inside :-)

 

Attached is a new version, also 3 bytes shorter.

fedora.xex

  • Like 5
Link to comment
Share on other sites

Hi!

 

 

I found the bug!!

 

It was missing a "#" in a "lda #0", so it worked if the memory location 0 had a 0 inside :-)

 

Attached is a new version, also 3 bytes shorter.

 

Hello. Could you post the source code please? Your program is fast as hell, but the real enjoyment here is the code. :)

Link to comment
Share on other sites

Hi!,

 

Hello. Could you post the source code please? Your program is fast as hell, but the real enjoyment here is the code. :)

Ok, I simplified somewhat the code, removed an extra multiplication, now the code is 829 bytes and the runtimes are 30sec in PAL, 38sec in NTSC.

 

I know how to make the code faster (and perhaps smaller), but it will need rewriting of the inner loop.

 

Attached is the source, compile with CA65:

 

- fedora.s : The main loop.

- math.s : The math functions, sine, sqrt, square (x^2), imul (signed fixed-point multiply)

- plot.s : The plotting routines, gr8, plot.

- macros.inc : Some usefull macros, to simplify the code.

- atari-header.s : The atari XEX header.

- atari-asm.cfg : The linker configuration file.

- Makefile : Makefile to compile all the above.

- genTable.awk : An AWK program that searches for best sine approximations.

fedora-asm.zip

fedora.xex

Edited by dmsc
  • Like 8
Link to comment
Share on other sites

Good example

180 XL=INT(SQR(20736-ZS)+0.5)

 

Where did the 20736 come from?

 

20736 is 144^2 (ie: the maximum size of ZS). That's in the original code from the Analog article, which is what we TI'ers got to start with. ;)

 

Going from memory, some of the parts I worked out:

100 REM ARCHIMEDES SPRIAL
110 REM
120 REM ANALOG MAGAZINE
130 REM
140 GRAPHICS 8+16:SETCOLOR 2,0,0
150 XP=144:XR=4.71238905:XF=XR/XP
160 FOR ZI=-64 to 64
170 ZT=ZI*2.25:ZS=ZT*ZT
180 XL=INT(SQR(20736-ZS)+0.5)
190 FOR XI=0-XL TO XL
200 XT=SQR(XI*XI+ZS)*XF
210 YY=(SIN(XT)+SIN(XT*3)*0.4)*56
220 X1=XI+ZI+160:Y1=90-YY+ZI
230 TRAP 250:COLOR 1:PLOT X1,Y1
240 COLOR 0:PLTO X1,Y1+1:DRAWTO X1,191
250 NEXT XI:NEXT ZI
260 GOTO 260

150 - calculates XF (X-Factor?) - this is just a ratio of the height of the screen (192 pixels) to a full circle in Radians (2PI). I don't know why it's calculated that way, but that's the point, to get Radians for the SIN function. XP and XR are never used again.

 

160 - presumably a Z coordinate loop - didn't look too deep

 

170 - scales ZI, then gets ZS (which is Z-squared, obviously)

 

180 - X Limit is calculated from ZS - which is inverted by subtracting from 20736 - its maximum value. ((64*2.25)^2)=20736. The +0.5 allows for integer rounding up.

 

190 - X loop, pretty well understood already

 

200 - XT is a temp value converted down to radians by the XF. I don't know the original function so I didn't dig deep into what part of it this is.

 

210 - this calculates the actual height of the curve for the current position. More scaling with the *56.

 

220 - fake projection (adding of ZI), origin centering (the 160 and 90), and Y-axis in version to get pixel addresses

 

230 - plots the pixel

 

240 - erases from one below the pixel to the bottom of the screen. This provides the hidden surface removal fakery, but was one of the first things we removed in optimizations (by reversing the loop), since it's slow on the TI to draw. ;)

 

250 - end the loops

 

260 - sit forever.

 

Also, if you check the TI thread, Sometimes99er worked out the ranges of all the variables, and also did some comparison animations showing the effects of integer versus floating point on some of the variables for output. We start about halfway down this page:

 

http://atariage.com/forums/topic/215138-bitmap-mode/page-4

 

 

Edited by Tursi
Link to comment
Share on other sites

Ok, I simplified somewhat the code, removed an extra multiplication, now the code is 829 bytes and the runtimes are 30sec in PAL, 38sec in NTSC.

 

 

Dude! You absolutely kicked my butt! Fantastic! :)

 

I'll have to go through this later and see if there's anything I can steal. The only optimization I had left in my pocket was to delay the screen draw (every pixel takes 7 memory accesses on the TI - two to set VDP address, one to read the byte, one to change the pixel, two to set the address AGAIN, and one more to write it back). If I render to CPU memory (as was suggested), I can skip all the extra VDP access during the runtime at the expense of not getting to watch it draw. ;)

Link to comment
Share on other sites

Hi!,

Dude! You absolutely kicked my butt! Fantastic! :)

Thanks!! :)

 

Well, I redid the loops to avoid squaring the variables, using a recurrence and replaced the multiplication by 0.4 with a multiplication by a better factor (0.40625), so I don't need the multiplication routines anymore.

 

The result is a reduction to 787 bytes, and runtimes of 21.2 seconds on PAL, 22.6 seconds on NTSC.

 

Attached is the new XEX and code, now the profiler shows that the square root routine consumes most of the CPU.

 

By the way, in the Atari you can turn the screen DMA off and turn it on only at the end, reducing the overhead, then the runtime would be only 16 seconds.

fedora-asm.zip

fedora.xex

  • Like 8
Link to comment
Share on other sites

Ok, I simplified somewhat the code, removed an extra multiplication, now the code is 829 bytes and the runtimes are 30sec in PAL, 38sec in NTSC.

 

I know how to make the code faster (and perhaps smaller), but it will need rewriting of the inner loop.

 

Attached is the source, compile with CA65:

 

Thanks for posting the source. I don't have/use CA65 (I'm assuming ©ross(A)ssembler 6500 series), but that's okay as I really just wanted to peek at the code a bit.

Edited by fujidude
Link to comment
Share on other sites

The Apple II version runs in about 32 minutes once compiled with Einstein.
That's really no improvement and I think I used the right options for the best speed.
Clearly, most of the time is spent in the floating point library but this should have eliminated the constant parsing that goes on at runtime.
I'll double check the compiler options and try again if I messed up. I may try The Beagle Compiler to see if it's any better.

  • Like 1
Link to comment
Share on other sites

Hi!

 

Thanks for posting the source. I don't have/use CA65 (I'm assuming ©ross(A)ssembler 6500 series), but that's okay as I really just wanted to peek at the code a bit.

It is the assembler from the CC65 suite, http://cc65.github.io/cc65/ IMHO the best assembler for the 6502 :) :)

 

Seriously, the thing that makes ca65 stand apart from the rest is it support for object files and linking, that makes possible to structure big programs with multiple independent files.

Edited by dmsc
  • Like 3
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...