Jump to content
Tursi

Stock Mandlebrot

Recommended Posts

I wanted to do something that would test the performance of the F18A GPU, and I figured, everyone used to love fractals! So I learned how to and wrote my first Mandlebrot set generator.

 

I wrote two versions, the first version runs on the 9900 CPU for comparison's sake, the second loads almost the same code to the F18A GPU, and runs it there. (Almost because the F18A GPU has direct access to video memory, so there are no functions for loading addresses).

 

Both programs run the same code, and generate the same set. I used regular bitmap mode since a non-F18A test wouldn't have any other choice really, but this does mean there is some substantial color clash that hurts the aesthetics a bit. I did write some code to at least sort it when there are only two colors per 8-pixel block, but when there are more, the last color wins.

 

The set itself, which is drawn in black, ALWAYS wins, so that shape at least should be correct. Happily, even with all the extra color sorting code, this version runs about the same speed as a dumber version I wrote first that always writes the current color, due to needing to set the VDP address only twice per 8 pixels, instead of three times for every pixel. ;)

 

The technical side of it: the full 256x192 resolution is scanned, and the math is using a 16-bit fixed point format, with 1 bit for sign, 2 bits for integer, and 13 bits for fraction (because the set mostly exists in coordinates under 2.0). 16-bit fixed point allowed use of the native MPY (and DIV for scaling) without needing to get too complicated. It scans from -2.0,1.25 for a range of 2.5 units on each axis. The maximum search depth is set to 15 (for 15 colors).

 

I didn't make any special effort to optimize - in particular the code is very small. The search function would certainly fit entirely in scratchpad, which would probably speed things up substantially.

 

Nevertheless, the 9900 based version takes approximately 168 seconds to complete, which I thought was not bad! I expected much worse. However, the GPU version draws the entire thing in just 3 seconds!!

 

The next trick would be updating the F18A version to use a better color mode. ;)

 

Source included.

 

post-12959-0-59944600-1385549675_thumb.jpg

 

TIMandlebrot.zip

  • Like 9

Share this post


Link to post
Share on other sites

and a YouTube video to show it off here:

 

Edited by Tursi
  • Like 8

Share this post


Link to post
Share on other sites

One more point to Mandelbrot set generation: If some of you are using a recent KDE desktop, have a look at the Background settings. There is a Mandelbrot set generator which actually allows to zoom into the graphics with the mouse wheel. Unless you are doing it with a maximum quality setting, the regeneration is tremendously fast.

 

Well, it makes use of all my 8 cores, there should indeed be a good speedup. But even with 1920x1200? Wow.

 

Adding a picture here. (No, this is not from my FRACTALS program :-) This is the seahorse valley (where the big black, apple-shaped part on the right and the part on the left approach.)

post-35000-0-24417100-1385555378_thumb.png

  • Like 1

Share this post


Link to post
Share on other sites

I wanted to do something that would test the performance of the F18A GPU, and I figured, everyone used to love fractals! So I learned how to and wrote my first Mandlebrot set generator.

 

I wrote two versions, the first version runs on the 9900 CPU for comparison's sake, the second loads almost the same code to the F18A GPU, and runs it there. (Almost because the F18A GPU has direct access to video memory, so there are no functions for loading addresses).

 

Both programs run the same code, and generate the same set. I used regular bitmap mode since a non-F18A test wouldn't have any other choice really, but this does mean there is some substantial color clash that hurts the aesthetics a bit. I did write some code to at least sort it when there are only two colors per 8-pixel block, but when there are more, the last color wins.

 

The set itself, which is drawn in black, ALWAYS wins, so that shape at least should be correct. Happily, even with all the extra color sorting code, this version runs about the same speed as a dumber version I wrote first that always writes the current color, due to needing to set the VDP address only twice per 8 pixels, instead of three times for every pixel. ;)

 

The technical side of it: the full 256x192 resolution is scanned, and the math is using a 16-bit fixed point format, with 1 bit for sign, 2 bits for integer, and 13 bits for fraction (because the set mostly exists in coordinates under 2.0). 16-bit fixed point allowed use of the native MPY (and DIV for scaling) without needing to get too complicated. It scans from -2.0,1.25 for a range of 2.5 units on each axis. The maximum search depth is set to 15 (for 15 colors).

 

I didn't make any special effort to optimize - in particular the code is very small. The search function would certainly fit entirely in scratchpad, which would probably speed things up substantially.

 

Nevertheless, the 9900 based version takes approximately 168 seconds to complete, which I thought was not bad! I expected much worse. However, the GPU version draws the entire thing in just 3 seconds!!

 

The next trick would be updating the F18A version to use a better color mode. ;)

 

Source included.

 

attachicon.gifsmCIMG0776.JPG

 

attachicon.gifTIMandlebrot.zip

 

:) NICE! I cannot wait to get my new system up an running to play with this. It'll probably be another two weeks though. :(

 

I do have a couple of questions:

 

1) Does the speed difference have anything to do with the F18A running at 100 MHz (I'm assuming all calculations and processing are being done in the F18A) as opposed to the TI's much, much slower speed?

 

2) From the video you said it was the same set, and it looked the same, any future plans to include a random generator and a looping feature so we can just leave our TI's up making pretty pictures all the time?

 

:thumbsup: :thumbsup: :thumbsup: :thumbsup: :thumbsup: Now THIS was a demo! :thumbsup: :thumbsup: :thumbsup: :thumbsup: :thumbsup:

THANKS!!

Share this post


Link to post
Share on other sites

The set itself, which is drawn in black, ALWAYS wins, so that shape at least should be correct.

To be precise, if you stop early (after 15 iterations) you will falsely paint a location black that is not member of the Mandelbrot set, so your black shape is a upper approximation to the real shape. It actually looks similar, but if we had a much higher resolution and iteration depth we would see that the big right part and the first one on the left are actually connected by a single dot, just one point on the real axis.

 

BTW, the whole Mandelbrot set is simply connected (which is hard to believe when we see the complicated patterns). That means every closed path in the set surrounds only set members.

  • Like 1

Share this post


Link to post
Share on other sites

The next trick would be updating the F18A version to use a better color mode. ;)

 

Very impressive, and interesting finally to see the GPU pushed to its limits. With 'a better color mode' I assume you mean the bitmap layer? I was thinking about doing an image rotation demo, but I'm not sure 4 colors would be enough to create a good effect.

 

Edit: I noticed the GPU demo doesn't work in Classic99, any particular reason?

Share this post


Link to post
Share on other sites

An easy optimization on the F18A GPU side would be to use the PIX instruction. It can be used to calculate the GM2 byte address of a pixel given an X,Y location, or if using the bitmap layer it can plot the pixel as well.

Share this post


Link to post
Share on other sites

1) Does the speed difference have anything to do with the F18A running at 100 MHz (I'm assuming all calculations and processing are being done in the F18A) as opposed to the TI's much, much slower speed?

Yes.. also, the F18A GPU gets much more work done per clock than the real 9900 does. This point of this was to try something where we could give it a push and see what the real world difference looks like.

 

In the GPU version, the entire program, including setting up bitmap mode, happens in the GPU, all the 9900 does is load the program to VDP RAM and start it.

 

2) From the video you said it was the same set, and it looked the same, any future plans to include a random generator and a looping feature so we can just leave our TI's up making pretty pictures all the time?

I don't actually have any future plans, although I suppose a random function is easy enough. My expectation is that fully random, even in this space, would more often than not give you a solid colored screen, though...?

 

To be precise, if you stop early (after 15 iterations) you will falsely paint a location black that is not member of the Mandelbrot set

At 256x192 resolution, it's ALL approximation. :) But yes, the set is amazingly intricate, I learned a lot in my research.

 

Very impressive, and interesting finally to see the GPU pushed to its limits. With 'a better color mode' I assume you mean the bitmap layer?

I don't know yet. The bitmap layer would have no clash, of course!

 

I was thinking about doing an image rotation demo, but I'm not sure 4 colors would be enough to create a good effect.

For a rotation demo? Sure it would!

 

Edit: I noticed the GPU demo doesn't work in Classic99, any particular reason?

Yes, the GPU initializes the VDP registers itself, and that code is not in Classic99 yet (well, not in the released version ;) ). If you breakpoint at the start and manually set the VDP registers as per the "BMREGS" label in the source (using the debugger "VRx=y"), then the rest will work (that was how I first tested it). It won't be as fast as the real thing though, though, the GPU in Classic99 runs about the speed of overdrive mode, which is slower than the real GPU.

 

An easy optimization on the F18A GPU side would be to use the PIX instruction. It can be used to calculate the GM2 byte address of a pixel given an X,Y location, or if using the bitmap layer it can plot the pixel as well.

I looked at it, but the number of instructions I needed to get my coordinates packed into a single word for PIX was approaching the number of instructions needed to do the calculation manually. After poking at it for a bit I shrugged it off and left it alone. Rewriting the pixel loop counters to use bytes in the same word would allow use of PIX for that easy performance boost, though I didn't feel that a rewrite of that size was a fair comparison. :)

 

The code's not optimized at all, this is my first pass at it. There are a few places that could be rebuilt for faster performance, lots of loading immediates that don't need to be and reloading and retesting data that has already been tested. But... the goal was the comparison and to do something I hadn't done before but always liked to see, so good enough. :)

Share this post


Link to post
Share on other sites

This demo shows what new chips can do in a old machine. Very very fast!

 

I do wish we had a 9938/9958 version as that would be 512x384 pixels and 256 colors. It would also look more like a modern PC output.

 

Very impressive work and demo Tursi.

  • Like 1

Share this post


Link to post
Share on other sites

This demo shows what new chips can do in a old machine. Very very fast!

 

I do wish we had a 9938/9958 version as that would be 512x384 pixels and 256 colors. It would also look more like a modern PC output.

 

Very impressive work and demo Tursi.

 

Thanks. As for your wish, why not just use the one mizapf posted in the 9938 thread?

Share this post


Link to post
Share on other sites

Very nice work Tursi, very impressive! :thumbsup: :thumbsup:

Pretty amazing how fast the GPU runs the program - seems like making use of the F18A's processing power could come in useful to some creative coder.

Edited by RobertLM78

Share this post


Link to post
Share on other sites

Very nice work Tursi, very impressive! :thumbsup: :thumbsup:

Pretty amazing how fast the GPU runs the program - seems like making use of the F18A's processing power could come in useful to some creative coder.

 

I agree, Tursi's a TI-God! He never ceases to amaze...

 

... but when you talk about some creative coder other than Tursi, Rasmus popped into my mind. With the wizardry Rasmus pulled off in his last two games, I'm wondering what's he up to and what's going to be next? :)

  • Like 1

Share this post


Link to post
Share on other sites

 

I agree, Tursi's a TI-God! He never ceases to amaze...

 

... but when you talk about some creative coder other than Tursi, Rasmus popped into my mind. With the wizardry Rasmus pulled off in his last two games, I'm wondering what's he up to and what's going to be next? :)

Indeed - I've been playing with Magellan on MESS the past week or so - although I'd like to try it out on the actual hardware, I'm just not sure what files to omit to get the game onto a SSSD disk, if that's at all possible...

Share this post


Link to post
Share on other sites

Indeed - I've been playing with Magellan on MESS the past week or so - although I'd like to try it out on the actual hardware, I'm just not sure what files to omit to get the game onto a SSSD disk, if that's at all possible...

 

Assuming you mean TI Scramble, you don't need the TISC file.

Share this post


Link to post
Share on other sites

Assuming you mean TI Scramble, you don't need the TISC file.

 

I did mean Megellan - I got confused with who had authored it :dunce: . But I'll remember that - looking at the Scramble thread, it looks like something worth trying out too :).

 

Megellan has a few music files it appears, so I was thinking to try and cut some of those out to see if I can get it on one SSSD disk.

Share this post


Link to post
Share on other sites

 

I did mean Megellan - I got confused with who had authored it :dunce: . But I'll remember that - looking at the Scramble thread, it looks like something worth trying out too :).

 

Megellan has a few music files it appears, so I was thinking to try and cut some of those out to see if I can get it on one SSSD disk.

 

I'm afraid that won't work. Magellan is for the PC. The samples are graphics/map files. :)

Share this post


Link to post
Share on other sites

Robert, concerning your profile icon, you're not seriously showing us you losing a base in round 1 of TI Invaders? My goodness. First base to lose is at about 4000 points on a bad day, really. :grin:

 

TI Invaders was my very first game cartridge which I got for Christmas 1982, age of 13. I'm not quite sure how far I got over time; at least until the magenta, pulsating, button-like enemies; I think there were some blue blinkers even later (first blinkers are green). And I got to the xxx in the mothership round (hitting the 500 points).

 

Maybe we should start a competition in old TI games? There was a suggestion from Wolfgang Bertsch at the last meeting in Eindhoven, using the Errorfree website: http://www.errorfree.de/Menu16.html

Share this post


Link to post
Share on other sites

Robert, concerning your profile icon, you're not seriously showing us you losing a base in round 1 of TI Invaders? My goodness. First base to lose is at about 4000 points on a bad day, really. :grin:

:rolling: LOL - I was hoping that no one would notice that - the picture is more of a showcase for MESS than a showcase of my abilities on TI Invaders... that said, I'm only okay at the game, only making to the first blinkers, and typically, I don't make it past the "worms".

Share this post


Link to post
Share on other sites

 

I'm afraid that won't work. Magellan is for the PC. The samples are graphics/map files. :)

By PC only, do you mean emulation only? Because I have ran Magellan in MESS :).

 

Nevermind - I have run Titanium in MESS - indeed, everything I've mentioned is about Titanium, not Magellen.... whoops! :dunce: :dunce:

 

I'm afraid I got myself rather confused... anyway, Titanium I'd like to try out on the real hardware, but not sure what, if anything, can be omitted... sorry about the confusion there Rasmus :).

Edited by RobertLM78

Share this post


Link to post
Share on other sites

 

Thanks. As for your wish, why not just use the one mizapf posted in the 9938 thread?

 

 

If you look at that thread you will see the issue in MESS is the EVPC is a hinderance as it did not duplicate the color palette like the TIM card or Geneve so the colors are way off.

 

If you watched my video I consistently point this out.

Share this post


Link to post
Share on other sites

By PC only, do you mean emulation only? Because I have ran Magellan in MESS :).

 

Nevermind - I have run Titanium in MESS - indeed, everything I've mentioned is about Titanium, not Magellen.... whoops! :dunce: :dunce:

 

I'm afraid I got myself rather confused... anyway, Titanium I'd like to try out on the real hardware, but not sure what, if anything, can be omitted... sorry about the confusion there Rasmus :).

 

You don't need the objects files TIC and TIO.

Share this post


Link to post
Share on other sites

 

You don't need the objects files TIC and TIO.

Cool 8) - thanks Rasmus! I'll be doing some file transferring to the TI today - so I can play some Titanium on the real hardware ;).

 

Sorry about the derailment Tursi - back to the topic, I was amazed by the speed of the MandleGPU, just trying it on the hardware last night (especially considering how the 9900 handles it :D). I bet making use of that speed could come in real handy :).

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...