Jump to content
IGNORED

fast memory copy


Recommended Posts

Memcopy using indirect addressing for speed? What about using self modified code instead? ;)

 

Sometimes you can't use self-modifying code, for instance, running code from cart.

 

I think we sometimes obsess about speeding up specific pieces of code, when really they are "good enough" and you'd be better off working on gameplay or other mechanics. :) Not you, specifically of course... just in general us 8-bit programmers.

  • Like 3
Link to comment
Share on other sites

Sometimes you can't use self-modifying code, for instance, running code from cart.

Running from a Cart should automatically imply to use unrolled code and lookup tables.

Using a cart also means to have more RAM of the real machine available, to put the self modifying code there.

 

I think we sometimes obsess about speeding up specific pieces of code, when really they are "good enough" and you'd be better off working on gameplay or other mechanics. :) Not you, specifically of course... just in general us 8-bit programmers.

Good enough?

Every single frame counts, so why not putting the speed to the limits with using the fastest routines?

Link to comment
Share on other sites

@Franco

 

I just stumbling across this in your source ;)

 

'; Generic memcpy optimized for speed. No overlapping'

 

that was my intension and was wondering why then using ZP. but as usual... we are not talking here about the "fastest" generic memcopy routine here... so no offence here.

Link to comment
Share on other sites

Running from a Cart should automatically imply to use unrolled code and lookup tables.

Using a cart also means to have more RAM of the real machine available, to put the self modifying code there.

 

 

Good enough?

Every single frame counts, so why not putting the speed to the limits with using the fastest routines?

 

 

Because it uses more memory? Because it in most cases just doesn't matter?

We are here in the Programming part of the forums. So for you in German:

 

 

Wenn der Kuchen spricht, haben die Krümel Pause

 

:)

 

 

To clarify for the non-German speakers;

 

http://www.phrasen.com/uebersetze,Wenn-der-Kuchen-spricht-haben-die-Kruemel-Pause,76808,d.html

 

Although I am not sure if the English version really means the same as the German. But it is close enough. After all, we are all thinking the same thing here :P

Edited by Creature XL
  • Like 1
Link to comment
Share on other sites

@Franco

 

I just stumbling across this in your source ;)

 

'; Generic memcpy optimized for speed. No overlapping'

 

that was my intension and was wondering why then using ZP. but as usual... we are not talking here about the "fastest" generic memcopy routine here... so no offence here.

 

I have no problem with that!

 

Setup code will be similar for the fastest version (direct addr) without loop unrolling. If you use loop unrolling, the setup code will be ugly. Of course you can use macros, but I hate macros. (It's not something rational, it's just a feeling).

 

Be sure that I would go for the uglier version if I really need to.

Link to comment
Share on other sites

Although I am not sure if the English version really means the same as the German. But it is close enough. After all, we are all thinking the same thing here :P

Ich kenne da auch was:

 

ngan rub maitschop , ngan lai maiköy ... guess the origin ;)

 

Or, in other words: a PICTURE that isn't worth drawing , also doesn't need a line to be drawn...

 

 

This sentence fits in a phantastic way to most of "new A8 projects" ... you know , exactly those "unfinished" ... or better unfinishable projects.

 

 

Using NOT the optimized routines for memory copying, means to have speed issues in simple games ,just like Prince of Persia... but that's another story..... or not ... ?

 

 

You could put it in other words: If you can look over the completed project, it's worth to create every part of it.

Edited by emkay
Link to comment
Share on other sites

Using NOT the optimized routines for memory copying, means to have speed issues in simple games ,just like Prince of Persia... but that's another story..... or not ... ?

 

 

You could put it in other words: If you can look over the completed project, it's worth to create every part of it.

 

 

Well, now that you put it on topic... My current version of Prince of Persia is "fast enough" to render each screen, at least faster than the original version, and it's using my "slow" memcpy:

 

 

BTW: The original Prince of Persia uses self modifying code, but the screen handling on Apple can turn very complex.

  • Like 4
Link to comment
Share on other sites

Well, now that you put it on topic... My current version of Prince of Persia is "fast enough" to render each screen, at least faster than the original version, and it's using my "slow" memcpy:

 

 

BTW: The original Prince of Persia uses self modifying code, but the screen handling on Apple can turn very complex.

Wrong video?

Link to comment
Share on other sites

Look at "Karateka", side a side.

 

 

Even due all "adapted" optimizations, the A8 version doesn't run faster. There are also nifty glitches in the masking on the Atari due to the 8 bit instead of 7 bit pixels... and people want 4 colour objects there. How will you solve that without all possible speed optimizing?

Edited by emkay
Link to comment
Share on other sites

Look at "Karateka", side a side.

 

Even due all "adapted" optimizations, the A8 version doesn't run faster. There are also nifty glitches in the masking on the Atari due to the 8 bit instead of 7 bit pixels... and people want 4 colour objects there. How will you solve that without all possible speed optimizing?

 

Karateka needs to move the whole screen, do the masking with the mountain and the big portals. In that video mode just a complete "sta" takes how much? 3 frames? I don't get impressed if Karateka has slow scrolling.

 

Prince of Persia on the other hand, doesn't need scrolling. Ok, the rendering IS complex (isometric tiles with masks), but there is no need to move the entire screen for each movement of the player, so I can turn off DMA, make selective drawing and as shown in the video, it takes "only" 8-9 frames to redraw the entire screen.

Edited by Franco Catrin
Link to comment
Share on other sites

Karateka needs to move the whole screen, do the masking with the mountain and the big portals. In that video mode just a complete "sta" takes how much? 3 frames? I don't get impressed if Karateka has slow scrolling.

Not sure if that "some scanline" overlay can be named scrolling.

PoP uses 3 parts on the screen for movement and the masking "back" & "front" happens sometimes at the same time on different positions.

What the Atari can do and what people want is often NOT the same, so they don't tell all problems, when wanting to have a game "C64" like...

 

 

And with the following picture , I'll end the PoP discussion in this thread.

 

post-2756-0-48065100-1430245377_thumb.jpg

 

Let's wait and see the final stuff ...

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...