42bs Posted July 7, 2019 Share Posted July 7, 2019 I made a small test drawing the Phobyx logo wobbling with a sinus (like demo0006) once with a chain of 102 SCBs (w/o size and palette reloading) and as 102 separate sprites (flipped, but this is only the drawing order). (See https://github.com/42Bastian/lynx_hacking/tree/master/chained_scbs). Chained is always quicker and on a Lynx II it can be drawn in a single frame at 60Hz. But not on Handy. Just wanted to share this. 4 Quote Link to comment Share on other sites More sharing options...
+bhall408 Posted July 7, 2019 Share Posted July 7, 2019 2 hours ago, 42bs said: Chained is always quicker and on a Lynx II it can be drawn in a single frame at 60Hz. But not on Handy. Just wanted to share this. Is this issue exposed in any released games? Is it worth trying to fix/address? Quote Link to comment Share on other sites More sharing options...
42bs Posted July 7, 2019 Author Share Posted July 7, 2019 1 hour ago, bhall408 said: Is this issue exposed in any released games? Is it worth trying to fix/address? You mean fix Handy? I think it is important to know, that Handy is slower w/ respect to sprite painting. But mostly if you count cycles (or just at the edge of running 30fps or only 20fps). 1 Quote Link to comment Share on other sites More sharing options...
+bhall408 Posted July 7, 2019 Share Posted July 7, 2019 2 hours ago, 42bs said: You mean fix Handy? I think it is important to know, that Handy is slower w/ respect to sprite painting. But mostly if you count cycles (or just at the edge of running 30fps or only 20fps). Yes, I mean fix Handy... Quote Link to comment Share on other sites More sharing options...
drludos Posted July 7, 2019 Share Posted July 7, 2019 2 hours ago, 42bs said: You mean fix Handy? I think it is important to know, that Handy is slower w/ respect to sprite painting. But mostly if you count cycles (or just at the edge of running 30fps or only 20fps). As someone who is trying to make a 60fps game on Handy, I'm glad to read that! I'm quite impressed that the Lynx can draw a 102 sprite chain at 60fps, someone will maybe make a manic shooter game on it someday! Did you make your test with Mednafen too, as it's apparently *faster* than real hardware? Quote Link to comment Share on other sites More sharing options...
42bs Posted July 8, 2019 Author Share Posted July 8, 2019 Mednafen does nor work on my PC. But anyway, AFAIK all emulators use the same code base: Keith Wilkins Handy. Since it makes no difference what kind of video output or scaling I use, I guess it is a "limitation" in the Suzy emulation. Quote Link to comment Share on other sites More sharing options...
42bs Posted July 9, 2019 Author Share Posted July 9, 2019 (edited) Added a 2nd version, now 100 tiles each 16x10 pixels. Takes slightly more time to draw. https://github.com/42Bastian/lynx_hacking/tree/master/chained_scbs2 Edited July 9, 2019 by 42bs Quote Link to comment Share on other sites More sharing options...
42bs Posted July 9, 2019 Author Share Posted July 9, 2019 On 7/7/2019 at 11:40 PM, bhall408 said: Yes, I mean fix Handy... It seems the cycle calculation is either wrong or too rough. Quote Link to comment Share on other sites More sharing options...
VladR Posted July 9, 2019 Share Posted July 9, 2019 On 7/7/2019 at 3:56 PM, drludos said: As someone who is trying to make a 60fps game on Handy, I'm glad to read that! I'm quite impressed that the Lynx can draw a 102 sprite chain at 60fps, someone will maybe make a manic shooter game on it someday! Did you make your test with Mednafen too, as it's apparently *faster* than real hardware? Yeah, just not with 102 sprites and at 60 fps You are, quite literally, multiplying the processing cycles per sprite by 100 here, so it's going to add up real fast. Now, with ~56,467 cycles of CPU time available (after Mikey reads FrameBuffer), that means you have about ~564 cycles per 1 sprite to handle its behavior, to keep within confines of a frame time. That should be doable and you could still have 30 fps (plenty for such small screen anyway). Now, if you had, say, 15 enemies, and each with ~7 bullets, that's around 105 sprites on screen. Now, if the bullets were just horizontal, vertical they wouldn't have to be transparent, so that should help. Still, it would be interesting to see, how transparency directly affects the performance. 13 hours ago, 42bs said: Added a 2nd version, now 100 tiles each 16x10 pixels. Takes slightly more time to draw. https://github.com/42Bastian/lynx_hacking/tree/master/chained_scbs2 What's the exact device timing here ? I presume those tiles are not transparent, correct ? What were the sprite dimensions / total pixels drawn in the first benchmark ? This interests me a great deal, because , obviously, this was the first thing that popped into my mind when reading the docs : when doing flatshading, does it make sense to burn cycles on creating a list of chained scanlines ? Problem is, that list is different each and every frame, so it's questionable if it even can be faster, given that so many scanlines are so short (less than 10 px) - as for those, you don't want to loose any more time that you already did). I'm not doing the benchmark just yet, must resist and focus on the working game first Clearly, 16 MHz Blitter [paired with 8-bit CPU], is really, really nice Quote Link to comment Share on other sites More sharing options...
drludos Posted July 10, 2019 Share Posted July 10, 2019 Oh by-the-way, I have another "performance question" related to Suzy: Is it faster to render a small sprite stretched to a larger size by Suzy using the SCB or to render a big sprite directly? For example, if I want to draw a blue fullscreen background, would it be faster for Suzy to draw a 1px*1px blue sprite stretched to 160*102 or to draw directly a 160*102 blue rectangle sprite? Quote Link to comment Share on other sites More sharing options...
42bs Posted July 10, 2019 Author Share Posted July 10, 2019 4 hours ago, drludos said: Oh by-the-way, I have another "performance question" related to Suzy: Is it faster to render a small sprite stretched to a larger size by Suzy using the SCB or to render a big sprite directly? For example, if I want to draw a blue fullscreen background, would it be faster for Suzy to draw a 1px*1px blue sprite stretched to 160*102 or to draw directly a 160*102 blue rectangle sprite? Not tested yet, but I assume, reading 8160+ bytes and then writing 8160 bytes should take longer than just reading 5 and writing 8160. Quote Link to comment Share on other sites More sharing options...
42bs Posted July 10, 2019 Author Share Posted July 10, 2019 Vladr, first writes whole screen, second "only" 160x100 (hence each tile is 16x10). Quote Link to comment Share on other sites More sharing options...
42bs Posted July 10, 2019 Author Share Posted July 10, 2019 11 minutes ago, 42bs said: Not tested yet, but I assume, reading 8160+ bytes and then writing 8160 bytes should take longer than just reading 5 and writing 8160. Prove: No difference. I tried: Packed 2 color sprite (10x1) sized up 160x102 => 3ms Literal 16 color sprite (1x1) sized up 160x102 => 3ms Literal 16 color sprite (160x102) => 3ms Suzy, I stand corrected 2 Quote Link to comment Share on other sites More sharing options...
42bs Posted July 10, 2019 Author Share Posted July 10, 2019 8 hours ago, VladR said: What's the exact device timing here ? I presume those tiles are not transparent, correct ? No difference if I draw normal (with transparent) and background sprites. Always 12ms for cls+100 tiles. 1 Quote Link to comment Share on other sites More sharing options...
VladR Posted July 10, 2019 Share Posted July 10, 2019 Man, I would kill to see the internal HW implementation. This is the same bulls*it as with Jaguar's Blitter where it doesn't matter whether I draw two nontransparent bitmaps 768x240 or if they are transparent. I strongly encourage everybody to try to implement it in SW and compare the results. Which means there's only one - albeit bloody insane - explanation : The HW always treats the payload same in the inner loop and performs the RenderTarget Read + per-pixel condition. Even for non-transparent ones, where it's not needed. Even if they had a separate silicon (with parallel execution) for this particular purpose (which I doubt), it still should be faster to just dump the payload than apply per-pixel conditioning. Quote Link to comment Share on other sites More sharing options...
VladR Posted July 10, 2019 Share Posted July 10, 2019 There's one more scenario related to scanline scaling (that I'm currently working with). You have a scanline, that depending on distance, ranges in width from 4 pixels to 160 pixels. Now, on Jaguar, I already did the benchmarking, and it doesn't matter whether you render 2x2 = 4 pixels or 128x128 = 16,384. It still takes the exact same amount of time, meaning the HW is literally going through each and every pixel, in brute-force. I suspect the same will be true for Suzy, based on your sizing example ? Quote Link to comment Share on other sites More sharing options...
Cyprian Posted July 10, 2019 Share Posted July 10, 2019 There is another explanation: 'dma' channels can have different memory slots access, e.g source uses 'even' and destination ' odd' - as it is done in amiga blitter. Therefore, when you use only destination channel, 'even' cycles are free, and there no is time difference between 'copy' and 'clear.' 1 Quote Link to comment Share on other sites More sharing options...
VladR Posted July 10, 2019 Share Posted July 10, 2019 Thanks. It would be very interesting for me to read more on the HW implementation of Blitters. But not the diagrams, those pictures are useless. A detailed description of the processing of the inner and outer loops during blitting, describing all the stages of the HW pipeline, including timing. I'm just annoyed there's no performance advantage to doing least amount of processing. I wouldn't expect anything significantly parallelized in 1989 for an 8-bit HW. Quote Link to comment Share on other sites More sharing options...
Fadest Posted July 10, 2019 Share Posted July 10, 2019 13 minutes ago, Cyprian_K said: There is another explanation: 'dma' channels can have different memory slots access, e.g source uses 'even' and destination ' odd' - as it is done in amiga blitter. As the Atari Lynx has been created by the same guys than the original Amiga, they probably reused some of their best ideas. Quote Link to comment Share on other sites More sharing options...
Cyprian Posted July 10, 2019 Share Posted July 10, 2019 3 hours ago, Fadest said: As the Atari Lynx has been created by the same guys than the original Amiga, they probably reused some of their best ideas. yep, but I would not use a word 'best' in that case Quote Link to comment Share on other sites More sharing options...
42bs Posted July 10, 2019 Author Share Posted July 10, 2019 1 minute ago, Cyprian_K said: yep, but I would not use a word 'best' in that case Don't judge a 30 year old machine by today's possibilities. 1 Quote Link to comment Share on other sites More sharing options...
+karri Posted July 10, 2019 Share Posted July 10, 2019 Guys. What a pity you tell me this now. I could have written On Duty with linked sprites but now I am too far in the project for changing the design. I already used the RAM for other things and there is no way to change this any more. Quote Link to comment Share on other sites More sharing options...
drludos Posted July 10, 2019 Share Posted July 10, 2019 11 hours ago, 42bs said: Prove: No difference. I tried: Packed 2 color sprite (10x1) sized up 160x102 => 3ms Literal 16 color sprite (1x1) sized up 160x102 => 3ms Literal 16 color sprite (160x102) => 3ms Suzy, I stand corrected Woaw, thanks a lot for the numbers and your test, it's very good to know! 1 Quote Link to comment Share on other sites More sharing options...
+bhall408 Posted July 10, 2019 Share Posted July 10, 2019 Is there a way to check if an existing title is making used of chained SCBs? I had been wondering why I saw the performance of Ms. Pac-Man increase (in emulation) as you eat the dots. It just didn't make sense to me... Until... If the dots were part of a linked list of sprites, then that would make total sense -- eating the dots would remove them from the chain, making it shorter, and thus less work to do. Any way to confirm this? And if that *is* the reason, then all the more excuse to spend some time improving how Handy core handles large lists of sprites. Quote Link to comment Share on other sites More sharing options...
42bs Posted July 10, 2019 Author Share Posted July 10, 2019 30 minutes ago, bhall408 said: Is there a way to check if an existing title is making used of chained SCBs? I had been wondering why I saw the performance of Ms. Pac-Man increase (in emulation) as you eat the dots. It just didn't make sense to me... Until... If the dots were part of a linked list of sprites, then that would make total sense -- eating the dots would remove them from the chain, making it shorter, and thus less work to do. Any way to confirm this? And if that *is* the reason, then all the more excuse to spend some time improving how Handy core handles large lists of sprites. Take the source and add some debug output in susie.cpp Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.