Jump to content
JetSetIlly

Gopher2600 - Performance and CRT improvements

Recommended Posts

1 hour ago, Thomas Jentzsch said:

I think the same can be achieved without too (Phosphor blend = 80% in Stella).

image.png.1a997a55b9704256a3313584f47261a5.png

 

Or am I missing something?

 

At the end of the day I think it's just a different way of achieving the same thing. There might be a subtle difference in the methods though. I'm not sure.

 

1 hour ago, Thomas Jentzsch said:

 

BTW: What you using to merge the frames? Is that only done for screenshots?

 

Screenshots are still a work in progress but at the moment it's using the same pipeline as the display but with a concession for the previous frame, which is blurred slightly and blended with the result of the display (we can think of this as special case phosphor). This works well for some kernels but is poor for others. I have an idea how to fix that but I'll have to see how that turns out. I want to avoid having different screenshot methods for different type of kernels. And I definitely want to avoid having a set of preferences just for screenshots.

Share this post


Link to post
Share on other sites
45 minutes ago, JetSetIlly said:

I want to avoid having different screenshot methods for different type of kernels. And I definitely want to avoid having a set of preferences just for screenshots.

I completely understand. We emulate a non-static picture on a CRT, which is complicated enough. Static screenshots by emulating the human eye and brain add another, hefty level of complexity.

Share this post


Link to post
Share on other sites
10 minutes ago, Thomas Jentzsch said:

I completely understand. We emulate a non-static picture on a CRT, which is complicated enough. Static screenshots by emulating the human eye and brain add another, hefty level of complexity.

Can we please add an option for male/female viewer?  My wife still swears our curtains are green, when they're obviously blue. ;)

 

  • Haha 1

Share this post


Link to post
Share on other sites
25 minutes ago, Andrew Davie said:

My wife still swears our curtains are green, when they're obviously blue. ;)

Pictures? :) 

Share this post


Link to post
Share on other sites
27 minutes ago, Thomas Jentzsch said:

Pictures? :) 

From a previous house long ago, alas. But at the time I did take a picture and resampled down to 1 pixel and then looked at the RGB values.

Blue was significantly higher in value than green. And besides, all of the rest of the house was decorated in blue/grey. I rest my case.

 

Share this post


Link to post
Share on other sites
20 minutes ago, Andrew Davie said:

From a previous house long ago, alas. But at the time I did take a picture and resampled down to 1 pixel and then looked at the RGB values.

Blue was significantly higher in value than green. And besides, all of the rest of the house was decorated in blue/grey. I rest my case.

I hope you agreed on Cyan. :) 

Share this post


Link to post
Share on other sites
On 5/6/2021 at 1:24 PM, Thomas Jentzsch said:

I completely understand. We emulate a non-static picture on a CRT, which is complicated enough. Static screenshots by emulating the human eye and brain add another, hefty level of complexity.

 

I've been playing with this for the last couple of days and concluded that the best option in the short term is to save five candidate files for each snapshot. Each file being a combination of three consecutive frames, with the idea that at least one of the combinations will be an acceptable representation of the moving image.

 

In these examples, the first frame in the sequence is frame A, the second is frame B and the third is frame C. The combinations we save are (in each case using a phosphor appropriate for the last frame in the combination). I don't believe there's any value in saving singular frames B and C.

 

A

AB

ABC

AC

BC

 

Saving frame A captures your regular, single frame kernels. AB double frame kernels and ABC the rarer triple frame kernels.

 

Using @Andrew Davie's CDFJChess to illustrate the first three:

 

Frame A

camera_CDFJChess_20210508_202210_A.thumb.jpg.2bd858c29b93031d41ee8016a0badd15.jpg

 

Frame AB

camera_CDFJChess_20210508_202210_AB.thumb.jpg.da861a033224da5c88c2fe4dbf89b60b.jpg

 

Frame ABC

camera_CDFJChess_20210508_202211_ABC.thumb.jpg.37b616ea9f2af338ee06c08f99051023.jpg

 

 

The other combinations AC and BC try to capture frames from kernels where objects move and flicker less predictably. It can't capture all instances accurately but it provides some options at least. I've found the third level of Zookeeper to be particularly challenging to get good images.

 

For example, in this sequence, Frame AB is okay but isn't great because we're relying on the phosphor fade to see some elements:

 

camera_zookeeper_20200308_demo2_NTSC_20210508_204201_AB.thumb.jpg.ca30ac32395bd8d094176d6b056c7728.jpg

 

Frame ABC for this particular sequence is much better.

 

camera_zookeeper_20200308_demo2_NTSC_20210508_204201_ABC.thumb.jpg.ddcf70d40c11fe81ba16b83614526488.jpg

 

 

However, a snapshot from a moment later and ABC is unsatisfactory - the player character just so happens to move at this very moment creating a double image:

 

camera_zookeeper_20200308_demo2_NTSC_20210508_204202_ABC.thumb.jpg.b4154cc7483ebc80f9ab98cd8ddbfabc.jpg

 

AB in this instance is the best we can do. We lose a little in the scene at the top of the screen but we would have to live with it in this instance.

 

camera_zookeeper_20200308_demo2_NTSC_20210508_204202_AB.thumb.jpg.9f627c6be12ea5bdc5e3e016f75cb768.jpg

 

Anything else would involve either too many combinations (in fact five is perhaps too many already) or some way of interactively making screenshots, where you can pause and rewind until you find the best image. I might add such a feature but it's not a priority. Maybe version 2.0 🙂

 

Any thoughts? Are there any obvious combinations that I've missed.

  • Like 2

Share this post


Link to post
Share on other sites

Nice!  

 

Between the 5 images it should be possible to selectively copy/paste pieces to create a full image that best represents the game.  I was able to quickly make this:

 

camera_zookeeper_20200308_demo2_NTSC_20210508_204202_AB.jpg.b4baafb34ce429c13f4ec78c841155de.thumb.jpg.694f7a63df7b4a518655be8bd18022d9.jpg

 

Using the last two images:

 

27 minutes ago, JetSetIlly said:

However, a snapshot from a moment later and ABC is unsatisfactory - the player character just so happens to move at this very moment creating a double image:

 

camera_zookeeper_20200308_demo2_NTSC_20210508_204202_ABC.thumb.jpg.b4154cc7483ebc80f9ab98cd8ddbfabc.jpg

 

AB in this instance is the best we can do. We lose a little in the scene at the top of the screen but we would have to live with it in this instance.

 

camera_zookeeper_20200308_demo2_NTSC_20210508_204202_AB.thumb.jpg.9f627c6be12ea5bdc5e3e016f75cb768.jpg

 

 

 

  • Like 1

Share this post


Link to post
Share on other sites
Just now, SpiceWare said:

Nice!  

 

Between the 5 images it should be possible to selectively copy/paste pieces to create a full image that best represents the game.  I was able to quickly make this:

 

camera_zookeeper_20200308_demo2_NTSC_20210508_204202_AB.jpg.b4baafb34ce429c13f4ec78c841155de.thumb.jpg.694f7a63df7b4a518655be8bd18022d9.jpg

 

Yes. Thanks for that. Having the five separate files isn't ideal but this proves it's useful in other ways.

 

  • Like 1

Share this post


Link to post
Share on other sites
On 5/9/2021 at 4:39 AM, SpiceWare said:

Between the 5 images it should be possible to selectively copy/paste pieces to create a full image that best represents the game.

It's nice that it is possible to get an image but requiring the user to slice and dice it by hand kind of goes against the whole idea of having computers in the first place.
Whenever a human is doing work for the computer, then something is wrong.

 

A key point that I mentioned above is for the tool to watch for recurring frames.
If it sees frames like ABABABABABA (ie a 2 frame kernel) then it grabs A, grabs B, grabs another A, notices that the 3rd frame is identical to the first and then blends the first 2 frames.

 

But this doesn't capture flickering stuff like ABACABACABAC and could miss the C frame.
So we look for 2 consecutive frames.
It grabs A, grabs B, grabs A (no AB match yet), grabs C (no AB match yet), grabs A (no AB match yet), grabs B, notices that last 2 frames match the first 2 frames and then blends the 4 frames (ie every frame minus the last 2 matching).

 

If the player is moving around then it may take a few cycles before it finds a set that don't have movement.
We are hoping that the game is not having things move at 50 or 60 times a second (or at least that it does so cyclically) otherwise no player could keep up with a screen that screams by in a flash.
Fall back is that after say 10 frames it just blends all 10 and just accept some possible motion blur.

 

The above could be coded with a single parameter n.
n=1 means just look single frame identical to a previous frame. Handles AAAA, ABABABA, ABCABC, ABCDABCD type cases
n=2 means looks for 2 identical frames in sequence., Handles ABACABACABAC type cases
n=3 means look for 3 identical frames in sequence. Handles more complex cases but I can't think of examples. Might have trouble with motion blur.

Share this post


Link to post
Share on other sites

Thank-you for your post you make some good points.

 

2 hours ago, stepho said:

 

A key point that I mentioned above is for the tool to watch for recurring frames.
If it sees frames like ABABABABABA (ie a 2 frame kernel) then it grabs A, grabs B, grabs another A, notices that the 3rd frame is identical to the first and then blends the first 2 frames.

 

But this doesn't capture flickering stuff like ABACABACABAC and could miss the C frame.
So we look for 2 consecutive frames.
It grabs A, grabs B, grabs A (no AB match yet), grabs C (no AB match yet), grabs A (no AB match yet), grabs B, notices that last 2 frames match the first 2 frames and then blends the 4 frames (ie every frame minus the last 2 matching).

 

If the player is moving around then it may take a few cycles before it finds a set that don't have movement.
We are hoping that the game is not having things move at 50 or 60 times a second (or at least that it does so cyclically) otherwise no player could keep up with a screen that screams by in a flash.
Fall back is that after say 10 frames it just blends all 10 and just accept some possible motion blur.

 

This only really works if frames change uniformly. But as in the Zookeeper example, this isn't necessarily the case. In the specific instance of when the player character is falling on level 3 all the elements are moving and flickering independently. There are no duplicate frames.

 

You need to be able compare parts of the image and create a composite image. Maybe in the way @SpiceWare tried, but done automatically, or in some other way. Recognising different parts of the screen and deciding how to treat each part isn't a trivial problem I believe, but I'll continue to think about it.

 

If by motion blur you mean allowing the the phosphor trail to be seen, this again requires knowledge of how the picture is divided and allowing some areas to blend together (compose is perhaps a better word) and in other areas to allow the phosphor to be seen (blending for the still image and the phosphor are not the same thing)

 

Referring to the zookeeper image I posted, you need to compose the area in Yellow but allow the phosphor to show in the Red area (and other areas as well but for simplicity we'll concentrate on these areas).

 

1.png.0f676e38fe4988c10d4e42051e5345d6.png

 

At it's core, it seems that any solution beyond giving the users some options to work with or a tool to facilitate making decisions, would require information that the computer just doesn't have. Machine learning is a possibility but I'm not going there just yet.

 

  • Like 1

Share this post


Link to post
Share on other sites

Sorry, by motion blur I meant that for the fall back position (ie no repeated frame was found) of just grabbing the last X frames and then blending/averaging them, then anything that moved would show up in both old and new positions.
Like the red box in your last image.

 

The system would do it's best to look for repeated frame(s).
If it can't find repetitions then it just takes the last 10 frames (or whatever number you deem best).

 

Manual splicing is always an option - we just try to let the computer do as much of the work as possible.
My day job is designing automated equipment that can go for months in the field without human intervention.
So I naturally try to make it as self sufficient as possible.

 

I hadn't thought of showing phosphor fading in the final static image.
If you took multiple snapshots then there is no guaranteeing the same first frame in each snapshot - which means the fading could be different for each snapshot.
But it might be a nice option in the GIF.
Each final frame in the GIF would contain a weighted average from the other frames - with special care taken so the first few frames contain faded info from the last few frames.
I have a few thoughts on how to do that but I'll wait a bit in case I've gone an option too far.

Share this post


Link to post
Share on other sites
Posted (edited)

Cartridge hot-loading. A quick demo showing hot loading or cartridges. I'm using @SpiceWare's Collect3 example project for the demonstration

 

In this example, I want to change the appearance of the cross image in the menu. Switching to the asm file, I search for the correct code, edit the binary data and remake the binary.

 

Flipping back to the debugger, the ROM is still running and I enter the "CARTRIDGE HOTLOAD" command. The emulation immediately reflects the changed data and most importantly, continues running without having to restart. No need to start the ROM from the beginning.

 

This is potentially very useful when trying to debug something that is deep into a game. Should work with ARM code too. Any thoughts or ideas?

 

 

Edited by JetSetIlly
  • Like 5

Share this post


Link to post
Share on other sites

I've pushed what I have so far to the GitHub (for those in a position to compile it).

 

I've checked and it does work with ARM code: which is good because that's why I'm adding this feature 🙂 With the 6507 the emulation would go crazy if the binary changes underneath the program counter. But with the ARM we don't have that problem - it's far easier to schedule the hot load to take place in between ARM program execution. But for graphics editing and timing tweaks I think it might be useful even when changing 6507 code. Work in progress.

  • Like 4

Share this post


Link to post
Share on other sites
Posted (edited)

I've been reworking the ARM cycle counting to try to better account for all the variant hardware and expectations people have. In particular, the new Turbo Arcade demo had issues unless the emulator was in "immediate ARM execution" mode.

 

I've gone through all the instructions and differentiated when N and S cycles are addressing "PC" addresses and "data" addresses. I'd already entered the cycle profile for each instruction group so this wasn't as much hard work as I first expected.

 

For simplicity, I've assumed that all "data" read/writes are done in SRAM. This is not as big an assumption as if first appears because only PUSH, POP, LDMIA and STMIA ever use cycles in this way and I'm fairly confident those instructions rarely (if ever) read/write to flash.

 

The other assumption is the MAM caching is essentially perfect and when enabled Flash memory is never touched.

 

N and S cycles are stretched according to the speed of the memory being addressed. In previous versions I used a flat value of 2 (which was a reasonable first estimation) for N cycles only and in all instances. This led to a reasonable average result but the new version should be more accurate in more situations.

 

I've also added some more ARM options to the preferences window. This can be summoned in playmode as well as the debugger. The full list of ARM options is now:

 

* Immediate ARM execution - thumb program returns immediately and consumes no 6507 time

* Default MAM Enable for Thumb Programs - assume the Harmony driver is enabling the MAM

* Allow MAM Enable from Thumb - allow the enabling of MAM from within the thumb program. From what I understand, some editions of the Harmony do not allow this. I've added this option in case there are versions or variants which do allow it.

 

The Timings sliders:

* ARM Clock - the basic speed of the ARM

* Flash Access Time and SRAM Access Time - speed in nanoseconds. The slower the memory the more stretching for N and S cycles.

 

I'm not sure if my default speed values are correct. But these are the values that seem to hit the sweet spot for the collection of ARM ROMs I have available.

 

I plan to do some more work on this this week. Checking for accuracy and adding some instrumentation to the debugger.

 

Here's a short video showing the effect of changing memory speed on the Gorf Arcade title screen. Apologies for my poor screen-roll emulation - that's next on the TODO list.

 

Source on Github.

 

Edited by JetSetIlly
  • Like 3

Share this post


Link to post
Share on other sites
Posted (edited)

I've packaged up recent changes as v0.11. https://github.com/JetSetIlly/Gopher2600/releases/tag/v0.11

 

Main features in this release are better CRT shaders and the improved ARM timings. The Turbo Arcade demo will also work with this version now that I've added support for CDFJ+

 

I was hoping the new Go compiler would be ready to use but it's not due for a few more weeks yet. When compiled with the development version of the compiler however, there is an approx 8% performance increase in Gopher2600. Not massive but still significant. This version has been compiled with 1.16.4

 

Edited by JetSetIlly
  • Like 4

Share this post


Link to post
Share on other sites

Wow, that "bended scanlines" TV mode is really nice.

I tried the latest release with a CDFJ rom, but noticed that the performance is not on par with playing on a real '2600. Could the new Go compiler help with this?

  • Like 1

Share this post


Link to post
Share on other sites
4 minutes ago, Dionoid said:

Wow, that "bended scanlines" TV mode is really nice.

Cheers 🙂

4 minutes ago, Dionoid said:

I tried the latest release with a CDFJ rom, but noticed that the performance is not on par with playing on a real '2600. Could the new Go compiler help with this?

 

We'll have to see when 1.17 is released but from what I've seen there will be a difference.

 

But speed generally, is a problem for this emulator when compared to Stella. It's partly down to the differences between C++ and Go but a lot of it is down to my emulation method which is probably more fussy than it needs to be.

 

These are the top performance hogs in the emulator, running the Gorf Arcade demo. As you can see the ARM emulation is a very small percentage of overall cost. It's the way I'm doing the TIA emulation which is causing the most expense.

 

image.thumb.png.f65f96c822c10c1cc2313dd518dff09a.png

 

I can get a more-or-less solid 60fps on my development machine (a 2012 i3) which was my goal when starting this.

 

You can check for performance with the following:

 

gopher2600 performance -display -fpscap=false romfile.bin

 

and

 

gopher2600 performance -fpscap=false romfile.bin

 

Any difference between the -display and non-display versions tells us the basic overhead of the screen rendering, which is cut out entirely unless the -display flag is used. By my measurements there's quite a lot to gain but I'm no expert on graphics programming so I can't see how to improve it at the moment.

 

If you're getting around 60fps normally, using the -fpscap=false option can give you a better idea of performance. Limiting the frame rate to the TV specification introduces its own set of problems so removing it from the measurement can be good.

 

I think a good next step for me would be to run and profile the program on a different machine (with a different OS). I've only ever seen it run on this machine and I think a different rig might highlight differences I've not considered.

Share this post


Link to post
Share on other sites

Here are my figures for the display/non-display example above...

38.80 fps (194 frames in 5.00 seconds) 64.7%
46.40 fps (232 frames in 5.00 seconds) 77.3%

I'm running on an early 2013 vintage MacBook Pro.

 

As to the new display rendering/mode it doesn't play well at all with my particular game, alas.

I get distracting banding down the screen.

 

Share this post


Link to post
Share on other sites
4 minutes ago, Andrew Davie said:

Here are my figures for the display/non-display example above...

38.80 fps (194 frames in 5.00 seconds) 64.7%
46.40 fps (232 frames in 5.00 seconds) 77.3%

I'm running on an early 2013 vintage MacBook Pro.

 

Ouch. That is slow. I'll have another look at performance for next version.

Share this post


Link to post
Share on other sites
Posted (edited)
28 minutes ago, Andrew Davie said:

 

As to the new display rendering/mode it doesn't play well at all with my particular game, alas.

I get distracting banding down the screen.

 

 

I've messaged you but in case your talking about the scanline effect being too heavy you can turn it down or turn it off through the CRT Preferences window, which you can open with F10. Uncheck or alter the effect strength to your taste. Pixel Perfect renders with no effects at all.

 

image.png.0e46d37763610a814a1df626f5b01eb4.png

 

 

Edited by JetSetIlly
  • Like 1

Share this post


Link to post
Share on other sites
Posted (edited)
On 6/5/2021 at 7:22 PM, JetSetIlly said:

If you're getting around 60fps normally, using the -fpscap=false option can give you a better idea of performance. Limiting the frame rate to the TV specification introduces its own set of problems so removing it from the measurement can be good.

The performance option shows me 60fps, but my game is still having hiccups and feels like it's running at 70% of the normal speed.

Still, I think this is a very promising '2600 emulator and debugger!

Edited by Dionoid

Share this post


Link to post
Share on other sites
On 6/5/2021 at 10:36 AM, Andrew Davie said:

I'm running on an early 2013 vintage MacBook Pro.

 

How about we all chip in and buy Andrew a new computer? :)

  • Like 1

Share this post


Link to post
Share on other sites
13 minutes ago, Dionoid said:

The performance option shows me 60fps, but my game is still having hiccups and feels like it's running at 70% of the normal speed.

Still, I think this is a very promising '2600 emulator and debugger!

 

F7 while you're playing will show a live fps counter.

Share this post


Link to post
Share on other sites
18 minutes ago, CPUWIZ said:

 

How about we all chip in and buy Andrew a new computer? :)

I love my MBP! Nothing wrong with a vintage computer.

  • Like 3

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...