Jump to content
IGNORED

Emulators obvious difference with real hardware ?


obschan

Recommended Posts

Hi !

 

Just got my flashcard today, and gave it a try, really happy with it, it works great.

 

Sadly it gave me more things to worry about, I can not get my homebrew to work properly (more than 2 seconds ...) on real hardware.

I tested successfully my .lnx file on handy and mednafen.

 

Is there any obvious known difference between handy/mednafen and real hardware which I have to be careful of ?

 

Thanks in advance.

Edited by obschan
Link to comment
Share on other sites

  • 2 weeks later...

From your mail I guess you really talk about _your_ homebrew creations?

Then please can you give more details which development environment you use?

  1. plain lyxass
  2. newcc65 (BS/MD/...)
  3. cc65.org ("karri")

as they use all different startup codes.

Edited by sage
Link to comment
Share on other sites

There is one big difference. All old bootloaders cleared the complete memory before loading in the code. The mini-bootloader in cc65.org does not do it.

 

So if you are using variables from the stack without initializing them you get into different behaviour on the real hardware.

 

--

Karri

Link to comment
Share on other sites

Hoo Karri, thanks for that !

That can be an explanation, emulators boot up with a blank memory while I do not know about the real hardware.

I am going to check that, I may have forgotten 1 or 2.

 

I do not know yet if that is the reason but that's funny though, between my two Lynx I, I get different behaviors, but, always the same wrong behavior on the same Lynx.

Can the boot up memory status for the same hardware always be the same ?!

 

For my personal info, somebody knows about the memory status after power on ?

Edited by obschan
Link to comment
Share on other sites

I noticed an interesting thing that I never noticed when I first had a Lynx in 1991. Been playing Xenophobe for a few hours today (on real Lynx) and noticed that it strains the system at times, that is I can tell as a programmer now - that the game main loop timing slows down on really busy bits where there are dozens of aliens on the level. Emulators dont suffer from this as they are not constrained in the same way. Not sure if its a bottleneck with Suzie, RAM or just CPU, but interesting as strictly speaking an accurate emulator should behave the same way.

Edited by GadgetUK
Link to comment
Share on other sites

I noticed an interesting thing that I never noticed when I first had a Lynx in 1991. Been playing Xenophobe for a few hours today (on real Lynx) and noticed that it strains the system at times, that is I can tell as a programmer now - that the game main loop timing slows down on really busy bits where there are dozens of aliens on the level.

Yes thats really a surprise. But what did you expect :-)

 

Emulators dont suffer from this as they are not constrained in the same way. Not sure if its a bottleneck with Suzie, RAM or just CPU, but interesting as strictly speaking an accurate emulator should behave the same way.

 

sure, because the timing for Sprite rendering is not (well) documentend.

Anyway, a emulated delay for Math and Sprites is really the last feature I would request.

Its sooo useless unless you use it to detect on which hardware you run on ;-)

Link to comment
Share on other sites

An update on my side: together with obschan I am looking into the math delays.

The first step is to find out how long a delay actually takes:

From the Epyx documentation "Divides take 176 + 14*N ticks where N is the number of most significant zeros in the divisor."

 

I have written a little program based on code from obschan that will do a math divide with different divisors.

 

 POKEW(MATHNP, div);
 POKEW(MATHGH, b2);
 POKEW(MATHEF, b1);
 WAITSUZY;

 

The heart of the program is a piece of assembly that will loop until the math is done.

 

;
; WAITSUZY;
;
 lda	 #$0
notready:
 inc	 a
 bit	 $fc92
 bmi	 notready
 sta	 $77

 

Inside the loop it will wait for the 7th bit (MATHINPROGRESS) of FC92 to become zero again, increasing A on every iteration. Once the math is done, it will store the number of iterations in $77 (randomly picked as a zeropage address to store A).

The rest of the code will read the contents of $77.

 

The results are as follows

Significant zeros Ticks Iterations 0 176 4 1 190 5 2 204 5 3 218 5 4 232 6 5 246 6 6 260 6 7 274 7 9 302 7 10 316 8 12 344 8 14 372 9 15 386 9

 

The columns show the number of significant zeros, the calculated number of ticks according to the formula and the number of iterations that occurred.

 

Now, I was amazed that the low number of iterations through the loop. I was expecting something higher, because a single loop takes about 9 cycles. Even for 0 significant zeros a division would take 176 ticks, but the iterations are just 4. So, what gives: either a mistake on my part or ticks and cycles are not the same. I have also looked up the timings of memory access:

 

"Cycle Min Max

---------------------------------------------------

Page Mode RAM(read) 4 4

Normal RAM(r/w) 5 5

Page Mode ROM 4 4

Normal ROM 5 5

Available Hardware(r/w) 5 5

Mikey audio DPRAM(r/w) 5 20

Mikey color palette DPRAM(r/w) 5 5

Suzy Hardware(write) 5 5

Suzy Hardware(read) 9 15"

 

With these values I decomposed the various cycles and looked at the type of access that it would take and the number of ticks required. That came to about 40-45 ticks for this loop. If you do the calculations on this, then for 4 iterations it comes to 4*40 or 4*45 = 160-180 ticks (176 if you use 44). For 15 significant zeros it is 176 + 14*15 = 386 where 9 iterations times 44 means 396. All pretty close if you ask me.

 

 

Instruction

Cycles

Ticks per cycle

 

LDA #$00

2

5 Fetch opcode

5 Fetch value

notready:

INC A

2

5 Fetch opcode

1? Write to A

 

BIT $FC92

4

5 Fetch opcode

5 Fetch low order effective address byte

5 Fetch low order effective address byte

9-15 Fetch Data (Suzy hardware)

 

BMI notready

3

5 Fetch opcode

5 Fetch branch offset

1? Offset added to PC

 

Questions for all:

  • Does anyone know the exact difference between a tick and a cycle?
  • Any comments on this math of mine?
  • Source code anyone to further analyse?

Thanks in advance

Link to comment
Share on other sites

Does anyone know the exact difference between a tick and a cycle?

 

I may be stating something stupid but ticks and cycles are not used to express the difference between Mikey and Suzy's running frequencies ?

 

The lowest system tick is 62.5ns -> 16MHz, which is Suzy's frequency.

Mikey (on which your code is running) is running at 4Mhz max no ? Which gives a max. ratio of 1 Mikey cycle for 4 Suzy cycles.

Link to comment
Share on other sites

Emulators dont suffer from this as they are not constrained in the same way. Not sure if its a bottleneck with Suzie, RAM or just CPU, but interesting as strictly speaking an accurate emulator should behave the same way.

 

sure, because the timing for Sprite rendering is not (well) documentend.

Anyway, a emulated delay for Math and Sprites is really the last feature I would request.

Its sooo useless unless you use it to detect on which hardware you run on ;-)

 

I don't think accurate emulation is useless, but that's hopefully not what you meant.

Link to comment
Share on other sites

sure, because the timing for Sprite rendering is not (well) documentend.

Anyway, a emulated delay for Math and Sprites is really the last feature I would request.

Its sooo useless unless you use it to detect on which hardware you run on ;-)

I don't think accurate emulation is useless, but that's hopefully not what you meant.

I would love an accurate emulator too.

When you extensively use the maths and sprites you can quickly get non negligible speed difference ...

Link to comment
Share on other sites

[i would love an accurate emulator too.

When you extensively use the maths and sprites you can quickly get non negligible speed difference ...

Just a short update on the math emulation. I have made progress there.

 

The delay in the math calculation delay is now reasonably accurate. For a better implementation I will have to redo the entire timing, because all of a sudden the Handy timings do not make sense to me anymore.

Also, I figured out the startup values of the math registers too. In Handy these were set to 0xFFFFFFFF to fix an initialization "bug" in STUN Runner. The values I discovered also fix the bug, but are a more accurate representation of the real hardware.

 

Hopefully I will get around to doing a full update and a release of Handy later today or this week.

 

@obschan: you mentioned a broken math delay macro. I am now getting approximately the same behavior as on the real hardware. Could you post an update of the cart.lnx that has a correct implementation of the WAITSUZY?

Edited by LX.NET
Link to comment
Share on other sites

  • 5 months later...

The delay in the math calculation delay is now reasonably accurate. For a better implementation I will have to redo the entire timing, because all of a sudden the Handy timings do not make sense to me anymore.

Also, I figured out the startup values of the math registers too. In Handy these were set to 0xFFFFFFFF to fix an initialization "bug" in STUN Runner. The values I discovered also fix the bug, but are a more accurate representation of the real hardware.

 

Hopefully I will get around to doing a full update and a release of Handy later today or this week.

 

Hi LX.NET,

 

I am coming back on the subject.

Have you had the opportunity to commit your changes somewhere ?

If no could you post a patch to correct those timing issues.

That would be very helpful.

Thank you !

Link to comment
Share on other sites

Hi obschan,

 

Great to hear from you again.

I did spent my time working on the changes and am sorry about that. It is still sitting at my computer waiting for a few extra hours. Your request triggers me to spend that additional time and get it in the Handy source, as well as in my own public code repository.

Just to set expectations: the current implementation does not do exact intermediate calculations. If you look at a sample program when reading to soon, you can see that each read gives a more precise number. It is probably the algorithm used, which converges towards a precise enough number at the end. My implementation simply does wrong numbers (all 0xFF I believe, but I would have to check) and sets the number at the end of the required time.

 

Two improvements for later are the more accurate timing of the 65sc02 and the use of intermediate results. The latter will be possible if someone knows what the approximate algorithm is. I will try to find the time sometime after next week. I have a busy schedule till Wednesday evening and hardly any time before that. Keep sending me PMs to encourage/harass/annoy me. That works best to get the result.

 

Question to readers: who knows what algorithms are used for the math engine of Suzy?

Link to comment
Share on other sites

Great thank you, I know that Lynx dev. is only a hobby for everybody so we all do it when we feel so.

This patch will be great though, my 2 Lynx I are showing signs of death with all this testing :'(

 

If you look at a sample program when reading to soon, you can see that each read gives a more precise number. It is probably the algorithm used, which converges towards a precise enough number at the end. My implementation simply does wrong numbers (all 0xFF I believe, but I would have to check) and sets the number at the end of the required time.

Obvisouly wrong (FF or 0) and constant numbers are good enough for me, that helps to spot an issue. This is getting tricky when it looks "random" ...

 

If you're still into it you're emulator with an integrated simple debugger would be a terrific improvement for the dev. community ! ;)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...