Jump to content
IGNORED

Handy emulator - the latest version and source code


Cyprian

Recommended Posts

1 hour ago, 42bs said:

 


  cmp #10
  beq else
  sta $10
  TRIPLE_NOP ; = dc.b $5c
else:
  stz $10

TRIPE_NOP will skip over the "stz $10", otherwise one would write:

 

Why should it skip it?

 

3 cycles means only that it ìs the time the CPU takes to handle that special NOP, it will not affect the PC differently than a standard NOP.

 

The use of this special opcodes should be to better sincronize the timing of the code with something very fast happening at specific time periods, without the need to use timers and interrupts (you can set a timer only to 1us, and if you take count of the timer activation and of the time to enter the interrupt, its a much more time than 3 cycle)

 

For a three cycles delay I'm using a BIT M instruction at the moment, but there is no 1 cycle opcode other than some of this special NOPs

Link to comment
Share on other sites

5 hours ago, Nop90 said:

Why should it skip it?

 

3 cycles means only that it ìs the time the CPU takes to handle that special NOP, it will not affect the PC differently than a standard NOP.

 

The use of this special opcodes should be to better sincronize the timing of the code with something very fast happening at specific time periods, without the need to use timers and interrupts (you can set a timer only to 1us, and if you take count of the timer activation and of the time to enter the interrupt, its a much more time than 3 cycle)

 

For a three cycles delay I'm using a BIT M instruction at the moment, but there is no 1 cycle opcode other than some of this special NOPs

These NOPs take as many cycles as bytes. So no benefit in using them. (At least, that's what my measurements show).

So why using a illegal opcode to get 2 NOPs instead of 2 NOPs?

Link to comment
Share on other sites

@42bsyou are right about your example of code for the IF THAN ELSE, I read again the specs and the triple NOP takes 3 bytes. I missed this part the first time I read the 65C02 specs. 

 

This is really interesting to know.

 

but the same specs also reports the behaviour of all the other unused opcodes:

 

On the 65C02, all unused opcodes are guaranteed to have no operation, and are documented as such. They differ from the standard NOP (opcode $EA) only in size (i.e. the number of bytes) and cycle count. (On the 65816, only opcode $42 is unused. It is documented as having no operation, but is reserved for future instruction set expansion.) The following table summarizes the unused opcodes of the 65C02. The first number is the size in bytes, and the second number is the number of cycles taken. After the second number, a lower case letter may be present; when it is present it indicates a footnote.

    02     03     04     07     0B     0C     0F
    -----  -----  -----  -----  -----  -----  -----
00  2 2    1 1    . .    1 1 a  1 1    . .    1 1 b
10  . .    1 1    . .    1 1 a  1 1    . .    1 1 b
20  2 2    1 1    . .    1 1 a  1 1    . .    1 1 b
30  . .    1 1    . .    1 1 a  1 1    . .    1 1 b
40  2 2    1 1    2 3    1 1 a  1 1    . .    1 1 b
50  . .    1 1    2 4    1 1 a  1 1    3 8    1 1 b
60  2 2    1 1    . .    1 1 a  1 1    . .    1 1 b
70  . .    1 1    . .    1 1 a  1 1    . .    1 1 b
80  2 2    1 1    . .    1 1 c  1 1    . .    1 1 d
90  . .    1 1    . .    1 1 c  1 1    . .    1 1 d
A0  . .    1 1    . .    1 1 c  1 1    . .    1 1 d
B0  . .    1 1    . .    1 1 c  1 1    . .    1 1 d
C0  2 2    1 1    . .    1 1 c  1 1 e  . .    1 1 d
D0  . .    1 1    2 4    1 1 c  1 1 f  3 4    1 1 d
E0  2 2    1 1    . .    1 1 c  1 1    . .    1 1 d
F0  . .    1 1    2 4    1 1 c  1 1    3 4    1 1 d

a) RMB instruction on Rockwell 65C02 and WDC 65C02
b) BBR instruction on Rockwell 65C02 and WDC 65C02
c) SMB instruction on Rockwell 65C02 and WDC 65C02
d) BBS instruction on Rockwell 65C02 and WDC 65C02
e) WAI instruction on WDC 65C02
f) STP instruction on WDC 65C02

These unused opcodes may prove useful in some situations. Note, however, that any code that makes use of them is limited to the 65C02.

First, many opcodes behave as one byte, one cycle NOPs. This can more useful than the standard one byte, two cycle NOP (opcode $EA). 

If I can find the time I'll write some code to run on real HW to test which code is working, instruction lenghts and maybe the cycles taken too.

 

 

 

 

Link to comment
Share on other sites

Nop90, I have to revise my first finding. Did some quick test this morning:

 

    ; iter  size opcode
    ; 16384   1   0b..fb    5,0ms => 0,3us  => 1cycle
    ; 16384   1   03..f3    5,0ms => 0,3us  => 1cycle
    ; 16384   1   NOP       9,5ms => 0,58us => 2cycles
    ; 16384   2   $02       9,5ms => 0,6us  => 2cycles
    ; 8192    3   dc,fc    11,0ms => 1,2us  => 4cylces
    ; 8192    2   f4       11,0ms => 1,3us  => 4cylces
    ; 8192    2   44        9,5ms => 1,2us => 4cycles

 

So my findings are more or less as in your list. So with $02 one can make a conditional inc/dec:

 

	MACRO SKIP1
	dc.b $02
	ENDM

	cmp	#10
	beq	.e
	inx
	SKIP1
.e
	dex

SKIP1 takes only 2 cycles instead of 3 using BRA

Edited by 42bs
Link to comment
Share on other sites

No, I'm not. It's a serious question. Earlier in the thread it was mentioned that "all unused opcodes are guaranteed to have no operation on 6502C".

 

I interpret this as all those nice and fancy illegal 6502 opcodes (like LAX,SAX, etc.) not executing on 6502C. Which would be very sad.

 

If that's the case, how on earth did the C version get designed like this ?

 

 

 

Now, I haven't yet reverted to illegal opcodes on 6502 yet, but it's pretty high on my list. I did, till today, assume that 6502C should handle those codes just fine (other than, perhaps, not those that would have same hexa opcode as the additional 6502C instructions).

 

I'm sure it would be obvious if we had a comparison table of both 6502 and 6502C opcodes (including illegal ones) handy. When I was 12-13, I eventually coded straight in hexa on Atari (to save RAM as I only had tape and ran out of RAM typing actual ASM mnemonics) and knew all opcodes from top of my head. Haven't used it in 3 decades, so it's been cached out...

Link to comment
Share on other sites

The "Apple IIc Reference Manual Vol 2" has a section on the differences between a 6502 and a 65C02 (see "Appendix A"), that I came across today.

 

A few instructions take 1 less cycle, and one instruction (JMP abs) takes 1 more cycle.

 

The BIT instr affects status register bits differently, and JMP indirect can differ.

 

 

 

Edited by jum
Link to comment
Share on other sites

Hi, didn't know that bspruck (sage) is maintaining the libretro and standalone forks. It'd be easier if everything was kept in the main fork; I made some additions to the libretro side before finding out.


So I've ported libretro to the upstream line here:
https://github.com/bspruck/handy-fork/pull/10


Changes:
- libretro added (use makefile)
- savestates can be kept in memory only and avoid hdd (libretro runahead which does it every frame)
- eeprom is cleared on startup only, to avoid (libretro) reset from zero'ing out the data
- cart.cpp mempcy needs size checks to avoid random crashes when it allocs memory in low private mapped areas
- eeprom mAUDIN_ext is reset initialized also
- added eeprom to savestates
- delete[] arrays

Thanks for improving this great Lynx emulator. Never knew anything about this handheld until this month really.

Edited by snes2600
Link to comment
Share on other sites

Warbirds has a problem with superclip.


Disable these two lines to test:
https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/susie.cpp#L676
https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/susie.cpp#L677


I have no clue how superclip affects other games (or how to fix it) and won't submit a pr with this change.

Edited by snes2600
Link to comment
Share on other sites

European Soccer Challenge is running illegal opcodes

9651 01 1b ora ($1b,x)
9653 17    rmb1

 

which can trigger gError->Warning crash (nullptr).
https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/c65c02.h#L1744



Safety check and it goes in-game
https://github.com/bspruck/handy-fork/pull/12

Edited by snes2600
Link to comment
Share on other sites

Roadblasters is strange I guess? It relies on lots of WAI (Mikie CPU Sleep). So it goes to sleep until an IRQ comes in, since there's no NMI.

When coming out of RTI, Handy does this sometimes:
https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/c65c02.h#L635

I'm not understanding why the 65c02 would go back to sleep if the IRQ just kicked WAI out of hibernation? Even if we're doing nested IRQs (which I haven't checked yet), still makes no sense to me.


Removing that line does make the game behave better but not perfectly. But I figure there's a reason it's there, just like the Warbirds superclip issue?

 

 

edit:

Didn't realize that Mikie has a sleep feature. And WAI is not being called directly by CPU for this game.

 

0.73 changelog

  * Added code within RTI to compensate for lost sleep cycles when the CPU
    is woken to service an interrupt.

Mednafen seems to be on older codebase before the lost sleep stuff was added. So it doesn't flicker mad crazy but still has a visual glitch.

Edited by snes2600
correction: Mikie sleep
Link to comment
Share on other sites

Maybe I found an idea for Ms. Pac-Man. Handy updates every 3 frames when screen is full of sprites. 2 frames when it gets lower. This slowdown is noticeable; game feels sluggish at the maze start.

 

Reads are 1 word penalty.

https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/susie.cpp#L1211

https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/susie.cpp#L1250

 

Writes are 2 words penalty.

https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/susie.cpp#L1194

https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/susie.cpp#L1233

 

 

But I've been thinking about this further.

 

            if(!mSPRCOLL_Collide && !mSPRSYS_NoCollide && pixel!=0x0e) {
               int collision=ReadCollision(hoff);
               if(collision>mCollision) {
                  mCollision=collision;
               }
// 01/05/00 V0.7	if(mSPRCOLL_Number>collision)
               {
                  WriteCollision(hoff,mSPRCOLL_Number);
               }

Hardware:

1. Read pixel from mLineCollisionAddress + hoff/2. 1 word ram penalty.

2. Write pixel back to mLineCollisionAddress + hoff/2. 1 word ram penalty.

Total = 2 ram i/o cycles.

 

Handy:

1. Read pixel from mLineCollisionAddress + hoff/2. 1 word ram penalty.

2. Write pixel back to mLineCollisionAddress + hoff/2. 2 word ram penalty.

Total = 3 ram i/o cycles.

 

By replacing those writes to a 1 word penalty, Ms. Pac-Man now updates at roughly ~2.5 frames. This is fast enough to be lot less noticeable.

 

I'd PR this change but want another opinion.

Link to comment
Share on other sites

Possibly found the problem with Warbirds and superclip math.

https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/susie.cpp#L654

https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/susie.cpp#L655

 

Quad math compares sprite (h,v) with screen_start (h,v). So range should be 0-255 ideally to fit within screen.

 

The world mid-point however is always 32768 + 128'ish. And we're not doing 3D projection math anywhere. This number is never going to happen and is 1-way doomed.

 

So convert world center to screen_start (h,v) + 1/2 (width, height).
 

         int screen_h_start=(SWORD)mHOFF.Word;
         int screen_h_end=(SWORD)mHOFF.Word+SCREEN_WIDTH;
         int screen_v_start=(SWORD)mVOFF.Word;
         int screen_v_end=(SWORD)mVOFF.Word+SCREEN_HEIGHT;

         int world_h_mid=screen_h_start+(SCREEN_WIDTH/2);
         int world_v_mid=screen_v_start+(SCREEN_HEIGHT/2);

 

Which is basically (start+end)/2.

 

 

Feel good enough about this to PR.

https://github.com/bspruck/handy-fork/pull/13

 

Note that Mednafen does not use superclip math.

Link to comment
Share on other sites

4 hours ago, snes2600 said:

Roadblasters is strange I guess? It relies on lots of WAI (Mikie CPU Sleep). So it goes to sleep until an IRQ comes in, since there's no NMI.

When coming out of RTI, Handy does this sometimes:
https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/c65c02.h#L635

I'm not understanding why the 65c02 would go back to sleep if the IRQ just kicked WAI out of hibernation? Even if we're doing nested IRQs (which I haven't checked yet), still makes no sense to me.


Removing that line does make the game behave better but not perfectly. But I figure there's a reason it's there, just like the Warbirds superclip issue?

 

 

edit:

Didn't realize that Mikie has a sleep feature. And WAI is not being called directly by CPU for this game.

 

0.73 changelog


  * Added code within RTI to compensate for lost sleep cycles when the CPU
    is woken to service an interrupt.

Mednafen seems to be on older codebase before the lost sleep stuff was added. So it doesn't flicker mad crazy but still has a visual glitch.

There is no WAI/STP in Mikey. Instead the "WAI" opcode (all xb opcodes) is a single cycle NOP.

 

Link to comment
Share on other sites

Thanks for pointing that out. I saw this in the Handy code for WAI:

https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/c65c02.h#L1381

https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/c6502mak.h#L833

 

and thought hmm. But then realized oops, it's never called. And found out more opcodes aren't used. Then lots of headscratching about this game. :)

 

 

edit: So much to learn about this system. Feels different than others I've worked with emulator code-wise.

Edited by snes2600
Link to comment
Share on other sites

I have to admit, I do not yet fully understand how the cycles are checked/calculated in Handy.

But it is no accurate and if a game relies on it it will fail.

 

For example: 16K NOPs on a real Lynx need 152*64µs (@75Hz). On Handy 115*64µs.

Means, the 2 cycles on the Lynx take 0.594µs and on Handy 0.449µs, hence 25% to quick.

 

Edited by 42bs
Typo
Link to comment
Share on other sites

For info (posted already in the coding club):
 

;                    75Hz          |        60Hz           |        50Hz
; iter opcode  count   us   cycles | count    us   cycles  | count    us   cycles |
;              of 64us  per opcode | of 64us  per opcode   | of 64us  per opcode  |
; ---------------------------------------------------------------------------------
; 32K  xb,x3    152   0.297   1    |  148    0.289    1	   |  145    0.283    1
; 16K  NOP      152   0.594   2    |  147    0.578    2	   |  144    0.563    2
; 16K  x2       152   0.594   2	   |			   |
; 16K  adc imm  152   0.594   2    |			   |
;  8K  adc zp   130   1.02    3.4  |			   |
;  8K  adc abs  169   1.32    4.4  |  163    1.27     4.4  |
;  8K  jmp      122   0.953   3.2  |			   |
;  8K  bra      122   0.953   3.2  |			   |
;  8K  bCC n/t   76   0.598   2.1  |			   |
;  8K  bCC  /t  122   0.953   3.2  |			   |
;  8K  $dc,$fc  169   1.32    4.4  |			   |
;  4K  $5c      177   2.77    2.6  |			   |
;  4K  inc abs  130   2.03    6.8  |			   |
;  4K  inc zp   111   1.73    5.8  |			   |

; n/t not taken
;  /t taken

Handy: "adc abs" => 152*64µs <=> 169*64µs @75Hz

Link to comment
Share on other sites

Wow. That's some missing cycles to account for. ^^


I've thought about the Warbirds superclip some more. Based on my pr:
1. We should now include the center point since it's a valid h,v coordinate. Added commit.

2. Thinking over that 0x8000 some more. It's a fudge factor to account for way-out-there sprite coords. Which only works? for 2 quads and must be sign flipped for the other 2 quads. But I'm not so sure now which is better: screen center or this extreme edge value. So edited PR with (DRAFT) title and do not merge yet.


As for Ms. Pac-Man, a new thought. A pixel is a 4-bpp nybble. Can't the hardware do this then?
1. Read 2 pixels at a time. 1 i/o.
2. Internal picture math on 2 pixels.
3. Write 2 pixels back. 1 i/o.

So even if no changes are done, it still writes back nybble pair anyway. 2 i/o total. That would be an improvement speedup.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...