Nop90 Posted July 17, 2019 Share Posted July 17, 2019 1 hour ago, 42bs said: cmp #10 beq else sta $10 TRIPLE_NOP ; = dc.b $5c else: stz $10 TRIPE_NOP will skip over the "stz $10", otherwise one would write: Why should it skip it? 3 cycles means only that it ìs the time the CPU takes to handle that special NOP, it will not affect the PC differently than a standard NOP. The use of this special opcodes should be to better sincronize the timing of the code with something very fast happening at specific time periods, without the need to use timers and interrupts (you can set a timer only to 1us, and if you take count of the timer activation and of the time to enter the interrupt, its a much more time than 3 cycle) For a three cycles delay I'm using a BIT M instruction at the moment, but there is no 1 cycle opcode other than some of this special NOPs Quote Link to comment Share on other sites More sharing options...
sage Posted July 18, 2019 Share Posted July 18, 2019 if you want to add this to handy core, you would better add some option to switch between L1 and L2, but then the same should be done for other hardware. Spoiler 1 Quote Link to comment Share on other sites More sharing options...
42bs Posted July 18, 2019 Share Posted July 18, 2019 5 hours ago, Nop90 said: Why should it skip it? 3 cycles means only that it ìs the time the CPU takes to handle that special NOP, it will not affect the PC differently than a standard NOP. The use of this special opcodes should be to better sincronize the timing of the code with something very fast happening at specific time periods, without the need to use timers and interrupts (you can set a timer only to 1us, and if you take count of the timer activation and of the time to enter the interrupt, its a much more time than 3 cycle) For a three cycles delay I'm using a BIT M instruction at the moment, but there is no 1 cycle opcode other than some of this special NOPs These NOPs take as many cycles as bytes. So no benefit in using them. (At least, that's what my measurements show). So why using a illegal opcode to get 2 NOPs instead of 2 NOPs? Quote Link to comment Share on other sites More sharing options...
Nop90 Posted July 18, 2019 Share Posted July 18, 2019 @42bsyou are right about your example of code for the IF THAN ELSE, I read again the specs and the triple NOP takes 3 bytes. I missed this part the first time I read the 65C02 specs. This is really interesting to know. but the same specs also reports the behaviour of all the other unused opcodes: On the 65C02, all unused opcodes are guaranteed to have no operation, and are documented as such. They differ from the standard NOP (opcode $EA) only in size (i.e. the number of bytes) and cycle count. (On the 65816, only opcode $42 is unused. It is documented as having no operation, but is reserved for future instruction set expansion.) The following table summarizes the unused opcodes of the 65C02. The first number is the size in bytes, and the second number is the number of cycles taken. After the second number, a lower case letter may be present; when it is present it indicates a footnote. 02 03 04 07 0B 0C 0F ----- ----- ----- ----- ----- ----- ----- 00 2 2 1 1 . . 1 1 a 1 1 . . 1 1 b 10 . . 1 1 . . 1 1 a 1 1 . . 1 1 b 20 2 2 1 1 . . 1 1 a 1 1 . . 1 1 b 30 . . 1 1 . . 1 1 a 1 1 . . 1 1 b 40 2 2 1 1 2 3 1 1 a 1 1 . . 1 1 b 50 . . 1 1 2 4 1 1 a 1 1 3 8 1 1 b 60 2 2 1 1 . . 1 1 a 1 1 . . 1 1 b 70 . . 1 1 . . 1 1 a 1 1 . . 1 1 b 80 2 2 1 1 . . 1 1 c 1 1 . . 1 1 d 90 . . 1 1 . . 1 1 c 1 1 . . 1 1 d A0 . . 1 1 . . 1 1 c 1 1 . . 1 1 d B0 . . 1 1 . . 1 1 c 1 1 . . 1 1 d C0 2 2 1 1 . . 1 1 c 1 1 e . . 1 1 d D0 . . 1 1 2 4 1 1 c 1 1 f 3 4 1 1 d E0 2 2 1 1 . . 1 1 c 1 1 . . 1 1 d F0 . . 1 1 2 4 1 1 c 1 1 3 4 1 1 d a) RMB instruction on Rockwell 65C02 and WDC 65C02 b) BBR instruction on Rockwell 65C02 and WDC 65C02 c) SMB instruction on Rockwell 65C02 and WDC 65C02 d) BBS instruction on Rockwell 65C02 and WDC 65C02 e) WAI instruction on WDC 65C02 f) STP instruction on WDC 65C02 These unused opcodes may prove useful in some situations. Note, however, that any code that makes use of them is limited to the 65C02. First, many opcodes behave as one byte, one cycle NOPs. This can more useful than the standard one byte, two cycle NOP (opcode $EA). If I can find the time I'll write some code to run on real HW to test which code is working, instruction lenghts and maybe the cycles taken too. Quote Link to comment Share on other sites More sharing options...
42bs Posted July 18, 2019 Share Posted July 18, 2019 (edited) Nop90, I have to revise my first finding. Did some quick test this morning: ; iter size opcode ; 16384 1 0b..fb 5,0ms => 0,3us => 1cycle ; 16384 1 03..f3 5,0ms => 0,3us => 1cycle ; 16384 1 NOP 9,5ms => 0,58us => 2cycles ; 16384 2 $02 9,5ms => 0,6us => 2cycles ; 8192 3 dc,fc 11,0ms => 1,2us => 4cylces ; 8192 2 f4 11,0ms => 1,3us => 4cylces ; 8192 2 44 9,5ms => 1,2us => 4cycles So my findings are more or less as in your list. So with $02 one can make a conditional inc/dec: MACRO SKIP1 dc.b $02 ENDM cmp #10 beq .e inx SKIP1 .e dex SKIP1 takes only 2 cycles instead of 3 using BRA Edited July 18, 2019 by 42bs Quote Link to comment Share on other sites More sharing options...
VladR Posted July 19, 2019 Share Posted July 19, 2019 Wait, so Lynx's 6502C can't handle the standard illegal 6502 opcodes (e.g. LAX/SAX, etc.), unlike regular 6502 ? Quote Link to comment Share on other sites More sharing options...
42bs Posted July 19, 2019 Share Posted July 19, 2019 4 hours ago, VladR said: Wait, so Lynx's 6502C can't handle the standard illegal 6502 opcodes (e.g. LAX/SAX, etc.), unlike regular 6502 ? You are kidding, aren't you? Quote Link to comment Share on other sites More sharing options...
VladR Posted July 19, 2019 Share Posted July 19, 2019 No, I'm not. It's a serious question. Earlier in the thread it was mentioned that "all unused opcodes are guaranteed to have no operation on 6502C". I interpret this as all those nice and fancy illegal 6502 opcodes (like LAX,SAX, etc.) not executing on 6502C. Which would be very sad. If that's the case, how on earth did the C version get designed like this ? Now, I haven't yet reverted to illegal opcodes on 6502 yet, but it's pretty high on my list. I did, till today, assume that 6502C should handle those codes just fine (other than, perhaps, not those that would have same hexa opcode as the additional 6502C instructions). I'm sure it would be obvious if we had a comparison table of both 6502 and 6502C opcodes (including illegal ones) handy. When I was 12-13, I eventually coded straight in hexa on Atari (to save RAM as I only had tape and ran out of RAM typing actual ASM mnemonics) and knew all opcodes from top of my head. Haven't used it in 3 decades, so it's been cached out... Quote Link to comment Share on other sites More sharing options...
42bs Posted July 19, 2019 Share Posted July 19, 2019 All documents say it is a 65C02, my assumption was it is a 65SC02 (what might be true for Lynx I). See https://github.com/42Bastian/new_bll/blob/master/doc/65sc02.txt Or: https://github.com/42Bastian/6502/blob/master/doc/6502refcard.pdf 1 Quote Link to comment Share on other sites More sharing options...
Cyprian Posted July 19, 2019 Author Share Posted July 19, 2019 great documentations @42bs Quote Link to comment Share on other sites More sharing options...
jum Posted July 23, 2019 Share Posted July 23, 2019 (edited) The "Apple IIc Reference Manual Vol 2" has a section on the differences between a 6502 and a 65C02 (see "Appendix A"), that I came across today. A few instructions take 1 less cycle, and one instruction (JMP abs) takes 1 more cycle. The BIT instr affects status register bits differently, and JMP indirect can differ. Edited July 23, 2019 by jum Quote Link to comment Share on other sites More sharing options...
Cyprian Posted July 23, 2019 Author Share Posted July 23, 2019 jum, thanks for pointing that out. regarding JMP, it was buggy in 6502a/b/c and it was corrected in 65c02 1 Quote Link to comment Share on other sites More sharing options...
snes2600 Posted July 28, 2019 Share Posted July 28, 2019 (edited) Hi, didn't know that bspruck (sage) is maintaining the libretro and standalone forks. It'd be easier if everything was kept in the main fork; I made some additions to the libretro side before finding out. So I've ported libretro to the upstream line here: https://github.com/bspruck/handy-fork/pull/10 Changes: - libretro added (use makefile) - savestates can be kept in memory only and avoid hdd (libretro runahead which does it every frame) - eeprom is cleared on startup only, to avoid (libretro) reset from zero'ing out the data - cart.cpp mempcy needs size checks to avoid random crashes when it allocs memory in low private mapped areas - eeprom mAUDIN_ext is reset initialized also - added eeprom to savestates - delete[] arrays Thanks for improving this great Lynx emulator. Never knew anything about this handheld until this month really. Edited July 29, 2019 by snes2600 Quote Link to comment Share on other sites More sharing options...
snes2600 Posted July 28, 2019 Share Posted July 28, 2019 (edited) Gates of Zendocon laughing spider fix (separate PR) https://github.com/bspruck/handy-fork/pull/11 Game writes 0 values to all 4 mAUDIO_x_BKUP registers. Which I guess is a valid playback value. https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/mikie.cpp#L3463 https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/mikie.cpp#L3544 https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/mikie.cpp#L3625 https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/mikie.cpp#L3706 Edited July 29, 2019 by snes2600 Quote Link to comment Share on other sites More sharing options...
snes2600 Posted July 28, 2019 Share Posted July 28, 2019 (edited) Warbirds has a problem with superclip. Disable these two lines to test: https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/susie.cpp#L676 https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/susie.cpp#L677 I have no clue how superclip affects other games (or how to fix it) and won't submit a pr with this change. Edited July 29, 2019 by snes2600 Quote Link to comment Share on other sites More sharing options...
snes2600 Posted July 28, 2019 Share Posted July 28, 2019 (edited) European Soccer Challenge is running illegal opcodes 9651 01 1b ora ($1b,x) 9653 17 rmb1 which can trigger gError->Warning crash (nullptr). https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/c65c02.h#L1744 Safety check and it goes in-game https://github.com/bspruck/handy-fork/pull/12 Edited July 29, 2019 by snes2600 Quote Link to comment Share on other sites More sharing options...
42bs Posted July 28, 2019 Share Posted July 28, 2019 RMBx is actually working on Lynx II. So likely also on Lynx I. Handy needs to be fixed here. Also for the other "illegal" opcodes which actually are NOPs with different size and cycles. Quote Link to comment Share on other sites More sharing options...
snes2600 Posted July 28, 2019 Share Posted July 28, 2019 (edited) Roadblasters is strange I guess? It relies on lots of WAI (Mikie CPU Sleep). So it goes to sleep until an IRQ comes in, since there's no NMI. When coming out of RTI, Handy does this sometimes: https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/c65c02.h#L635 I'm not understanding why the 65c02 would go back to sleep if the IRQ just kicked WAI out of hibernation? Even if we're doing nested IRQs (which I haven't checked yet), still makes no sense to me. Removing that line does make the game behave better but not perfectly. But I figure there's a reason it's there, just like the Warbirds superclip issue? edit: Didn't realize that Mikie has a sleep feature. And WAI is not being called directly by CPU for this game. 0.73 changelog * Added code within RTI to compensate for lost sleep cycles when the CPU is woken to service an interrupt. Mednafen seems to be on older codebase before the lost sleep stuff was added. So it doesn't flicker mad crazy but still has a visual glitch. Edited July 29, 2019 by snes2600 correction: Mikie sleep Quote Link to comment Share on other sites More sharing options...
snes2600 Posted July 29, 2019 Share Posted July 29, 2019 Maybe I found an idea for Ms. Pac-Man. Handy updates every 3 frames when screen is full of sprites. 2 frames when it gets lower. This slowdown is noticeable; game feels sluggish at the maze start. Reads are 1 word penalty. https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/susie.cpp#L1211 https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/susie.cpp#L1250 Writes are 2 words penalty. https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/susie.cpp#L1194 https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/susie.cpp#L1233 But I've been thinking about this further. if(!mSPRCOLL_Collide && !mSPRSYS_NoCollide && pixel!=0x0e) { int collision=ReadCollision(hoff); if(collision>mCollision) { mCollision=collision; } // 01/05/00 V0.7 if(mSPRCOLL_Number>collision) { WriteCollision(hoff,mSPRCOLL_Number); } Hardware: 1. Read pixel from mLineCollisionAddress + hoff/2. 1 word ram penalty. 2. Write pixel back to mLineCollisionAddress + hoff/2. 1 word ram penalty. Total = 2 ram i/o cycles. Handy: 1. Read pixel from mLineCollisionAddress + hoff/2. 1 word ram penalty. 2. Write pixel back to mLineCollisionAddress + hoff/2. 2 word ram penalty. Total = 3 ram i/o cycles. By replacing those writes to a 1 word penalty, Ms. Pac-Man now updates at roughly ~2.5 frames. This is fast enough to be lot less noticeable. I'd PR this change but want another opinion. Quote Link to comment Share on other sites More sharing options...
snes2600 Posted July 29, 2019 Share Posted July 29, 2019 Possibly found the problem with Warbirds and superclip math. https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/susie.cpp#L654 https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/susie.cpp#L655 Quad math compares sprite (h,v) with screen_start (h,v). So range should be 0-255 ideally to fit within screen. The world mid-point however is always 32768 + 128'ish. And we're not doing 3D projection math anywhere. This number is never going to happen and is 1-way doomed. So convert world center to screen_start (h,v) + 1/2 (width, height). int screen_h_start=(SWORD)mHOFF.Word; int screen_h_end=(SWORD)mHOFF.Word+SCREEN_WIDTH; int screen_v_start=(SWORD)mVOFF.Word; int screen_v_end=(SWORD)mVOFF.Word+SCREEN_HEIGHT; int world_h_mid=screen_h_start+(SCREEN_WIDTH/2); int world_v_mid=screen_v_start+(SCREEN_HEIGHT/2); Which is basically (start+end)/2. Feel good enough about this to PR. https://github.com/bspruck/handy-fork/pull/13 Note that Mednafen does not use superclip math. Quote Link to comment Share on other sites More sharing options...
42bs Posted July 29, 2019 Share Posted July 29, 2019 4 hours ago, snes2600 said: Roadblasters is strange I guess? It relies on lots of WAI (Mikie CPU Sleep). So it goes to sleep until an IRQ comes in, since there's no NMI. When coming out of RTI, Handy does this sometimes: https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/c65c02.h#L635 I'm not understanding why the 65c02 would go back to sleep if the IRQ just kicked WAI out of hibernation? Even if we're doing nested IRQs (which I haven't checked yet), still makes no sense to me. Removing that line does make the game behave better but not perfectly. But I figure there's a reason it's there, just like the Warbirds superclip issue? edit: Didn't realize that Mikie has a sleep feature. And WAI is not being called directly by CPU for this game. 0.73 changelog * Added code within RTI to compensate for lost sleep cycles when the CPU is woken to service an interrupt. Mednafen seems to be on older codebase before the lost sleep stuff was added. So it doesn't flicker mad crazy but still has a visual glitch. There is no WAI/STP in Mikey. Instead the "WAI" opcode (all xb opcodes) is a single cycle NOP. Quote Link to comment Share on other sites More sharing options...
snes2600 Posted July 29, 2019 Share Posted July 29, 2019 (edited) Thanks for pointing that out. I saw this in the Handy code for WAI: https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/c65c02.h#L1381 https://github.com/bspruck/handy-fork/blob/master/handy-win32src-0.95-patched/core/c6502mak.h#L833 and thought hmm. But then realized oops, it's never called. And found out more opcodes aren't used. Then lots of headscratching about this game. edit: So much to learn about this system. Feels different than others I've worked with emulator code-wise. Edited July 29, 2019 by snes2600 Quote Link to comment Share on other sites More sharing options...
42bs Posted July 29, 2019 Share Posted July 29, 2019 (edited) I have to admit, I do not yet fully understand how the cycles are checked/calculated in Handy. But it is no accurate and if a game relies on it it will fail. For example: 16K NOPs on a real Lynx need 152*64µs (@75Hz). On Handy 115*64µs. Means, the 2 cycles on the Lynx take 0.594µs and on Handy 0.449µs, hence 25% to quick. Edited July 29, 2019 by 42bs Typo Quote Link to comment Share on other sites More sharing options...
42bs Posted July 29, 2019 Share Posted July 29, 2019 For info (posted already in the coding club): ; 75Hz | 60Hz | 50Hz ; iter opcode count us cycles | count us cycles | count us cycles | ; of 64us per opcode | of 64us per opcode | of 64us per opcode | ; --------------------------------------------------------------------------------- ; 32K xb,x3 152 0.297 1 | 148 0.289 1 | 145 0.283 1 ; 16K NOP 152 0.594 2 | 147 0.578 2 | 144 0.563 2 ; 16K x2 152 0.594 2 | | ; 16K adc imm 152 0.594 2 | | ; 8K adc zp 130 1.02 3.4 | | ; 8K adc abs 169 1.32 4.4 | 163 1.27 4.4 | ; 8K jmp 122 0.953 3.2 | | ; 8K bra 122 0.953 3.2 | | ; 8K bCC n/t 76 0.598 2.1 | | ; 8K bCC /t 122 0.953 3.2 | | ; 8K $dc,$fc 169 1.32 4.4 | | ; 4K $5c 177 2.77 2.6 | | ; 4K inc abs 130 2.03 6.8 | | ; 4K inc zp 111 1.73 5.8 | | ; n/t not taken ; /t taken Handy: "adc abs" => 152*64µs <=> 169*64µs @75Hz Quote Link to comment Share on other sites More sharing options...
snes2600 Posted July 29, 2019 Share Posted July 29, 2019 Wow. That's some missing cycles to account for. ^^ I've thought about the Warbirds superclip some more. Based on my pr: 1. We should now include the center point since it's a valid h,v coordinate. Added commit. 2. Thinking over that 0x8000 some more. It's a fudge factor to account for way-out-there sprite coords. Which only works? for 2 quads and must be sign flipped for the other 2 quads. But I'm not so sure now which is better: screen center or this extreme edge value. So edited PR with (DRAFT) title and do not merge yet. As for Ms. Pac-Man, a new thought. A pixel is a 4-bpp nybble. Can't the hardware do this then? 1. Read 2 pixels at a time. 1 i/o. 2. Internal picture math on 2 pixels. 3. Write 2 pixels back. 1 i/o. So even if no changes are done, it still writes back nybble pair anyway. 2 i/o total. That would be an improvement speedup. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.