cd-w Posted May 4, 2011 Share Posted May 4, 2011 I read somewhere around here that while ARM code is being ran, it sends NOP's to the 6507. But this would still finish in a reasonable amount of cycles correct? seeing as the ARM runs at around 70mhz. The ARM can do other things, but not while it is servicing the 6507. The ARM CPU runs at 70Mhz, but the IO is much slower (around 10Mhz I think). Also, for each 6507 cycle, the ARM must do a lot of work to decode the address, fetch the data, and place it on the bus in time to be read, so the 70Mhz doesn't go as far as you might think. Chris Quote Link to comment Share on other sites More sharing options...
+SpiceWare Posted May 4, 2011 Author Share Posted May 4, 2011 Check out my blog were I recently posted the beginnings of my next project, a version of Berzerk with 64K room layouts like at the Arcade. I'll be using the ARM to run the game logic. The initial build updates the maze walls, which are stored in bank 6 (though I always refer to it as Display Data since you don't get to "bank it in" like the other 4K banks). While 6507 code is running, the ARM is extremely busy feeding it data, namely the ROM that's visible in the 4K window. There's barely enough time for the DPC+ support routines to run (especially the 3-voice music generation). When custom ARM code is running you have a choice of feeding the 6507 NOPs, or NOPs and LDA AMPLITUDE/STA AUDV0. This is controlled by storing $FF (NOP only) or $FE into CALLFUNCTION. The $FE function lets your game play 3-voice music while running ARM code (3-Voice music must update AUDV0 on every-single-scanline), but there's a performance penalty for the ARM code for using that mode (don't recall how much, maybe 10% - batari could most likely tell us). Quote Link to comment Share on other sites More sharing options...
+batari Posted May 5, 2011 Share Posted May 5, 2011 When custom ARM code is running you have a choice of feeding the 6507 NOPs, or NOPs and LDA AMPLITUDE/STA AUDV0. This is controlled by storing $FF (NOP only) or $FE into CALLFUNCTION. The $FE function lets your game play 3-voice music while running ARM code (3-Voice music must update AUDV0 on every-single-scanline), but there's a performance penalty for the ARM code for using that mode (don't recall how much, maybe 10% - batari could most likely tell us). I don't know exactly, but 10%-15% sounds reasonable. When you store $FE, the ARM sets up an interrupt service routine to interrupt your C code about every 76 6507 cycles. The interrupt service routine must wait up to two 6507 cycles for the 6507's PC to stabilize, then it must deliver 5 full cycles of code (LDA #xx/STA AUDV0) then wait for the PC to stabilize again and throw another NOP on the bus. Best case is probably a hair under 10% but worst case could be approaching 15%. Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted May 13, 2011 Share Posted May 13, 2011 Would it be possible to make the DataFetchers access more than just Bank 6? Quote Link to comment Share on other sites More sharing options...
+batari Posted May 14, 2011 Share Posted May 14, 2011 Would it be possible to make the DataFetchers access more than just Bank 6? Probably not. Bank 6 is stored in internal SRAM for speed and the rest is in flash which is too slow. There are copy and fill functions to copy data from the binary to the display bank if you don't have quite enough space in bank 6. Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted May 14, 2011 Share Posted May 14, 2011 I see, alright then. About the copy and fill functions, I didn't see those defined in the header. Quote Link to comment Share on other sites More sharing options...
+batari Posted May 15, 2011 Share Posted May 15, 2011 Guess they didn't get added to the header. Copy data to fetcher requires that four bytes are written to PARAMETER - the low byte of the data, the high byte of the data, the fetcher number to write the data to, and the number of bytes to copy, in that order. Then, store 1 to CALLFUNCTION to perform the copy. The address of the data is relative to the beginning of the .bin file, not including DPC+.arm. What I mean by this is bank 0 starts at $0000, bank 1 starts at $1000, and so on including the display data bank (bank 6) which maintains a ROM mirror of the original values at $6000-$6FFF if you ever need it. Fill data in fetcher requires three bytes are written to PARAMETER, but the first byte is written twice. Store the value to fill (twice), the fetcher number to write the data to, and the number of bytes to fill, in that order. Then, store 2 to CALLFUNCTION to perform the copy. The fetchers need to be set up prior to the copy/fill. The fetchers are not incremented during the copy/fill, and will still point to the beginning of their data when the function is complete. If you ever need to reset PARAMETER for whatever reason, storing 0 in CALLFUNCTION will do that. This shouldn't normally be necessary. If you need code examples, let me know. Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted May 15, 2011 Share Posted May 15, 2011 Got it I think, but I assume the maximum that can be copied at a time is 256 bytes? and that writing to PARAMETER increments a pointer for a DWORD load? Example lda #<TargetBuffer ;Bank 6 sta DF0LOW lda #>TargetBuffer ;Bank 6 sta DF0HI lda #<Source2Copy sta PARAMETER ;byte pointer increments on write? lda #>Source2Copy sta PARAMETER lda #0 ;DataFetcher #0 sta PARAMETER lda #$FF ;256 bytes to copy max? sta PARAMETER lda #1 sta CALLFUNCTION Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted May 25, 2011 Share Posted May 25, 2011 No reply on this for a while, just curious if what I posted above is how it's supposed to work because it doesn't seem to. Quote Link to comment Share on other sites More sharing options...
+batari Posted May 27, 2011 Share Posted May 27, 2011 (edited) No reply on this for a while, just curious if what I posted above is how it's supposed to work because it doesn't seem to. Yes, PARAMETER increments internally with every store, and our example above seems correct, but with a possible snag. lda #<Source2Copy sta PARAMETER ;byte pointer increments on write? lda #>Source2Copy sta PARAMETER The address from which to copy is not the literal assembled address of 2600 code but the offset into the binary file itself (less the 3k DPC+.arm). Therefore, it's unlikely that your addressing will work for both 2600 code and the DPC+ copy function. A workaround is to purposely set RORGs for the 6 code banks as $3000, $5000, $7000, $9000, $B000, and $D000 then use some creative manipulation of the upper address byte: lda #<Source2Copy sta PARAMETER ;byte pointer increments on write? lda #((>Source2Copy) & $0f) | (((>(Source2Copy - $2000)) / 2) & $70) sta PARAMETER In case anyone was wondering why I didn't start at $1000, that's because this can lead to a higher incidence of hazards when assembling fastfetch kernels here - for example, a JMP $xxA9 will fail under fastfetch when assembled at $1000-$1FFF but not at $3000-$FFFF. Edited May 27, 2011 by batari Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted May 28, 2011 Share Posted May 28, 2011 Thanks for the info, I'll update my other project with the changes. Quote Link to comment Share on other sites More sharing options...
+stephena Posted May 29, 2011 Share Posted May 29, 2011 Stella 3.4 has been released, which includes support for running ARM code using this bankswitching scheme. Most of the people here were probably already running a pre-release that had that ability, but there are a few differences. First, some segfaults were fixed, so Stella doesn't suddenly crash anymore for no reason. Second, fatal errors in the ARM code are dealt with more cleanly. They're caught by the Stella debugger, with the error and current registers shown in a window. And you can choose to continue emulating the ROM or exit it. This is also much better than simply having Stella crash on a fatal error. There's a thread created for Stella 3.4 in the Emulation forums. Please direct all bug reports there. Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted July 13, 2011 Share Posted July 13, 2011 I'm back to working on my DPC+ project, and I'm trying to put together a macro with this info, however it tends to copy the address of the source data rather than the source data itself. But it at least is copying something to the correct destination address, so that works. MAC COPY2RAM .source SET {1} .dest SET {2} .amount SET {3} .mode SET {4} LDA #<.dest STA DF0LOW LDA #>.dest STA DF0HI lda #<.source sta PARAMETER ;byte pointer increments on write lda #((>.source) & $0f) | (((>(.source - $2000)) / 2) & $70) sta PARAMETER lda #0 ;Using DataFetcher #0 sta PARAMETER lda #.amount ;256 bytes copy max sta PARAMETER lda .mode sta CALLFUNCTION ENDM I think I'm missing something obvious in the way I'm passing the address values. Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted July 17, 2011 Share Posted July 17, 2011 Ok, I seem to have figured it out. COPY2RAM #<source,destination,amount,mode ;Have to pass low pointer of source data MAC COPY2RAM .source SET {1} .destination SET {2} .amount SET {3} .mode SET {4} LDA #<.destination STA DF0LOW LDA #(>.destination) & $0F STA DF0HI lda #<.source sta PARAMETER ;byte pointer increments on write lda #((>.source) & $0f) | (((>(.source)) / 2) & $70) sta PARAMETER lda #0 ;Using DataFetcher #0 sta PARAMETER lda #.amount ;256 bytes copy max sta PARAMETER lda .mode sta CALLFUNCTION ENDM 1 Quote Link to comment Share on other sites More sharing options...
mos6507 Posted March 10, 2013 Share Posted March 10, 2013 The ARM can do other things, but not while it is servicing the 6507. The ARM CPU runs at 70Mhz, but the IO is much slower (around 10Mhz I think). Also, for each 6507 cycle, the ARM must do a lot of work to decode the address, fetch the data, and place it on the bus in time to be read, so the 70Mhz doesn't go as far as you might think. Chris One of the big things that the Chimera Cartridge was going to have which I don't think Harmony has is a separate custom MMU of sorts that would separate the ARM from the VCS. A big part of the constant tweaking on Delicon's part was getting that separation to work right. There was fast-RAM which represented the ARM's internal ram, and then there was external static RAM that was shared between the ARM and the VCS. This was going to allow the ARM to operate in true parallel fashion rather than having to simulate the real-time memory accesses of the 2600. The Harmony approach reflects Delicon's original idea of having the ARM watch the bus all the time, and it's great that it's possible as it keeps things simple and cheap, but it incurs penalties. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.