Jump to content
SpiceWare

Harmony DPC+ programming

Recommended Posts

I read somewhere around here that while ARM code is being ran, it sends NOP's to the 6507. But this would still finish in a reasonable amount of cycles correct? seeing as the ARM runs at around 70mhz.

 

The ARM can do other things, but not while it is servicing the 6507. The ARM CPU runs at 70Mhz, but the IO is much slower (around 10Mhz I think). Also, for each 6507 cycle, the ARM must do a lot of work to decode the address, fetch the data, and place it on the bus in time to be read, so the 70Mhz doesn't go as far as you might think.

 

Chris

Share this post


Link to post
Share on other sites

Check out my blog were I recently posted the beginnings of my next project, a version of Berzerk with 64K room layouts like at the Arcade. I'll be using the ARM to run the game logic. The initial build updates the maze walls, which are stored in bank 6 (though I always refer to it as Display Data since you don't get to "bank it in" like the other 4K banks).

 

While 6507 code is running, the ARM is extremely busy feeding it data, namely the ROM that's visible in the 4K window. There's barely enough time for the DPC+ support routines to run (especially the 3-voice music generation).

 

When custom ARM code is running you have a choice of feeding the 6507 NOPs, or NOPs and LDA AMPLITUDE/STA AUDV0. This is controlled by storing $FF (NOP only) or $FE into CALLFUNCTION. The $FE function lets your game play 3-voice music while running ARM code (3-Voice music must update AUDV0 on every-single-scanline), but there's a performance penalty for the ARM code for using that mode (don't recall how much, maybe 10% - batari could most likely tell us).

Share this post


Link to post
Share on other sites

When custom ARM code is running you have a choice of feeding the 6507 NOPs, or NOPs and LDA AMPLITUDE/STA AUDV0. This is controlled by storing $FF (NOP only) or $FE into CALLFUNCTION. The $FE function lets your game play 3-voice music while running ARM code (3-Voice music must update AUDV0 on every-single-scanline), but there's a performance penalty for the ARM code for using that mode (don't recall how much, maybe 10% - batari could most likely tell us).

I don't know exactly, but 10%-15% sounds reasonable. When you store $FE, the ARM sets up an interrupt service routine to interrupt your C code about every 76 6507 cycles. The interrupt service routine must wait up to two 6507 cycles for the 6507's PC to stabilize, then it must deliver 5 full cycles of code (LDA #xx/STA AUDV0) then wait for the PC to stabilize again and throw another NOP on the bus. Best case is probably a hair under 10% but worst case could be approaching 15%.

Share this post


Link to post
Share on other sites

Would it be possible to make the DataFetchers access more than just Bank 6?

Probably not. Bank 6 is stored in internal SRAM for speed and the rest is in flash which is too slow. There are copy and fill functions to copy data from the binary to the display bank if you don't have quite enough space in bank 6.

Share this post


Link to post
Share on other sites

Guess they didn't get added to the header.

 

Copy data to fetcher requires that four bytes are written to PARAMETER - the low byte of the data, the high byte of the data, the fetcher number to write the data to, and the number of bytes to copy, in that order. Then, store 1 to CALLFUNCTION to perform the copy.

 

The address of the data is relative to the beginning of the .bin file, not including DPC+.arm. What I mean by this is bank 0 starts at $0000, bank 1 starts at $1000, and so on including the display data bank (bank 6) which maintains a ROM mirror of the original values at $6000-$6FFF if you ever need it.

 

Fill data in fetcher requires three bytes are written to PARAMETER, but the first byte is written twice. Store the value to fill (twice), the fetcher number to write the data to, and the number of bytes to fill, in that order. Then, store 2 to CALLFUNCTION to perform the copy.

 

The fetchers need to be set up prior to the copy/fill. The fetchers are not incremented during the copy/fill, and will still point to the beginning of their data when the function is complete.

 

If you ever need to reset PARAMETER for whatever reason, storing 0 in CALLFUNCTION will do that. This shouldn't normally be necessary.

 

If you need code examples, let me know.

Share this post


Link to post
Share on other sites

Got it I think, but I assume the maximum that can be copied at a time is 256 bytes? and that writing to PARAMETER increments a pointer for a DWORD load?

 

Example

lda #<TargetBuffer ;Bank 6
sta DF0LOW
lda #>TargetBuffer ;Bank 6
sta DF0HI

lda #<Source2Copy
sta PARAMETER ;byte pointer increments on write?
lda #>Source2Copy
sta PARAMETER 
lda #0 ;DataFetcher #0
sta PARAMETER 
lda #$FF ;256 bytes to copy max?
sta PARAMETER 
lda #1
sta CALLFUNCTION

Share this post


Link to post
Share on other sites

No reply on this for a while, just curious if what I posted above is how it's supposed to work because it doesn't seem to.

Share this post


Link to post
Share on other sites

No reply on this for a while, just curious if what I posted above is how it's supposed to work because it doesn't seem to.

Yes, PARAMETER increments internally with every store, and our example above seems correct, but with a possible snag.

lda #<Source2Copy
sta PARAMETER ;byte pointer increments on write?
lda #>Source2Copy
sta PARAMETER

The address from which to copy is not the literal assembled address of 2600 code but the offset into the binary file itself (less the 3k DPC+.arm). Therefore, it's unlikely that your addressing will work for both 2600 code and the DPC+ copy function.

 

A workaround is to purposely set RORGs for the 6 code banks as $3000, $5000, $7000, $9000, $B000, and $D000 then use some creative manipulation of the upper address byte:

lda #<Source2Copy
sta PARAMETER ;byte pointer increments on write?
lda #((>Source2Copy) & $0f) | (((>(Source2Copy - $2000)) / 2) & $70)
sta PARAMETER

In case anyone was wondering why I didn't start at $1000, that's because this can lead to a higher incidence of hazards when assembling fastfetch kernels here - for example, a JMP $xxA9 will fail under fastfetch when assembled at $1000-$1FFF but not at $3000-$FFFF.

Edited by batari

Share this post


Link to post
Share on other sites

Stella 3.4 has been released, which includes support for running ARM code using this bankswitching scheme. Most of the people here were probably already running a pre-release that had that ability, but there are a few differences. First, some segfaults were fixed, so Stella doesn't suddenly crash anymore for no reason. Second, fatal errors in the ARM code are dealt with more cleanly. They're caught by the Stella debugger, with the error and current registers shown in a window. And you can choose to continue emulating the ROM or exit it. This is also much better than simply having Stella crash on a fatal error.

 

There's a thread created for Stella 3.4 in the Emulation forums. Please direct all bug reports there.

Share this post


Link to post
Share on other sites

I'm back to working on my DPC+ project, and I'm trying to put together a macro with this info, however it tends to copy the address of the source data rather than the source data itself. But it at least is copying something to the correct destination address, so that works.

 

           MAC COPY2RAM
.source      SET {1}
.dest        SET {2}
.amount      SET {3}
.mode        SET {4}

           LDA #<.dest
           STA DF0LOW
           LDA #>.dest
           STA DF0HI
           
           lda #<.source
           sta PARAMETER ;byte pointer increments on write
           lda #((>.source) & $0f) | (((>(.source - $2000)) / 2) & $70)
           sta PARAMETER
           lda #0        ;Using DataFetcher #0
           sta PARAMETER
           lda #.amount  ;256 bytes copy max
           sta PARAMETER
           lda .mode
           sta CALLFUNCTION
           ENDM

 

I think I'm missing something obvious in the way I'm passing the address values.

Share this post


Link to post
Share on other sites

Ok, I seem to have figured it out.

 

COPY2RAM #<source,destination,amount,mode ;Have to pass low pointer of source data

 

           MAC COPY2RAM
.source      SET {1}
.destination SET {2}
.amount      SET {3}
.mode        SET {4}

           LDA #<.destination
           STA DF0LOW
           LDA #(>.destination) & $0F
           STA DF0HI

           lda #<.source
           sta PARAMETER ;byte pointer increments on write
           lda #((>.source) & $0f) | (((>(.source)) / 2) & $70)
           sta PARAMETER
           lda #0        ;Using DataFetcher #0
           sta PARAMETER
           lda #.amount  ;256 bytes copy max
           sta PARAMETER
           lda .mode
           sta CALLFUNCTION
           ENDM

  • Like 1

Share this post


Link to post
Share on other sites

The ARM can do other things, but not while it is servicing the 6507. The ARM CPU runs at 70Mhz, but the IO is much slower (around 10Mhz I think). Also, for each 6507 cycle, the ARM must do a lot of work to decode the address, fetch the data, and place it on the bus in time to be read, so the 70Mhz doesn't go as far as you might think.

 

Chris

 

One of the big things that the Chimera Cartridge was going to have which I don't think Harmony has is a separate custom MMU of sorts that would separate the ARM from the VCS. A big part of the constant tweaking on Delicon's part was getting that separation to work right. There was fast-RAM which represented the ARM's internal ram, and then there was external static RAM that was shared between the ARM and the VCS. This was going to allow the ARM to operate in true parallel fashion rather than having to simulate the real-time memory accesses of the 2600. The Harmony approach reflects Delicon's original idea of having the ARM watch the bus all the time, and it's great that it's possible as it keeps things simple and cheap, but it incurs penalties.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...