Jump to content

Photo

Harmony DPC+ programming


89 replies to this topic

#76 cd-w OFFLINE  

cd-w

    Stargunner

  • 1,616 posts
  • Juno First!
  • Location:Glasgow, UK

Posted Wed May 4, 2011 2:11 AM

I read somewhere around here that while ARM code is being ran, it sends NOP's to the 6507. But this would still finish in a reasonable amount of cycles correct? seeing as the ARM runs at around 70mhz.


The ARM can do other things, but not while it is servicing the 6507. The ARM CPU runs at 70Mhz, but the IO is much slower (around 10Mhz I think). Also, for each 6507 cycle, the ARM must do a lot of work to decode the address, fetch the data, and place it on the bus in time to be read, so the 70Mhz doesn't go as far as you might think.

Chris

#77 SpiceWare OFFLINE  

SpiceWare

    Draconian

  • Topic Starter
  • 12,629 posts
  • Medieval Mayhem
  • Location:Planet Houston

Posted Wed May 4, 2011 9:24 AM

Check out my blog were I recently posted the beginnings of my next project, a version of Berzerk with 64K room layouts like at the Arcade. I'll be using the ARM to run the game logic. The initial build updates the maze walls, which are stored in bank 6 (though I always refer to it as Display Data since you don't get to "bank it in" like the other 4K banks).

While 6507 code is running, the ARM is extremely busy feeding it data, namely the ROM that's visible in the 4K window. There's barely enough time for the DPC+ support routines to run (especially the 3-voice music generation).

When custom ARM code is running you have a choice of feeding the 6507 NOPs, or NOPs and LDA AMPLITUDE/STA AUDV0. This is controlled by storing $FF (NOP only) or $FE into CALLFUNCTION. The $FE function lets your game play 3-voice music while running ARM code (3-Voice music must update AUDV0 on every-single-scanline), but there's a performance penalty for the ARM code for using that mode (don't recall how much, maybe 10% - batari could most likely tell us).

#78 batari OFFLINE  

batari

    )66]U('=I;B$*

  • 6,680 posts
  • begin 644 contest

Posted Wed May 4, 2011 11:32 PM

When custom ARM code is running you have a choice of feeding the 6507 NOPs, or NOPs and LDA AMPLITUDE/STA AUDV0. This is controlled by storing $FF (NOP only) or $FE into CALLFUNCTION. The $FE function lets your game play 3-voice music while running ARM code (3-Voice music must update AUDV0 on every-single-scanline), but there's a performance penalty for the ARM code for using that mode (don't recall how much, maybe 10% - batari could most likely tell us).

I don't know exactly, but 10%-15% sounds reasonable. When you store $FE, the ARM sets up an interrupt service routine to interrupt your C code about every 76 6507 cycles. The interrupt service routine must wait up to two 6507 cycles for the 6507's PC to stabilize, then it must deliver 5 full cycles of code (LDA #xx/STA AUDV0) then wait for the PC to stabilize again and throw another NOP on the bus. Best case is probably a hair under 10% but worst case could be approaching 15%.

#79 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 373 posts
  • Location:Polysorbate 60

Posted Fri May 13, 2011 5:39 PM

Would it be possible to make the DataFetchers access more than just Bank 6?

#80 batari OFFLINE  

batari

    )66]U('=I;B$*

  • 6,680 posts
  • begin 644 contest

Posted Sat May 14, 2011 1:24 PM

Would it be possible to make the DataFetchers access more than just Bank 6?

Probably not. Bank 6 is stored in internal SRAM for speed and the rest is in flash which is too slow. There are copy and fill functions to copy data from the binary to the display bank if you don't have quite enough space in bank 6.

#81 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 373 posts
  • Location:Polysorbate 60

Posted Sat May 14, 2011 4:49 PM

I see, alright then. About the copy and fill functions, I didn't see those defined in the header.

#82 batari OFFLINE  

batari

    )66]U('=I;B$*

  • 6,680 posts
  • begin 644 contest

Posted Sun May 15, 2011 1:19 AM

Guess they didn't get added to the header.

Copy data to fetcher requires that four bytes are written to PARAMETER - the low byte of the data, the high byte of the data, the fetcher number to write the data to, and the number of bytes to copy, in that order. Then, store 1 to CALLFUNCTION to perform the copy.

The address of the data is relative to the beginning of the .bin file, not including DPC+.arm. What I mean by this is bank 0 starts at $0000, bank 1 starts at $1000, and so on including the display data bank (bank 6) which maintains a ROM mirror of the original values at $6000-$6FFF if you ever need it.

Fill data in fetcher requires three bytes are written to PARAMETER, but the first byte is written twice. Store the value to fill (twice), the fetcher number to write the data to, and the number of bytes to fill, in that order. Then, store 2 to CALLFUNCTION to perform the copy.

The fetchers need to be set up prior to the copy/fill. The fetchers are not incremented during the copy/fill, and will still point to the beginning of their data when the function is complete.

If you ever need to reset PARAMETER for whatever reason, storing 0 in CALLFUNCTION will do that. This shouldn't normally be necessary.

If you need code examples, let me know.

#83 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 373 posts
  • Location:Polysorbate 60

Posted Sun May 15, 2011 1:48 AM

Got it I think, but I assume the maximum that can be copied at a time is 256 bytes? and that writing to PARAMETER increments a pointer for a DWORD load?

Example
lda #<TargetBuffer ;Bank 6
sta DF0LOW
lda #>TargetBuffer ;Bank 6
sta DF0HI

lda #<Source2Copy
sta PARAMETER ;byte pointer increments on write?
lda #>Source2Copy
sta PARAMETER 
lda #0 ;DataFetcher #0
sta PARAMETER 
lda #$FF ;256 bytes to copy max?
sta PARAMETER 
lda #1
sta CALLFUNCTION


#84 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 373 posts
  • Location:Polysorbate 60

Posted Wed May 25, 2011 4:01 PM

No reply on this for a while, just curious if what I posted above is how it's supposed to work because it doesn't seem to.

#85 batari OFFLINE  

batari

    )66]U('=I;B$*

  • 6,680 posts
  • begin 644 contest

Posted Fri May 27, 2011 2:30 PM

No reply on this for a while, just curious if what I posted above is how it's supposed to work because it doesn't seem to.

Yes, PARAMETER increments internally with every store, and our example above seems correct, but with a possible snag.
lda #<Source2Copy
sta PARAMETER ;byte pointer increments on write?
lda #>Source2Copy
sta PARAMETER
The address from which to copy is not the literal assembled address of 2600 code but the offset into the binary file itself (less the 3k DPC+.arm). Therefore, it's unlikely that your addressing will work for both 2600 code and the DPC+ copy function.

A workaround is to purposely set RORGs for the 6 code banks as $3000, $5000, $7000, $9000, $B000, and $D000 then use some creative manipulation of the upper address byte:
lda #<Source2Copy
sta PARAMETER ;byte pointer increments on write?
lda #((>Source2Copy) & $0f) | (((>(Source2Copy - $2000)) / 2) & $70)
sta PARAMETER
In case anyone was wondering why I didn't start at $1000, that's because this can lead to a higher incidence of hazards when assembling fastfetch kernels here - for example, a JMP $xxA9 will fail under fastfetch when assembled at $1000-$1FFF but not at $3000-$FFFF.

Edited by batari, Fri May 27, 2011 2:32 PM.


#86 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 373 posts
  • Location:Polysorbate 60

Posted Fri May 27, 2011 10:45 PM

Thanks for the info, I'll update my other project with the changes.

#87 stephena OFFLINE  

stephena

    River Patroller

  • 3,355 posts
  • Stella maintainer
  • Location:Newfoundland, Canada

Posted Sun May 29, 2011 6:00 AM

Stella 3.4 has been released, which includes support for running ARM code using this bankswitching scheme. Most of the people here were probably already running a pre-release that had that ability, but there are a few differences. First, some segfaults were fixed, so Stella doesn't suddenly crash anymore for no reason. Second, fatal errors in the ARM code are dealt with more cleanly. They're caught by the Stella debugger, with the error and current registers shown in a window. And you can choose to continue emulating the ROM or exit it. This is also much better than simply having Stella crash on a fatal error.

There's a thread created for Stella 3.4 in the Emulation forums. Please direct all bug reports there.

#88 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 373 posts
  • Location:Polysorbate 60

Posted Wed Jul 13, 2011 7:03 AM

I'm back to working on my DPC+ project, and I'm trying to put together a macro with this info, however it tends to copy the address of the source data rather than the source data itself. But it at least is copying something to the correct destination address, so that works.

            MAC COPY2RAM
.source      SET {1}
.dest        SET {2}
.amount      SET {3}
.mode        SET {4}

            LDA #<.dest
            STA DF0LOW
            LDA #>.dest
            STA DF0HI
            
            lda #<.source
            sta PARAMETER ;byte pointer increments on write
            lda #((>.source) & $0f) | (((>(.source - $2000)) / 2) & $70)
            sta PARAMETER
            lda #0        ;Using DataFetcher #0
            sta PARAMETER
            lda #.amount  ;256 bytes copy max
            sta PARAMETER
            lda .mode
            sta CALLFUNCTION
            ENDM

I think I'm missing something obvious in the way I'm passing the address values.

#89 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 373 posts
  • Location:Polysorbate 60

Posted Sun Jul 17, 2011 9:54 AM

Ok, I seem to have figured it out.

COPY2RAM #<source,destination,amount,mode ;Have to pass low pointer of source data

            MAC COPY2RAM
.source      SET {1}
.destination SET {2}
.amount      SET {3}
.mode        SET {4}

            LDA #<.destination
            STA DF0LOW
            LDA #(>.destination) & $0F
            STA DF0HI

            lda #<.source
            sta PARAMETER ;byte pointer increments on write
            lda #((>.source) & $0f) | (((>(.source)) / 2) & $70)
            sta PARAMETER
            lda #0        ;Using DataFetcher #0
            sta PARAMETER
            lda #.amount  ;256 bytes copy max
            sta PARAMETER
            lda .mode
            sta CALLFUNCTION
            ENDM


#90 mos6507 OFFLINE  

mos6507

    River Patroller

  • 4,916 posts

Posted Sat Mar 9, 2013 11:38 PM

The ARM can do other things, but not while it is servicing the 6507. The ARM CPU runs at 70Mhz, but the IO is much slower (around 10Mhz I think). Also, for each 6507 cycle, the ARM must do a lot of work to decode the address, fetch the data, and place it on the bus in time to be read, so the 70Mhz doesn't go as far as you might think.

Chris


One of the big things that the Chimera Cartridge was going to have which I don't think Harmony has is a separate custom MMU of sorts that would separate the ARM from the VCS. A big part of the constant tweaking on Delicon's part was getting that separation to work right. There was fast-RAM which represented the ARM's internal ram, and then there was external static RAM that was shared between the ARM and the VCS. This was going to allow the ARM to operate in true parallel fashion rather than having to simulate the real-time memory accesses of the 2600. The Harmony approach reflects Delicon's original idea of having the ARM watch the bus all the time, and it's great that it's possible as it keeps things simple and cheap, but it incurs penalties.




0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users