Jump to content

Photo

MIPS

9900 Z80A 6502

33 replies to this topic

#26 Asmusr OFFLINE  

Asmusr

    River Patroller

  • 2,943 posts
  • Location:Denmark

Posted Wed Mar 9, 2016 10:49 PM

 This makes an average of 7.8125 cycles per byte.

 

I think we have a winner.  :) And it's not just a theoretical exercise - this code can be used when drawing vector graphics to a CPU RAM buffer before sending it to the VDP. Each frame you have to clear the buffer.



#27 Asmusr OFFLINE  

Asmusr

    River Patroller

  • 2,943 posts
  • Location:Denmark

Posted Wed Mar 9, 2016 11:10 PM

Since STST doesn't do a read before write, would clearing CPUVDP RAM using STST overrun the VDP?



#28 sometimes99er OFFLINE  

sometimes99er

    River Patroller

  • Topic Starter
  • 4,169 posts
  • Location:Denmark

Posted Thu Mar 10, 2016 12:39 AM

Since STST doesn't do a read before write, would clearing CPU RAM using STST overrun the VDP?

 

Wow, does it affect the VDP at all ? Or do you mean clearing VDP RAM using STST ?



#29 sometimes99er OFFLINE  

sometimes99er

    River Patroller

  • Topic Starter
  • 4,169 posts
  • Location:Denmark

Posted Thu Mar 10, 2016 1:01 AM

Too bad everyone is in such a hurry to add layers and get as far away from the CPU as possible these days.

 

If you mean that apparently only a few will know what's going on at lower levels, then yes.

 

Otherwise I think it's only a natural productive process to build layers (to abstract) and to build upon what others have built.

 

Even an assembler is moving away from entering hexadecimal values etc.

 

There's the danger of us ending up not knowing what the machines actually do with all the layers, and eventually the machines can plot to take control. Well, they will probably fight each other. Someday the 9900 will rule. Behind the innocent MIPS was a sleeping tiger.

 

;)



#30 Tursi OFFLINE  

Tursi

    Quadrunner

  • 5,349 posts
  • HarmlessLion
  • Location:BUR

Posted Thu Mar 10, 2016 3:52 AM

Since STST doesn't do a read before write, would clearing CPU RAM using STST overrun the VDP?


(Clearing VDP RAM) Absolutely and by a long shot. You need 8 microseconds between writes, that's 24 cycles. Our assertions that you can't overrun the VDP were based on moving data. STST is only 8 cycles, targeting the VDP would make it 12. Even CLR at 10 cycles is too fast - targeting the VDP makes it only 18.

Of course you can use it while the VDP is in blank - either vertical blank or display blanking enabled.

(edit: the fastest data move, assuming you could change a register dynamically, MOVB R1,*R2 is 14 for the opcode, 4 for the indirect, and 8 for the wait states, making 26 cycles. More commonly it'd be MOVB *R1+,*R2, which adds another 8 cycles for the indirect auto-increment and makes 34 cycles. But you could use the former for a direct-to-VDP clear, if there was a reason that was useful (or you had custom hardware, which was something I wanted to try ;) )

Edited by Tursi, Thu Mar 10, 2016 3:57 AM.


#31 sometimes99er OFFLINE  

sometimes99er

    River Patroller

  • Topic Starter
  • 4,169 posts
  • Location:Denmark

Posted Thu Mar 10, 2016 3:58 AM

(Clearing VDP RAM) Absolutely and by a long shot. You need 8 microseconds between writes, that's 24 cycles. Our assertions that you can't overrun the VDP were based on moving data. STST is only 8 cycles, targeting the VDP would make it 12. Even CLR at 10 cycles is too fast - targeting the VDP makes it only 18.

Of course you can use it while the VDP is in blank - either vertical blank or display blanking enabled.

 

Excellent. Great news. So, until proven otherwise, we still, after so many years, find new ways to improve performance.

 

:thumbsup:



#32 PeteE OFFLINE  

PeteE

    Chopper Commander

  • 170 posts
  • Location:Beaverton, OR

Posted Fri Nov 9, 2018 2:26 PM

(Clearing VDP RAM) Absolutely and by a long shot. You need 8 microseconds between writes, that's 24 cycles. Our assertions that you can't overrun the VDP were based on moving data. STST is only 8 cycles, targeting the VDP would make it 12. Even CLR at 10 cycles is too fast - targeting the VDP makes it only 18.

Of course you can use it while the VDP is in blank - either vertical blank or display blanking enabled.

(edit: the fastest data move, assuming you could change a register dynamically, MOVB R1,*R2 is 14 for the opcode, 4 for the indirect, and 8 for the wait states, making 26 cycles. More commonly it'd be MOVB *R1+,*R2, which adds another 8 cycles for the indirect auto-increment and makes 34 cycles. But you could use the former for a direct-to-VDP clear, if there was a reason that was useful (or you had custom hardware, which was something I wanted to try ;) )

 

I've found you can write bytes to the VDP at 24 cycles per byte with this method:

LWPI >8C00   ; Set the workspace pointer so that R0 is at VDP Write Data register
LI R0,>1200  ; Write >12 to VDP
LI R0,>3400  ; Write >34 to VDP
LI R0,>5600  ; Write >56 to VDP
LI R0,>7800  ; Write >78 to VDP
; etc
LWPI >8300   ; Restore the default workspace pointer
RT           ; Return

When running in 8-bit RAM or cartridge ROM, each LI takes 24 cycles.  At the expense of 4 bytes for each instruction per byte written to the VDP.  By my estimates you could load about 2KB per 60Hz frame using this method, although the instructions would take up 8KB.  If performance is critical, the tradoff might be worth it.  Might also be a good use case for taking advantage of SuperAMS memory.



#33 RXB OFFLINE  

RXB

    River Patroller

  • 3,405 posts
  • Location:Vancouver, Washington, USA

Posted Fri Nov 9, 2018 5:29 PM

Seems we need a DSR and Device that uses RAM instead of VDP for Buffer I/O of data.

 

I thought the SCSI card still set up a buffer but on board RAM is used instead?

 

So the VDP buffer is there but just to make the OS happy, nothing is really taking place other than the PAB is updated.

 

The buffer address in the PAB is just not used, but instead that address in PAB  is where to put it in RAM from the Device.


Edited by RXB, Fri Nov 9, 2018 5:31 PM.


#34 Asmusr OFFLINE  

Asmusr

    River Patroller

  • 2,943 posts
  • Location:Denmark

Posted Sat Nov 10, 2018 12:59 AM

 

I've found you can write bytes to the VDP at 24 cycles per byte with this method:

LWPI >8C00   ; Set the workspace pointer so that R0 is at VDP Write Data register
LI R0,>1200  ; Write >12 to VDP
LI R0,>3400  ; Write >34 to VDP
LI R0,>5600  ; Write >56 to VDP
LI R0,>7800  ; Write >78 to VDP
; etc
LWPI >8300   ; Restore the default workspace pointer
RT           ; Return

When running in 8-bit RAM or cartridge ROM, each LI takes 24 cycles.  At the expense of 4 bytes for each instruction per byte written to the VDP.  By my estimates you could load about 2KB per 60Hz frame using this method, although the instructions would take up 8KB.  If performance is critical, the tradoff might be worth it.  Might also be a good use case for taking advantage of SuperAMS memory.

 

Cool. It could be useful if you want to, for instance, upload sprites patterns to the VDP on the fly.

 

It reminded me of this video where the 3D frames are stored as code instead of data in order to squeeze the maximum speed out of the MSX.







Also tagged with one or more of these keywords: 9900, Z80A, 6502

0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users