Jump to content
  • entries
    657
  • comments
    2,692
  • views
    897,854

reboot delayed


SpiceWare

2,271 views

Last time around cd-w made a comment that led me to doing a test rewrite of one of the ball reposition kernels:

DS_GRP0         = DS1DATA
DS_GRP1         = DS2DATA
DS_HMP0_ENAM0   = DS3DATA
DS_HMP1_ENAM1   = DS4DATA
DS_HMMB_ENABL   = DS5DATA   ; HMxx for missile/ball, ENABL
DS_COLOR        = DS6DATA   ; COLUP0, COLUP1, COLUPF - only used during reposition
DS_SIZE         = DS7DATA   ; NUSIZ0, NUSIZ1, CTRLPF - only used during reposition
DS_JUMP         = DS8DATA

RealResblStrobe73:              ;   20
        lda #<DS_SIZE           ; 2 22
        sta CTRLPF              ; 3 25
        lda #<DS_GRP0           ; 2 27
        sta GRP0                ; 3 30 <- for next scanline, VDELP0 on
        lda #<DS_JUMP           ; 2 32
        sta NextKernel          ; 3 35
        sta HMCLR               ; 3 38 <- reset missile/ball HMOVE
        ldx DS_HMP0_ENAM0       ; 4 42
        stx HMP0                ; 3 45
        ldy DS_HMP1_ENAM1       ; 4 49
        sty HMP1                ; 3 52
        lda #<HMMB_ENABL        ; 2 54
        sta HMBL                ; 3 57
        sta ENABL               ; 3 60 <- for next scanline, VDELBL on
        lda #<DS_COLOR          ; 2 62
        sta COLUPF              ; 3 65
        lda #<AMPLITUDE         ; 2 67
        sta AUDV0               ; 3 70
        sta RESBL               ; 3 73
        lda #<DS_GRP1           ; 2 75
        sta.w HMOVE             ; 4  3
        sta GRP1                ; 3  6
        stx ENAM0               ; 3  9
        sty ENAM1               ; 3 12
        jmp (NextKernel)        ; 5 17
 


Basically by making 3 of the datastreams hold 2 TIA updates each (HMxx value in upper nybble, ENAxx in lower nybble) then we could do lda/sta/sta and squeeze in enough time to also update audio to support digitized samples for the Voice Alerts.

So as of Thursday around lunchtime I was planning to reboot Draconian yet again.


Then, later that night, this happened in the Bus Stuffing Demos topic:

 

On 3/30/2017 at 7:01 PM, ZackAttack said:

Could always enhance the driver to include a "fast jump" feature.

JMP FastJump would be detected and the 2 byte address is pulled from ARM memory instead of the ROM. Similar to how fast fetch works.


So like Fast Fetch where we turned this:

 

 lda #DS_SIZE       ; 4  4
 sta CTRLPF         ; 3  7
 


into this for a 2 cycle savings:

 

 lda #<DS_SIZE      ; 2  2
 sta CTRLPF         ; 3  5
 


Fast Jump would turn this:

 

    lda #<DS_JUMP      ; 2  2
    sta NextKernel     ; 3  5
    jmp (NextKernel)   ; 5 10 <- goes to jump table so we can do a single lda/sta for NextKernel instead of two for a 2 cycle savings
 
    ...
 
JumpTable:
     jmp NextKernelReal ; 3 13
 


to just this:

 

 jmp FASTJUMP       ; 3  3
 


For a 10 cycle savings. If we could update the other 2 HMxx registers (so HMM0 and HMM1 during a ball reposition kernel) then the STA HMCLR is no longer needed, which yields up another 3 cycles. With that in mind, I worked up another test rewrite which resulted in this:

 

DS_GRP0     = DS1DATA
DS_GRP1     = DS2DATA
DS_HMP0     = DS3DATA
DS_HMP1     = DS4DATA
DS_MISSILE0 = DS5DATA   ; HMM0 and ENAM0
DS_MISSILE1 = DS6DATA   ; HMM1 and ENAM1
DS_BALL     = DS7DATA   ; HMBL and ENABL
DS_COLOR    = DS8DATA   ; color change for players and ball only
DS_SIZE     = DS9DATA   ; size change for all objects

RealResblStrobe73:              ;   20
        lda #<DS_SIZE           ; 2 22
        sta CTRLPF              ; 3 25
        lda #<DS_GRP0           ; 2 27
        sta GRP0                ; 3 30 <- for next scanline, VDELP0 on
        lda #<DS_BALL           ; 2 32
        sta ENABL               ; 3 35 <- for next scanline, VDELBL on
        sta HMBL                ; 3 38
        lda #<DS_HMP0           ; 2 40
        sta HMP0                ; 3 43
        lda #<DS_HMP1           ; 2 45
        sta HMP1                ; 3 48
        lda #<DS_COLOR          ; 2 50
        sta COLUPF              ; 3 53
        lda #<AMPLITUDE         ; 2 55
        sta AUDV0               ; 3 58
        ldx DS_MISSILE0         ; 4 62
        stx HMM0                ; 3 65
        ldy DS_MISSILE1         ; 4 69
        sta.w RESP0             ; 4 73
        sty HMM1                ; 2 75
        sta HMOVE               ; 3 78/2
        lda #<DS_GRP1           ; 2  4
        sta GRP1                ; 3  7
        stx ENAM0               ; 3 10
        sty ENAM1               ; 3 13
        SLEEP 4                 ; 4 17
        jmp FASTJUMP            ; 3 20
 

 

 


So we'd have digitized audio, diagonal shots, fancy station cores, etc that I was hoping to do using BUS Stuffing.

draco-station-cores.gif

This is a significant improvement, so cd-w's going to look into updating the CDF driver to see if it's possible for him to squeeze in Fast Jump. Due to timing constraints in the driver he's not sure if can be done, and he won't be able to work on it for a few week due to prior obligations, so I'm going to hold off on the reboot until we know which of the two reboot options I'll be using.

In the meantime I'll be working on the audio support in Stella for BUS and CDF.

 

 

 

Addendum:
blogentry-3056-0-63607600-1491226681_thumb.png

  • Like 3

20 Comments


Recommended Comments

Amazing! unbe-freakin-lievable!

This will also make Frantic spectacular.

 

Then we should collaborate on Astro Blaster 2600 -- it's a "talkie" where speech in-game is more important than Bosconian -- because I have permission to take Bob's 7800 movement routines (which equal the Arcade but need tweaked for 2600 screen compensation, as was done for the 7800 amounting to only one divide by 2), and the TIA audio is already there (like Scramble 7800 to 2600).

I myself could do a decent batari Basic DPC+ kernel Astro Blaster using AtariVox+, but CDF with samples and less canned kernel constraints would be so much better.

Also, like DK Arcade 2600, I can do the Graphics, Sound, Speech Samples, Layout, but game logic and C programming are out of my wheelhouse.

 

If only we could all be as productive and fast as Bob DeCrescenzo was!

  • Like 3
Link to comment

Yeah, can't wait to see what I'll be able to pull off in Frantic. While I'd love to use BUS, I have a feeling it's not going to pan out when room temperature is one of the factors of the current solution :(

 

Astro Blaster sounds good to me.

 

If only we could all be as productive and fast as Bob DeCrescenzo was!

Yeah, Bob really knocks them out! I have a feeling once we get a few different kernels written for CDF that it'll be possible to make new games much faster as development in C code is much faster than 6507 assembly. The Draconian kernel alone could be used for numerous games.

  • Like 1
Link to comment

That's so cool!!! I kind of wish that I had played Bosconian 'back in the day' now so I could appreciate this more...

 

@iesposta: Thank You! ...but "Was"? I'm not dead (yet) :P :D

I can't wait to get back to 7800 programming.

  • Like 3
Link to comment

cd-w had me make a minor change to the CDF driver to add a check for a Fast Jump. The Fast Jump itself is not implemented, so it'll return the jump address as normal. This test was a success, so it looks like we have enough time to implement Fast Jump.

However, there's only room left in the 2K driver for 6 additional instructions after the check, which are not enough. Fast Jump is such a significant improvement that we're willing to give up some space for it. The driver will most likely end up 256 bytes larger, so 2.25 KB. That'll reduce both C Code & Data by 256 bytes, as well as C Variables & Stack by 256 bytes. This is still way better than the overheard for using DPC+.

I've attached a revised cartridge layout to the end of the blog post.

Link to comment

That's so cool!!! I kind of wish that I had played Bosconian 'back in the day' now so I could appreciate this more...

 

Bosconian was one of my favorites back-in-the-day. It was at most of the arcades that I frequented, but I don't think it ever really caught on like the more popular games of the day. Maybe the name was too weird. :ponder:

Link to comment

cd-w thinks we're good to go for the Fast Jump, so my plan of action is this:

  • Implemented Fast Jump in Stella - this is already done, size of the driver is irrelevant for implementing this within Stella.
  • Implement 3 voice audio in CDF
  • Implement audio sample support in CDF
  • copy audio & sample support to BUS (they work the same, data's even in the exact same memory locations)
  • Configure Stella so CDF driver is now 2.25 K (this will shift the locations of various things by 256 bytes which is why I'll do audio then audio->BUS first). NO LONGER NEEDED
  • Reboot Draconian using Fast Jump

Do note I won't be able to get to any of this until Sunday evening at the earliest as I have a house full of company arriving Friday for the Houston Art Car Parade, and I need to finish getting the house ready!

Link to comment

cd-w burned the midnight oil last night and managed to revise CDF to such an extent that he freed up 304 bytes in the driver! He still needs to implement FastJump, but it looks like we'll be able to keep the driver at 2K :thumbsup:

 

It did change how the driver works - the CDF registers used to be this at the start of the 4K cartridge space (so AMPLITUDE is at $1020, $3020, ... $F020):

 

DS0DATA     DS 1    ; $00
DS1DATA     DS 1    ; $01
DS2DATA     DS 1    ; $02
DS3DATA     DS 1    ; $03
DS4DATA     DS 1    ; $04
DS5DATA     DS 1    ; $05
DS6DATA     DS 1    ; $06
DS7DATA     DS 1    ; $07
DS8DATA     DS 1    ; $08
DS9DATA     DS 1    ; $09
DS10DATA    DS 1    ; $0A
DS11DATA    DS 1    ; $0B
DS12DATA    DS 1    ; $0C
DS13DATA    DS 1    ; $0D
DS14DATA    DS 1    ; $0E
DS15DATA    DS 1    ; $0F
DS16DATA    DS 1    ; $10
DS17DATA    DS 1    ; $11
DS18DATA    DS 1    ; $12
DS19DATA    DS 1    ; $13
DS20DATA    DS 1    ; $14
DS21DATA    DS 1    ; $15
DS22DATA    DS 1    ; $16
DS23DATA    DS 1    ; $17
DS24DATA    DS 1    ; $18
DS25DATA    DS 1    ; $19
DS26DATA    DS 1    ; $1A
DS27DATA    DS 1    ; $1B
DS28DATA    DS 1    ; $1C
DS29DATA    DS 1    ; $1D
DS30DATA    DS 1    ; $1E
DS31DATA    DS 1    ; $1F
AMPLITUDE   DS 1    ; $20
 
  ; Write Registers
SETMODE     DS 1    ; $21
CALLFN      DS 1    ; $22
RESERVED    DS 1    ; $23
DS0WRITE    DS 1    ; $24
DS1WRITE    DS 1    ; $25
DS2WRITE    DS 1    ; $26
DS3WRITE    DS 1    ; $27
DS0PTR      DS 1    ; $28
DS1PTR      DS 1    ; $29
DS2PTR      DS 1    ; $2A
DS3PTR      DS 1    ; $2B

now there's just 4 registers at the end:

DSWRITE     DS 1    ; $1FF0
DSPTR       DS 1    ; $1FF1
SETMODE     DS 1    ; $1FF2
CALLFN      DS 1    ; $1FF3

DSWRITE and DSPTR are hardcoded to use stream 0 (FastJump is likewise hardcoded to stream 31). The DS0DATA-DS31DATA and AMPLITUDE registers are now only accessible when FastFetch mode is on, so LDA #0-LDA #31 and LDA #32 respectively.

Link to comment

cd-w did it again - FastJump is now implemented in the CDF driver! Works great on the 2600, however the 7800 has issues so there's still more to be done.

Link to comment

@PacManPlus. I don't ever recall seeing Bosconian. And I pretty much have seen and remember everything from 1972 on (1972 I was 5). I remember Pong, saw B&W video games switch to color (Galaxian, Pac-Man were the 1st color I remember). Saw electro-mechanical pinball go digital, but I'm too young to remember electro-mechanical coin-op games.

 

@Spiceware. When you say that now 3-voice music is generated on the fly, I assume we still have waveforms? I just was hoping it would scale all the way up to using samples. As in a simple 32-byte waveform to larger-byte samples as waveforms. Maybe not huge like a 3-voice dog bark piano, but more waveform size to closer match Organ, Banjo, HonkeyTonk Pi-an-oh, or even different instruments at once. Bass guitar, square wave, piano?

Link to comment

By default the 3 voice music is the same as DPC+. There are two enhancements:

  • Selectable Waveform Size
  • Digital Sample Mode
By default the driver uses a shift value of 27, which results in a 32 byte waveform size. You can set that to 20-31, the resulting sizes are 2^(32-#):

// Set waveform size:
// 20 = 4096 bytes
// 21 = 2048 bytes
// 22 = 1024 bytes
// 23 = 512 bytes
// 24 = 256 bytes
// 25 = 128 bytes
// 26 = 64 bytes
// 27 = 32 bytes (DEFAULT)
// 28 = 16 bytes
// 29 = 8 bytes
// 30 = 4 bytes
// 31 = 2 bytes

If you switch to Digital Sample Mode you get a single voice that plays packed samples (two AUDV0 values per byte) directly from ROM.

Link to comment

You guys ROCK on my first and favorite gaming platform ever. I am constantly amazed at what you guys can get out of the old 2600!

Link to comment

Thanks! We definitely have a great group of people here working together to figure out ways to push the 2600 further than even I thought would be possible!

 

I'm still recuperating from a full weekend - family was visiting for the Houston Art Car Parade. I plan to resume working on Stella's support for CDF and BUS tomorrow(basically I need to finish the audio routines), and start the Draconian reboot utilizing CDF's new FastJump feature this weekend.

Link to comment

Wow! It can scale waveforms!

 

Helping with the Stay Frosty 2 Christmas music, I was amazed at how we could make a tiny 32-byte waveform could sound like so many different instruments -- apart from the sine, square, and triangle.

We didn't put too much time into custom waveforms, but you had chosen some different options I could pick, some sounded like woodwind instrument, some sounded like keyboard instruments.

The opening SF2 song (which I didn't code), has a unique sound like a finger picking stringed instrument.

 

Scaled waveforms can add the dimension of sound to a game, like spooky organ music for a scary game, or a HonkeyTonk Piano for Mappy!

Link to comment

I plan to resume working on Stella's support for CDF and BUS tomorrow(basically I need to finish the audio routines), and start the Draconian reboot utilizing CDF's new FastJump feature this weekend.

 

Sadly this didn't happen as I caught a bug, most likely from one of the kids - they went through a couple boxes of facial tissue while here last weekend.

Link to comment

I love FastJump. :) I've stuffed routines into zeropage before to have a variable jump with tight timing... but zeropage is very limited in space.

Fastjump does free a lot of cycles in the kernel. How many kernels are you using? If you have alot of SLEEP 4 you could PLA to save a byte as long as you don't care where the SP is.

Else, one of my favorite tricks is to use ZP,X addressing (with X set to a known value) to save a byte over using ZP ABSOLUTE (.w) to delay a cycle.

i.e.:

tsx ; X=$FF for example
sta RESP0-$FF,X ; store to RESP0 in 4 cycles, but takes only 2 bytes.


Where this gets powerful is doing it multiple times to eliminate NOP's. In CAA I often use this to eliminate NOP's.

;--------------------------------------- Lines 57-59
ldx #LEFT_5 ;2 @2 X=$50
lda splsh_ColBall_17-$50,X ;4 @6
sta COLUP1-$50,X ;4 @10
lda #MISSILE_4_CLKS | QUAD_SIZE ;2 @12 $27
sta NUSIZ0-$50,X ;4 @16


I realize that you are using X in your kernels, but you might be able to save 10-20 bytes by rearranging your kernels to have X loaded from TSX before using "ldx DS_MISSILE0".

Link to comment

FastJump is awesome.

 

There's 57 kernels, 55 reposition kernels (11 for each object), 1 non-reposition kernel, and the score/radar/lives/formation kernel.

 

That's just 1 of the 4 instances of SLEEP 4 in the entire game. Instead of sleeping, most of the kernels jump to another one in order to save ROM:

 

Resp0Strobe23:                  ;   20
        sta RESP0               ; 3 23
        SLEEP 3                 ; 3 26
        lda #DS_SIZE            ; 2 28
        sta NUSIZ0              ; 3 31  <- changes player size
FP028:  lda #DS_COLOR           ; 2 33
        sta COLUP0              ; 3 36
FP033:  lda #DS_HMP0            ; 2 38
        sta HMP0                ; 3 41
FP038:  lda #DS_HMP1            ; 2 43
        sta HMP1                ; 3 46        
FP043:  lda #DS_GRP0            ; 2 48
        sta GRP0                ; 3 51  <- for next scanline, VDELP0 on
FP048:  lda #DS_BALL            ; 2 53
        sta ENABL               ; 3 56  <- for next scanline, VDELBL on
FP053:  sta HMBL                ; 3 59
        lda #DS_MISSILE0        ; 2 61
FP058:  tax                     ; 2 63
        stx HMM0                ; 3 66
FP063:  lda #DS_MISSILE1        ; 2 68
        tay                     ; 2 70
        sty HMM1                ; 3 73
        lda #DS_GRP1            ; 2 75
        sta.w HMOVE             ; 4 79/3
        sta GRP1                ; 3  6  @0-21
        DIGITAL_AUDIO           ; 5 11
        stx ENAM0               ; 3 14  @0-21
        sty ENAM1               ; 3 17  @0-21
        jmp FASTJMP             ; 3 20         
        
Resp0Strobe28:                  ;   20   
        lda #DS_SIZE            ; 2 22
        sta NUSIZ0              ; 3 25  <- changes player size
        sta RESP0               ; 3 28
        jmp FP028               ; 3 31

Resp0Strobe33:                  ;   20   
        lda #DS_SIZE            ; 2 22
        sta NUSIZ0              ; 3 25  <- changes player size
        lda #DS_COLOR           ; 2 27
        sta COLUP0              ; 3 30
        sta RESP0               ; 3 33
        jmp FP033               ; 3 36
Link to comment

FastJump is awesome.

 

There's 57 kernels, 55 reposition kernels (11 for each object), 1 non-reposition kernel, and the score/radar/lives/formation kernel.

 

That's just 1 of the 4 instances of SLEEP 4 in the entire game. Instead of sleeping, most of the kernels jump to another one in order to save ROM:

That's alot of kernels for sure. :) Very impressive. With the sleep, it doesn't have to just be SLEEP 4.

 

Old:

Resp0Strobe23:                  ;   20
        sta RESP0               ; 3 23
        SLEEP 3                 ; 3 26
        lda #DS_SIZE            ; 2 28
        sta NUSIZ0              ; 3 31  <- changes player size
New (minor savings of 1 byte):

Resp0Strobe23:                  ;   20
        sta RESP0               ; 3 23
        tsx                     ; 2 25  X=$FF
        lda #DS_SIZE            ; 2 27
        sta NUSIZ0-$FF,X        ; 4 31  <- changes player size

I see some short ranges jumps in your code there. It's worth a check to see if you can set/clear the overflow flag ahead of time to do some unconditional branches with BVC/BVS. You could save some bytes there.

Link to comment

Most likely. My current plan is:

  • get the next batch of sound effects from iesposta in place
  • implement Spy Ship/Red Alert - those are the final pieces for the gameplay logic
  • create something on my web site so end-users can build and submit sectors. In September we'll have a vote or something for the most popular levels, which will end up as the Delta Quadrant
  • optimize, optimize, optimize to make more space for the final digital samples. Thomas' plans to help with this, your help would be appreciated as well.

While all that's going on I'll also be fine-tuning the difficulty ramp-up. Haven't worried to much about that yet as the Spy Ship is MIA.

Link to comment

I can help just a little bit with optimizing as I don't have too much time.

 

I am curious if you can run these kernels in ram. I know you did that for the graphics in Space Rocks, but this would be the kernels instead. If the datastreams can fetch while running in ram then concievably you could load the kernels into ram at startup and change a few registers (i.e. RESPx, COLUPx). It could save a lot of rom.

Link to comment
Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...