Jump to content
IGNORED

SID Emulation Re-Revisited: Atari Sid IV


ivop

Recommended Posts

Hi all,

 

A little over three years ago I released Atari Sid III. A few days ago, just when I wanted to get some sleep, I got an idea about how to improve its player routine. I got out of bed, started coding and here's the result :)

 

13kB of tables have been compressed to circa 512 bytes, including the decompression and noise generation routines. This improves load times tremendously.

 

The time spent in the timer IRQ handler has been reduced from 98 cycles down to 84 cycles (per scan line).

 

Added multiple song support; you can switch songs by pressing one of the three console keys

 

Instead of using three Pokey channels, it now uses just one. That means that the per channel dynamic range has decreased slightly from the previous version, but instead it sounds a little more balanced and it saves some precious cycles :)

 

Currently I waste 1248 cycles by visualizing the current waveform, but that's just to differentiate the "play" screen from version 3.

 

Because now only one Pokey channel/timer is used, the other three are free for some Pokey fun! And because there's a lot more CPU time left, one could have a 3 channel Pokey tune combined with a 3 channel Sid tune. Ninja/Goattracker + RMT :-)

 

Attached you'll find the full source code and a zip with a few sample songs (Cybernoid, Cybernoid II, Commando, Metal Warrior 2, Nintendo Metal).

 

There's still room for improvement though. The noise sounds a bit metalic at times. This could be reduced by refilling (parts of) the noise tables every frame, but this is not implemented yet as it would also possibly eliminate the ability to combine Pokey channels with a softsynth SID emulation, in which case you have Pokey do the drums.

 

As for emulating the emulator, Altirra should work (cannot test as my machine is way too slow), atari800 only works with a patch I recently posted to its mailinglist, implementing Read-Modify-Write instructions for Pokey registers.

 

Anyway, it sounds best on real hardware of course ;-)

 

The source is still in my weird shasm65 format, as I based this on my previous code, but it should be fairly readable :grin:

 

Regards,

Ivo

 

atarisid4-src.zip

atarisid4-xex.zip

  • Like 15
Link to comment
Share on other sites

Nice, I'm going on memory but I think the earlier 3-voice version had somewhat better sound quality.

 

Do you reckon having freed up cycles it'd be possible to now have an active display? Even though it might mean something like a narrow OS mode 3, plus you'd probably need multiple versions of the playback loop to cater for the varying DMA loads.

Link to comment
Share on other sites

Actually, I'm not getting, why "just put a value to a register / 3 Pokey channels" takes more CPU cycles, than a software mixing of 3 channels and calculating the resulting value before writing it to one register...

 

 

If there is much CPU time left, and POKEY channels free...

How about using approx. "50%" of CPU time, and put the player together with the SIO-loader routines?

Edited by emkay
Link to comment
Share on other sites

The 19.2 k isn't related to scanline frequency - I was only toying with that idea as it's default SIO rate (actually it's a bit less?) and IIRC the SID emulation is oriented towards one sample every 2 scanlines (?)

 

Every chance higher rates might be possible - probably a case of sacrificing fidelity with sound in doing so.

Link to comment
Share on other sites

Wow, is that my unmodified XL doing that?

 

That is awesome...

 

Totally well done Ivo...

 

And thanks for the updated Commando Philisan...(edit: oops was in the Ivo file)

 

Ta muchly to all...

Edited by Mclaneinc
Link to comment
Share on other sites

Some more info I probably should have put in the first post :)

 

Replay rate is 15.6 kHz, just like version 3 (version 2 was 7.8 kHz).

 

The extra cycles were saved by doing a single INC IRQEN (an RMW instruction) to clear and reset the timer 1 interrupt bit.

Also, I went back to a single channel, which indeed does slightly degrade the sound quality, but as Philsan said, imho that's acceptable if it leaves more CPU time for other things (like a Pokey player, PMG based scroller, or perhaps a SIO loader).

 

To reply to emkay why it actually saves time to add the channels instead of storing them to Pokey directly:

version 3:

lda $1234
sta audc1
lda $5678
sta audc2
lda $9abc
sta audc3

24 cycles


version 4:

lda $1234
clc
adc $5678
adc $9abc
sta audc1

18 cycles

Sadly, the clc cannot be skipped. It'll start playing a 7.8kHz beep if you omit it.

The tables in v4 are slightly adapted. Its range is now 0-5 instead of 0-7, which is why the quality is a little less. Luckily, the SID chip has three channels, which means that adding three values in the range of $10-$15 gives a result in the range of $30-$3f which is still volume-only :D

 

As for the funny assembler format, basically, the source is a Unix shell script (works with zsh, bash, ksh).

 

Thanks for the feedback,

Ivo

 

  • Like 4
Link to comment
Share on other sites

Sadly, the clc cannot be skipped. It'll start playing a 7.8kHz beep if you omit it.

The tables in v4 are slightly adapted. Its range is now 0-5 instead of 0-7, which is why the quality is a little less. Luckily, the SID chip has three channels, which means that adding three values in the range of $10-$15 gives a result in the range of $30-$3f which is still volume-only :D

Well, sometimes doing less is more ;)

The results show , it's useful.

 

 

 

Are you interested in plugging the emulation into the SIO loader ?

Such stuff is exactly missing ;)

Link to comment
Share on other sites

I have been thinking of changing the source format to a more reasonable format, but have been putting it off every time because of the work involved :) This whole project started out as a testcase for shasm65, which in itself was just a fun project to see if it could be done (i.e. an assembler as a shell script).

 

Heaven, if you want to convert it yourself, go ahead. It'll probably help if you have an editor which has syntax highlighting for (ba)sh. Suddenly it becomes a lot more readable :D I remember that Tezz wanted to do something similar. Perhaps some work has already been done in that direction?

 

I have never written any polled SIO related code, so I'm not sure if I'm the right person to try combining the two. Also, I'm working on two "new graphics mode" projects at the moment :)

 

Edit: a short "manual" on how to get a SID converted is in the sid2gumby thread here on AtariAge. Once converted, the resulting binary works with v3, v4 and sid2gumby.

Edited by ivop
Link to comment
Share on other sites

version 4:

lda $1234
clc
adc $5678
adc $9abc
sta audc1

18 cycles

 

Thanks for the insight. Do you really use absolute, non-ZP addressing here? Could you easily change this routine to use self-modifying-code like this:

 

lda #BYTELOC1234

clc

adc #BYTELOC5678

adc #BYTELOC9ABC

sta audc1

 

to go down to 12 cycles?

Link to comment
Share on other sites

Here's the core of the irq routine (skipped code duplication for clarity):

    .org 0x0000 $tempzp

L irq
    sta.z $saveA        # 6 + 3 = 9

count_lsb_v1=$(($_here+1))
    lda. 0                  # 2
freq_lsb_v1=$(($_here+1))
    adc. 0                  # 2
    sta.z $count_lsb_v1     # 3
    lda.z $count_msb_v1     # 3
freq_msb_v1=$(($_here+1))
    adc. 0                  # 2
    sta.z $count_msb_v1     # 3
                            # ---> 15
### REPEAT THE ABOVE TWO TIMES FOR SECOND AND THIRD CHANNEL

count_msb_v1=$(($_here+1))
table_msb_v1=$(($_here+2))
    lda $silence            # 4
    clc                     # 2
count_msb_v2=$(($_here+1))
table_msb_v2=$(($_here+2))
    adc $silence            # 4
count_msb_v3=$(($_here+1))
table_msb_v3=$(($_here+2))
    adc $silence            # 4
    sta $AUDC1              # 4
                            # ---> 18

    inc $IRQEN              # 4

saveA=$(($_here+1))
    lda. 0              # 2
    rti                 # 6
                        # ---> 8

                        # total: 9+3*15+18+4+8 = 84

Stuff starting with a $ are labels, not hex values. Those look like 0x.... similar to the C programming language.

Mnemonics with a . (dot/period) added are immediate, with .z are zero page. The whole routine runs in zero page.

freq_msb_* and table_msb_* are modified by the sid emulation/softsynth that runs once per frame. All the _here+1 stuff is similar to *+1 in other assemblers. It's all self modifying code.

 

I don't see how I could use immediate loads as those values have to come from tables at some point. But perhaps you can think of a way to speed this up even more? :)

 

Side note: saving and restoring the accumulator could be omitted if one was to write a player routine directly for the softsynth engine, removing the need for sid register emulation, and only use the X and Y register :D

 

  • Like 1
Link to comment
Share on other sites

I was able to squeeze 4-channels into an IRQ-based player at 15.7KHz once by only updating phase for one channel at a time in round-robin fashion, i.e. 1 -> 2 -> 3 -> 4 -> 1. In the other three phases, the MSB of the phase was projected by a multiple of the MSB of the increment. It gives up to 3/256 phase error for 3/4 samples, but that's not too audible with 4-bit samples. This is one of the IRQ routines:

.proc irq1
    sta asave       ;3

    ldy phase1hi    ;3
    lda wavtab,y    ;4+1
voltab1 = *-1
    sta audc1       ;4

    ldy phase2hi    ;3
    lda wavtab,y    ;4+1
voltab2 = *-1
    sta audc2       ;4
    
    ldy phase3hi    ;3
    lda wavtab,y    ;4+1
voltab3 = *-1
    sta audc3       ;4
    
    ldy phase4hi    ;3
    lda wavtab,y    ;4+1
voltab4 = *-1
    sta audc4       ;4

    asl irqen       ;6
    lda #0          ;2
phase1lo = *-1
    adc #0          ;2
freq1lo = *-1
    sta phase1lo    ;3
    lda #0          ;2
phase1hi = *-1
    adc #0          ;2
freq1hi = *-1
    sta phase1hi    ;3
    mva #irq2 $fffe ;6
    lda #0          ;2
asave = *-1
    rti             ;6
.endp

The main downside is that it requires a lot of zero page and twice as much storage for the samples, since they need two pages instead of one per volume level. Cost including interrupt overhead for 4 channels is 88-92 cycles. Going down to 3 channels would reduce to 77-80 cycles, and accumulating to just AUDC1 instead of AUDC1-3 would bring it down further to 69-71. Note that in this player the main routine was constrained not to use the Y register, so adding save/restore for that would cost 5 cycles.

 

 

  • Like 2
Link to comment
Share on other sites

Ivop... basicly what is the way to go to get a own composed SID into A8 then?

Use a PC Tracker ... Goat Tracker f.E. then import the tune to the player format?

It's somehow the missing "digi MOD Tracker" for the A8 then ;)

 

 

 

Interesting to check if the replay could use 15.6kHz during SIO . 7.8KHz will work for sure with DMA on. When the DMA is off, SIO gets even more time. As SIO is using POKEY, granting the Timer handling , it shouldn't interfere...

Link to comment
Share on other sites

The source is still in my weird shasm65 format, as I based this on my previous code, but it should be fairly readable :grin:

 

Oh man, this is fokking brilliant - it is almost like an assembler that assembles assemblies. You Prince of assemblages! Me bows in awe.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...