Jump to content

Photo

SID Emulation Re-Revisited: Atari Sid IV

softsynth sid emulation

72 replies to this topic

#1 ivop OFFLINE  

ivop

    Moonsweeper

  • 435 posts
  • Location:The Netherlands

Posted Fri Jun 26, 2015 1:58 PM

Hi all,

 

A little over three years ago I released Atari Sid III. A few days ago, just when I wanted to get some sleep, I got an idea about how to improve its player routine. I got out of bed, started coding and here's the result :)

 

13kB of tables have been compressed to circa 512 bytes, including the decompression and noise generation routines. This improves load times tremendously.

 

The time spent in the timer IRQ handler has been reduced from 98 cycles down to 84 cycles (per scan line).

 

Added multiple song support; you can switch songs by pressing one of the three console keys

 

Instead of using three Pokey channels, it now uses just one. That means that the per channel dynamic range has decreased slightly from the previous version, but instead it sounds a little more balanced and it saves some precious cycles :)

 

Currently I waste 1248 cycles by visualizing the current waveform, but that's just to differentiate the "play" screen from version 3.

 

Because now only one Pokey channel/timer is used, the other three are free for some Pokey fun! And because there's a lot more CPU time left, one could have a 3 channel Pokey tune combined with a 3 channel Sid tune. Ninja/Goattracker + RMT :-)

 

Attached you'll find the full source code and a zip with a few sample songs (Cybernoid, Cybernoid II, Commando, Metal Warrior 2, Nintendo Metal).

 

There's still room for improvement though. The noise sounds a bit metalic at times. This could be reduced by refilling (parts of) the noise tables every frame, but this is not implemented yet as it would also possibly eliminate the ability to combine Pokey channels with a softsynth SID emulation, in which case you have Pokey do the drums.

 

As for emulating the emulator, Altirra should work (cannot test as my machine is way too slow), atari800 only works with a patch I recently posted to its mailinglist, implementing Read-Modify-Write instructions for Pokey registers.

 

Anyway, it sounds best on real hardware of course ;-)

 

The source is still in my weird shasm65 format, as I based this on my previous code, but it should be fairly readable :grin:

 

Regards,

Ivo

 

Attached Files



#2 Rybags ONLINE  

Rybags

    Quadrunner

  • 15,164 posts
  • Location:Australia

Posted Fri Jun 26, 2015 7:03 PM

Nice, I'm going on memory but I think the earlier 3-voice version had somewhat better sound quality.

 

Do you reckon having freed up cycles it'd be possible to now have an active display?  Even though it might mean something like a narrow OS mode 3, plus you'd probably need multiple versions of the playback loop to cater for the varying DMA loads.



#3 emkay OFFLINE  

emkay

    Quadrunner

  • 8,750 posts
  • What's up?
  • Location:Holy Grail ;)

Posted Fri Jun 26, 2015 9:14 PM

Actually, I'm not getting, why "just put a value to a register / 3 Pokey channels" takes more CPU cycles, than a software mixing of 3 channels and calculating the resulting value before writing it to one register...


If there is much CPU time left, and POKEY channels free...
How about using approx. "50%" of CPU time, and put the player together with the SIO-loader routines?

Edited by emkay, Fri Jun 26, 2015 9:15 PM.


#4 Rybags ONLINE  

Rybags

    Quadrunner

  • 15,164 posts
  • Location:Australia

Posted Sat Jun 27, 2015 1:31 AM

I like the integrate with SIO idea... in fact it might actually work but likely in 19.2 k speed only with a poll-driven loader.



#5 Philsan OFFLINE  

Philsan

    River Patroller

  • 3,384 posts
  • New Orleans Saints Super Bowl XLIV Champions
  • Location:Switzerland

Posted Sat Jun 27, 2015 1:50 AM

You can compare the two versions.

If 2015 version leaves more cpu time I think it's acceptable.

 

Attached File  SID - Commando (2012) (Ivo van Poorten).xex   25.81KB   116 downloads

Attached File  SID - Commando (2015) (Ivo van Poorten).xex   7.2KB   141 downloads



#6 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • 10,338 posts
  • Location:Baden-Württemberg, Germany

Posted Sat Jun 27, 2015 3:09 AM

SID plus sio loader.... Yeah.... Gimme asap! :) finally we would equalise at least little bit to c64 ;)

#7 Rybags ONLINE  

Rybags

    Quadrunner

  • 15,164 posts
  • Location:Australia

Posted Sat Jun 27, 2015 3:21 AM

I reckon it could be done - having the playback bumped by a scanline for every incoming byte wouldn't be too bad.



#8 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • 10,338 posts
  • Location:Baden-Württemberg, Germany

Posted Sat Jun 27, 2015 3:34 AM

Now dumb question... Is the 19200 somehow repeated to the scanline frequency? (Maybe some missing info to me but seems to when looking at possible digital playback and 19200 baud)

#9 MARIO130XE OFFLINE  

MARIO130XE

    Chopper Commander

  • 164 posts
  • Location:Germany

Posted Sat Jun 27, 2015 3:35 AM

wow, WOOOAAAHHH awesome!!!!



#10 Rybags ONLINE  

Rybags

    Quadrunner

  • 15,164 posts
  • Location:Australia

Posted Sat Jun 27, 2015 3:42 AM

The 19.2 k isn't related to scanline frequency - I was only toying with that idea as it's default SIO rate (actually it's a bit less?) and IIRC the SID emulation is oriented towards one sample every 2 scanlines (?)

 

Every chance higher rates might be possible - probably a case of sacrificing fidelity with sound in doing so.



#11 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • 10,338 posts
  • Location:Baden-Württemberg, Germany

Posted Sat Jun 27, 2015 4:57 AM

thought of converting the source to MADS but it seems more difficult than thought... ;) strange assembler format :D



#12 Mclaneinc OFFLINE  

Mclaneinc

    River Patroller

  • 4,968 posts
  • Location:Northolt, UK

Posted Sat Jun 27, 2015 5:25 AM

Wow, is that my unmodified XL doing that?

 

That is awesome...

 

Totally well done Ivo...

 

And thanks for the updated Commando Philisan...(edit: oops was in the Ivo file)

 

Ta muchly to all...


Edited by Mclaneinc, Sat Jun 27, 2015 5:42 AM.


#13 ivop OFFLINE  

ivop

    Moonsweeper

  • Topic Starter
  • 435 posts
  • Location:The Netherlands

Posted Sat Jun 27, 2015 7:55 AM

Some more info I probably should have put in the first post :)

 

Replay rate is 15.6 kHz, just like version 3 (version 2 was 7.8 kHz).

 

The extra cycles were saved by doing a single INC IRQEN (an RMW instruction) to clear and reset the timer 1 interrupt bit.

Also, I went back to a single channel, which indeed does slightly degrade the sound quality, but as Philsan said, imho that's acceptable if it leaves more CPU time for other things (like a Pokey player, PMG based scroller, or perhaps a SIO loader).

 

To reply to emkay why it actually saves time to add the channels instead of storing them to Pokey directly:

version 3:

lda $1234
sta audc1
lda $5678
sta audc2
lda $9abc
sta audc3

24 cycles


version 4:

lda $1234
clc
adc $5678
adc $9abc
sta audc1

18 cycles

Sadly, the clc cannot be skipped. It'll start playing a 7.8kHz beep if you omit it.

The tables in v4 are slightly adapted. Its range is now 0-5 instead of 0-7, which is why the quality is a little less. Luckily, the SID chip has three channels, which means that adding three values in the range of $10-$15 gives a result in the range of $30-$3f which is still volume-only :D

 

As for the funny assembler format, basically, the source is a Unix shell script (works with zsh, bash, ksh).

 

Thanks for the feedback,

Ivo

 



#14 emkay OFFLINE  

emkay

    Quadrunner

  • 8,750 posts
  • What's up?
  • Location:Holy Grail ;)

Posted Sat Jun 27, 2015 8:12 AM

Sadly, the clc cannot be skipped. It'll start playing a 7.8kHz beep if you omit it.
The tables in v4 are slightly adapted. Its range is now 0-5 instead of 0-7, which is why the quality is a little less. Luckily, the SID chip has three channels, which means that adding three values in the range of $10-$15 gives a result in the range of $30-$3f which is still volume-only :D


Well, sometimes doing less is more ;)
The results show , it's useful.



Are you interested in plugging the emulation into the SIO loader ?
Such stuff is exactly missing ;)

#15 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • 10,338 posts
  • Location:Baden-Württemberg, Germany

Posted Sat Jun 27, 2015 8:42 AM

I would die for. Sio Sid loader ;) even people say no need for track loaders anymore...

Ivop any chance of MADS format? Or should I try to convert myself? But
I guess each Sid needs not be run through the converter...

#16 ivop OFFLINE  

ivop

    Moonsweeper

  • Topic Starter
  • 435 posts
  • Location:The Netherlands

Posted Sat Jun 27, 2015 8:58 AM

I have been thinking of changing the source format to a more reasonable format, but have been putting it off every time because of the work involved :)  This whole project started out as a testcase for shasm65, which in itself was just a fun project to see if it could be done (i.e. an assembler as a shell script).

 

Heaven, if you want to convert it yourself, go ahead. It'll probably help if you have an editor which has syntax highlighting for (ba)sh. Suddenly it becomes a lot more readable :D I remember that Tezz wanted to do something similar. Perhaps some work has already been done in that direction?

 

I have never written any polled SIO related code, so I'm not sure if I'm the right person to try combining the two. Also, I'm working on two "new graphics mode" projects at the moment :)

 

Edit: a short "manual" on how to get a SID converted is in the sid2gumby thread here on AtariAge. Once converted, the resulting binary works with v3, v4 and sid2gumby.


Edited by ivop, Sat Jun 27, 2015 8:59 AM.


#17 Irgendwer OFFLINE  

Irgendwer

    Stargunner

  • 1,201 posts
  • Location:Germany

Posted Sat Jun 27, 2015 9:45 AM


version 4:

lda $1234
clc
adc $5678
adc $9abc
sta audc1

18 cycles

 

Thanks for the insight. Do you really use absolute, non-ZP addressing here? Could you easily change this routine to use self-modifying-code like this:

 

lda #BYTELOC1234

clc

adc #BYTELOC5678

adc #BYTELOC9ABC

sta audc1

 

to go down to 12 cycles?



#18 ivop OFFLINE  

ivop

    Moonsweeper

  • Topic Starter
  • 435 posts
  • Location:The Netherlands

Posted Sat Jun 27, 2015 10:22 AM

Here's the core of the irq routine (skipped code duplication for clarity):

    .org 0x0000 $tempzp

L irq
    sta.z $saveA        # 6 + 3 = 9

count_lsb_v1=$(($_here+1))
    lda. 0                  # 2
freq_lsb_v1=$(($_here+1))
    adc. 0                  # 2
    sta.z $count_lsb_v1     # 3
    lda.z $count_msb_v1     # 3
freq_msb_v1=$(($_here+1))
    adc. 0                  # 2
    sta.z $count_msb_v1     # 3
                            # ---> 15
### REPEAT THE ABOVE TWO TIMES FOR SECOND AND THIRD CHANNEL

count_msb_v1=$(($_here+1))
table_msb_v1=$(($_here+2))
    lda $silence            # 4
    clc                     # 2
count_msb_v2=$(($_here+1))
table_msb_v2=$(($_here+2))
    adc $silence            # 4
count_msb_v3=$(($_here+1))
table_msb_v3=$(($_here+2))
    adc $silence            # 4
    sta $AUDC1              # 4
                            # ---> 18

    inc $IRQEN              # 4

saveA=$(($_here+1))
    lda. 0              # 2
    rti                 # 6
                        # ---> 8

                        # total: 9+3*15+18+4+8 = 84

Stuff starting with a $ are labels, not hex values. Those look like 0x....  similar to the C programming language.

Mnemonics with a . (dot/period) added are immediate, with .z are zero page. The whole routine runs in zero page.

freq_msb_* and table_msb_* are modified by the sid emulation/softsynth that runs once per frame. All the _here+1 stuff is similar to *+1 in other assemblers. It's all self modifying code.

 

I don't see how I could use immediate loads as those values have to come from tables at some point. But perhaps you can think of a way to speed this up even more? :)

 

Side note: saving and restoring the accumulator could be omitted if one was to write a player routine directly for the softsynth engine, removing the need for sid register emulation, and only use the X and Y register :D

 



#19 phaeron OFFLINE  

phaeron

    River Patroller

  • 2,251 posts
  • Location:USA

Posted Sat Jun 27, 2015 2:45 PM

I was able to squeeze 4-channels into an IRQ-based player at 15.7KHz once by only updating phase for one channel at a time in round-robin fashion, i.e. 1 -> 2 -> 3 -> 4 -> 1. In the other three phases, the MSB of the phase was projected by a multiple of the MSB of the increment. It gives up to 3/256 phase error for 3/4 samples, but that's not too audible with 4-bit samples. This is one of the IRQ routines:

.proc irq1
    sta asave       ;3

    ldy phase1hi    ;3
    lda wavtab,y    ;4+1
voltab1 = *-1
    sta audc1       ;4

    ldy phase2hi    ;3
    lda wavtab,y    ;4+1
voltab2 = *-1
    sta audc2       ;4
    
    ldy phase3hi    ;3
    lda wavtab,y    ;4+1
voltab3 = *-1
    sta audc3       ;4
    
    ldy phase4hi    ;3
    lda wavtab,y    ;4+1
voltab4 = *-1
    sta audc4       ;4

    asl irqen       ;6
    lda #0          ;2
phase1lo = *-1
    adc #0          ;2
freq1lo = *-1
    sta phase1lo    ;3
    lda #0          ;2
phase1hi = *-1
    adc #0          ;2
freq1hi = *-1
    sta phase1hi    ;3
    mva #irq2 $fffe ;6
    lda #0          ;2
asave = *-1
    rti             ;6
.endp

The main downside is that it requires a lot of zero page and twice as much storage for the samples, since they need two pages instead of one per volume level. Cost including interrupt overhead for 4 channels is 88-92 cycles. Going down to 3 channels would reduce to 77-80 cycles, and accumulating to just AUDC1 instead of AUDC1-3 would bring it down further to 69-71. Note that in this player the main routine was constrained not to use the Y register, so adding save/restore for that would cost 5 cycles.

 

 



#20 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • 10,338 posts
  • Location:Baden-Württemberg, Germany

Posted Sun Jun 28, 2015 12:39 AM

Ivop... basicly what is the way to go to get a own composed SID into A8 then?



#21 snicklin OFFLINE  

snicklin

    River Patroller

  • 2,066 posts
  • Location:Australia

Posted Sun Jun 28, 2015 12:51 AM

Ta muchly to all...

 

:) Just remember that this is an international website!


Edited by snicklin, Sun Jun 28, 2015 12:52 AM.


#22 emkay OFFLINE  

emkay

    Quadrunner

  • 8,750 posts
  • What's up?
  • Location:Holy Grail ;)

Posted Sun Jun 28, 2015 1:27 AM

Ivop... basicly what is the way to go to get a own composed SID into A8 then?


Use a PC Tracker ... Goat Tracker f.E. then import the tune to the player format?
It's somehow the missing "digi MOD Tracker" for the A8 then ;)



Interesting to check if the replay could use 15.6kHz during SIO . 7.8KHz will work for sure with DMA on. When the DMA is off, SIO gets even more time. As SIO is using POKEY, granting the Timer handling , it shouldn't interfere...

#23 pirx OFFLINE  

pirx

    Moonsweeper

  • 371 posts
  • Location:Poland

Posted Sun Jun 28, 2015 2:09 AM

The source is still in my weird shasm65 format, as I based this on my previous code, but it should be fairly readable :grin:

 

Oh man, this is fokking brilliant - it is almost like an assembler that assembles assemblies. You Prince of assemblages! Me bows in awe.



#24 Mclaneinc OFFLINE  

Mclaneinc

    River Patroller

  • 4,968 posts
  • Location:Northolt, UK

Posted Sun Jun 28, 2015 5:39 AM

 

:) Just remember that this is an international website!

 

Yes indeed :)

 

Changed to

 

Thank you all :)



#25 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • 10,338 posts
  • Location:Baden-Württemberg, Germany

Posted Sun Jun 28, 2015 6:18 AM

still not get the workaround to get a Sid file converted.




0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users