Incidentally, when I got to emulating the pAPU (embedded in the NES 6502), I ran into the same problems you are having with the crackling. A big part of the problem was timing and a few issues with the standard circular buffer approach:-
1) Timing - ie. writting bytes to active parts of the buffer, and the buffer size itself. I got better cleaner sound by creating an entirely new buffer each time and pointing to a new buffer. And if the code from the CPU (or pseudo CPU etc) sets bits and bytes to change frequency, amplitude etc, then in the real harware it probably doesnt just change instantly from one wave form to another without a tiny bit of drop off etc - probably explaining this badly, a simple example of what I am trying to say is, imaging a square wave and suddenly it changes to a saw tooth. Without some leveling / merge on some of the bytes you will get a an audio spike as you get half square half saw tooth.
2) Changing from one wave form to another but with an unclean join (as above).
I was not an expert on audio emulation at all though so take what I am saying with a pinch of salt - getting audio emulated perfectly is still a tiny bit of a black art to me and I gave up before I perfected audio in my NES emulator. I paid no attention to existing methods out there and just jumped in at the deep end. Used a circular buffer, created the standard wave forms (square, triangle etc) and applied logic to change amplitude and frequency, then merged the seperate audio channels over the top of each other using bit logic - it worked but wasnt great and lowered performance a lot. I guess a better way would have been to use a seperate audio stream for each of the 4 channels the NES 6502 pAPU had and and let directx merge them.
Edited by GadgetUK, Tue Mar 27, 2012 4:29 PM.