Digital Sounds by Tone Channel Volume Variations

newcoleco · December 12, 2009

The concept of using volume to make digital sounds instead of just volume for tones and noise is not new. It was used many times in the past and for various vintage computers and consoles. Here, I'm gonna talk about technical informations, a few ideas and audio samples based on my experimentations with a real ColecoVision. Please note that the difference between PAL or NTSC doesn't matter because it's essentially about CPU and I/O port speeds, not about the internal clock frequency.

Side note : In 2009, as you probably know, I did work on a new devkit for SDCC, based first on Hi-Tech C, then I did work on personal projects and videos, and now I'm working on digital sounds. As you can see, even if I'm not releasing much Coleco games, I'm kinda very productive.

* WHY TALKING ABOUT DIGITAL SOUNDS? *

First of all, I know about this subject; I did use digital sounds in a Colecovision project titled Jeepers Creepers, and in a few demo programs. And Second, the major problem is not to play a digital sound but to store it; a high-quality digital sound needs a lot more space than a low-quality one.

So, in the quest of getting a good compromise between quality and quantity, I did a few experimentations and I want to share them with you. Don't expect miracle results, we are talking here about Kilobytes, not in Megabytes. But first, let's talk about the sound chip.

* VOLUME VS ATTENUATION *

You should know from now that inside the sound chip the circuit is made in a way to attenuate or mute a signal and this for each channel. So, we are not talking about volume, but about attenuation : less a sound is attenuated, louder it is. Let's say that the volume range is between 0 (mute) and 1 (full).

This is the correlation between attenatuation and volume perception based on the technical information of the sound chip Texas Instruments SN76489A(N). The first value is the attenuation value used logically in the sound chip, the second number is the volume perception I was taling about : 0 = muted, 1 = full volume.

0 -> 1.000000000000000

1 -> 0.794328234724282

2 -> 0.630957344480193

3 -> 0.501187233627272

4 -> 0.398107170553497

5 -> 0.316227766016838

6 -> 0.251188643150958

7 -> 0.199526231496888

8 -> 0.158489319246111

9 -> 0.125892541179417

10 -> 0.100000000000000

11 -> 0.079432823472428

12 -> 0.063095734448019

13 -> 0.050118723362727

14 -> 0.039810717055350

15 -> 0.000000000000000

As you can see, the resulting volume is not well distributed, and there is a big gap between attenuation 14 and 15, compared to between 13 and 14, due to a mute state instead of just really attenuated. So, we have to deal with this situation.

!WARNING! In the future, instead of 10 to 15, I'll use A to F, that represents the hexadecimal representation. And also, I'll talk about bytes, bits and nibbles where 4 bits is a nibble and 2 nibbles is 1 byte.

In Jeepers Creepers, the loud yells and laughs are digital sounds, played at a low rate quality, based on this attenuation table where all the 3 tone channels play the same "attenuation variations" at once. The exact same technic is used in Squish'em Sam and Sewer Sam. Note : we can't use the noise channel to produce digital sounds.

* GETTING MORE NUANCES *

Only 15 possible attenuations don't give much possibilities, it's like a strange 4-bit quality. How to get more nuances? Well, by combining different attenuations for each channel (tone channels only). The following is a table based on this idea, but keep in mind that it's not the only possible solution.

The following is a 46 attenuations table that reduces the gaps and gives a pretty good result. The first 3 characters is the attenation value for each channel written here in hexadecimal format; the number between 0 and 1 is the volume perception as the output ( 0 = muted, 1 = loudest) .

F F F -> 0.000000000000000

E F F -> 0.013270239018450

E E F -> 0.026540478036900

E E E -> 0.039810717055350

D E E -> 0.043246719157809

D D E -> 0.046682721260268

D D D -> 0.050118723362727

C D D -> 0.054444393724491

C C D -> 0.058770064086255

C C C -> 0.063095734448019

B C C -> 0.068541430789489

B B C -> 0.073987127130959

B B B -> 0.079432823472428

A B B -> 0.086288548981619

A A B -> 0.093144274490809

A A A -> 0.100000000000000

9 A A -> 0.108630847059806

9 9 A -> 0.117261694119611

9 9 9 -> 0.125892541179417

8 9 9 -> 0.136758133868315

8 8 9 -> 0.147623726557213

8 8 8 -> 0.158489319246111

7 8 8 -> 0.172168289996370

7 7 8 -> 0.185847260746629

7 7 7 -> 0.199526231496888

6 7 7 -> 0.216747035381578

6 6 7 -> 0.233967839266268

6 6 6 -> 0.251188643150958

5 6 6 -> 0.272868350772918

5 5 6 -> 0.294548058394878

5 5 5 -> 0.316227766016838

4 5 5 -> 0.343520900862391

4 4 5 -> 0.370814035707944

4 4 4 -> 0.398107170553497

3 4 4 -> 0.432467191578089

3 3 4 -> 0.466827212602681

3 3 3 -> 0.501187233627272

2 3 3 -> 0.544443937244913

2 2 3 -> 0.587700640862553

2 2 2 -> 0.630957344480193

1 2 2 -> 0.685414307894889

1 1 2 -> 0.739871271309585

1 1 1 -> 0.794328234724282

0 1 1 -> 0.862885489816188

0 0 1 -> 0.931442744908094

0 0 0 -> 1.000000000000000

It's still not well distributed but the noisy glitches due to the lack of nuances is almost gone. The question now is how to encode these attenuations without coding 3 nibbles each time? By using a dictionnary of course. So, we have 46 entries in a dictionnary, values 0 to 63 can be coded as 6-bit data, so we can compare this solution as a 6-bit quality.

* HOW TO ENCODE THE 6-BIT DATA? *

The first solution we can think of is putting four 6-bit values stored as 3 bytes and deal with bit manipulations to get correctly all the values. This technic is simply storing RAW data in less space as possible. Let's suppose a 2 seconds at 22KHz audio sample, which means approximately 44000 attenuations to do. This gives a total of 33000 bytes by using 3 bytes for 4 attenuations, and remember that a regular Colecovision game is maximum 32K (32678 bytes). As you can see, we can't really use this solution -or- using it but with less attenuations to do like 2 seconds at 11KHz -or- 1 second at 22Khz audio samples.

So, instead of encoding RAW data, we could think of compression technics. But remember that ColecoVision doesn't have RAM space to decompress the data and then playing it, it should be playing the data while decompressing it... so we should not use complex compression methods.

* A LOSSY COMPRESSION STRATEGY *

!WARNING! I don't know if you did learn in a math class, but for me, a delta sign represents a difference, a step, between 2 values. So, sometimes I'll use the word delta instead of a difference or a step between values.

A lossy compression is a way to encode only part of the data by getting rid off details by transformations or other ways. By encoding only the 5 high-bits of each 6-bit value, we get rid of the less important bit. Or, instead of doing that, knowing that the table is only 46 possibilities, we can reduce to the 32 most interresting values by cutting extremities where are the big gaps. Or, we can encode only differences between each 6-bit data, and limit the possibilities to only 32 delta values and again getting a 5-bit encoding.

The following is an example to show what I mean by compression with DELTAs

TABLE OF SELECTED DELTAs

0 : add 0

1 : add 1

2 : add 4

3 : add 7

4 : add 10

5 : add 13

6 : add 16

7 : add 19

8 : sub 19

9 : sub 16

A : sub 13

B : sub 10

C : sub 7

D : sub 4

E : sub 1

F : Read a 6-bit value.

Let's suppose we want to compress the following data with a lossy compression involving delta :

$20 $2F $36 $39 $3A $3B $3A $39 $36 $2F $20 $11 $0A $07 $06 $07 $0A $11 $20

Let's encode it, step by step, with the table of selected deltas (only 16 for this example).

F $20 : read $20

6 : add 16 => $30 (error = +1)

3 : add 7 => $37 (error = +1)

1 : add 1 => $38 (error = -1)

1 : add 1 => $39 (error = -1)

1 : add 1 => $3A (error = -1)

0 : add 0 => $3A

E : sub 1 => $39

D : sub 4 => $35 (error = -1)

C : sub 7 => $2E (error = -1)

A : sub 13 => $21 (error = +1)

9 : sub 16 => $11

C : sub 7 => $0A

D : sub 4 => $06 (error = -1)

0 : add 0 => $06

1 : add 1 => $07

2 : add 4 => $0B (error = +1)

3 : add 7 => $12 (error = +1)

5 : add 13 => $1F (error = -1)

Let's compare :

ORIGINAL SIGNAL

$20 $2F $36 $39 $3A $3B $3A $39 $36 $2F $20 $11 $0A $07 $06 $07 $0A $11 $20

OUTPUT SIGNAL

$20 $30 $37 $38 $39 $3A $3A $39 $35 $2E $21 $11 $0A $06 $06 $07 $0B $12 $1F

ENCODED DATA

$F $20 $6 $3 $1 $1 $1 $0 $E $D $C $A $9 $C $D $0 $1 $2 $3 $5

ENCODED DATA REARRANGED NIBBLES INTO BYTES

$F2 $06 $31 $11 $0E $DC $A9 $CD $01 $23 $5x

So now we have 5-bit values to encode... packing them RAW (like in the example) is a solution, but certainly need a lot of bit manipulations to decode. Let's suppose again a 2 seconds audio sample at 22KHz, which means approximately 44000 attenuations to do. This gives a total of 27500 bytes by using 5 bytes for 8 attenuations, and remember that a regular Colecovision game is maximum 32K (32678 bytes).

* A LOSSLESS COMPRESSION STRATEGY *

How to keep all the data "as is" and still compress them? Well, if there are repetitions, there is a possibility of compression. RLE (Run-Length Encoding) is a very basic compression technic that simply said, repeat a number of time this and then repeat another number of time that, etc. A very simple form of LZSS (Dictionnary Compression) could be used to repeat not only a single 6-bit value but a bunch of them. This need of course more time and memory, so it can be disappointing for the player speed limitation depending on the way you implement the decoder (or player).

As for the RLE idea, you can use 2 high bits of a byte for the number of repetitions (1 to 4) and the 6 lower bits for the attenuation value and get a kind of compression compared to RAW data... but of course it depends on what is the data we want to encode.

* A LOSSY AND LOSSLESS COMPRESSION STRATEGY *

Depending on what we want to do, a lossy compression may be compressed with a lossless compression, but this gives a two levels compression which can't be good for the player rendering speed. However, this strategy was done before and not only with digital sounds and music.

Even if it's a 2 levels compression, it's this strategy I did try lately, with the following results.

A 3.8 seconds audio sample at 44Khz, reduced to apprimately 17.5Khz (speed limit of the player), and encoded with simple RLE in 3-bits and DELTAs in 5-bits. Total number of attenuations? Approximately 67000. Rom size including the player? Approximately 26000.

Sabrina - Boys Boys Boys (sndtest_sample.mp3

ROM File (if you try with an emulator, try BlueMsx at 79% CPU speed) : result_19.zip

Edited December 13, 2009 by newcoleco

youki · December 13, 2009

Very good result.

What software do you use to convert sample rate?

I would need a software that can convert to very low rate like : 4bit / 4khz .

newcoleco · December 13, 2009

Strange, I did reply to Youki's question but somehow my message was lost in the process.

* TAKE 2 *

What I did is reducing "on the fly" the bitrate into my convertion tool written in Java, keeping the source audio file as a 16bit signed stereo PCM WAV file. I did use Java libraries and a bunch of my personal codes based on what I know and what I needed. And that's pretty much it. :

import java.io.*;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.util.Date;
import java.util.Arrays;

import javax.sound.sampled.*;
import javax.sound.sampled.spi.FormatConversionProvider;

As for a tool to reduce a bitrate of an audio file, I remember using one time "cool edit pro" that was capable of doing this and let me save the result as "RAW data" instead of a complex audio file format. I'm not sure if you realize that a bitrate about 4Khz will makes an audio file near to not audible (frequencies lost : 2001Hz - 22000Hz).

youki · December 14, 2009

thanks for your answer.

Of course, the rate i gave was just a example, just want a tool where i can choose what i want in matter of bit and frequencies.

I saw some VIC 20 demos that uses Digital sound that runs on unexpended VIC 20 . That are quite impressive and i love how the sounds is rendered. Of course the sound is not clear at all but combinated with the music , the images and etc... I love the result. I guess it uses very low sample rate and it reads data flow from 1541.

for instance in this demo (but they are lot of other by PWP) :

newcoleco · December 14, 2009

thanks for your answer.

Of course, the rate i gave was just a example, just want a tool where i can choose what i want in matter of bit and frequencies.

I saw some VIC 20 demos that uses Digital sound that runs on unexpended VIC 20 . That are quite impressive and i love how the sounds is rendered. Of course the sound is not clear at all but combinated with the music , the images and etc... I love the result. I guess it uses very low sample rate and it reads data flow from 1541.

for instance in this demo (but they are lot of other by PWP) :

Sometimes, only 1-bit for the volume information is enough.

Sometimes, a voice can be produced by playing notes. For example, in Coleco Reversi, you can hear "Nah! nah! nah!" if you try to play a wrong space.

And there are other ways, like using a mathematic model of human voice,... and others I don't even think of.

As for the demo, I'm not sure of the technic used for the voice, but it's a data streaming from a disk technic that gives pretty much over 100K... they only used 16K for this robotic demo movie, and it seems they really pushed to make it fit that size.

There are many ways... each time it's the question "what can fit well for this project?"

Edited December 14, 2009 by newcoleco

youki · December 14, 2009

As for the demo, I'm not sure of the technic used for the voice, but it's a data streaming from a disk technic that gives pretty much over 100K... they only used 16K for this robotic demo movie, and it seems they really pushed to make it fit that size.

Here a demo from the same author , that uses only 5k , no disk drive , all is in memory.

newcoleco · December 20, 2009

Here a demo from the same author , that uses only 5k , no disk drive , all is in memory.

Well, I don't know if you did heard about E.S.S. or about the incredible work of Mozer on voice synthesis. In a few words, in the mid-80s, after 4 years of calculation and tests, a voice codec (vocodec) know today as Linear Predictive Coding (LPC) was licenced by ESS technology and used as a single chip (used in some voice modules) or as a software solution for videogames of that time including Impossible Mission "ANOTHER VISITOR, STAY AWHILE, STAY FOREVER" and Ghostbusters "GHOSTBUSTER HAHAHAHAHAAA! HE SLIMED ME!". The result sounds robotic but clear and can produce about 1 second of voice with less than 1KB of data which is 10 times less data than any codec I can imagine at the moment. I suspect a version adapted for ColecoVision can be done and give voice to future Colecovision games without the need of an external module.

After LPC, CELP was developed and the result, more natural, became a standard to stream voice. Of course, a better quality means also more data to do that.

If you look for the keywords "commodore" and "speech box", you'll find more details about LPC and also a little interactiev application that contains Commodore 64 voice samples based on the ESS technology.

* UPDATE *

I've just notice that someone is trying to port Impossible Mission to the Oric computer and examine the possibilities to adapt the C64 voice samples to his project.

http://forum.defence-force.org/viewtopic.php?t=499&highlight=speech&start=90

Edited December 20, 2009 by newcoleco

youki · December 21, 2009

waoouuuh .. Impossible Mission on Oric!!!. I'm looking foward to try it! . I love the Orics. (I own 4 !!)

Speaking about voice. Do you know the adventure game "Le Manoir de Morteville"? It was a game that originally born on Sinclair QL , but later ported to AtariST , Amiga, PC, Amstrad CPC ,C64 and Apple 2 GS.

All version , except the QL one (how ever not sure) , included vocal synthesis , all text (and there were a lot) was spoken in French. (don't know for internationnal version).

That is fanstastic it is that even on a PC XT CGA with no sound card (just the beeper) you had the voice!.

And also the Amstrad CPC version , you had the voice and the machine had only 64k.

If you never tried that game, try it. The best version is the Atari St one , but CPC and PC version are not bad, and i have been really impressed to heard cleary the voice on PC's beeper.

you can download the PC version freely from here (you would need "DOS Box" to run it) : http://www.abandonware-france.org/ltf_abandon/ltf_jeu.php?id=107

PkK · December 25, 2009

What's the time scale here?

Philipp

newcoleco · December 26, 2009

What's the time scale here?

Philipp

It's taken from this ColecoVision sound experimentation I did to check if I was right about the different levels of attenuations... I've lost the ROM file but it's essentially this :

cv_sound_attenuations.wav

It was done simply by setting all 3 tone channels the same attenuation and using this pattern (remember that 15 is muted) : 15, 14, 15, 13, 15, 12, 15, 11, 15, 10, 15, 9, 15, 8, 15, 7, 15, 6, 15, 5, 15, 4, 15, 3, 15, 2, 15, 1, 15, 0, 15.

The white noise in the background is the static noise going on, of course. What is interresting is the squared wave not squared at all because the signal try to reach a "neutral state" all the time.

Edited December 26, 2009 by newcoleco

Sign In

Digital Sounds by Tone Channel Volume Variations

Recommended Posts

newcoleco

Link to comment

Share on other sites

youki

Link to comment

Share on other sites

newcoleco

Link to comment

Share on other sites

youki

Link to comment

Share on other sites

newcoleco

Link to comment

Share on other sites

youki

Link to comment

Share on other sites

newcoleco

Link to comment

Share on other sites

youki

Link to comment

Share on other sites

PkK

Link to comment

Share on other sites

newcoleco

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Apps

My Activity Streams

More