Speech Synthesis on VCS

Max-T · March 28, 2004

Can anyone give me a link to some information on how the programmers were able to make the 2600 say "Quadrun?" I remember thinking it was a feat of the black arts when I heard my NES exclaim "Double Dribble" and "Blades Of Steel", but it fascinates me that the Atari could be made to synthesize anything close to a spoken word.

From what I understand, the TIA chip is only able to produce square-wave or noise type sounds. How then was it pushed to produce a perfectly distinguishable (if aliased) word like "Quadrun?"

If only I could afford to own one of those coveted cartridges for myself...

StanJr · March 28, 2004

if you are impressed by Quadrun, you HAVE to get Berzerk VE. That has the best voice of ANY 2600 game (or hack )

:spidey:

Max-T · March 28, 2004

Thanks very much for the tip -- I thought that the only games for the VCS with voice were Quadrun and open Sesame. I'll have to put Berzerk VE at the top of my shopping list. I appeciate it!

Max

StanJr · March 28, 2004

Berzerk VE is a hack of Berzerk and used to be available here at AA. Albert might still run you a copy off if you ask nicely, but I'm not 100% on that so don't come after me if he won't.

:spidey:

Albert · March 28, 2004

Here's the game in our store:

Berzerk: Voice Enhanced

This is easily the best voice I've heard on the 2600, although from my understanding it was easier to reproduce because it's more of a mechanical, robotic voice than a regular human voice. Still, sounds pretty damn good.

..Al

chairmonkey4406 · March 28, 2004

I downloaded the ROM for quadrun and, I don't hear the voice. When is it supposed to say it?

bjk7382 · March 28, 2004

I downloaded the ROM for quadrun and, I don't hear the voice. When is it supposed to say it?

Right after you hit the button to start the game.

SS · March 28, 2004

Here are some zipped WAV files of the speech from "Berzerk VE", "Quadrun", and "Open Sesame". Are there any other games that use voice?

open_sesame.zip

quadrun.zip

berzerk_ve.zip

kisrael · March 28, 2004

You know, that's a darn good question. I'm amazed at sampling in general...actually, even polyphonic sound is a mystery to me, along with any artificial way of "making" syllable sounds.

StanJr · March 28, 2004

those are the only 3 VCS games with voice....so far.....

:spidey:

Dan Iacovelli · March 28, 2004

If I remember corectly from early days of stella list,

some body did program a demo calle stella says and it had speech in it too.

I don't remember how they did it though.

Blackbird · March 29, 2004

I'd like to know what the process is to convert the audio. The only thing I've even heard mentioned is that you have to convert the file to 4-bit audio, and even doing that's tough. Compiling it in a ROM must be even harder...

+Andrew Davie · March 29, 2004

I'd like to know what the process is to convert the audio. The only thing I've even heard mentioned is that you have to convert the file to 4-bit audio, and even doing that's tough. Compiling it in a ROM must be even harder...

I did speech on the NES (Three Stooges) and the process for '2600 is exactly the same. Speech is just a a series of volumes for the speaker, played in sequence very rapidly. Your ear probably can handle frequencies (= changes in volume) up to about 20kHz (ie: 20,000 changes per second). However, human speech, if I recall correctly, comes in about 4kHz). So to reproduce speech, all you really need to be able to do is change the volume at about 4,000 times per second.

Now consider a typical '2600 display; it is 262 lines at 60Hz. That's 15,720 scanlines per second. If you changed the volume on EVERY scanline, you could pretty much pump out sounds of frequencies almost up to the human hearing limit. But that's not really necessary. All you really need to do is change the volume (say) once every 2 or 4 scanlines, giving you very roughly 8kHz or 4kHz frequency capability.

But what is 'speech'? It's a sound, just like anything else. Consisting purely of a series of "volumes" at varying frequencies, all mixed together. It's amazing that the ear/brain can distinguish individual elements of a series of sounds 'mixed' together, but it can.

To play speech, all you really need to do is find the sample you want to play (a recording of a famous speech, say). Then you figure your sample rate. That is, how many times per second you are going to change the volume on the '2600. We already figured 4kHz was sufficient. So 4,000 times per second, sample the volume of the recording, and save the volume readings to an array. When we play back those readings on the '2600, we hear the original recording!

There's just one other issue to consider; the resolution of each sample. Typically, we can hear quite a range of volume -- from very quiet to very loud. Actually, it's a logarithmic scale. But the point is, when we take a sample, we are representing it with a number - if we used a single byte, we could represent volume from 0 to 255. Now the Atari doesn't have 8 bits of volume fidelity; it has just 4 bits. So when we sample, instead of ending up with a volume from 0-255, we need to shrink that range down to 0-15. Not high fidelity, but still does the job.

There are a few other complications, for example, we're not REALLY dealing with 0-255 as our range, but instead -128 to 127 (0 is the midpoint of the 256-value range). Sound is represented as 'vibration' or oscillation around a 0-point, with positive and negative values. But the fundamentals are the same; convert your sample into individual volume elements. Downsize to the resolution of your output method, and then play back the samples at the appropriate rate.

Once you can play *a* digitised speech sample, you can play back pretty much *any* digitised speech sample. It's not that tricky, really!

Cheers

A

Cootster · March 29, 2004

www.gooddealgames.com has an interview with Steve Woita where he discusses the process . . . (Basically, just as Andrew described it) . . .

I've done it on the Apple via an old Compute (or possibly Byte) type-in as well. Does anybody know where to find that program, and whether or not it works in emulation?

StanJr · March 29, 2004

I did speech on the NES (Three Stooges) and the process for '2600 is exactly the same.
A

And the speech on Three Stooges is pretty darn good, I might add.

:spidey:

pmpddytim · March 29, 2004

Prompted by Andrews post above I must say that Andrew Davie is the man. Whatever I read of his (Programming for Newbies, etc..) no matter how complicated it is its alway easy to understand. If he neglects write a book the world is missing out.

Now, If he can only explain quantum physics to me.

Tim

kisrael · March 29, 2004

Prompted by Andrews post above I must say that Andrew Davie is the man. Whatever I read of his (Programming for Newbies, etc..) no matter how complicated it is its alway easy to understand. If he neglects write a book the world is missing out.

Yeah, Andrew is quite amazing. Although I'm proud of 2600 101 (currently undergoing a revision) my own meager knowledge of 6502 and TIA is always eclipsed by his. Hopefully I can persuade him and some of the other [stella]ites to put in some entries in my next project, the 2600 Cookbook...

Andrew, what is the status of your Newbie forum? It seems to be a bit neglected...(I know how easy it is to let that happen as real world timesucks make their demands...)

foxglove9 · March 29, 2004

I did speech on the NES (Three Stooges) and the process for '2600 is exactly the same.

I always felt the speech in 3 Stooges for NES is what made it fun to play. Good work!

foxglove9 · March 29, 2004

berzerk ve.zip - 66 KB
quadrun.zip - 15 KB

open sesame.zip - 9 KB

Those links aren't working for me (downloads as a .php file), probably incompatible with my browser. Anyway I can download them directly or would someone want to email me directly?

Max-T · March 29, 2004

Mr. Davie, thanks very much!

I'll be spending the next couple of days attached to online technical journals, 'cause you really went above and beyond beyond the call answerig my puny little post. But seriously, I do thank you, and I appreciate the time you took to help out a newbie.

Max'

Paul Slocum · March 29, 2004

If I remember corectly from early days of stella list,
some body did program a demo calle stella says and it had speech in it too.

I don't remember how they did it though.

That was Eckhard, and my understanding is that's the same code used in Berzerk VE. I'm considering using it for the RPG if we have space left over.

-paul

+Andrew Davie · March 29, 2004

Andrew, what is the status of your Newbie forum? It seems to be a bit neglected...(I know how easy it is to let that happen as real world timesucks make their demands...)

Well it's probably sufficient to point out that here it is currently 1:30am in the morning and I'm working on an important project and I will be waking up in another 6 hours or so and putting on a suit and tie and going to my "real" job and I'm likely to be doing that routine for a few months to come, at least.

I'm just too busy to do all the things I have on my plate. I'd really really love to turn the Newbie forum into an actual book. Somebody find me a publisher and I promise I'll do just that. I like to think that I'm able to explain things in a way that helps people understand.

On the subject of speech synthesis, it is helpful to think about what we are actually DOING when we are making sounds. The bottom line is that we cause our eardrums to vibrate. Nothing more, nothing less. The eardrum is a simple membrane... it vibrates quickly, slowly, with large or small intensity. And (for all intents and purposes) with that mechanism alone we hear every sound that you care to name.

So recreating speech or music is simply a matter of making the eardrum move in the same way as the original sound made the eardrum move. But no matter HOW complex the original sound, it was still moving the eardrum back and forward... at different speed, with different intensity... but still the same basic back and forward oscillation.

If we can reproduce that oscillation then we hear the original sound. Think about a speaker in a hi-fi system. That can reproduce incredibly complex sounds, speech, music. But ultimately, it's just a simple cardboard cone with a membrane vibrating back and forth and causing pressure waves in the air which in turn cause our eardrums to vibrate back and forth... just like the original sound.

So the Atari... just needs to control the speaker (the vibrating membrane) in the same way. To vibrate it back and forth, we can simply change the volume of the sound we're sending to it. Loud, soft, loud, soft, loud, soft... we have an oscillating membrane! Do that once and you hear a click. Do it quicker and you hear a low buzz. Do it a bit quicker and you hear a higher toned buzz. Quicker still and it's higher pitched 'note'. Do it with varying intensity, and you hear a warbling sound. Do it from a digitised speech sample and you hear... speech!

Let's throw a bit of theory at you while you're reading... given any waveform (ie: a recording of someone speaking), that waveform consists of various frequency components (remember, we already noted that the human ear can hear up to 20kHz). Some of the speech has 4kHz bits, some of it is at 8kHz, etc. To reliably reproduce the waveform from a sampled subset of the original data, Shannon's Theorem states that you need to sample at twice (or more) times the highest frequency component of the original. So if we decide that we want to reproduce sound with frequency up to 8kHz, then we'd need to sample at 16kHz (ie: have 16,000 samples per second) to accurately reproduce the original sample.

Even at 4bits per sample (the resolution of the '2600 sound register), that equates to 8K-bytes of ROM per second of speech! So you can see, although sound is simple, it's also very expensive. Clever packing techniques (eg: a delta between successive samples, rather than absolute values for each) can reduce this burden, but they, of course, can cost you valuable processing time. So I would expect that sound which is compressed will probably play with blank screens, and uncompressed sound will be very short in duration

Feel free to correct any errors in the above; it's rather late and clearly I'd rather be doing anything else but working right now

Cheers

A

Blackbird · March 30, 2004

Wow, that really covers it! I think I sorta understand it now... heh. Thank you!

Nukey Shay · March 30, 2004

I've done it on the Apple via an old Compute (or possibly Byte) type-in as well. Does anybody know where to find that program, and whether or not it works in emulation?

The samples themselves should work in emulation...since you are just playing a speaker click at the proper frequency. Getting the samples would be the difficult part, because the program just used the cassette input to gather it. IIRC, it was printed in Creative Computing.

Nukey Shay · March 30, 2004

Found it...

http://www.atarimagazines.com/creative/v9n...speech_synt.php

Speech Synthesis on VCS

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members