Jump to content

Open Club  ·  87 members

AtariVox
IGNORED

Feasibility of prerecorded speech packages for Stella


DirtyHairy

Recommended Posts

I have been wondering whether it would be possible to prerecord the phrases used by individual games and distribute them as a kind of "speech packs" for Stella. To all of you who have already written games  that use the Atarivox: what kind of commands are you using? Are you sending them over to the atarivox in a single "transaction", or is the transmission stretched over time? If all commands for a individual phrase would be sent in a single transmission, we could just collect them and calculate a hash to identify the matching recording.

Link to comment
Share on other sites

Sort of the way MAME used speech samples for games where it hadn't emulated the speech circuitry?

 

That's an interesting idea. We'd need the authors to generate a binary for recording the phrases (or words) for each of their games.

 

At least one game (Sync) doesn't use words, but rather makes the AtariVox "sing" along with what you're doing. I'm not sure how that would work. (And Seemo hasn't been around for quite some time.)

 

One of the things I want to document, is how each game is currently using the AtariVox: voices, scores, settings, etc. I just have to plow through the game descriptions.

Link to comment
Share on other sites

1 hour ago, DirtyHairy said:

 what kind of commands are you using? Are you sending them over to the atarivox in a single "transaction", or is the transmission stretched over time? 

We're sending byte commands (or 2 byte commands) that either contain phonemes or pauses, or some parameter that affects the speech quality.

 

The joystick port uses bit-banged serial at 19200 baud, so I think it's pretty rare to send more than a byte a frame. It's possible to do more, but a byte a frame will keep the speech buffer full, and CPU time is precious.

 

My spell&speak app won't work with your idea either, in addition to Nathan's observation about Sync, since all phrases are unique.

 

Nathan, check out toymailman's thread. There might be a few new ones to add, but I think it has most of them listed by vox functionality.

Link to comment
Share on other sites

5 hours ago, RevEng said:

Nathan, check out toymailman's thread. There might be a few new ones to add, but I think it has most of them listed by vox functionality.

Thanks! I'd even posted in that thread, but like so much AtariVox info I'd forgotten where it was (or that it even existed). I'll add it to the Information section.

  • Like 1
Link to comment
Share on other sites

6 hours ago, RevEng said:

The joystick port uses bit-banged serial at 19200 baud, so I think it's pretty rare to send more than a byte a frame. It's possible to do more, but a byte a frame will keep the speech buffer full, and CPU time is precious.

Hm...

 

I suppose consecutive allophones are not independent, correct? So we cannot simply play the allophones one by one. How long is the dependency queue? More than just the next neighbor?

 

Without dependency, we could simply play one MP3 per allophone. If the resulting sound is depending on two allophones, we could delay output by one frame and stitch the MP3s together. So, as long as two (or more) samples cannot be identified, we split the MP3s (assuming that the initial part would sound identical in this case). Once we have found a unique sample, we play the remaining MP3. 

 

Sounds complicated, but is it even feasible?

 

Maybe we better look at the MAME solution?

Edited by Thomas Jentzsch
Link to comment
Share on other sites

You're welcome, Nathan!

 

Thomas, as far as I can tell, consecutive phonemes are independent with speakjet. The formant frequencies don't change depending on previous or following phonemes, and I haven't detected any other dependencies.

 

 I was originally responding to DirtyHarry's question about using phrase samples with my answer. So far as phoneme samples, the approach could work out, but probably not with comprehensive representation of phonemes. There would be a lot of samples involved in a comprehensive capture; number_of_phoneme_samples = phonemes * speeds * pitches * bends, which works out to 66584576 samples. If we estimate with an average phoneme length of .1s, that's about 77 days worth of recording time.

 

You'd also want to ideally capture the samples starting and ending on the same fundamental frequency phase, so there's no "pop" of static between phonemes during playback, though I don't know how noticeable the pop would be in practise.

Link to comment
Share on other sites

Using programmatic speed and pitch changes would certainly put you back into workable numbers.

 

The bends are described in the speakjet manual as follows. "The frequency Bend adjusts the output frequencies of the oscillators. This will change the voicing from a deep-hollow sounding voice to a High-metallic sounding voice."

 

I don't think you'd be able to do that in the same way the speakjet does, with sample manipulation.

Link to comment
Share on other sites

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...