Jump to content
Tursi

Voice Encoding and Playback (with NO speech synth)

Recommended Posts

Some time ago artrag mentioned a voice converter he had created that ran on the MSX and produced really nice voice at 60Hz. He was kind enough to adapt it for the PSG in the TI-99/4A and ColecoVision, and after lots of work-induced delays, I've put together a small package for those interested in using it.

 

With only three voices, the results are not quite as nice as the MSX, but most voice is still quite legible. I've posted a sample YouTube (it's just a quick and dirty test, but it demonstrates both good samples and not-so-good samples, so you can hear the range.)

https://www.youtube.com/watch?v=wkBShy-EFkI

 

Playback takes very little CPU (just unpacking 6 bytes per frame) and a fair amount of memory (360 bytes per second). It's quite good for adding short voice samples!

 

On the TI side it may not be quite as handy, since the Speech Synthesizer can do as well or better with some tuning, but it's still a nice option to have and can enable speech without the speech synthesizer.

 

The actual converter runs under Matlab so requires the Matlab runtime and 64-bit Windows to execute. Alternately, the Matlab script is included so you can run it on your choice of platform if you have the ability to run Matlab scripts. I experimented with Octave and although it didn't run out-of-the-box, I eventually got an early version processing there.

 

For playback, I've included assembly playback code for both the TI-99/4A and the ColecoVision (the ColecoVision code is hand-optimized from SDCC output and runs fine linked into C programs). There's also a VGM converter with C source in case you have a need to VGM audio files (for instance, for my VGM compressor).

 

Anyway, hope you enjoy! Archive is posted on my site:

http://harmlesslion.com/software/artvoice

 

 

  • Like 7

Share this post


Link to post
Share on other sites

Loud female voices seem to work the best. Plenty of those around these parts if you believe another thread ;) It is quite an interesting demo, and I wonder if the algorithms can be improved. Did anybody ever use SAMS on other platforms?

Share this post


Link to post
Share on other sites

yes, right, it is very similar to SAMS, i used it on commodore years ago, and i had also a speech utility for the GW-Basic on my older PC DOS

Share this post


Link to post
Share on other sites

It sounds very 80s. :)

The sample rate is a bit low to make it easy to understand. 'Beware I Live' is barely recognizable. I'm sure it's a frequency issue.
But some other sounds are pretty good. The higher sounds like the laugh are really good!

Share this post


Link to post
Share on other sites

There's no sample rate in the traditional concept, that's how it works at 60hz. ;) Instead, every frame it changes the tones being played on the sound chip. The muffling problems you hear are coming from the fact that low frequency voices have more harmonics, which make the job of choosing the "best" frequency more difficult. The more voices you have, the more accurately you can reproduce the sound, but we only have three. The same for low volume sounds, they are harder to distinguish from the background noise.

 

The samples that are coming from speech chips, specifically Sinistar's "Beware I Live" and "I am the Texas Instruments Home Computer" - they are suffering from a double-compression. The original WAV file to LPC, then from LPC to the limited frequency selection of the sound chip. Likewise, Megabyte in the middle (Yes yes yes) and Fluttershy at the end (whatever you want to do) are suffering from being too quiet. If you play with it a bit, you can quickly predict what will work better than others. ;)

 

The issues are understood, but solving them is more difficult. Again, with more voices to throw at the issue, we can worry less about the "best" harmonic by just playing more of them, and it makes an audible difference. (It would be interesting to try a version that plays back on the FourTI card ;) ).

 

SAMS -- do you guys mean SAM (Software Automatic Mouth)? It uses a similar concept. I started porting a port of it to the TI some years ago, but at the time GCC wasn't up to the task. It may be able to do something with it now. Here's the page of the group that ported the original 6502 code to C: http://hitmen.c02.at/html/tools_sam.html

  • Like 3

Share this post


Link to post
Share on other sites

A bit off topic, but among the VGM files for the music to the Sega Master System version of After Burner I noticed the attached file that contains speech (it can be played using VGMPlay). I didn't think VGM files could contain speech, so how does this work?

After Burner - 02 - Get Ready.vgm

Share this post


Link to post
Share on other sites

The algorithm is in the .m file. The wav file is resampled at 8khz and segmented in chunks of 1/60 of second. Each chunk is converted in frequency via DFT on a number of points sufficient to guarantee frequency resolution of 1hz.

The algorithm looks for the 3 highest peeks on the segment and records their frequencies and amplitudes.

The info are coded as sn768xx parameters and saved in the output file.

  • Like 1

Share this post


Link to post
Share on other sites

A bit off topic, but among the VGM files for the music to the Sega Master System version of After Burner I noticed the attached file that contains speech (it can be played using VGMPlay). I didn't think VGM files could contain speech, so how does this work?

 

VGM files can contain cycle-accurate audio, that's how this one works. If you compressed it using my vgmcomp tool you'd get a warning about lost resolution and not hear the speech in the resulting file.

 

The files compressed by this new tool are trivially converted to VGM (I included the tool ;) ) with no loss of resolution.

Share this post


Link to post
Share on other sites

The TI really needs a programmable interrupt timer similar to the one on the CoCo 3.
Then you can select an appropriate playback rate for individual samples.
Twice the playback rate would offer pretty decent quality without slowing down the machine significantly.

Maybe a future RAM / Drive interface board will add such a thing.
It's not that difficult, I implemented the CoCo 3 timer in Verilog based on the specs in a few hours, and I'm just learning Verilog.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...