Voice Encoding and Playback (with NO speech synth)

Tursi · August 13, 2016

Some time ago artrag mentioned a voice converter he had created that ran on the MSX and produced really nice voice at 60Hz. He was kind enough to adapt it for the PSG in the TI-99/4A and ColecoVision, and after lots of work-induced delays, I've put together a small package for those interested in using it.

With only three voices, the results are not quite as nice as the MSX, but most voice is still quite legible. I've posted a sample YouTube (it's just a quick and dirty test, but it demonstrates both good samples and not-so-good samples, so you can hear the range.)

https://www.youtube.com/watch?v=wkBShy-EFkI

Playback takes very little CPU (just unpacking 6 bytes per frame) and a fair amount of memory (360 bytes per second). It's quite good for adding short voice samples!

On the TI side it may not be quite as handy, since the Speech Synthesizer can do as well or better with some tuning, but it's still a nice option to have and can enable speech without the speech synthesizer.

The actual converter runs under Matlab so requires the Matlab runtime and 64-bit Windows to execute. Alternately, the Matlab script is included so you can run it on your choice of platform if you have the ability to run Matlab scripts. I experimented with Octave and although it didn't run out-of-the-box, I eventually got an early version processing there.

For playback, I've included assembly playback code for both the TI-99/4A and the ColecoVision (the ColecoVision code is hand-optimized from SDCC output and runs fine linked into C programs). There's also a VGM converter with C source in case you have a need to VGM audio files (for instance, for my VGM compressor).

Anyway, hope you enjoy! Archive is posted on my site:

http://harmlesslion.com/software/artvoice

ti99iuc · August 13, 2016

Oh my.... i found this fantastic !! :D

the laugh is incredible ahahah

+OLD CS1 · August 13, 2016

Loud female voices seem to work the best. Plenty of those around these parts if you believe another thread It is quite an interesting demo, and I wonder if the algorithms can be improved. Did anybody ever use SAMS on other platforms?

ti99iuc · August 13, 2016

yes, right, it is very similar to SAMS, i used it on commodore years ago, and i had also a speech utility for the GW-Basic on my older PC DOS

JamesD · August 13, 2016

It sounds very 80s.

The sample rate is a bit low to make it easy to understand. 'Beware I Live' is barely recognizable. I'm sure it's a frequency issue.
But some other sounds are pretty good. The higher sounds like the laugh are really good!

Tursi · August 14, 2016

There's no sample rate in the traditional concept, that's how it works at 60hz. Instead, every frame it changes the tones being played on the sound chip. The muffling problems you hear are coming from the fact that low frequency voices have more harmonics, which make the job of choosing the "best" frequency more difficult. The more voices you have, the more accurately you can reproduce the sound, but we only have three. The same for low volume sounds, they are harder to distinguish from the background noise.

The samples that are coming from speech chips, specifically Sinistar's "Beware I Live" and "I am the Texas Instruments Home Computer" - they are suffering from a double-compression. The original WAV file to LPC, then from LPC to the limited frequency selection of the sound chip. Likewise, Megabyte in the middle (Yes yes yes) and Fluttershy at the end (whatever you want to do) are suffering from being too quiet. If you play with it a bit, you can quickly predict what will work better than others.

The issues are understood, but solving them is more difficult. Again, with more voices to throw at the issue, we can worry less about the "best" harmonic by just playing more of them, and it makes an audible difference. (It would be interesting to try a version that plays back on the FourTI card ).

SAMS -- do you guys mean SAM (Software Automatic Mouth)? It uses a similar concept. I started porting a port of it to the TI some years ago, but at the time GCC wasn't up to the task. It may be able to do something with it now. Here's the page of the group that ported the original 6502 code to C: http://hitmen.c02.at/html/tools_sam.html

+OLD CS1 · August 14, 2016

Yeah, SAM. I fat-fingered the extra 'S'.

artrag · August 14, 2016

This is how it works on msx (using a dedicated SCC chip with 5 channels)

Asmusr · August 14, 2016

A bit off topic, but among the VGM files for the music to the Sega Master System version of After Burner I noticed the attached file that contains speech (it can be played using VGMPlay). I didn't think VGM files could contain speech, so how does this work?

After Burner - 02 - Get Ready.vgm

artrag · August 14, 2016

The algorithm is in the .m file. The wav file is resampled at 8khz and segmented in chunks of 1/60 of second. Each chunk is converted in frequency via DFT on a number of points sufficient to guarantee frequency resolution of 1hz.

The algorithm looks for the 3 highest peeks on the segment and records their frequencies and amplitudes.

The info are coded as sn768xx parameters and saved in the output file.

Tursi · August 15, 2016

A bit off topic, but among the VGM files for the music to the Sega Master System version of After Burner I noticed the attached file that contains speech (it can be played using VGMPlay). I didn't think VGM files could contain speech, so how does this work?

VGM files can contain cycle-accurate audio, that's how this one works. If you compressed it using my vgmcomp tool you'd get a warning about lost resolution and not hear the speech in the resulting file.

The files compressed by this new tool are trivially converted to VGM (I included the tool ) with no loss of resolution.

JamesD · August 15, 2016

The TI really needs a programmable interrupt timer similar to the one on the CoCo 3.
Then you can select an appropriate playback rate for individual samples.
Twice the playback rate would offer pretty decent quality without slowing down the machine significantly.

Maybe a future RAM / Drive interface board will add such a thing.
It's not that difficult, I implemented the CoCo 3 timer in Verilog based on the specs in a few hours, and I'm just learning Verilog.

mikiex · October 1, 2016

Nice work, I wonder if you get any more improvement from updating twice a frame at 120hz?

Voice Encoding and Playback (with NO speech synth)

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members