Jump to content
IGNORED

Intellivoice is How Hard?


First Spear

Recommended Posts

 

With all of the recent Independent and home brew activity lately, I have not seen a cart that uses the Intellivoice, except for the Hover Bovver demo+ cart I bought many months ago.

 

I'm guessing it is a non-trivial operation, to say the least, to add voice functionality to a game. Can any of the development heavyweights comment on what it really takes to add Intellivoice features? I'm not talking about recording and compressing the sound, that is out-of-band stuff, I am talking about taking the voice in whatever format it is in and weaving it into a (16k?) game.

 

Thanks.

Link to comment
Share on other sites

If you have the compressed voice data already, it's quite easy. SDK-1600 has shipped with an Intellivoice driver for many years now. Tag-Along 2 w/ Voice (using the AL2 samples) went into the tree around 12 years ago. I received legal permission from Microchip to distribute the AL2 samples for use with the Intellivoice.

 

I'm not sure why folks haven't picked it up. It's there, and it works.

 

The driver is under examples/library/ivoice.asm.

 

Call IV_INIT to initialize the Intellivoice.

Call IV_ISR from your ISR to "pump" the Intellivoice.

Call IV_PLAY to queue up a sample to be played; drop sample if the queue is full.

Call IV_PLAYW to queue the sample; wait for room in the phrase queue if queue is full.

Call IV_WAIT to wait for all samples to stop playing.

 

Construct your phrase table of custom phrases as IV_PHRASE_TBL. Look in examples/tagalong2v/tagalong2v.asm for an example phrase table.

 

The only other bits is that your program needs to set aside a small bit of RAM for the Intellivoice phrase queue.

Edited by intvnut
  • Like 1
Link to comment
Share on other sites

If I re-read the dev docs for the 3rd time,

 

If you have the compressed voice data already, it's quite easy. SDK-1600 has shipped with an Intellivoice driver for many years now. Tag-Along 2 w/ Voice (using the AL2 samples) went into the tree around 12 years ago. I received legal permission from Microchip to distribute the AL2 samples for use with the Intellivoice.

 

I'm not sure why folks haven't picked it up. It's there, and it works.

 

The driver is under examples/library/ivoice.asm.

 

Call IV_INIT to initialize the Intellivoice.

Call IV_ISR from your ISR to "pump" the Intellivoice.

[snip]

The only other bits is that your program needs to set aside a small bit of RAM for the Intellivoice phrase queue.

 

If I re-read the docs for the 3rd time, I might understand what needs to be done there. ;)

 

Thanks for putting the info out there. Maybe as I get a better a grasp of Inty BASIC, Intellivoice support will be baked in and I might have a better chance of contributing something on that front.

 

Thanks!

Link to comment
Share on other sites

  • 3 weeks later...
  • 3 weeks later...

I too would like this. Would add another element to Dr. Chatterbox (Eliza).

 

On the Atari 800 we have a voice program called S.A.M. and we had hours of fun with the Eliza routine renamed "Dr. Sam".

 

Having voice for Eliza would be fun for me for that reason.

Link to comment
Share on other sites

All of the SP0256-AL2 samples are available with SDK-1600. Maybe someone can find the US Naval Research Text-to-Speech algorithm so many 80s computers and the CTS256-AL2 used to drive allophones from that?

 

It was a pretty simple algorithm from what I recall, as the CTS256-AL2 was basically a TMS7040, which was a pretty limited processor, so it seems likely the CP-1610 can do it too in limited memory.

 

I did find one web page that purports to have a CTS256-AL2 emulator but the spelling on the page was awful and it gave off a vibe of "pre-infected virus-wrapped software" so I didn't download and try it. :-)

 

But hey, you're all armed with the Googles, so go research! There's a couple papers known to exist that, if you can find them, should be huge shortcuts:

 

  • Janet May "Allophone Speech Synthesis Technique" General Instruments 1982
  • Elovitz et Al. "Automatic Translation of English Text to Phonetics by mean of letter to sound rules" United States Naval Research Laboratory Report 7948 year: 1976
Link to comment
Share on other sites

 

All of the SP0256-AL2 samples are available with SDK-1600. Maybe someone can find the US Naval Research Text-to-Speech algorithm so many 80s computers and the CTS256-AL2 used to drive allophones from that?

 

It was a pretty simple algorithm from what I recall, as the CTS256-AL2 was basically a TMS7040, which was a pretty limited processor, so it seems likely the CP-1610 can do it too in limited memory.

 

I did find one web page that purports to have a CTS256-AL2 emulator but the spelling on the page was awful and it gave off a vibe of "pre-infected virus-wrapped software" so I didn't download and try it. :-)

 

But hey, you're all armed with the Googles, so go research! There's a couple papers known to exist that, if you can find them, should be huge shortcuts:

 

  • Janet May "Allophone Speech Synthesis Technique" General Instruments 1982
  • Elovitz et Al. "Automatic Translation of English Text to Phonetics by mean of letter to sound rules" United States Naval Research Laboratory Report 7948 year: 1976

 

 

 

O HAI, what have we here? A simplified US Naval Research algorithm, as employed by the Votrax-based Microvox, and explained by the always fun to read Steve Ciarcia!

 

http://books.google.com/books?id=zQWNinpbFx0C&pg=PA118&lpg=PA118&dq=us+naval+research+text+to+speech&source=bl&ots=7UlE2qhKHW&sig=wqbS42od6r92oDXcbIejrBcbBXg&hl=en&sa=X&ei=melIVIOYLMnH8AHP_IC4BA&ved=0CEsQ6AEwBg#v=onepage&q=us%20naval%20research%20text%20to%20speech&f=false

 

This is the algorithm you'd need to write to interpret / speak words that were typed in by the user.

 

 

And speaking of Google Search, go to Google Image Search and search for "Atari Breakout". GO. NOW. I'll wait.

Edited by intvnut
  • Like 1
Link to comment
Share on other sites

  • 5 weeks later...

Heeeeeeeey. Let's re-use this thread to discuss the Intellivoice. And not fill up the IntyBASIC thread with Voice chatter (pun intended).

 

One thing I'm trying to wrap my head around - the C64 (and other platforms) had software-based speech synth that was a heck of a lot easier to use and a lot more flexible. You were able to modify pitch and other attributes, simply by changing parameters around in your code. My question is why? Is this just because it was an entirely different design/implementation? At their heart, these speech engines all do the same thing (string together allophones). So I'm a bit mystified as to how SAM can be so much more powerful than the Intellivoice. Surely this isn't just due to having extra RAM? And I guess I'm curious if a person could write a primitive SAM for the INTV - or is the PSG just that much less capable than the SID in terms of what it can generate?

Link to comment
Share on other sites

Heeeeeeeey. Let's re-use this thread to discuss the Intellivoice. And not fill up the IntyBASIC thread with Voice chatter (pun intended).

 

One thing I'm trying to wrap my head around - the C64 (and other platforms) had software-based speech synth that was a heck of a lot easier to use and a lot more flexible. You were able to modify pitch and other attributes, simply by changing parameters around in your code. My question is why? Is this just because it was an entirely different design/implementation? At their heart, these speech engines all do the same thing (string together allophones). So I'm a bit mystified as to how SAM can be so much more powerful than the Intellivoice. Surely this isn't just due to having extra RAM? And I guess I'm curious if a person could write a primitive SAM for the INTV - or is the PSG just that much less capable than the SID in terms of what it can generate?

 

Well... SAM apparently ran on the Apple II, which only had 1 bit output for audio, so in theory it could run on anything. ;-)

 

The secret to SAM's phonemes, according to this page, was that most of them were built from two sine waves and a square wave, and that's apparently enough?

 

If someone cares to take the time to convert the SAM code to drive the PSG instead, maybe it won't sound too bad. Only thing is that the PSG outputs three square waves, so it'll probably sound a bit harsher.

Link to comment
Share on other sites

Heeeeeeeey. Let's re-use this thread to discuss the Intellivoice. And not fill up the IntyBASIC thread with Voice chatter (pun intended).

 

One thing I'm trying to wrap my head around - the C64 (and other platforms) had software-based speech synth that was a heck of a lot easier to use and a lot more flexible. You were able to modify pitch and other attributes, simply by changing parameters around in your code. My question is why? Is this just because it was an entirely different design/implementation? At their heart, these speech engines all do the same thing (string together allophones). So I'm a bit mystified as to how SAM can be so much more powerful than the Intellivoice. Surely this isn't just due to having extra RAM? And I guess I'm curious if a person could write a primitive SAM for the INTV - or is the PSG just that much less capable than the SID in terms of what it can generate?

 

 

The SID in the C=64 supported different output wave shapes, including rectangle pulse width modulation and various filters. The PSG can only output square waves, and any filtering or shaping must be done in software, to the extent that it is possible. Therefore, the PSG is much less capable than the SID.
There is a reason why SID is used by a lot of modern musicians to generate interesting synthesized sounds, yet outside the Intellivision culture and some embedded systems, the PSG is practically unheard of.
Not to mention that the C=64 allowed you more direct access to the hardware, especially the low-level timing of the VIC-II that controlled video output. The Intellivision's only source of timing available to the programmer is the VBLANK interrupt, which occurs at 60Hz.

 

 

Well... SAM apparently ran on the Apple II, which only had 1 bit output for audio, so in theory it could run on anything. ;-)

 

The secret to SAM's phonemes, according to this page, was that most of them were built from two sine waves and a square wave, and that's apparently enough?

 

If someone cares to take the time to convert the SAM code to drive the PSG instead, maybe it won't sound too bad. Only thing is that the PSG outputs three square waves, so it'll probably sound a bit harsher.

 

Interesting. All samples I've heard are from the C=64. Indeed I had the SAM software for mine when I was 12 years-old, and it was very versatile and impressive. I wonder how it actually sounds on an Apple ][, and if the output is anywhere as intelligible as on a C=64.

 

In any case, based on the article on that page, it seems that apart from the generated output, the magic was in the special rules applied to the text-to-speech engine. The Intellivoice already provides a really good voice simulation, what freeweed seems to be requesting is a much simpler interface to string together the allophones, and you can't get much more intuitive than straightforward "text-to-speech" like in SAM.

 

-dZ.

Link to comment
Share on other sites

With all of the recent Independent and home brew activity lately, I have not seen a cart that uses the Intellivoice, except for the Hover Bovver demo+ cart I bought many months ago.

 

I'm guessing it is a non-trivial operation, to say the least, to add voice functionality to a game. Can any of the development heavyweights comment on what it really takes to add Intellivoice features? I'm not talking about recording and compressing the sound, that is out-of-band stuff, I am talking about taking the voice in whatever format it is in and weaving it into a (16k?) game.

 

Thanks.

Converting voice directly isn't easy.

 

But you can use the Intellivoice pre-recorded phrases and the SPO256-AL2 phonemes with the new IntyBASIC v1.0 :)

 

http://atariage.com/forums/topic/232063-intybasic-compiler-v10-crunchy-tasty/

Link to comment
Share on other sites

 

 

In any case, based on the article on that page, it seems that apart from the generated output, the magic was in the special rules applied to the text-to-speech engine. The Intellivoice already provides a really good voice simulation, what freeweed seems to be requesting is a much simpler interface to string together the allophones, and you can't get much more intuitive than straightforward "text-to-speech" like in SAM.

 

Well, the US Naval Research algorithm (or a simplified form of it) is linked upthread... :)

 

EDIT: And, if the text-to-speech is done at compile time, not run time, I'm sure we could find an even better algorithm. The US Naval Research Labs algorithm is simple enough to run on a very low-end processor.

Edited by intvnut
Link to comment
Share on other sites

  • 3 years later...

 

Well, the US Naval Research algorithm (or a simplified form of it) is linked upthread... :)

 

EDIT: And, if the text-to-speech is done at compile time, not run time, I'm sure we could find an even better algorithm. The US Naval Research Labs algorithm is simple enough to run on a very low-end processor.

 

Sooo... reviving this old dead zombie thread....

 

Has anybody implemented this already? I know of at least one person who has (you know who you are).

 

I am hereby asking kindly if they would be willing to share their code instead of making others go through the same pain de novo, especially for those of us who are less talented in these technical schmechnical things. :)

 

-dZ.

Link to comment
Share on other sites

Let me be clear, so as to avoid further confusion: I know of at least one person who implemented a mechanism to convert PCM voice samples into Intellivoice LPC-encoded data, whether it's the US Naval algorithm or something else. That is what goes to the heart of this thread's topic.

 

I humbly and kindly ask anybody who has done so, to please share it with the community.

 

dZ.

Link to comment
Share on other sites

In any case, as anyone who reads INTVPROG already knows, I do have an LPC encoder. It does not work very well. It is difficult to use. It also incorporates 3rd party code with unclear licensing. I had distributed a one or two copies privately in the past, and the results it produced were... not anything I'd put my name on. You can eventually get decent results out of it, but so far I've not been able to shepherd anyone else to get the same.

 

I've been working slowly over time to fix those various faults, but it's slow going as my time is limited and I have bigger fish to fry. For example, jzIntv will stop working entirely in the next MacOS once they drop 32-bit support, so I need to port jzIntv to a different graphics/sound library. This is turning out to be non-trivial.

Edited by intvnut
Link to comment
Share on other sites

That's not text-to-speech, and has nothing to do with allophones or phonemes seem to have dominated this thread.

Yes, I am glad you understand my question then. I never mentioned text-to-speech. The original question was about taking voice samples. Then it steered into text-to-speech (probably when you suggested it as an option).

 

It is obvious that I am not the only one who would like to be able to convert voice samples into Intellivoice data. Unfortunately I lack the necessary skills to do so, which is why I asked.

 

Your last couple of responses are not helping anybody get closer to that goal.

 

dZ.

Link to comment
Share on other sites

Yes, I am glad you understand my question then. I never mentioned text-to-speech.

You must have a different definition of "this" than I do then. The post you quoted referred only to text-to-speech and the corresponding US Naval Research Labs algorithm.

post-14113-0-22416500-1525210053_thumb.png

 

The original question was about taking voice samples. Then it steered into text-to-speech (probably when you suggested it as an option).

 

It is obvious that I am not the only one who would like to be able to convert voice samples into Intellivoice data. Unfortunately I lack the necessary skills to do so, which is why I asked.

 

Your last couple of responses are not helping anybody get closer to that goal.

I don't particularly appreciate being publicly guilted into devoting my spare time differently than I already am with oblique pleas into the unknown. "Has anybody done this?" "I know of somebody who's done this, but I'm not saaaaaying whooooooo...." That reeks of innuendo and bully pulpit. Whatever happened to contacting me directly and asking "Do you have this? What would it take to get this released?" Instead, this comes across as "Somebody's holdin' out on us, let's guilt him into releasing."

 

I spend all of my free hobby time on Intellivision-based projects, whether it's preserving the development artifacts, reverse engineering new finds, developing and extending jzIntv, providing technical assistance, and so on. I've spent more hours than you realize trying to get the voice stuff to be reasonable, including researching code bases and algorithms, trying to reduce it to something better than a grad school project that's good enough for a simple demo.

 

Please, you're better than this.

 

 

Edited by intvnut
Link to comment
Share on other sites

Whatever. You don't want to share, then don't. I asked the question openly in case someone else has done this because many people have asked about in the past and I thought maybe some has succeeded.

 

You want to feel guilt and get defensive it's on you. You could have just as well ignored the question.

Link to comment
Share on other sites

It's all in your approach.

 

If you'd come straight out and asked for LPC encoding #14 instead, and didn't play games about saying "I know of at least one person," followed by "anybody who has done so, please release" you would have come across rather differently. After all, the one person you know of is in the "anybody" list.

 

In any case, I do work with those who have contacted me directly. I can't release the tool I have, and I've explained why. I still am working on a tool I can release in the spare time I have. I'm also backlogged on projects and the backlog isn't getting smaller.

Link to comment
Share on other sites

It's all in your approach.

 

If you'd come straight out and asked for LPC encoding #14 instead, and didn't play games about saying "I know of at least one person," followed by "anybody who has done so, please release" you would have come across rather differently. After all, the one person you know of is in the "anybody" list.

 

In any case, I do work with those who have contacted me directly. I can't release the tool I have, and I've explained why. I still am working on a tool I can release in the spare time I have. I'm also backlogged on projects and the backlog isn't getting smaller.

 

You seem to have a lot of angst lately, and take great pains to make sure to respond curtly to anything I ask in this forum whether directed at you or not. I'm getting tired of it. Stop making everything I say personal, it's not. :roll:

 

I specifically did not mention names to avoid turning this into a "shame game." I knew that if you were willing to share your code you would have, so I asked in general if anybody else had done it. You were under no obligation to respond at all. I'm sure I and everybody else would much more preferred it had this thread gone unanswered and buried once again without response, instead of turning into this stupid argument. Jeez.

 

-dZ.

Link to comment
Share on other sites

So why did you bother saying "you know who you are"? Seriously, that's about as close to calling out someone without naming a name as you can get.

 

I'd like to think I have been pretty straightforward with my answers. If they're short, it's because I lack the time to make them longer.

Link to comment
Share on other sites

  • 4 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...