SPEECH TECHNOLOGY | The talking computer has always been something that fascinates us. Ahmad Humeid looks at an interesting application that makes the web talk to you

Science fiction movies have made the idea of a “computer that speaks to you” a very popular concept. That’s not to say that the real life implementations of computer speech technologies have ever approached the science fiction fantasies. Who remembers KITT, the sleek, black, customized Pontiac Trans-Am that could talk to its passenger Michael Knight (David Hasselhoff), in the 80’s TV series Kinght Rider, which had us kids glued to the JTV screen on Fridays. Now, that car could not only talk, but it had intelligent conversations with Michael, let alone a personality of its own.

Compared to KITT, the text-to-speech capabilities of my computer are decidedly underdeveloped. Still they are an improvement over the totally robotic speech capabilities of computers in the 80’s or early 90’s.

Science fiction might have raised our expectations of computer speech. But we should not forget that even the robotic sounding voice of a computer can make a world of difference to people with impaired vision or other disabilities. Just think of the voice computer attached to the wheel chair of Professor Stephen Hawking, the leading astrophysicist and author of ‘A Brief History of Time’. It gave this genius a voice, a way to communicate with the world and the possibility to lecture his students that would not have been possible without computer speech technology.

People with learning disabilities, dyslexia, impaired vision as well as other audiences are an important market for speech technologies. And as most of our information now exists in digital format, the potential of converting text to audio is greater than ever before. No time to read the newspaper? Why not listen to it in your car. In fact, podcast and news readers (like iPodderX for the Mac) can convert your favourite news feed into audio files for you to listen to on your computer or MP3 player.

The other day I was really surprised when a colleague pointed me to the site of a German newspaper that implemented a technology from ReadSpeaker, a Swedish company. What surprised me was the quality of the pronunciation and the expressiveness of the voice that was reading me the newspaper article from the web page. ReadSpeaker is available for web site owners who want to add speech to their sites. It can be up and running on any site within days, the company claims, and needs no extra programming effort on the part of the site’s publisher.

The company offers its service in English (both US and British pronunciation), German, French, Spanish, Dutch, Italian, Swedish and Brazilian Portuguese.
A ReadSpeaker enabled website has a small ‘Say It’ icon on every page which, when clicked, opens a popup window and starts reading the content on the page. All major browsers on a variety of platforms (Windows, Mac, Linux, Solaris) are supported with no plug-in download required.

The ReadSpeaker implementation of speech technology is notable because it works rather elegantly for the site’s publisher and reader. Of course, there are other companies, like AT&T, Ceptstral and Acapela, who are active in the speech field in general and usually have demos of their technology on their sites.
When it comes to the Arabic language, we find that speech synthesis is not a common technology. The Arabic software company Sakhr has for long been a leading name in this field.

One major challenge in Arabic is the fact that the language uses short vowel diacritics (the little marks over and under words) as a grammatical tool. Change the diacritics on a word and you change the meaning of the word or sentence. Synthesising fully diacratized text is not a problem. The problem is that, commonly, everyday Arabic text is rarely diacrtized. Native speakers of the language intuitively ‘guess’ the diacritization on the fly. Alas, a computer has no such human intelligence. That’s why a company like Sakhr had to invent a diacritization engine that takes, say, a normal newspaper text and intelligently diacrztizes it in preparation for the speech synthesis engine. This, obviously is a more complex procedure than synthesising English or French.

For you to be able to ‘talk’ with a computer it should not only have the ability to ?speak? but listen too. Speech recognition technologies are the other half of this equation, and major advances have been made in this field too. Cell phones are an obvious focus of interest here.

Don’t expect to be chatting to you car soon, however. Apart from talking and listening, computers need to also be ‘intelligent’ to ‘understand’. That’s still an illusive dream. As far as attitude is concerned, well, I am sure that many of us would attribute ‘stubbornness’ as the basic personality trait of computers. Not very useful for a good conversation.


by

Tags:

Comments

One response to “The talking web”

  1. Jad Avatar
    Jad

    OddCast have interesting service in same field.
    http://www.oddcast.com/sitepal/