Speaking with One Voice

Speaking with One Voice

Voice-activated services are gaining attention as workers get increasingly mobile and always-available support is the rule rather than the exception.

AAA Auto Club South in the US is on the Web . . . but not all its members are there. “More and more people continue to use our Web site, but there are a heck of a lot of people in our demographic who still want to do business by phone,” says Mike Petrilli, senior vice president of information services. The auto club offers roadside assistance, insurance and travel services to 3.5 million members in Florida, Georgia and parts of Tennessee.

The club wanted to let members pay dues, check bills, change addresses and carry out other tasks more quickly over the phone, 24 hours a day. It also wanted to save money on call centres.

The answer was automated call centre services using speech recognition technology. But rather than build the system itself, the club turned to a carrier.

As workers become more mobile and companies rely more often on good, fast, always-available support to hold customers, services based on speech recognition are starting to come to the rescue. Emerging industry standards are helping to open the market and pave the way for broader adoption, according to industry experts.

Speech recognition makes it practical to do things with a phone that would be too complicated using the 12-digit keypad. In some cases, callers won’t have to work their way through a hierarchy of options by pressing numbers or saying words. Although it’s not at the point where systems can understand anything a caller might say, callers no longer have to use specific words. Voice recognition also can trigger transactions without a live operator. Combined with speech-to-text and text-to-speech technology, it can support even more emerging applications. Several carriers have built these kinds of services and offer them to companies on site or at a central facility.

The auto club outsourced its system to WorldCom, now known as MCI. It kept the HTML applications it uses on the Web for selected services and let MCI create an interface between those applications and a speech recognition system.

Now when club members move they can enter a change of address without using the Web or talking to an operator. The “voice portal” prompts the caller for the postal code first, and then matches the street name and address the caller states against a database of possible addresses in that area. Having something to compare the caller’s responses to aids in recognising the spoken words, Petrilli says.

The club pays for MCI’s services on a per-transaction basis. Petrilli compares MCI’s piece of the system with a “black box” that he doesn’t have to worry about.

“We chose to use an outsourced model . . . primarily because we believe this technology is still pretty early in its life cycle,” Petrilli says. “The software is changing very quickly. Why sign up to maintain that when it’s really just a side job for us?”

Voice-activated services might count most on the road, where users might have just one data-access device available: a mobile phone. The payoff can be higher productivity. For example, service people in the field who have just finished a job can mark the ticket item as completed just by calling in to an automated system, says Marcello Typrin, director of product marketing at speech software vendor Nuance Communications.

Providing a speech-based interface to applications is a good thing for companies to outsource to a carrier, says Mark Plakias, an analyst at Zelos Group. In most cases, access to an application such as ERP by voice is only a small fraction of the use of the application, he says.

“There’s no reason the enterprise should have to go out and buy a telephony platform to do this,” Plakias says. Typrin, on the other hand, says companies can save on operating expenses by owning their own equipment, and more are doing so as they become confident in the technology.

Companies and carriers are using speech recognition because it’s getting better, according to analysts. More powerful processors and refined algorithms are at the core of the improvement. Now, at the application development level, two new specifications that extend current mark-up languages are helping companies and service providers get started.

Voice XML (VXML) is an extension of XML that lets developers for corporations and service providers take advantage of work that already has been done to put applications and information on the Web. Released in Version 1.0 in 2000, it is now in Version 2.0. VXML has opened the voice-based market to new vendors, such as start-up VoiceGenie Technologies, while leading existing vendors to offer alternatives to their proprietary software platforms using VXML interpreter software.

Meanwhile, the Speech Application Language Tags (SALT) standard, backed by Cisco, Intel and Microsoft, is also coming on the scene. The platform is based on extensions of scripting languages, including HTML and XML. Microsoft released the first beta of its SALT-based Speech Server in July, but it is marketing the platform directly to corporations and not to service providers.

VXML is already helping developers get new voice-based services out more quickly and painlessly.

VoiceGenie’s product is an example of how the specification can work. The company makes middleware that runs on Linux. That middleware is the interface between speech recognition systems that process what a caller says and VXML applications that answer or carry out tasks the caller requests, says Eric Jackson, vice president of strategy and business development at VoiceGenie.

Traditionally, interfaces between speech recognition systems and back-end applications have come in the form of proprietary software that speech recognition platform vendors have written for their own systems, according to Zelos Group’s Plakias. The advent of VXML makes voice-enabled applications less dependent on the platforms on which they run. As soon as each platform maker provides a VXML interpreter to run VXML applications on its system, a single application can be adapted to all the platforms easily, experts say.

The specification made it easier and less expensive for BBN Technologies, a unit of Verizon, to develop voice-activated systems for Verizon call centres that are also being offered to other carriers and corporations.

What It’s Good For

Speech recognition, sometimes used with text-to-speech functions, opens up a range of new services.

  • Automatic routing of support calls based on a request in the caller’s own words.
  • Voice-activated transactions such as purchases using a credit card number.
  • Caller authentication through comparing the voice to a “voiceprint”.
  • Outbound messages delivered to many recipients in the most appropriate form for the device they are using.
  • Multimodal applications that let users speak and listen, or read and write, different kinds of information on one device.
  • Voice-activated dialling.
  • Hands-free access by phone to e-mail and a personal address book.
  • Audio Short Message Service.

“With Voice XML, the application you build is really yours, regardless of what systems you want to deploy,” says Marie Meteer, director of call centre solutions at BBN. Once an application for a voice-activated service has been written, it doesn’t have to be rebuilt from scratch if BBN decides to change hardware and software platforms. That means companies and carriers can feel more confident about making an investment in speech recognition.

“These standards are growing and evolving over time, but the risks are relatively low, and at least you know them,” Meteer says.

It also gives carriers a larger pool of qualified developers, Meteer says. With proprietary platforms, there has to be at least one developer with special training on that platform. By contrast, there are many HTML and XML developers who can make the leap fairly easily to working with VXML, she says.

SALT boasts similar advantages and possibly an even bigger developer base. It is a lighter set of extensions to current markup languages than VXML and easier for HTML and XML developers to use, Plakias says.

Analysts and industry participants are optimistic that VXML and SALT, both of which have been submitted to the World Wide Web Consortium, won’t develop the type of rivalry between them that has stymied development in other areas. The two specifications are heading towards becoming one and might merge by the end of next year, according to Plakias.

Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about ADVENTBBN TechnologiesBlack Box Network ServicesIntelMCIMicrosoftNuance Communications AustraliaRecognition SystemsVerizonVoiceGenieWorldComWorld Wide Web Consortium

Show Comments