The Samsung Galaxy S4 will be shipping in 155 countries by the end of next month, and its real-time voice translation to help people communicate across borders may be one of its most ambitious features.
Translating voice conversations in real time has been a goal for devices and software for years, and Samsung claims to have reached it. With its built-in S-Translator app, the Galaxy S4 promises to capture words spoken in one language and then reproduce them in another language, all at the speed of a conversation. Samsung says it will offer support for 10 languages as soon as the phone hits the street.
There's a reason real-time voice translation has been in the works for so long. It involves three different processes that all are at least moderately hard to do well. In 2008, networking colossus Cisco Systems promised its own translation system in a year, designed to go into its TelePresence videoconferencing platform. The company had to backtrack on that a year later, saying the job was harder than expected. Cisco is still working toward such a feature today.
Though S-Translator and other tools have improved and will keep getting smarter, technology hasn't yet eliminated the language barrier, analysts and researchers say. Languages are too complex and open to misinterpretation.
"Even if human beings are doing this, real-time translation remains pretty hard, and I don't think we've seen a breakthrough," Opus Research analyst Dan Miller said.
S-Translator is designed for text messaging and email as well as voice, but it's the face-to-face scenarios that generate the "wow" factor. At the Galaxy S4 launch at Radio City Music Hall, actors dramatized the capabilities of S-Translator with a skit where an American backpacker asked a man in Shanghai what bus to take to a museum. The backpacker spoke the English question into his Galaxy S4 and it came back out in spoken Mandarin. After the man heard the question, he spoke an answer into the phone and his words were converted into English text.
Translating conversations requires three separate processes: converting speech to text, translating those written words into another language, and then converting the translated text back into speech, said Ananth Sankar, a distinguished engineer in Cisco's Collaboration and Technology Group. The first one is an especially hard nut to crack when it comes to natural conversations, according to Sankar.
At the heart of the problem is the way we talk to people, compared to the way we talk to audiences or to computers, Sankar said. He's talking about the "ums and ahhs," the false starts and self-corrections, that break up the fluency of natural speech. They make it much harder for software to interpret a conversation than a command or dictation, he said.
Text translation holds other challenges, and they vary by language, Sankar said. A common way to refine that process is a statistical approach, comparing large documents that humans have translated between two languages. But some pairs of languages have more translated documents available than others, so accuracy varies, he said.
Where real-time translation is closer to reality is in specific subject areas such as business, technology, law and travel assistance. For example, transcribing and translating prepared speeches that cover certain subjects is about 90 percent accurate in real time now, Sankar said. Impressively, Samsung says S-Translator's vocabulary is not limited to travel subjects.
Another caveat about translation is that supporting a long list of languages doesn't necessarily mean the system can translate freely among them all. Because the process relies on translations that have already been done, what matters are language pairs. In S-Translator, U.S. or U.K. English can be mutually translated with each of the eight other included languages: French, German, Italian, Latin-American Spanish, Brazilian Portuguese, Korean, Mandarin and Japanese. In addition, Korean, Mandarin and Japanese can be mutually translated among themselves.
For translation, S-Translator uses Samsung's own technology, not Google Translate or any other third-party platform. One limitation of the system's real-time interpretation is that it requires a data connection to an online translation server. Samsung isn't alone in this, according to Miller at Opus Research. That can be a problem for a feature that's often used, as in Samsung's example, while traveling.
"Unless they're doing stuff locally on the device, a lot of companies are ignoring the fact that having a data connection can sometimes turn out to be pretty expensive," Miller said. A rare exception is Jibbigo, a startup that makes an app for iOS and Android. Jibbigo comes as a free app that calls out to a server, but users can buy modules for offline translation of specific sets of languages. S-Translator on the Galaxy S4 does include translations of certain useful phrases built into the phone.
Both speech recognition and translation are steadily getting better, Sankar and Miller said. For one thing, software based on statistical models learns from each translation it does, Miller said.
Cisco sees real-time translation of videoconferences on the horizon, Sankar said, though within the bounds of something like a prepared, formal discussion between two diplomats.
"We believe here at Cisco that there will come a day when we will be able to have a Web conference with somebody on the other side of the world, across language boundaries, in a very natural conversational and fluent style," Sankar said. The bad news is, his forecast for that accomplishment is five to 10 years.
"You just have to exhibit patience and understand that it is not perfect, but they constantly improve," Miller said.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.