A man from South Africa speaks Sepedi to a Peruvian woman who knows only Quechua, yet they can understand each other. The universal translator is a staple of science fiction, but Google, Meta and others are locked in a battle to get as many languages as possible working with their AI models.
Meta chief Mark Zuckerberg announced on Wednesday that his
firm now had a block of 200 languages that could be translated into each other,
doubling the number in just two years.
Meta's innovation, trumpeted in 2020, was to break the link
with English — long a conduit language because of the vast availability of
sources.
Instead, Meta's models go direct from, say, Chinese to
French without going through English.
In May, Google announced its own great leap forward, adding
24 languages to Google Translate after pioneering techniques to reduce noise in
the samples of lesser-used languages.
Sepedi and Quechua, of course, were among them — so the
Peruvian and the South African could now communicate, but so far only with
text.
Researchers warn that the dream of a real-time conversation
translator is still some way off.
Quantity vs quality
Both Google and Meta have business motivations for their
research, not least because the more people using their tools, the better the
data to feed back into the AI loop.
They are also in competition with the likes of Microsoft,
which has a paid-for translator, and DeepL, a popular web-based tool that
focuses on fewer languages than its rivals.
The challenge of automatic translation is "particularly
important" for Facebook because of the hate speech and inappropriate
content it needs to filter, researcher Francois Yvon told AFP.
The tool would help English-speaking moderators, for
example, to identify such content in many other languages.
Meta's promotional videos, however, focus on the liberating
aspects of the technology — amateur chefs having recipes from far and wide
appearing at their fingertips.
But both companies are also at the forefront of AI research,
and both accompanied their announcements with academic papers that highlight
their ambitions.
The Google paper, titled Building Machine Translation
Systems for the Next Thousand Languages, makes clear that the firm is not
satisfied with the 133 languages it already features on Google Translate.
However, as the cliche goes, quantity does not always mean
quality.
European primacy
"We should not imagine that the 200x200 language pairs
will be at the same level of quality," said Yvon of Facebook's model.
European languages, for example, would probably always have
an advantage simply because there are more reliable sources.
As regular users of tools such as Google Translate and other
automatic programmes will attest, the text produced can be robotic and mistakes
are not uncommon.
While this may not be a problem for day-to-day use lie
restaurant menus, it does limit the utility of those tools.
"When you're working on the translation of an assembly
manual for a fighter jet, you can't afford a single mistake," said Vincent
Godard, who runs French tech firm Systran.
And the ultimate nut to crack is inventing a tool that can
seamlessly translate the spoken word.
"We're not there yet, but we're working on it,"
said Antoine Bordes, who runs Fair, Meta's AI research lab.
He said Meta's speech translation project works on far fewer
languages at the moment.
"But the interest will be in connecting the two
projects, so that one day we will be able to speak in 200 languages while
retaining intonations, emotions, accents," he said.
0 comments:
Post a Comment