Creating linguistic resources for automated translation

A major difficulty in developing automated language translation is that you need a system with a fairly extensive vocabulary from which it can learn, before any degree of reliability or accuracy is possible. The LC-STAR project developed just such a vocabulary.


“First, we created large lexica for several language databases,” explains project coordinator Ute Ziegenhain of Siemens in Germany. “Secondly, we developed a demonstrator that could automatically translate speech to speech for output to another interface.”

Having finished on 31 January 2005, the IST programme funded-LC-STAR developed vocabularies, called lexica, and bodies of writings, called corpora, for some 13 languages in all, ranging from Italian and Greek to include Arabic, Chinese, Hebrew and Russian. These linguistic databases comprise a minimum of 100,000 entries per language.

The lexica and corpora are needed to train such systems for reliable, automated speech-to-speech translation (SST). Once developed, the various SST components (flexible speech recognition, high-quality text-to-speech synthesis and speech-centred translation) can be integrated into speech-driven interfaces embedded into mobile appliances and network servers.

The team also produced a working demonstrator called ‘Gaia’, which is a telephone server capable of translating between the project partners’ languages of English, Spanish and Catalan within a single register. LC-STAR focused on the tourism register, however Ziegenhain stresses that the system can be opened up to any domain if it is provided with sufficient vocabulary.

Results already in use

LC-STAR project results are already in use by Siemens within its own speech recognition and speech synthesis systems. They have also been supplied to the European Language Resources Association (ELRA) for further dissemination. ELRA makes available a variety of language resources for language engineering and the evaluation of language-engineering technologies.

In addition, LC-STAR vocabularies and machine-translation technology have been incorporated into the ongoing TC-STAR project. TC-STAR is a long-term effort (six years) focused on advanced research into core technologies for speech-to-speech translation – its goal is to make a breakthrough in reducing the gap between human and machine performance.

Media Contact

Tara Morris alfa

More Information:

http://istresults.cordis.lu/

All latest news from the category: Information Technology

Here you can find a summary of innovations in the fields of information and data processing and up-to-date developments on IT equipment and hardware.

This area covers topics such as IT services, IT architectures, IT management and telecommunications.

Back to home

Comments (0)

Write a comment

Newest articles

Cichlids practicing brood care in 3D-printed snail shells

Time to Leave Home? Revealed Insights into Brood Care of Cichlids

Shell-dwelling cichlids take intense care of their offspring, which they raise in abandoned snail shells. A team at the Max Planck Institute for Biological Intelligence used 3D-printed snail shells to…

Amphiphile-enhanced wearable fabric generating electricity from movement

Smart Fabrics: Innovative Comfortable Wearable Tech

Researchers have demonstrated new wearable technologies that both generate electricity from human movement and improve the comfort of the technology for the people wearing them. The work stems from an…

Visualization of Atlantic Meridional Overturning Circulation (AMOC) stability over 60 years

Going Steady—Study Reveals North Atlantic’s Gulf Stream Remains Robust

A study by the University of Bern and the Woods Hole Oceanographic Institution in the USA concludes that the ocean circulation in the North Atlantic, which includes the Gulf Stream,…