Archive for the ‘Translation technology’ Category

Poetic Machine Translation by Google

Friday, October 8th, 2010

PoetrySoftware engineers at Google have been delving into the world of automatic poetry translation. A paper entitled “Poetic” Statistical Machine Translation: Rhyme and Meter will be presented at the upcoming EMNLP (Empirical Methods in Natural Language Processing) conference and shall address the progress made to date, the difficulties encountered and also discuss the necessary considerations to be taken into account when tackling such a difficult topic.

French and English was the language combination used for the program and there is an option for the user to select the target translation genre (such as sonnet). However, as Google point out, form rather than accuracy is the main focus at present, which unfortunately has a negative ‘impact on translation quality’ as MT is only capable of replicating either form or meaning.

The system is not yet available to the public, but the aforementioned paper gives further details of the work that has already been carried out. The ‘purely technical challenges around generating translations with fixed rhyme and meter schemes’ are discussed and the debate on whether to maintain the form of the source language text in the target translation is also addressed. Translation loss or ‘quality penalty’ when using MT for poetic translation is covered along with stress patterns and poetic form. Sub-sections include line-length, syllables and line breaks; stress and syllables for rhythmic poetry; meter (the ‘exact sequence of stressed and unstressed syllables’) for rhyming constraints; and the importance of avoiding computer-generated ‘self-rhyme’ (identical words used to produce the rhyme).

Even though the paper deals with statistical machine translation and technical issues, it could be said that it has been responsible for producing some original poetry around this subject. The paper was submitted for review and feedback prior to its presentation at the EMNLP conference. However, there was one response that Google had not counted on – one of the reviews was written in verse! This review has been published online, along with further author and reviewer comments – also written in verse!

Google are well aware that the use of MT in this field is certainly in its infancy and the official blog even quotes poet Robert Frost who said ‘Poetry is what gets lost in translation’. Clearly eschewing the use of human translators in this sector will only increase translation loss and misunderstanding, however, it will be interesting to see how these losses can be minimized over time . . . and we’ll be keeping an eye on the Review in Verse, as it really does seem to be a first!

Sources: Google Research Blog (Poetic Machine Translation); “Poetic” Statistical Machine Translation: Rhyme and Meter (Dmitriy Genzel, Jakob Uszkoreit, Franz Och); A Review in Verse: http://research.google.com/archive/papers/review_in_verse.html

Engkoo – Microsoft’s Chinese-English translation and language learning software

Tuesday, September 21st, 2010

EngkooEngkoo is Microsoft’s web-based language learning and machine translation service. Launched in 2009, it is a free resource aimed at helping Mandarin Chinese speakers to learn English. It also doubles up as a translation tool with a range of features including a Chinese / English dictionary; downloadable audio and video files; bilingual Chinese-English text comparison; text-to-speech software and a phonetic search facility allowing users to find fuzzy matches. This online linguistic resource is one of the finalists in the prestigious Wall Street Journal’s 2010 Innovation Award – the winners of which are to be announced at a prize giving ceremony on 26 October 2010.

Engkoo uses web-mining technology to extensively search the internet for suitable bilingual content and like the online translation tool ‘Linguee’, the web-crawling process concentrates on professionally translated texts, such as those from the United Nations or multilingual news sites. This enables the software to provide bilingual Chinese-English comparison tables and as the source is cited, it also allows a credibility rating to be assigned to the translation. To date, Engkoo contains more than 10 million cross-referenced terms and receives more than 4 million hits per month.

Other useful services include the mouse-over and collocation features. The former allows users to hover over specific words in the source language text and in turn, the corresponding word(s) are highlighted in the target text. For the latter, this employs ‘part-of-speech wild cards’. Microsoft Research explains: ‘Users can find prepositions that typically follow the word “terrific” by simply searching for “terrific prep”. In this example, they could find sentences such as “I think it looks terrific on you”’.

As for fuzzy matches, users can carry out searches based on the phonetics of a word and how it is typically spoken by a language learner. For example, entering “shampin” into the software would bring up “champagne”.

Engkoo also makes full use of audio and visual components with its text-to-speech software and video option. Inputted text is output in an audio format that is also available as an MP3 download. The aim here is for the audio output to sound natural and to follow the intonation and stress patterns of the target language. Microsoft Research has reported that this is one of the most popular features. To provide learners with further help in regards to correct pronunciation, there are plans to include animated videos displaying the position of the tongue, for example, when pronouncing the word.

Slang and idiomatic expressions are also included in the ever-expanding database, and there is also talk of adding Japanese as an available language and mobile apps for people on the move.

Sources: Wall Street Journal, Engadget, Microsoft.com, Microsoft Research, www.rdmag.com, 1on1english.blog18.fc2.com

Stuck for words? Try Linguee, the new online translation tool

Wednesday, September 1st, 2010

A new multilingual online ‘dictionary’ called Linguee was launched in September 2010. Unlike automatic translators such as GoogleTranslate, Linguee offers contextual translations by bringing the all important human element into the translation process and citing the website and the source of the translated text. Touted as a translation ‘web crawler’ rather than an automatic translator, it’s really rather good and will surely be used by professional translators to help with their research.

Linguee is the brainchild of Gereon Frahling (who came up with the concept whilst working at Google Inc.). Software developer Leonard Fink was invited to join the project and the rest is history! The original German / English version of the site went live in May 2010 and already receives 600,000 daily searches and nearly 80,000 unique visitors every day.

These are impressive figures, but when you visit the site you will understand why. The interface is extremely user friendly and it searches for common phrases along with individual words. It is presented in the form of a two-column comparison table with the source language displayed on the left, and the target translations on the right. But probably the most important feature for translators is that it offers a contextual translation and also states the source of the translation and a link to the website from which it was taken. The frequency of the translation is also provided and there is a ‘comments’ function allowing people to leave feedback.

Linguee only deals with translations that have been carried out by human beings. Its bread and butter texts (like automatic translators) are those from the United Nations and the European Parliament, in other words, those that have already been professionally translated. Patent translations also get a look in as regards to translation sources. However, with the controversial proposal for an EU-wide patent and the possible use of automatic translation in this sector, this source may well turn out to be less accurate in the future.

Focussing on quality rather than quantity, the Linguee website explains that out of one trillion sentences that have been run through the system, ‘only the top 0.01 per cent, i.e., 100 million translated sentences, are retained’. Currently, the language pairs available are English and German; English and Spanish; English and French; and English and Portuguese. Plans are currently underway to add further languages, including Mandarin Chinese, Japanese, Russian and Italian. The multilingual search facility is free to use at the moment, but it is thought that charges may apply in the future.

Linguee received a glowing review from the French version of technology website TechCrunch. However, as one comment stated, as with any free dictionary, the translation should always be checked against a veritable source.

Language professionals checking out this new multilingual search facility may well be pleasantly surprised!

Sources: www.linguee.com; http://fr.techcrunch.com; www.blogs.ft.com/technology ; www.prweb.com

US survey highlights the perils of machine translation for medicine labelling

Monday, April 12th, 2010

What automatic machine translation makes up for in productivity, it certainly loses in accuracy. This may be a valid compromise when conversing with a friend over the Internet, but unacceptable in the medical sector where mistranslations could prove to be fatal.

The National Post reports that in 2009, a law came into effect in the US which required New York pharmacies to provide multilingual medicine labelling. In a city which includes an estimated 50% of citizens who speak a language other than English in the home, it was hoped that these steps would ensure an equality of care for those who do not speak English as a first language. Following this law, a study (published by Pediatrics® in April 2010) was carried out with the objective of evaluating the ‘accuracy of translated, Spanish-language medicine labels among pharmacies in a borough with a large Spanish-speaking population.’

The study covered pharmacies in the Bronx area of New York and the results provided information about how many pharmacies produced medicine labels in the Spanish language; how often machine translations were used; and the quality of the Spanish translation produced.

The results were astounding: 86% of pharmacies providing Spanish translations used machine translation with only 3% employing professional translators; 43% of the total labels evaluated contained incomplete translations and of an additional 6 labels studied, misspellings and grammatical errors resulted in a 50% error rate.

WSFA news reported on the some of the translation errors. A common problem was ‘Spanglish’ – the mixture of Spanish and English – resulting in instructions which were difficult to read and a source of patient confusion. Mistranslations were another problem, e.g. the use of the word ‘once’ which means ‘eleven’ in Spanish – a difference in meaning between English and Spanish which could cause a potential overdose. Misspelling included ‘poca’ instead of ‘boca’ (‘little’ and ‘mouth’ respectively in Spanish) and under the heading of poor translations, ‘Take 1.2 aldia give dropperfuls with juice eleven to day’ was a salient example.

There has been a call for standardisation and improvements in this area and patients have been advised to request the services of professional translators and interpreters to ensure complete understanding of the dosage instructions. But with only 3% of pharmacies employing professional translators to carry out this work, it would seem that adequate access to language services and the provision of accurate translations are not a high priority.

The survey was carried out by Iman Sharif and Julia Tse and their results concluded that the ‘quality of the translations was inconsistent and potentially hazardous’. The need for better regulations and funding in this domain was identified, which Sharif stated ‘is probably something that belongs within the health reform conversations’ (The National Post).

Medical translation is a highly qualified field and it is almost incomprehensible that machine translation is deemed an acceptable resource. Language professionals in the US and abroad will no doubt be interested to see whether these issues will be sufficiently covered in the health reform discussions currently underway, as any mistranslations here could well be the difference between life and death.

It’s not just a dog’s life!

Thursday, April 1st, 2010

Last month we saw how bowlingual is allowing human’s to communicate with man’s best friend, the dog. Today Google have gone one step further and have created an Android application -Translate for Animals.

Had this little girl had the Android app she wouldn't have had to shout.

Had this little girl had the Android app she wouldn't have had to shout.

So far the app, which will only be available on Android 1.6 handsets and above can translate noises made by cats, dogs, birds, rabbits, guinea pigs, hamsters, tortoise, horses, chickens, sheep, donkeys, and pigs. For the time being your furry friends’ thoughts can only be translated into English, but there are plans to allow translation into Cantonese and a few other language in the near future.

The app uses speech recognition and translation engines to analyse the acoustics produced by your animal and compares it with the existing sounds in their animal linguistics database. Google have issued a disclaimer, stating that they will not take any responsibility “if you are offended or disappointed by what your chosen animal may say.” And they “… do not guarantee stimulating conversation”. Looks like Doctor Dolittle may be out of a job though!

If you want to find out more about the app, check out Google’s instructions.

Wii can do it! Spanish game fans to translate Japanese video game

Monday, March 15th, 2010

In an unprecedented collaboration between members of the Spanish gaming community and publishers of the Wii game Fragile Dreams: Ruins of the Moon, around fifty volunteers will translate the game’s script into Spanish. Due to be released in US and European markets on 16 and 19 March respectively, it is hoped that the Spanish language version will follow shortly.

With a script containing more than 35,000 words – the translation project is no mean feat. But when DSWii.es contacted Japanese publishers Rising Star Games with the idea, an advertisement for volunteer translators was posted and fifty eager recruits were enrolled. However, on accessing the translation website – fragile.blogocio.net – it appears that a grand total of 0% translation has been carried out to date. With just over a week until the European release, it would appear that initial estimates of translation timescales were highly optimistic!

In a press release, Hugo Fraga – Director of Content and Marketing at Blogocio Media SL (DSWii.es’s parent company) – stated that ‘The most important aspect of this translation is not that the game arrives in Spanish, but that this is the first time in history in which gamers will participate actively in the development process or the localisation of a title.’

Yet even though this collaboration has been hailed as the first of its kind in relation to an officially sanctioned translation carried out by fans, it was still unclear from the website as to what was on offer for the Spanish language version. Available as a free ‘digital download’ via DSWii.es and the Rising Star’s Hoshi portal, questions were being asked for verification as to if this was a language patch to integrate files onto the console itself via the SD card. Through the comments page, it appears that the translated material will be a PDF downloadable booklet which can be printed out and consulted whilst playing the game. If this is indeed the case, surely a leaflet or PDF file is far removed from the culture of video games whereby attention is focussed on the screen. And with an estimated 400 million speakers of Castilian Spanish worldwide, surely a language patch would have been the best option.

Nevertheless, the volunteer translators have been applauded for their efforts, but the company itself has received some criticism in its failure to employ professional translators and programmers to create the patch. However, as an interesting aside, some users were pleased that playing video games in English had improved their language skills!

Crowdsourcing is an increasingly popular option for companies – but it is a risky path to follow. Undoubtedly, overall costs are less if volunteers are employed, but the importance of using professional translators must not be forgotten. Fragile Dreams: Ruins of the Moon sold over 26,000 copies in Japan in its first week of release which made it the second most popular video game at the time. It remains to be seen whether the figures will translate quite so well in Spain.

Microsoft unveils telephone capable of real-time translation

Tuesday, March 9th, 2010

The Translating! Telephone: an innovative blend of automatic speech recognition and machine translation; packed with text-to-speech and intelligent voice-recognition software; enhanced with a back translation tool and topped with archive and search facilities – this is not just automatic translation, this is Microsoft automatic translation! Described as a tantalizing glimpse into the future of real-time multilingual communication, this new language tool was certainly the item du jour for linguists at TechFest 2010.

TechFest is an annual event where developers from Microsoft Research facilities across the world meet to discuss innovative projects in progress and The Translating! Telephone hails from the Speech Group at Microsoft Research Asia (MSR Asia). However, researches have stressed that this project is still in its development stage and it could be a decade before it becomes ready for commercial use. Nevertheless, it has been mooted as a solution to language barriers in business and social environments where gist translations are preferable to no translation at all – not as a substitute for professional translators.

Microsoft’s research website explains how the tool combines three key technologies: speech recognition, machine translation and text-to-speech software. It is unclear as to which languages would be supported down the line, but the demo was carried out in German and English. Users connected by a Voice over Internet Protocol (VoIP) are able to speak in their native (or chosen) language which is recognised by automatic speech recognition (ASR) technology, transformed via automated translation and synthesised using text-to-speech.

Touted as a step towards unified communications, it certainly boasts some impressive features. Firstly, the inputted source language is almost simultaneously translated and output via audio format in the other user’s target language. Secondly, all speech is transcribed for verification, archiving and retrieval purposes and what is more, underneath this transcription is a back translation feature which appears as a table at the bottom of the screen – thus enabling users to check if the translation process is performing correctly. Finally, the transcription benefits from being ‘storable, browsable, searchable’ and cut-and-paste-able! As it is an intelligent piece of software, it is said that the translation quality will increase as the system learns the user’s voice.

It may well be a decade before The Translating! Telephone is market ready but with the combination of voice-generating software, automatic translation and a user-friendly interface, it looks set to become a staple of international offices in the not to distant future.

Voice-generating technology hitting all the right notes

Friday, March 5th, 2010

Artificial voice generators generally receive a lot of bad press, but this week was an exception. Two developments in the communications market were announced to worldwide acclaim: a silent-speech device incorporating an automatic translation tool with a twist; and a bespoke voice synthesizer which was aired on the Oprah Winfrey Show.

Silence was certainly speaking volumes at the CeBIT trade fair in Germany this week when scientists from the Karlsruhe Institute of Technology (KIT) demonstrated a device capable of ‘lipreading’ and transforming these movements into speech. The technology in question is called Silent Sounds which according to AFP works by electromyography – ‘monitoring the muscular movements produced when we speak and converting them into electrical pulses that can then be turned into speech.’

Currently the device functions through a variety of electrodes attached to the skin but it is anticipated that within a decade, the technology will become an everyday feature of mobile phones once it can be integrated into handsets. It is said to be 99 per cent accurate at the moment, but its success with different accents or technical language remains to be seen.

However, Silent Sounds does boast another feature and that is the automatic translation application which translates the input language into an output language of the user’s choice. At the moment it is mainly European languages which are on the menu as the developers explained that support for Chinese, for example, would require more development to incorporate ‘tone’.

But this type of technology is also important for the medical world and could help improve the quality of life for people who have no longer have the ability for speech due to an operation, illness, or accident. Such was the case for American film critic Roger Ebert who lost his voice four years ago following an operation. This week he unveiled a bespoke piece of voice-generating software on the Oprah Winfrey Show which has enabled him to speak again for the first time since the surgery that robbed him of his voice.

The device was developed by Edinburgh speech synthesis company, Cereproc, and what makes this machine stand out is that the computer-generated output sounds like Mr Ebert’s voice and not an electronic reproduction. The BBC reported how this was made possible through a process of accessing recordings of Mr Ebert’s voice, breaking these down into individual sounds, completing a transcription stage and finally reassembling everything. The user types out what he/she would like to say and the computer generates a ‘human’ voice. Mr Ebert commented that ‘It still needs improvements, but at least it sounds like me.’

These innovative technologies could well become common place in the future and what may seem like science fiction today, may be everyday communication tools when the products become market ready. For example, the ongoing work with the Silent Sounds device includes developing a system which is operable in offices and budding MI5 agents, military personnel, cinema-goers wishing to communicate from inside the theatre and even commuters will surely be adding it to their wish lists.

Further development stages and lots of tweaking are undoubtedly the order of the day for these devices and the jury is still out on the degree of success with which the automatic translation application will deal with the nuances and complexities of language. However, from those who would prefer to use silent communication for security reasons to the truly life-changing experience of giving people their voice back, there is no doubt that voice-generating technology is certainly hitting all the right notes.

Police, Camera, Translation!

Monday, January 11th, 2010

In recent years, police departments and emergency services in the United States have been using handheld translation machines in a drive to enhance communication with non-English speakers. In December 2009, the Cincinnati City Council was in the news for its partnership with Latino Educational Assimilation Resource Network, Inc. – a non-profit organisation providing English/Spanish bilingual material to the department to help bridge the language divide with the population’s growing number of Spanish speakers. Hand-held translation machines, bilingual dictionaries, language tapes and medical books for healthcare professionals are just some examples of the language material made available. According to the US Census Bureau, more than 18 per cent of the American population speak a language other than English in the home: there are over 30 million speakers of Spanish, 2 million of Chinese and 700,000 of Russian. Overcoming the language barrier has never been so important in today’s multicultural society.

The ‘talking translators’ as they have been dubbed, are a welcome accessory for the police, emergency workers and healthcare professionals alike. The device costs $1,200 but is priceless if it means that lives could be saved or potentially volatile situations brought about by lack of communication are diffused. It comprises a series of pre-programmed texts which when selected are output in audio format. There is also a loudspeaker function to broadcast the translated message free from distortion and within a range of 1 km. The user selects the desired text in English such as the Miranda warning to advise a person of their rights for example, and the foreign-language equivalent is relayed. Unfortunately, it does not allow for two-way communication as there is no real-time translation programme, however, there is a ‘record’ facility which enables responses to be translated at a later date.

According to the New York Times, the Los Angeles Police Department (LAPD) have been using a similar device called the Phraselator® since 2007 and it has become an invaluable tool for law enforcement officers as there are 224 languages spoken in the city. The impetus behind using such a translation tool stemmed from miscommunication and subsequent mismanagement of the MacArthur Park peaceful immigration procession in May 2007 which resulted in injuries to 250 protesters and journalists, and 18 police officers. Over $13 million in compensation was paid out by the LAPD as a result of the infamous incident. Taking into consideration the economic repercussions from such an event, the Phraselator®’s $2,500 price tag at that time would have seemed cheap in comparison.

The Phraselator® is manufactured by Voxtec, and was initially designed for use by US military personnel. According to the manufacturer’s website, the most current model – the Voxtec Phraselator® P2 – is the ‘most powerful one-way, handheld, speech-to-speech translation system available today’ and aids the ‘tactical and humanitarian needs’ of service personnel in war-torn societies and humanitarian fields.

There are other translation tools which are also penetrating the market such as the software developed by Florida-based company Vcom3D, and DARPA (the research and development office for the U.S. Department of Defence).

Vcom3D’s military products include the Vcommunicator® language software, which is compatible with the Apple iPod. It functions by selecting an English phrase which is then converted into the desired foreign-language equivalent and communicated via a computer-animated 3D character incorporating culture-specific gestures. It is lightweight and discreet, robust, relatively cheap in terms of large-scale distribution and requires less training as troops are already familiar with iPods.

For two-way translation systems, DARPA’s work in progress is the Spoken Language Communication and Translation System for Tactical Use (TRANSTAC). The DARPA website lists TRANSTAC’s objectives as enabling spontaneous communication in ‘real-world tactical situations’, being speaker-independent, utilizing an intuitive interface for ease-of-use, adapting easily and quickly to ambient noise and change of speaker, and being able to support new languages with a turnaround time of less than 100 days.

Language technology does not claim to be free from error and there is no disputing the importance of human translators and interpreters. But with reports of deaths of linguists in conflict zones on the increase (both in the field and also through targeted attacks because of their links with foreign troops), the innovations on the market today are rapidly becoming indispensable not only for peacekeeping operations, but for medical professionals, law enforcers and emergency personnel worldwide.

“Live” text or high-res PDFs and outlined eps files? – Solving the mystery.

Thursday, January 7th, 2010

Most clients who require DTP in Western European languages would prefer The Translation People to use the fonts supplied with the English template and supply back the artwork files with “live” text, and this causes no problems for the DTP operator.

Where issues do occur, however, is when the client needs DTP in Eastern European languages or languages which are written in non-Latin scripts, e.g. Russian, Greek, Japanese, Chinese, Arabic or Hebrew. More often than not these languages are not supported by fonts matching those used in the original template, and The Translation People are on hand to suggest suitable alternatives if the client has not already done so.

In terms of final output it is important to establish whether the client requires high-resolution PDF files for printing, or the original artwork files with or without linked outline eps files, as this may influence the choice of typeface to be used. If high-resolution PDFs or artwork with linked outlined eps files are required there is generally a wider choice of typefaces available, whereas artwork files with “live” text may cause problems with font licensing: font files are protected by copyright and cannot be freely distributed.

The advantage of high-resolution PDF files or artwork with linked outlined eps files is a wider choice of typefaces, as well as greater stability once the files reach the printing stage. Incorrectly installed non-standard typefaces may cause “live” text to corrupt, but the text in outline eps files or PDF files is treated as immovable objects – or paths – and eliminates the risk of corruption.

The Translation People will always advise the client on the best solution for choice of typeface and final output, based on their requirements.

Should you have a current DTP requirement, please contact us now.