This undergraduate dissertation is about machine translation tools from English into Spanish and about computer-assisted translation tools. The main goal is to identify the importance of these tools within the working environment of translators nowadays and to learn about their potential.
The first section of this dissertation consists of an introduction in which I justify the chosen subject. Next, the different types of MT are analysed and some aspects on the main online MT systems such as Google, Systran or DeepL are explained. Four documents of different nature have been selected in order to be translated with these same MT engines. The goal is to identify, analyse and classify the errors made by each MT system and to compare their performance. The processes of pre- and post-editing are explained through practical examples. Finally, the advantages and disadvantages of MT are presented, as well as an explanation on computer-assisted translation tools which already allow translators to use MT in their work environment.
Index
Information about this project
Abstract
Keywords
Reasons why I chose this topic
Introduction
1. Types of MT
1.1. Rule-based MT
1.1.1. Direct systems
1.1.2. Indirect systems
1.1.2.1. Transfer systems
1.1.2.2. Interlingua
1.2. MT based on the analysis of linguistic corpora
1.2.1. Example-based MT
1.2.2. Statistical MT
1.2.3. Neural MT
1.3. Hybrid translation
2. MT tools
2.1. Systran
2.2. Google
2.3. DeepL
3. Text excerpts translated with MT
3.1. The legal text
3.2. The scientific text
3.3. The technical text
3.4. The press article
4. Analysing and classifying the errors
5. Results
6. Pre-editing and post-editing
7. Advantages and disadvantages of MT
8. Conclusions
References
Endnotes
Note:
This project was conceived as my undergraduate dissertation during the 2017-2018 academic year. It contains the translated version of the project which was originally written in Spanish. For academic reasons, it has been published in 2021. The sections on the history of MT and analysis of MT systems are still useful to students interested in learning about MT and the different types of MT systems.
Information about this project
Title: Machine translation
Student: Carmen Romero
Centre: Faculty of Translation and Interpreting - University of Granada, Spain. Department of Translation and Interpreting.
Presented in June 2018.
Abstract
This undergraduate dissertation is about machine translation tools from English into Spanish and about computer-assisted translation tools. The main goal is to identify the importance of these tools within the working environment of translators nowadays and to learn about their potential.
The first section of this dissertation consists of an introduction in which I justify the chosen subject. Next, the different types of MT are analysed and some aspects on the main online MT systems such as Google, Systran or DeepL are explained. Four documents of different nature have been selected in order to be translated with these same MT engines. The goal is to identify, analyse and classify the errors made by each MT system and to compare their performance. The processes of pre- and post-editing are explained through practical examples. Finally, the advantages and disadvantages of MT are presented, as well as an explanation on computer-assisted translation tools which already allow translators to use MT in their work environment.
Keywords
Machine Translation, computer-assisted translation, Google, Systran, DeepL, translation industry, pre-editing, post-editing
Reasons why I chose this topic
MT is a sub-field of computational linguistics, an increasingly attractive field for the many possibilities still unexplored. Natural language processing (NLP) is bringing together efforts from different fields of knowledge such as computer science and translation. In my opinion, researching on these areas can lead towards an interesting and promising career. MT offers many possibilities worth exploring; such as live translation during a phone call or translating menus instantly at a restaurant. Hence my interest for these tools that are changing the translators’ professional work environment.
I also believe that translators cannot ignore the changes that the translation industry is undergoing, specially since the implementation of Machine Translation (from now on, MT) and computer-assisted translation (CAT) tools. Nowadays, the most advanced CAT software developers include MT systems in their software; for they are aware of the advantages they offer to human translators. MT can be a useful tool for translators and translation companies’ employers, for they save time and money respectively; even if MT tools are still far from being able to translate any type of text or to replace human workforce without post-editing the content in order to obtain high-quality results.
Nowadays, MT’s development is a key factor for global communication and the elimination of linguistic barriers (Alcina, 2010). The results obtained in such short time, specially since the implementation of neural networks, point towards the bright future that lies ahead for machine translation.
Introduction
I would like to state the goals of this dissertation:
(1) Classifying the existing types of MT systems.
(2) Explaining which are the most relevant resources for human translators that Systran, Google and DeepL offer.
(3) Translating different types of text using the free online MT engines offered by the above-mentioned companies in order to identify and classify the mistakes there might be in order to suggest a correct translation and compare the obtained results.
(4) Explaining the processes of pre- and post-editing and showing some examples on how to obtain better translations by means of these techniques.
(5) Analysing the advantages and disadvantages of using MT.
(6) Identifying which computer-assisted translation software already allows user to translate using MT.
(7) Discussing the possibilities that MT offers and the most recent progress achieved in the field of MT.
First of all, let’s define what Machine Translation and Computer-assisted translation are. ‘MT’ is the abbreviation of Machine Translation (Traducción Automática). According to Systran’s official website1 , MT is the process through which computer software is used in order to translate a text written in a natural language (for example, English) to another (Spanish).
Computer-Assisted Translation (CAT) tools refer to the computer programmes used in order to increase the quality of the translation process and translators’ productivity2 . The goal of CAT tools is not to replace human translators but to help them while translating. This help can take the form of dictionaries, terminology glossaries or translation memories, which are linguistic databases that store previous translations repeatedly in order to be reused in the future3 .
According to Poibeau:
The goal of machine translation is now considered mainly to be that of providing the user with some help, and, in some professional contexts, enabling him to decide whether a human translator needs to be called on or not (2017, p. 13).
Nowadays, great progress is being achieved in the development of MT. Even though the translator’s profession is far from disappearing, some people believe that the future threat to translators is embodied by MT. However, is it possible to translate without professional human translators? What kind of texts can be translated by MT software with a low error rate?
Many authors have taken an interest in the possibility of mechanising processes such as translation. I here cite two of them:
The mechanization of translation has been one of humanity’s oldest dreams. In the twentieth century it has become a reality, in the form of computer programs capable of translating a wide variety of texts from one natural language into another. But, as ever, reality is not perfect (Hutchins and Somers, 1992b).
Machine Translation has become one of the latest topics in the field of translation. The improvements that have been reached lately are very important and many are rethinking the future of the profession. Some argue that the spread of MT will be the end of the profession, while others claim that professional translators can never be replaced (Santilli et al., 2016, p. 271).
Nowadays, MT is a field of interest to many specialists from different fields of knowledge: computer scientists and translators, but also specialists in artificial intelligence and cognitive sciences, linguists and philosophers. This is because the analysis of languages cannot be separated from the analysis of knowledge and reasoning, as translation requires a deep knowledge of the text to be translated. Transposition into another language is a difficult and delicate process that also involves subjective factors: a reader may find a translation inadequate and a translated text may vary drastically depending on the needs of the client or the nature of the text (adapted from Poibeau, 2017, pp. 2–11).
The overall quality that MT can provide is also a topic of intense debate: should it match the quality of a human translation or simply allow the user to understand a text written in a foreign language? It is clear that advances and updates in MT software are geared towards more natural and fluent results in the target language. To achieve this, MT tools have gone through successive stages since their origins, which will be explained in more detail in the following sections.
1. Types of MT
In this section, the existing types of MT have been classified and an explanation on how each of them works is provided.
There are several classifications of MT systems depending on the chosen criteria. For example, systems can be bilingual if they are designed for a specific language pair or multilingual if they are designed for more than two languages. They can be unidirectional systems, where it is only possible to translate in one direction, or bidirectional if it is possible to translate in both directions (Hutchins and Somers, 1992b).
In this project, MT systems have been classified as it follows:
(1) Rule-Based Machine Translation (RBMT,) which includes direct and indirect systems. In turn, the latter are subdivided in transfer systems and interlingua.
(2) MT systems based on the analysis of linguistic corpora, which includes statistical MT, example-based MT and neural MT.
(3) The concept of ‘hybrid MT’.
1.1. Rule-based MT
On its website, Systran provides an explanation on how this type of MT works:
Rule-based MT is based on countless built-in linguistic rules and millions of bilingual dictionaries for each language pair. The software analyses the syntactic structure of the text and creates a transitional representation from which the text is generated in the target language. This process requires extensive lexicons with morphological, syntactic and semantic information, as well as large rule sets. The software uses these complex rule sets and then transfers the grammatical structure from the source language to the target language. Rule-based MT provides good quality outside the specific domain and is predictable. This type of MT offers the advantage that dictionary-based customization ensures improved quality according to corporate terminology. However, translation results may lack the fluency expected by readers (Systran, no date a).
Rule-based MT systems are divided into direct and indirect systems.
1.1.1. Direct translation systems
These are the oldest systems and belong to the first MT generation systems (Hutchins and Somers, 1992b). In direct systems, the source language is translated into the target language without an intermediate representation and without carrying out a syntactic or semantic analysis of the source text. The translation is carried out following these steps (adapteed from Poibeau, 2017, p. 63).
- Morphological analysis. The system identifies word endings and reduces inflected forms to their uninflected base form (running → run).
- Bilingual dictionary lookup. The system looks up the words it has reduced to their uninflected form in the bilingual dictionary to find their equivalent in the target language .
- Reordering. Direct systems have a series of reordering rules to make the result in the target language more appropriate; for example, by reordering some particles or adjectives (Hutchins and Somers, 1992a). In a direct English-Spanish MT system, one of the reordering rules would be the change of order in the adjective-noun phrase from English to noun-adjective in Spanish: red car → coche rojo .
Finally, the system would produce the translated text (Hutchins and Somers, 1992b).
Abbildung in dieser Leseprobe nicht enthalten
Picture 1. Stages of the translation process through direct MT systems. From Hutchins & Somers, An Introduction to Machine Translation (1992).
Available at: psychotransling.ucoz.com/_ld/0/13_hutchins_2.doc
The results of direct systems, without semantic disambiguation, were unsatisfactory (Poibeau, 2017, p. 64). These systems had many limitations, as they translated word by word and had only a few rules for rearranging the order of words. The translations were often mistaken and the syntactic structures were too similar to those of the source language. The failure of the first generation systems led to the development of more sophisticated linguistic models for translation (Hutchins and Somers, 1992a).
1.1.2. Indirect translation systems
These are known as the ‘second generation MT systems’ (Hutchins and Somers, 1992b). They are known as ‘indirect’ due to the fact that, during the translation process, transposition does not take place in one phase but consists of two or more. They are subdivided into transfer and interlingua systems.
1.1.2.1. Transfer systems
Transfer systems are more complex than direct translation systems, since they carry out syntactic analysis. This way, the system does not translate word by word and takes into account the syntactic structure of the source language to produce the translation into the target language (Poibeau, 2017, p. 27).
The translation process using transfer systems consists of 3 phases. This is how Hernández explains it (2002, p. 110):
- Analysing the source language and representation. First, the source language is analysed in order to obtain an intermediate representation with syntactic and/or semantic information and a representation of the syntactic structure of the source language is created.
- Transfer. Secondly, the transferring process from the source language to the target language is carried out. Two different processes can be distinguished during this step:
a) A lexical transfer or translation of the terms from the input sentence, usually using a bilingual dictionary;
b) A structural transfer or applying a set of transformation rules to the structure resulting from the analysis phase, in order to achieve an equivalent structure in the target language.
- Generation. And finally, in the phase known as ‘generation’, the text of the sentence is reconstructed in the target language, based on the structure obtained in the transfer process.
The main feature of transfer systems is the existence of an additional transfer module that projects intermediate representations of the source text onto intermediate representations of the target text. This transfer module can work at different levels of linguistic analysis (Hutchins & Somers, 1992 through (Moreno, 2000)).
Editor’s note: This image was removed due to copyright reasons.
Picture 2. In this picture, the different stages of the transfer system can be appreciated. This transfer model is conceived for six pairs of languages. Retrieved from http://elies.rediris.es/elies9/3-2-2.htm
1.1.2.2. Interlingua
This type of MT performs the translation based on an intermediate conceptual representation known as ‘interlingua’ (Hernández, 2002, p. 110). Interlingua-based systems perform the translation process in two phases:
- Analysis phase. In this phase, a structure is assigned to the source text, using only source language information. This structure is a sentence in a universal language that represents the ‘meaning’ of the text (Moreno, 2000). This intermediate representation includes all the necessary information to produce the text in the target language without the need of going back to the source text (Hutchins and Somers, 1992b).
- Generation phase. In this phase, the meaning obtained in the interlingua is used to generate target text in any language (Zapata and Benítez, 2009, p. 119). The target language text is produced regardless of the source language (Moreno, 2000).
However, a universal interlingua, totally independent of any language, has not yet been created (Hutchins and Somers, 1992b).
The most important problems with this approach is the choice of the vocabulary within the interlingua (Arnold et al. 1994 a través de (Moreno, 2000)), i.e. the primitive concepts of meaning representation. For example, [...] in Russian there is no translation for blue, but there is goluboi (light blue) and sinii (dark blue). What would be the interlingua representation for ‘blue’? These problems do not exist in a transfer system, since the discrepancies between languages are resolved in the transfer component of the language pair in question (Moreno, 2000).
Instead of undertaking the arduous task of developing an entirely artificial language, English often acts as an interlingua, or rather, a pivot language, acting as an intermediary between a source language and a target language (Poibeau, 2017, p. 28).
Direct, transfer and interlingua systems form a continuum, ranging from a strategy that analyses the text superficially then followed by those systems that aim to develop a completely artificial and abstract representation independent of any language (Poibeau, 2017, p. 29). This continuum is represented by ‘The Pyramid of Machine Translation’, also known as ‘the Vauquois triangle’, after the famous French researcher in the field of MT in the 1960s:
Editor’s note: This image was removed due to copyright reasons.
In this other image, the steps followed by each rule-based MT system are shown in different colours. In red, the direct system; in green, the transfer system; and finally in blue, the interlingua system, where it is shown that a language other than A and B can act as a pivot language.
Editor’s note: This image was removed due to copyright reasons.
Abbildung in dieser Leseprobe nicht enthalten
Picture 3. Vauquois’s Triangle. Retrieved from 1Global Translators.
Rule-based MT models have some advantages, as the translation result is predictable and there will always be consistency between the different versions (Systran, no date a). However, the main problem of rule-based technologies lies in the great difficulty of formalizing human linguistic knowledge by means of precise rules. Another problem is the high cost involved in the construction and maintenance of such systems as well as its adaptation to new domains or language pairs (Casacuberta and Peris, 2017, p. 67).
1.2. MT based on the analysis of linguistic corpora
In this section, an explanation on MT systems based on the analysis of data from bilingual corpora is provided. This modality is subdivided into example-based MT systems, statistical MT and neural MT.
MT systems based on the analysis of linguistic corpora require a large amount of data to produce satisfactory results. Therefore, when the availability of computer-accessible texts increased considerably in the 1980s, new approaches to MT were developed (Poibeau, 2017, p. 91).
It is important to understand the notions defined below in order to know how this type of MT works. A linguistic corpus is a collection of real examples showing the use of a language. They are usually stored electronically. They can be texts written in one language (monolingual corpus) but there can also be corpora comparing two or even several languages, i.e. bilingual or multilingual corpora. (adapted and translated from the blog En la luna de Babel, (Surià, 2016)). If a corpus consists of several texts together with their corresponding translations, then it is called a parallel corpus. In order to find the translation equivalents between the original text and the translated text, it is necessary to align both texts. As the company Acantho Ideas & Culturas explains on its web site (2016), ‘document alignment consists of joining segments or sentences extracted from a source document with their corresponding translation extracted from the translated document. In this way, large translation memories can be created from parallel texts, i.e. from texts that have already been translated’. Aligning two sentences consists of establishing which words or groups of words are the equivalents in the other language. This allows the system to take the following as equivalents house → ‘casa’ (symmetric alignment: 1 word in the source language, 1 word in the target language) but sneakers → ‘zapatillas de deporte’ (asymmetric alignment: 1 word in the source language, 3 words in the target language).
Parallel corpora are of great use to professional human translators. It is not surprising, therefore, that the efforts to develop MT systems attempted to exploit the advantages provided by the analysis of linguistic corpora.
1.2.1. Example-based MT
Example-based MT was introduced during the 1980s by Makoto Nagao (Poibeau, 2017, p. 109), a computer scientist who has contributed to various fields, including MT and natural language processing (Wikipedia no date d).
The translation process using example-based MT consists of three phases:
- Corpus query. First, the system tries to find fragments of the sentence to be translated in the corpora for the source language. All relevant fragments are identified and stored.
- Search for equivalents. Second, the system searches for translation equivalents in the target language using aligned bilingual texts.
- Fragment combination. Finally, the system tries to combine the translation fragments to obtain a correct sentence in the target language (Poibeau, 2017, p. 110).
This process is well illustrated in the following practical example. Let us assume that we want to translate into Spanish: Today is going to be a good day and that there is a bilingual English-Spanish corpus available with the following pairs of sentences:
Abbildung in dieser Leseprobe nicht enthalten
The system infers that ‘hoy va a ser’ is a translation of the English sequence ‘today is going to be’, since this expression appears in examples 1 and 2. From examples 3 and 4, the system infers that ‘un buen día’ is a translation of ‘a good day’. The system combines both fragments to produce ‘hoy va a ser un buen día’ (adapted from Poibeau, 2017, pp. 112–113).
In practice, however, translation problems are more difficult to solve. This system has some drawbacks: several translated fragments might be available in the target language for one fragment in the source text. It is not always obvious which one is the most appropriate. In addition, linking different text fragments is complicated, as they are sometimes not fully compatible with each other (Poibeau, 2017, pp. 113–114).
[...]
1 Available at http://www.systran.es/systran/tecnologia-de-traduccion/que-es-la-traduccion-automatica/.
2 Definition by MemoQ available at https://www.memoq.com/es/que-es-una-herramienta-tao
3 Definition by SDL available at https://www.sdltrados.com/es/solutions/translation-memory.html
-
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X.