Accueil Accueil

Tralogy 2011
Session 6-Using Alignment to detect associated multiword expressions in bilingual corpora, (3584) - Vendredi 04 mars 2011 14:40 - 15:00

Session 6-Using Alignment to detect associated multiword expressions in bilingual corpora,

Translating multiword expressions from a language to another needs to recognize them as such. Bilingual multiword expressions are an issue when they are not the exact word-to-word translation of each other. The following examples are provided for a French-English translation task: (1) Phrasal verbs such as « to call in on » becoming « rendre visite », (2) « sorry to hear that », that a human translator translates into the simple 'désolé que" (3) most of adverbial locutions like « such as », equivalent to « de telle façon que », or « de manière à », etc.Thus, Machine Translation (MT) either requires a rich multiword bilingual database, or tends to create or enrich a first set of associated multiword expressions. Most of the time, existing resources are incomplete, and an interesting way to enhance covering is to provide a tool detecting 'associable' multiword expressions in parallel corpora. The latter are sets of texts that are translations of each others. There is an extensive literature in alignment techniques trying to link sentences from a text in a language, named the source, to one or many in the other language, seen as target. Sentence alignment is the basic preliminary task that underlies all others, more fine-grained. Word-to-word alignment has largely been dealt with by statistical systems. Multiword expressions have a granularity that lies between word and sentence. They are mostly phrasal, and sometimes with a rather strong syntactical and lexical divergence. With the improvement of parsers, alignment methods using syntax have emerged. Syntax allows the translation task, among others, to focus on relevant phrase fragments and to link multiword units together. For instance, Ozdowska's AliBi system is based on dependencies structures. The Groves', Hearnes' and Way's system uses syntactic trees with internal node alignments. Bilingual terminology, consisting in recognizing equivalent groups of words, also relies on syntax to extract patterns, such as Noun-Verb, Adjective-Noun, Prepositional Noun Phrase, etc...(e.g. Claveau, 2009). Most of these multiword expressions could be reduced down to collocations. A collocation is a multiword expression, naturally translated with quite strong constraints (e.g., « to show respect »-» faire preuve de respect »). Seretan's method [Seretan, V. (1999)] recognizes numerous equivalent pairs of collocations throught bilingual alignments which POS-tags are equivalents or close (even with distant words). But it only retrieves two-words collocations. Thus, there is a need for systems that might detect longer collocations, and more divergent ones. The method proposed in this article is an alignment process between pairs of sentences, strongly based on syntax. It relies on is a rule-based system combining partial alignments from a database through a non-iterative graph-theory based process. Multiword expressions patterns built on examples help providing alignments with a good coverage, which in turn detect new multiword expressions, and enrich the database. The article sketches the state-of-the art in alignment, focusing on syntactic oriented systems, describes the designed system as well a corpus run experiment with promising results.

V. Prince, J. Segura, LIRMM

Sciences de l'Homme et de la Société Sciences de l'ingénieur Communication

Toutes les vidéos de l'évenement Tralogy 2011 (46 videos)

00:01:29.44

2011-03-03

09:00 - 09:35 : Mot de bienvenue, G. Sentise

00:09:30.80

2011-03-03

09:35 - 09:45 : Introduction, A. Kowalska

00:22:20.44

2011-03-03

09:45 - 10:10 : Machine Translation at the European Commission : language tools a...

00:20:06.12

2011-03-03

10:10 - 10:35 : Machine Translators

00:26:26.96

2011-03-03

10:35 - 11:00 : Language technologies in European research & innovation progr...

00:20:06.12

2011-03-03

11:00 - 11:15 : Session 1-Terminologie et traduction, une complémentarité oubli...

00:10:52.12

2011-03-03

11:15 - 11:30 : Session 1-User-centred Views on Terminology Extraction Tools : Us...

00:12:38.92

2011-03-03

11:30 - 11:45 : Session 1- Future Technologies for the Legal Translator : the Law...

00:12:44.32

2011-03-03

11:45 - 12:00 : Session 1-La Traduction assistée par ordinateur dans le contexte...

00:09:08.83

2011-03-03

12:00 - 12:15 : Discussion Session 1 tralogy 2011

00:20:13.28

2011-03-03

14:00 - 14:20 : Session 2-Aspects humains des technologies langagières dans l’...

00:12:55.48

2011-03-03

14:20 - 14:40 : Session 2-La technologie au service de la traduction

00:12:08.52

2011-03-03

15:00 - 15:20 : Session2-On the Systematicity of Human Translation Processes

00:13:58.44

2011-03-03

15:20 - 15:40 : Session 2-La post-édition à la portée du traducteur

00:13:08.88

2011-03-03

15:40 - 16:00 : Session 2-Face à la nouvelle donne : l’émergence d’un tradu...

00:20:29.23

2011-03-03

16:00 - 16:30 : Discussion session 2 tralogy 2011

00:22:19.20

2011-03-03

16:30 - 16:50 : Session 3-Language technologies and translators’ training - con...

00:19:46.40

2011-03-03

16:50 - 17:10 : Session 3-La formation du traducteur au Canada : espoirs et réal...

00:12:48.36

2011-03-03

17:10 - 17:30 : Session 3-Creating Blended Resources for Translator Training

00:10:34.64

2011-03-03

17:30 - 17:50 : Session 3 -The Internet, Google Translate and Google Translator T...

00:13:53.68

2011-03-03

17:50 - 18:10 : Session 3-Corpora, online resources and technology in training tr...

00:15:01.88

2011-03-03

18:10 - 18:30 : Session 3-Translation and technology in a project-based learning ...

00:30:48.80

2011-03-03

18:30 - 19:00 : Discussion session 3 Tralogy 2011

00:26:07.76

2011-03-04

09:00 - 09:20 : Session 4-Spoken Language Translation

00:18:40.76

2011-03-04

09:20 - 09:40 : Session 4-The Translator’s Workstations revisited : A new parad...

00:07:05.84

2011-03-04

09:40 - 10:00 : Session 4-What is web-based machine translation up to ?

00:10:53.32

2011-03-04

10:00 - 10:20 : Session 4-Translation 2.0 : facing the challenges of the global e...

00:10:24.32

2011-03-04

10:20 - 10:40 : Session 4-Traduction et informatique dématérialisée : une réa...

00:12:09.76

2011-03-04

10:40 - 11:00 : Session 4-Speech Recognition, Machine Translation and Gesture Loc...

00:28:16.30

2011-03-04

11:00 - 11:15 : Discussion session 4 Tralogy 2011

00:25:03.72

2011-03-04

11:30 - 11:50 : Session 5-What is a better translation ? Reflections on six years...

00:11:28.08

2011-03-04

11:50 - 12:10 : Session 5-How can we measure machine translation quality ?,

00:09:26.04

2011-03-04

12:10 - 12:30 : Session 5-Unobtrusive methods for low-cost manual evaluation of m...

00:13:19.73

2011-03-04

12:30 - 12:45 : Discussion session 5 Tralogy 2011

00:18:11.96

2011-03-04

14:00 - 14:20 : Session 6-Improving MT coherence through text-level processing of...

00:24:49.20

2011-03-04

14:00 - 14:40 : Session 6-Que peut apporter au traducteur la linguistique de corp...

00:11:45.80

2011-03-04

14:40 - 15:00 : Session 6-Using Alignment to detect associated multiword expressi...

00:08:06.72

2011-03-04

15:20 - 15:40 : Session 6-Repérage automatique des équivalences traductionnelle...

00:07:45.84

2011-03-04

15:40 - 16:00 : session 6-Traduction Automatique et multilinguisme : Bonnes prati...

00:37:57.00

2011-03-04

16:00 - 16:05 : discussion session 6 Tralogy 2011

00:20:57.84

2011-03-04

16:05 - 16:15 : session 7-Translation and the New Digital Commons

00:15:32.12

2011-03-04

16:15 - 16:30 : Session 7-TAUS Data Association (TDA) : a member-governed industr...

00:12:25.72

2011-03-04

16:30 - 16:45 : Session 7-Language Resources for Translation and Multilingual Tec...

00:11:03.90

2011-03-04

16:45 - 17:00 : discussion session 7 Tralogy 2011

00:19:06.88

2011-03-04

17:00 - 17:15 : Conclusion-Techniques futures pour le traducteur professionne

00:20:14.88

2011-03-04

17:15 - 17:30 : J’ai connu ce qu’ignorent les Grecs

Contact Webcast

La cellule webcast du CCIN2P3 vous propose de diffuser en direct et/ou en différé sur internet vos manifestations, colloques, conférences. Attention, ce service est réservé au domaine public dans le domaine de la Recherche Scientifique.


Cellule Webcast
Centre de Calcul IN2P3/CNRS
21 Avenue Pierre de Coubertin
CS70202
69627 VILLEURBANNE Cedex

Tél. :
+33(0) 4.78.93.08.80

Fax. :
+33(0) 4.72.69.41.70



Email :

Voir les mentions légales du site