I’m Eiji Aramaki from university of Tokyo and ATR

Slides:



Advertisements
Similar presentations
だい六か – クリスマスとお正月 ぶんぽう. て form review ► Group 1 Verbs ► Have two or more ひらがな in the verb stem AND ► The final sound of the verb stem is from the い row.
Advertisements

Essay writing rules for Japanese!!. * First ・ There are two directions you can write. ・よこがき / 横書き (same as we write English) ・たてがき / 縦書き (from right to.
VE 01 え form What is え form? え? You can do that many things with え form?
“Business English Pro” Why is it necessary to learn this course?
用例ベース翻訳における用言句の 簡潔な翻訳の実現
英語特別講座 疑問文 #1    英語特別講座 2011 疑問文.
STEP 2 ノート・テイキングのサンプル.
第1回レポートの課題 6月15日出題 今回の課題は1問のみ 第2回レポートと併せて本科目の単位を認定 第2回は7月に出題予定
日本語の文法 文型(ぶんけい)をおぼえよう!
Chapter 11 Queues 行列.
日本語... ジェパディー! This is a template for you to use in your classroom.
日英ニュース記事を用いた 用例ベース翻訳システム
今しましょう Translate the story on the next slide. せんせいは しゅくだいを みます。
Chris Burgess (1号館1308研究室、内線164)
What did you do, mate? Plain-Past
Only One Flower in the World
日本人の英語文章の中で「ENJOY」はどういうふうに使われているのか
Noun の 間(に) + Adjective Verb てform + いる間(に) during/while.
Japanese verbs informal forms
There are 5 wearing verbs in Japanese depending on the part of body or the item being worn.
How do you talk about Positions/ Locations?
Example-based Machine Translation without Saying Inferable Predicate
Tohoku University Kyo Tsukada
A 02 I like sushi! I like origami!
にほんご JPN101 Sep. 23, 2009 (Wednesday).
にほんご JPN101 Oct. 26, 2009 (Monday).
十年生の 日本語 Year 10 Writing Portfolio
Unit Book 10_课件_U1_Reading2-8 4 Word power university 1.
The future tense Takuya Mochizuki.
Chapter 4 Quiz #2 Verbs Particles を、に、で
定期考査2 英語.
The Sacred Deer of 奈良(なら)
Who Is Ready to Survive the Next Big Earthquake?
Did he/she just say that? Get your head out of the gutter! Oh wait….
“You Should Go To Kyoto”
know / knows(s) / ___________
VTA 02 What do you do on a weekend? しゅうまつ、何をしますか。
Nihongo Japanese 日本ご ‘Numbers ’ & ‘Hiragana Revision’
Topics on Japan これらは、過去のインターンが作成したパワポの写真です。毎回、同じような題材が多いため、皆さんの出身地等、ここにない題材も取り上げるようにしてください。
Session 8: How can you present your research?
The Syntax of Participants シンタックスの中の話者と聞き手
Causative Verbs Extensively borrowed from Rubin, J “Gone Fishin’”, Power Japanese (1992: Kodansha:Tokyo) Created by K McMahon.
Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task
Bellwork: 1)宿題がたくさんあるのに、ゲームをしました。 2)大好きなのに、結婚ができません。
Effective methods for multiplying hypericum oliganthum
suppose to be expected to be should be
全国粒子物理会 桂林 2019/1/14 Implications of the scalar meson structure from B SP decays within PQCD approach Yuelong Shen IHEP, CAS In collaboration with.
-Get test signed and make corrections
くれます To give (someone gives something to me or my family) くれました くれます
Term paper, Report (1st, first)
My Favorite Movie I will introduce my favorite movie.
Where is Wumpus Propositional logic (cont…) Reasoning where is wumpus
豊田正史(Masashi Toyoda) 福地健太郎(Kentarou Fukuchi)
Question Words….
クイズやゲーム形式で紹介した実例です。いずれも過去のインターン作です。
いくらですか?.
2019/4/22 Warm-up ※Warm-up 1~3には、小学校外国語活動「アルファベットを探そう」(H26年度、神埼小学校におけるSTの授業実践)で、5年生が撮影した写真を使用しています(授業者より使用許諾済)。
第1回レポートの課題 6月24日出題 今回の課題は1問のみ 第2回レポートと併せて本科目の単位を認定 第2回は7月に出題予定
Tag question Aoyama Shogo.
ー生命倫理の授業を通して生徒の意識に何が生じたかー
The difference between adjectives and adverbs
Created by L. Whittingham
The Facilitative Cues in Learning Complex Recursive Structures
英語勉強会:川口英語 Supporting of Continuing Life Habit Improvement Using the Theory of Cognitive Dissonance : System Extension and Evaluation Experiment B4 渡邉.
MO装置開発 Core part of RTR-MOI Photograph of core part.
Visualizing Japanese Grammar Appendix
Cluster EG Face To Face meeting
第八課文法二 Chapter 8 Grammar 2
Grammar Point 2: Describing the locations of objects
Improving Strategic Play in Shogi by Using Move Sequence Trees
Presentation transcript:

Word Selection for EBMT based on Monolingual Similarity and Translation Confidence I’m Eiji Aramaki from university of Tokyo and ATR The title of my talk is “Word Selection for EBMT based on Monolingual Similarity and Translation Confidence” Eiji Aramaki * ** Sadao Kurohashi * ** Hideki Kashioka ** Hideki Tanaka ** * University of Tokyo ** ATR Spoken Language Translation Research Laboratories

EBMT Framework An EBMT system requires a Translation Memory Input sentence Translation Memory EBMT system Output sentence We are working on example-based machine translation. The basic idea of example-based machine translation is that Given an input sentence. Similar translation examples are retrieved and combined to produce a translation. So an EBMT system requires a translation memory, which contains a large number of translation examples with correspondences. If highly parallel corpus is available, the construction of TM is relatively [レラティブリイ]easy. And most EBMT research so far, assumed such a situation, limiting its domain like computer manual, or travel conversations.. An EBMT system requires a Translation Memory

EBMT Framework If highly parallel corpus is available ▽ Input sentence Parallel Corpus Translation Memory EBMT system Output sentence We are working on example-based machine translation. The basic idea of example-based machine translation is that Given an input sentence. Similar translation examples are retrieved and combined to produce a translation. So an EBMT system requires a translation memory, which contains a large number of translation examples with correspondences. If highly parallel corpus is available, the construction of TM is relatively [レラティブリイ]easy. And most EBMT research so far, assumed such a situation, limiting its domain like computer manual, or travel conversations.. If highly parallel corpus is available ▽ construction of TM is relatively easy

EBMT Framework Input sentence Content Aligned Corpus Parallel Corpus Translation Memory EBMT system Output sentence However, we can not expect the availability of such highly parallel corpora for wider domain, What is available for us in usual, is corpora which only share the same content, Such as newspapers and broadcast-news, and so on. We call this type of corpus, “content-aligned-corpus”. What we want to present in this talk is a method for realizing EBMT using Content-Aligned-Corpus Most available corpora are share the same content. (e.g. newspapers, broadcast-news )

Outline 1: NHK News Corpus 2: How to build a Translation Memory (TM) 3: How to use the TM 4: Experiments 5: Conclusion This is Outline Fist I explain Our corpus And How to build a Translation Memory (TM) then: How to use the it And Experiments and Conclusion

NHK News Corpus (40,000 article pairs) NHK provides multi-lingual news services the Japanese Av.# of Japanese sentence = 5.2, English = 7.2. 田植えフェスティバル石川県輪島市で外国の大使や一般の参加者など千人あまりが急な斜面の棚田で田植えを体験する催しが行われました。 輪島市白米町には(しろよねまち)千枚田と呼ばれる(せんまいだ)大小二千百枚の棚田が急な斜面から海に向かって拡がっています。 田植え体験は農作業を通して米作りの意義などを考えていこうという地球環境平和財団の呼び掛けで開かれたもので、海外三十四ヵ国の大使や書記官、それに一般の参加者ら合わせておよそ千人が集まりました。 田植えに使われた苗は去年の秋、天皇陛下が皇居で収穫された稲籾から育てたものです。 参加者たちは裸足になって水田に足を踏み入れ地元に伝わる田植え歌に合わせて慣れない手つきで苗を植えていました。 きょうの輪島市は雲が広がったもののまずまずの天気となり、出席された高円宮さまも海からの風に吹かれながら田植えに加わっていました。 地球環境平和財団では今年の夏休みに全国の子どもたちを対象に草刈りや生きものの観察会を開く他、秋には稲刈体験を行なう予定にしています。 田植えフェスティバル石川県輪島市で外国の大使や一般の参加者など千人あまりが急な斜面の棚田で田植えを体験する催しが行われました。 Ambassadors and diplomats from 37 countries took part in a rice planting festival on Sunday in small paddies on steep hillsides in Wajima, central Japan. About one-thousand people gathered at the hill, where some two-thousand 100 miniature paddies, called Senmaida, stretch toward the Sea of Japan. The event was organized by the private Foundation for Global Peace and Environment. The rice seedlings are grown from grain harvested by the Emperor at the Imperial Palace in Tokyo last autumn. Barefoot participants waded into the paddies to plant the seedlings by hand while singing a local folk song about the practice of rice planting. Ambassadors and diplomats from 37 countries took part in a rice planting festival on Sunday in small paddies on steep hillsides in Wajima, central Japan. 輪島市白米町には(しろよねまち)千枚田と呼ばれる(せんまいだ)大小二千百枚の棚田が急な斜面から海に向かって拡がっています。 About one-thousand people gathered at the hill, where some two-thousand 100 miniature paddies, called Senmaida, stretch toward the Sea of Japan. 田植え体験は農作業を通して米作りの意義などを考えていこうという地球環境平和財団の呼び掛けで開かれたもので、海外三十四ヵ国の大使や書記官、それに一般の参加者ら合わせておよそ千人が集まりました。 We used NHK-news-bilingual-corpus. This corpus consists of 40-thousand bilingual-article-pairs. NHK is a Japanese broadcasting service and also provides some English news programs English articles are translation of Japanese ones. The Average # of Japanese sentences is 5.2, and English is 7.2. This slide shows an-example-articles-pairs The event was organized by the private Foundation for Global Peace and Environment. 田植えに使われた苗は去年の秋、天皇陛下が皇居で収穫された稲籾から育てたものです。 The rice seedlings are grown from grain harvested by the Emperor at the Imperial Palace in Tokyo last autumn. 参加者たちは裸足になって水田に足を踏み入れ地元に伝わる田植え歌に合わせて慣れない手つきで苗を植えていました。 Barefoot participants waded into the paddies to plant the seedlings by hand while singing a local folk song about the practice of rice planting. きょうの輪島市は雲が広がったもののまずまずの天気となり、出席された高円宮さまも海からの風に吹かれながら田植えに加わっていました。 地球環境平和財団では今年の夏休みに全国の子どもたちを対象に草刈りや生きものの観察会を開く他、秋には稲刈体験を行なう予定にしています。

NHK News Corpus (40,000 article pairs) English articles are translated from the Japanese articles Some phrases have no parallel expressions 田植えフェスティバル石川県輪島市で外国の大使や一般の参加者など千人あまりが急な斜面の棚田で田植えを体験する催しが行われました。 Ambassadors and diplomats from 37 countries took part in a rice planting festival on Sunday in small paddies on steep hillsides in Wajima, central Japan. 輪島市白米町には(しろよねまち)千枚田と呼ばれる(せんまいだ)大小二千百枚の棚田が急な斜面から海に向かって拡がっています。 About one-thousand people gathered at the hill, where some two-thousand 100 miniature paddies, called Senmaida, stretch toward the Sea of Japan. 田植え体験は農作業を通して米作りの意義などを考えていこうという地球環境平和財団の呼び掛けで開かれたもので、海外三十四ヵ国の大使や書記官、それに一般の参加者ら合わせておよそ千人が集まりました。 English articles are translated from the Japanese articles so They Basically share the same content. But translation is not literal Some phrases have no parallel expressions And detailed information in a Japanese article is omitted in English artive. For example, this part shows an event schedule(シェジュール) of next-year , is omitted in English article On the other hand , Some information is added in English article to help the understanding of non Japanese listener. such as the Imperial Palace “in Tokyo.” The event was organized by the private Foundation for Global Peace and Environment. 田植えに使われた苗は去年の秋、天皇陛下が皇居で収穫された稲籾から育てたものです。 The rice seedlings are grown from grain harvested by the Emperor at the Imperial Palace in Tokyo last autumn. 参加者たちは裸足になって水田に足を踏み入れ地元に伝わる田植え歌に合わせて慣れない手つきで苗を植えていました。 Barefoot participants waded into the paddies to plant the seedlings by hand while singing a local folk song about the practice of rice planting. The Imperial Palace “in Tokyo” きょうの輪島市は雲が広がったもののまずまずの天気となり、出席された高円宮さまも海からの風に吹かれながら田植えに加わっていました。 Schedule of next year event 地球環境平和財団では今年の夏休みに全国の子どもたちを対象に草刈りや生きものの観察会を開く他、秋には稲刈体験を行なう予定にしています。

Outline 1: NHK News Corpus 2: How to build a Translation Memory (TM) 3: How to select Translation Examples from TM 4: Experiments 5: Conclusion I will explain How to build a Translation-Memory from such a Content-Aligned-Corpus

Sentence Alignment DP matching method using 5 translation dictionaries (200,000 entries in total) Extract 1-to-1 sentence-pairs 1:1-sentence-pairs have higher accuracy than the others 田植えフェスティバル石川県輪島市で外国の大使や一般の参加者など千人あまりが急な斜面の棚田で田植えを体験する催しが行われました。 輪島市白米町には(しろよねまち)千枚田と呼ばれる(せんまいだ)大小二千百枚の棚田が急な斜面から海に向かって拡がっています。 きょうの輪島市は雲が広がったもののまずまずの天気となり、出席された高円宮さまも海からの風に吹かれながら田植えに加わっていました。 地球環境平和財団では今年の夏休みに全国の子どもたちを対象に草刈りや生きものの観察会を開く他、秋には稲刈体験を行なう予定にしています。 Ambassadors and diplomats from 37 countries took part in a rice planting festival on Sunday in small paddies on steep hillsides in Wajima, central Japan. About one-thousand people gathered at the hill, where some two-thousand 100 miniature paddies, called Senmaida, stretch toward the Sea of Japan. 田植え体験は農作業を通して米作りの意義などを考えていこうという地球環境平和財団の呼び掛けで開かれたもので、海外三十四ヵ国の大使や書記官、それに一般の参加者ら合わせておよそ千人が集まりました。 The first step for Building Translation Memory, the system estimates sentence-alignment. We used a-conventional-DP-matching-method using 5-translation-dictionaries they have 2-million-entries. In Total, After sentence alignment, we extracted only 1:1 sentence pairs, because they have higher-accuracy than the-other-sentence-pairs. The event was organized by the private Foundation for Global Peace and Environment. 田植えに使われた苗は去年の秋、天皇陛下が皇居で収穫された稲籾から育てたものです。 The rice seedlings are grown from grain harvested by the Emperor at the Imperial Palace in Tokyo last autumn. 参加者たちは裸足になって水田に足を踏み入れ地元に伝わる田植え歌に合わせて慣れない手つきで苗を植えていました。 Barefoot participants waded into the paddies to plant the seedlings by hand while singing a local folk song about the practice of rice planting.

Phrase Alignment Phrase-alignment in 1-to-1 aligned-sentence-pair We use our method proposed in [Aramaki et al.2001, MT-Summit VIII] 1: Analysis of phrasal dependency structures 2: Estimation of basic-phrasal corresp. with dictionaries 3: Expansion of phrasal corresp. with surrounding information 天皇陛下が,(the Emperor) 皇居で (at the Imperial Palace) 苗は(rice seedling) 去年の(of last year) in Tokyo last from grain The rice seedlings are grown harvested by the Emperor Autumn. 秋,(autumn) 収穫された (harvested) 稲籾から (from the chaff of rice) 育てられたものです. (grown) at the Imperial Palace After That, we estimate phrase-alignment in 1to1 sentence-pairs. we used our method proposed in This paper. This consists of 3 steps. 1: Analyze phrasal dependency structures We used The Japanese parser KNP \cite{Kurohashi1994} and The English parser by Charniak2000}. 2: Estimate correspondences by using dictionaries 3: Estimate correspondences by surrounding correspondences. (But Today, I skip the detail of the algorithm. )

Translation Example (TE) and Translation Memory (TM) TE:= sentence pairs Structurally analyzed Aligned at phrase level TM:= a collection of TEs After phrase-alignment. We got 1-to-1 sentence-pairs And their dependency structures are analyzed And their alignment is estimated We define such a sentence-pair as a TE. The rice seedlings 苗は(rice seedling) are grown 去年の(of last year) from grain Translation Memory 秋,(autumn) harvested 天皇陛下が,(the Emperor) = by the Emperor 皇居で (at the Imperial Palace) at the Imperial Palace 収穫された (harvested) in Tokyo 稲籾から (from the chaff of rice) last 育てられたものです. (grown) Autumn.

Evaluation of Alignment Accuracy Sentence alignment Evaluation-Data: 96 article pairs Result All(m-to-n) alignment = 60% (=226/377) 1-to-1 alignment = 77% (=111/145) Phrase alignment Evaluation-Data: 145 1-to-1 sentence pairs Phrase alignment precision= 50% We checked how good these methods for the real content-aligned-corpus, using 96-article-pairs. The precision of sentence alignments is 60% But when we extracted 1-to-1 sentence-alignments as I mentioned. Its precision goes up to 77% Then, We checked phrase alignment precision in 145 1-to-1 sentence- alignment-pairs. Phrase-alignment-precision is only 50% So, It is not a good idea to use the whole[houl] results for TM. not high enough to use for TM

WCR (Word Corresponding Ratio) Discard sentence-pairs with little word correspondence # Content words corresponded in dictionaries WCR= # Content words So, we filtered translation-examples by the Word-Corresponding-Ratio, defined as this formula. For example, this-Japanese-sentence has 9-content-words, while English has 11-content-words. 6 and 8-words-of-them correspond in the dictionaries , so WCR is 0.7 6 + 8 WCR= =0.7 9 + 11

WCR & Phrase Alignment Precision 30% 30,000 TEs This graph shows the relation between -WCR and phrase-alignment-precision. Y–axis indicates the phrase alignment precision. And X-axis indicates Word-corresponding-ratio. as I mentioned, When we use all 1to1-sentence-pairs. its Precision is 50% , and the number of TEs is 70-thousand sentence-pairs. When we use 1to1-sentence-pairs which have more than 30% WCR. The Precision is 66% and the # is 30 thousand sentence pairs. So we stored 1to1-sentence-pairs which have more than 30% WCR into our Translation-memory. 70,000 TEs

Outline 1: NHK News Corpus 2: How to build a Translation Memory (TM) 3: How to select Translation Examples from TM 4: Experiments 5: Conclusion so far , I talked, how to build the Translation Memory. Then, I explain our EBMT system, mainly, how to select Translation-Examples from a Translation Memory.

Translation Algorithm TE Input Sentence Output Sentence S T I T S T I S T I This slide shows Translation Algorithm of our EBMT system an Input sentence is parsed, and transformed into a phrase-based dependency structure (2) For each phrase in an input sentence, a plausible TE is retrieved. In this talk, When a part of the input sentence and a part of the TE have an equal expression, the former is called I and the later is called S. And a part of the TE correspond to S is called T. (3) Finally, the English expressions of the T are combined to produce the final English Translation

Question How do we select the most plausible example? TE Input Sentence Similarity Equality I S T Confidence How do we select the most plausible example? So the question is how to select the most plausible example? We consider 3 relations among I , S, & T. Equality between I and S Surrounding Similarity between I and S Alignment confidence between S and T ….. Equality between and I S TE Score I S Surrounding Similarity of and S T Alignment confidence between and

T S I Example of I-S-T TE Input Sentence The United States (congress) 議会は アメリカは The United States (congress) America has issued 輸入を 輸出を T (import) export a request 制限するように 制限するように that restricts (restrict) restrict S I I will explain 3 criteria with this example This is a-Japanese-Input-sentence. This is a translation Example. 働きかけた 働きかけてきました exports (recommend) recommend

1: Equality between I & S Equality := # of equal phrases in I & S ΣEQ 議会は アメリカは The United States (congress) America has issued 輸入を 輸出を (import) export a request 1.1 制限するように 制限するように that restricts (restrict) restrict 1.0 Equality between I & S is the number of phrases in I & S But sometimes these are slightly different, depending on conjugation type 働きかけた 働きかけてきました exports (recommend) recommend TE Score = ΣEQ

2: Surrounding Similarity 議会は 0.6 アメリカは The United States (congress) America has issued 輸入を 0.8 輸出を (import) export a request 制限するように 制限するように that restricts (restrict) restrict Surrounding Similarity. We define “Surrounding” as phrases which are connected to equal phrases. So, in this example, these are Surrounding phrases. And their similarity are calculated by the Japanese thesaurus and their part of speech types 働きかけた 働きかけてきました exports Thesaurus match ・・・0.3~0.8 POS match ・・・0.3 (recommend) recommend TE Score = ΣEQ + ΣSIM

Confidence of Alignment is estimated by using dictionaries is estimated by surrounding information 議会は アメリカは The United States (congress) America has issued 輸入を 輸出を (Import) export a request 制限するように 制限するように 制限するように that restricts (restrict) restrict 1.0 Confidence of alignment As I mentioned, some correspondences in TE are estimated by their surrounding. Such correspondences have lower accuracy. So, the score is multiplied by its correspondence type. Like this. 0.5 働きかけた 働きかけてきました 働きかけてきました exports (recommend) recommend } { TE Score = ΣEQ + ΣSIM × ΣCONF

Global Confidence (WCR) 議会は アメリカは The United States (congress) America has issued 輸入を 輸出を (import) export a request 制限するように 制限するように that restricts (restrict) restrict We also use Word-Correspondence-Ratio of TE. Because it represents the confidence of the TE itself. So, score is multiplied by WCR. 働きかけた 働きかけてきました exports (recommend) recommend WCR } { TE Score = ΣEQ + ΣSIM × ΣCONF ×WCR

Example TE Input Sentence Source Part Target Part Output Sentence in German ドイツは カンボジアに対する 武力行使に 抗議するため 援助の 凍結を German use of force protest aid Cambodia suspension decided そして 終わり, 受け入れを 決めました. formally to accept ドイツは 高くなっている. has been high 動きに 抗議するため to protest 出発しました Tahiti 武力行使の 新しい for the use of force カンボジアに対する 援助を 中断しています suspended aid to Cambodia 援助の 凍結を 延長することを the suspension of assistance 決めました. This slide shows a real example. First, An input-sentence is parsed, and for each phrase in an input sentence, TE which have the high score is selected. The system doesn’t select TEs for conjunction, pronoun, and determiner phrases. In the example, the system selected 6 Translation examples. ##And phrases sometimes overlapped in an input sentence, in such a case, the system decide which overlapped phrases is used in their TE score.(前でいう) Then, English expression in the selected TEs are combined, and English dependency is constructed. The dependency relations in TE are preserved, and the relation between the TE is estimated based on the relation of input sentence. Ordering TEs by a set of rules governing both the dependency relation and word order.

Outline 1: NHK News Corpus 2: How to build a Translation Memory (TM) 3: How to select Translation Examples from TM 4: Experiments 5: Conclusion Experiments

Experiments (word selection task) Evaluation-Data 50 Japanese sentences in NHK-corpus (not used for TM) Gold-standard-Data 50 English sentences (which are pairs of evaluation data) Evaluation A human judge phrase by phrase referring gold-standard-data Good Bad Accuracy Proposed (EQ ・ SIM ・CONF ) 268 47 85.0% Method A (EQ ・ CONF ) 254 61 80.6% Method B (EQ ・ SIM ) 234 81 74.2% Baseline (Dic. ・ Freq) 232 83 73.6% As I mentioned, the module for controlling conjugation, determiner is not yet implemented. So in this experiments, we evaluated the system in the point of word selection task. For evaluation, we selected 50 sentence-pairs from the NHK News-Corpus that were not used for the translation memory. Their Japanese sentences were translated by our system, and the selected translation examples were evaluated by hand, referring to their English sentences. A phrase by phrase, evaluation was done to judge whether the English expression was good or bad. The accuracy was 85.0\%. In order to investigate the effectiveness of each component of selection, we compared the following methods: Proposal method, and proposal method without surrounding similarity is 80% And proposal method without alignment confidence is 74% Baseline method uses only dictionaries, and selects the most frequent phrases. Baseline accuracy is 73% Our proposal methods has the highest accuracy. We interpreted this result that Alignment-confidence has importance because without Confidence, Accuracy is very low. Baseline method uses only dictionaries, and selects the most frequent word or phrase.

Example (joined) ⇔ have been welcomed 望んでいます (joined) ⇔ have been welcomed TE Input Sentence キム・デジュン大統領は, (President Kim Dae – jung) 天皇皇后両陛下は (The Japanese Emperor and Empress) The Japanese Emperor 現在 (now) 昨夜 (last night) Empress In these examples, the Japanese word which means “joined” is translated into “have been welcomed” Because This TE has much similarity with an Input-sentence (President Kim Dae – jung) ⇔ The Japanese Emperor and Empress (Now) ⇔ last night (At reception dinner) ⇔ at the ceremony So, In spite of low alignment confidence, this TE was selected. This is good translation match with our implicit knowledge. 歓迎式典に (at the ceremony) have been welcomed 歓迎晩餐会に (at reception dinner) 臨まれました. (joined) at a ceremony 臨んでいます. (joined) TE has much similarity to the Input sentence. TE has low alignment confidence.

Outline 1: NHK News Corpus 2: How to build a Translation Memory (TM) 3: How to select Translation Examples from TM 4: Experiments 5: Conclusion Experiments

Conclusion EBMT system using a content-aligned corpus Proposed methods: TM Construction Discard sentence-pairs with too little word correspondence Rigorous phrase alignment TE selection Source language similarity Translation confidence The accuracy of the word selection was 85% Future Work: Completion of remaining components Evaluation of full Translations Let me conclude my talk. In this presentation, we described about the realization of the entire EBMT process using a content-aligned corpus, The amount of content-aligned corpus highly exceed parallel corpus, but there are much difficulty in using content-align corpus. In this process, one of the key problems is how to select plausible translation examples. We proposed a new method to select translation examples based on source language similarity and translation confidence. In the word selection task, the performance is highly accurate. We believe that the experiment demonstrated the possibility of EBMT. Future work Built remaining part (word ordering, operating modality) Evaluation of output sentences

Let me conclude my talk. In this presentation, we described operations of the entire EBMT process while using a content-aligned corpus, In this process, one of the key problems is how to select plausible translation examples. We proposed a new method to select translation examples based on source language similarity and translation confidence. In the word selection task, the performance is highly accurate. We think the experiment demonstrated the possibility of EBMT.