Combinedeffectso...

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

A. Gelbukh (Ed.): CICLing 2001, LNCS 2004, pp. 197-205, 2001.
Springer-Verlag Berlin Heidelberg 2001
Intelligent Case Based Machine Translation System
Wang JianDe, Chen ZhaoXiong, and Huang HeYan
Computing Institute of Technology, Chinese Academic of Science
Beijing China 100081*******************
Abstract.Interactive Hybrid Strategies Machine Translation (IHSMT) system
has just been designed to solve the translation problems. It forms a nice
interdependent cooperation relation between human and machine by interaction.
The system achieves hybrid strategy translation by synthesizing the rule-based
reasoning and case-based reasoning, and therefore overcomes the demerits of
single strategy. This paper has done some work on learning mechanism of this
system and proposes a learning model of human-machine tracking and
memorizing (HMTM). This model can store the information of human-machine
interaction into memory base as case of machine learning, and then gradually
accumulate knowledge to improve the intelligence of MT system.
1.Introduction
With the growth of international communication and Internet, the language barrier is becoming a serious problem. The grown-up of Machine Translation (MT) brings a good chance to solve this problem. It has taken on great significance in academic research and good prospect in applications, as diverse as below: translation of business letters and reports, translation of technical documents and articles, and speech-to-speech translation for business and travel. MT has become one of the international drastic competing areas of high technology.
Until now, MT has developed for almost half a century. During these five decades,all kinds of MT application systems, such as on-line MT, domain-restricted MT and machine-aided MT, have come forth to accommodate information processing and network developing. But from the point of view of a great deal of demand and the low quality of translation, these systems are far from practicality. The problems are:1. The quality of direct MT translation is so low that human has to do heavy and
difficult work after that.
2. Machines cannot collect experiences and not enhance the power of it, so always
makes the same mistakes.
The case based MT or example based MT [1,2,5,6,9] can memorize the sentence or chunk, and use them to improve the MT system, but they cannot use them effectively.
198 Wang JianDe, Chen ZhaoXiong, and Huang HeYan
2.IHSMT System
Until now, there are no machine translation system can finish the language translation like human. Although the automatic translation is the aim of Machine Translation, the interaction between human and computer is necessary and emphasis nowadays. Interactive Hybrid Strategies Machine Translation—IHSMT system has just been designed to solve these problems. It forms a nice interdependent cooperation relation between human and machine, which not only helps solve the inherent problems of MT, but also makes the feedback learning possible. At the same time, IHSMT achieves hybrid strategy translation by synthesizing the Rule-based Reasoning and Case-based Reasoning, and therefore overcomes the demerits of single strategy. This paper has done some work on learning mechanism of this system and proposes a learning model of human-machine tracking and memorizing (HMTM) from the point of view of cognitive psychology. This model can store the information of human-machine interaction into memory base as analogous case by machine learning, and then gradually accumulate knowledge to improve the intelligence of MT system.
The system consists of six[3] parts: Knowledge Base, CBAMT (Case-based Analogy MT), RBMT (Rule-based MT), Human-Computer Interactive Interface, PostEdit Interface, Model Knowledge Acquisition Interface.
CBAMT (Case-based Analogy MT). With the feature abstraction from sentence and analogically matching in Case Base, the most analogical result is found. The results of translation are produced from the case.
RBMT (Rule-based MT) [7]. With the Rule-based Reasoning and knowledge, the sentence is analyzed, and the result is produced.
Intelligent Case Based Machine Translation System 199 Knowledge Base. Feature Case Base (FCBS), which is word aligned, store all kinds of bilingual language models in natural language sentence and translation result.
Rule Base (RBS) stores the syntax rules and lexical rules of the RBMT method.
Human-Computer Interactive Interface. The interface is Object-Orient and visualized. Human can analyze the sentence, select the target model, get the result of translation with computer interactively.
Model Knowledge Acquisition. When the English sentence is input into the system, the language model in the base is retrieved at first. If the same sentence or model exists, the analogical result can be obtained. Otherwise, the RBMT is used to translate the sentence. After that, the result can be edited with the bilingual mapping information. The source- target translation template is constructed, and added into the FCBS.
Comparing with the single-strategy Machine Translation, IHSMT has the opening knowledge base, can learn from the human-machine interaction and enhance the intelligence of the system.
200 Wang JianDe, Chen ZhaoXiong, and Huang HeYan
zy Learning (LL) Strategy and HMTM Learning Model With the algorithm of Lazy Learning, analogous cases in memory are compared, classified [10], and then memorized. After that, cases have different abstract levels including functional words of syntax and semantic, which is used in comparing and retrieving similar cases. Furthermore, they are also used to the generation of cases with Complex Features from concrete cases. It not only extends the application of concrete cases, but also reduces the memory size of system. The “Useful-life” of cases is used to limit the growth of memory base.
Lazy learning is one kind of analogy machine learning. The cases are stored into the TM in classes. Because the rule and the decision tree are not used, the method is simple. When a new model is input, according to the distance of the examples, the most similar sample is found.
In the Lazy Learning, every sample has its own property matrix, which forms a model space. The similar samples belong to the same model space. And the similarity can be measure by the distance between the matrixes.
Machine learning is the main problem of Artificial Intelligence, which can apply the knowledge to the similar conditions. The machine learning mechanism of the system is used to accumulate the knowledge and experience, and form the Knowledge Base that can be used to improve the translation. The method reduces the interception of human. Lazy learning of the system is used the case based reasoning and analogical reasoning, the model frame consists of below parts:
CBR (Case Based Reasoning) Model: these models can analysis and search the feature of the sentence model;
Complex Feature Abstract Model: abstract models from the TM (Translate Memory).
The result of MT is modified through the Orient-Object interface and output as the final result. At the same time, the analogical sample of learning system is produced, and sent into the TM. The classifier of the system use the case based reasoning to abstract the feature matrix from the samples. After searching the similar case in the case database, if the distance is over the value of analogical then return. Otherwise, the sample is added into the cluster of the class. All the cases in the TM are obtained in this way. When the cases increase, the radius of the case database becomes bigger, and the probability of finding result is more. So through the lazy learning, the intelligence of machine increased, and the work of the human post edit is reduced.
4.Algorithm of Case Based Reasoning
Sentence 1: It is easy to identify some simple writing rules and strategies that can improve the performance of MT system.
Sentence 2: It is easy to identify some simple writing rules that can improve the performance of almost any general-purpose MT system.
Intelligent Case Based Machine Translation System 201 These two sentences are similar, if they are all saved in the case database, the store space is wasted, and the performance of the search is low, so we can take the model from the sentence case:
Model: It is easy to identify NP (ABS) that can improve the performance of NP MT system.
ABS represents the feature of NP, which is an abstract noun.
There are two ways to acquire the Complex Feature
2.Automatic case based learning;
3.With the help of expert.
To keep the stability and efficiency in the CFS, we design the limited rule of grammar, which can prevent the bad or explicit expression.
The algorithm consists of: lexical learning algorithm LexLearning(), phrase or sentence learning algorithm SentLearning().
Algorithm 1. Lexical learning function1
LexLearning(Inst)
Input - analogical sample;
BEGIN
while (SearchWordMemory(Inst, patt)==1) {
/* if find the similar model pattern :patt */
if (WordDist (Inst, patt)==0) then
/* distance between Inst and patt is 0 (equal) */
1 is added to the probability of patt;
else {
if (patt.LivingTime > Live_Time)
delete the pattern patt; search next model;
if (WordDist (Inst, patt) < 1) {
/* same word, different property */
Add the property of Inst into the patt
to TM (Translation Memory);
return;
}
}
Add Inst to TM
END
Algorithm 2. Sentence learning function:2
SentLearning(Inst)
Input Inst - analogical sample;
Variable
AbsFea - Abstract property of sentence in Inst;
DiffArray - Difference array stores the difference of
the Inst and the model in the database;
FeaCollect - The result of search;
Distance - Distance between Inst and patt;
Threshold - The threshold of property distance;
1WordDist (Inst, patt) is the distance of Inst and patt.
2Match (Inst, patt, DiffArray) calculates the distance of Inst and patt, AbstractFeat (AbsFea, Inst) abstracts the sample feature of Inst.
202 Wang JianDe, Chen ZhaoXiong, and Huang HeYan
BEGIN
AbstractFeat(AbsFea, Inst);
/* abstract the property in Inst*/
i=0;
FeaCollect=SearchSentMemory(Inst, AbsFea)
if (FeaCollect Is Empty)
Build a new class A,
add Inst into Class A,
and return
while (FeaCollect Is not Empty) {
Find the analogical model patt in the TM
Excute the fuzzy matching,
calculate the distance of the property distance,
and add the result to the DiffArray
if (patt.LivingTime>Live_Time)
Delete the model patt,
search the next model;
Distance=Match (Inst, patt, DiffArray);
if(Disatance==0)
1 is added to probability property of patt, i++;
}
Find the shortest distance D in DiffArray, and
map into the respond sample model
if (D > Threshold)
build a new class A, add Inst to class A, return;
if (i > 5 and
all values are less than the value of threshold
and the human-machine returns YES) {
if (AbsFeatPattern(NewPatt, DiffArray)==success) Replace all class in patt with NewPatt;
else
Add the inst into the class of patt
}
else
Add Inst into the class of patt
END
There is a group of samples: ‘He studies in Beijing University.’ ‘He studies in my school.’ ‘I study in Beijing. ‘
First the CFS function abstract the common feature: ‘Study in’ and they are different places: ‘he’ and ‘I’ ‘Beijing University ‘and ‘my school ‘and ‘Beijing’Then reduce in the above difference, get the difference of CFS A(x) and A(y)(A represent the grammatical class , x and y represent their semantic property).At last get the CFS: A(x) study in A(y).
Case-based reasoning is realized as matching analogy that obtains function words, syntax features of compared case. Retrieves the best case through measuring similarity. As to matching algorithm, a hybrid method based on multi-function levels and word-driven is described. While performing the matching, it builds the function of difference between compared cases. Matching results show that this kind of analogical matching can reflect not only the surface similarity of characters but also the deep similarity between cases. So it can reach the high accuracy but low complexity. Furthermore, according to the multi-levels structure of memory base, this
Intelligent Case Based Machine Translation System 203
abstract features of cases.
5.Experiment and Evaluation of the Leaning Algorithm
With respect to the construction of case-based analogical MT, we propose and implement a similar translation building method by setting up the direct transfer operators between the case and target translation. The experiments show that this method can avoid building syntax tree of target language so that it performs simply and gets good results when similarity is larger than 80%.
1.Time of feature abstraction. We have select 300 typical sentences whose length
is not equal. The time of model feature abstractions includes that of sentence of pre-process and lexical analysis. From Chart 1, the time of feature abstraction increases when the number of word increase, and the Lazy Learning method is quick than ordinary one.
2.The efficiency of learning in system. To evaluate the efficiency of learning
properly, by regulating the function of learning, we test and analyze the time and accuracy of classification.
We have built TM (Translation Memory) of 1000 sentences, and the number of the test sentences is 200. We have tested the time spent in leaning without classification that is stored into TM directly, and Lazy learning which is store into different clusters.
With the increasing of case data in the TM, the time used in learning is more and more. Comparing with the simple learning, Lazy Learning can spend time 40% more.
To evaluate the precision, we test the TM with 3000 sentence, and the threshold is 80% and 60%, the precision of system is below.
204 Wang JianDe, Chen ZhaoXiong, and Huang HeYan
Table 1. Learning result of system
Threshold Error of Cluster, %Error of classification, % 80%9.78% 6.32%
60%12.4%8.63% The capacity of TM can affect the time of translation. If the size of TM increases, the time spent in translation will be longer. When there are 3000 sentences, the time of translation is 0.5 s, and when the number increases to 10000, the time is 0.7s.
According to the performance of IHSMT system, the learning model of tracking and memorizing, which is combined, with strategies of case-based analogical reasoning and rule-based reasoning, effectively improves the translation quality of ordinary MT system. The larger the memory base is, the less user interaction is involved, and more intelligent the MT system is.
References
1.David B. Leake, CBR in Context: the Present and Future, Case-Based Reasoning
Experiences, Lessons, & Future Directions, AAAI press / the MIT press.
2.Hideo Watanable, A Similarity-Driven Transfer system, COLING-92, pp. 770-774.
3.Huang Heyan, Chen Zhaoxiong, Song Jiping. The Design and Implementation Principle of
an Interactive Hybrid Strategies Machine Translation System: HISMTS. Proceedings of the International conference in Machine Translation & computer language Information processing, Beijing, June 1998, pp. 270-276.
Intelligent Case Based Machine Translation System 205
mbros Cranicas, Harris Papageorgiou, Stelios Piperidis, A Matching Technique In
Example-Based Machine Translation, COLING-94, pp. 100-104.
5.Matthias Heyn, Integrating Machine Translation into Translation Memory Systems, In:
EAMT Workshop(1996), pp. 111-124.
6.S. Sato, Example-Based Machine Translation, Ph.D. Thesis, Kyoto University, 1991.
7.Sergei Nirenburg, Jaime Carbonell, Masaru Tomita, Kenneth Goodman, Machine
Translation: a Knowledge-based Approach, Morgan Kaufmann Publishers, 1992.
8. Sergei Nireburg. Two Approaches to Matching in EBMT, TMI-93, pp. 47-50.
9. Shlomo Argamon, Ido Dagan, and Yuval Krymolowski. A Memory-Based Approach to
Learning Shallow Natural Language Patterns, /find/cmp-lg/9806011, 1998.
10.Wilpon J. & L. Rabiner, A Modified k-Means Clustering Algorithm for Use in Isolated
Word Recognition, IEEE ASSP-33.
11.Walter Daelemans, Memory-Based Lexical Acquistion and Processing, /
find/cmp-lg/9405018, 1994.
12.Riesbeck C., Schank R., Inside Case-Based Reasoning, Lawrence Erlbaum Associates,
1989.。

相关文档
最新文档