LREC 2010 Proceedings

Summary of the paper

Title	Is my Judge a good One?
Authors	Olivier Hamon
Abstract	This paper aims at measuring the reliability of judges in MT evaluation. Thescope is two evaluation campaigns from the CESTA project, during which humanevaluations were carried out on fluency and adequacy criteria forEnglish-to-French documents. Our objectives were threefold: observe both inter-and intra-judge agreements, and then study the influence of the evaluationdesign especially implemented for the need of the campaigns. Indeed, a webinterface was especially developed to help with the human judgments and storethe results, but some design changes were made between the first and the secondcampaign. Considering the low agreements observed, the judges' behaviour hasbeen analysed in that specific context. We also asked several judges to repeattheir own evaluations a few times after the first judgments done during theofficial evaluation campaigns. Even if judges did not seem to agree fully atfirst sight, a less strict comparison led to a strong agreement. Furthermore,the evolution of the design during the project seemed to have been a source forthe difficulties that judges encountered to keep the same interpretation ofquality.
Language
Topics	Evaluation methodologies, Machine Translation, SpeechToSpeech Translation
Full paper	Is my Judge a good One?
Bibtex	@InProceedings{HAMON10.402, author = {Olivier Hamon}, title = {Is my Judge a good One?}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} }