Title |
Evaluating Human-Machine Conversation for Appropriateness |
Authors |
Nick Webb, David Benyon, Preben Hansen and Oil Mival |
Abstract |
Evaluation of complex, collaborative dialogue systems is a difficult task.Traditionally, developers have relied upon subjective feedback from the user,and parametrisation over observable metrics. However, both models place somereliance on the notion of a task; that is, the system is helping to userachieve some clearly defined goal, such as book a flight or complete a bankingtransaction. It is not clear that such metrics are as useful when dealing witha system that has a more complex task, or even no definable task at all, beyondmaintain and performing a collaborative dialogue. Working within the EU fundedCOMPANIONS program, we investigate the use of appropriateness as a measure ofconversation quality, the hypothesis being that good companions need to be goodconversational partners . We report initial work in the direction of annotatingdialogue for indicators of good conversation, including the annotation andcomparison of the output of two generations of the same dialogue system. |
Language |
Usability, user satisfaction |
Topics |
Dialogue, Evaluation methodologies, Usability, user satisfaction |
Full paper  |
Evaluating Human-Machine Conversation for Appropriateness |
Bibtex |
@InProceedings{WEBB10.115,
author = {Nick Webb, David Benyon, Preben Hansen and Oil Mival}, title = {Evaluating Human-Machine Conversation for Appropriateness}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |