Summary of the paper

Title CASIA-CASSIL: a Chinese Telephone Conversation Corpus in Real Scenarios with Multi-leveled Annotation
Authors Keyan Zhou, Aijun Li, Zhigang Yin and Chengqing Zong
Abstract CASIA-CASSIL is a large-scale corpus base of Chinese human-humannaturally-occurring telephone conversations in restricted domains. The firstedition consists of 792 90-second conversations belonging to tourism domain,which are selected from 7,639 spontaneous telephone recordings in realscenarios. The corpus is now being annotated with wide range of linguistic andparalinguistic information in multi-levels. The annotations include Turns,Speaker Gender, Orthographic Transcription, Chinese Syllable, Chinese PhoneticTranscription, Prosodic Boundary, Stress of Sentence, Non-Speech Sounds, VoiceQuality, Topic, Dialog-act and Adjacency Pairs, Ill-formedness, and ExpressiveEmotion as well, 13 levels in total. The abundant annotation will be effectiveespecially for studying Chinese spoken language phenomena. This paper describesthe whole process to build the conversation corpus, including collecting andselecting the original data, and the follow-up process such as transcribing,annotating, and so on. CASIA-CASSIL is being extended to a large scale corpusbase of annotated Chinese dialogs for spoken Chinese study.
Language LR national/international projects, organizational/policy issues
Topics Corpus (creation, annotation, etc.), Discourse annotation, representation and processing, LR national/international projects, organizational/policy issues
Full paper CASIA-CASSIL: a Chinese Telephone Conversation Corpus in Real Scenarios with Multi-leveled Annotation
Bibtex @InProceedings{ZHOU10.248,
  author = {Keyan Zhou, Aijun Li, Zhigang Yin and Chengqing Zong},
  title = {CASIA-CASSIL: a Chinese Telephone Conversation Corpus in Real Scenarios with Multi-leveled Annotation},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA