Title |
CASIA-CASSIL: a Chinese Telephone Conversation Corpus in Real Scenarios with Multi-leveled Annotation |
Authors |
Keyan Zhou, Aijun Li, Zhigang Yin and Chengqing Zong |
Abstract |
CASIA-CASSIL is a large-scale corpus base of Chinese human-humannaturally-occurring telephone conversations in restricted domains. The firstedition consists of 792 90-second conversations belonging to tourism domain,which are selected from 7,639 spontaneous telephone recordings in realscenarios. The corpus is now being annotated with wide range of linguistic andparalinguistic information in multi-levels. The annotations include Turns,Speaker Gender, Orthographic Transcription, Chinese Syllable, Chinese PhoneticTranscription, Prosodic Boundary, Stress of Sentence, Non-Speech Sounds, VoiceQuality, Topic, Dialog-act and Adjacency Pairs, Ill-formedness, and ExpressiveEmotion as well, 13 levels in total. The abundant annotation will be effectiveespecially for studying Chinese spoken language phenomena. This paper describesthe whole process to build the conversation corpus, including collecting andselecting the original data, and the follow-up process such as transcribing,annotating, and so on. CASIA-CASSIL is being extended to a large scale corpusbase of annotated Chinese dialogs for spoken Chinese study. |
Language |
LR national/international projects, organizational/policy issues |
Topics |
Corpus (creation, annotation, etc.), Discourse annotation, representation and processing, LR national/international projects, organizational/policy issues |
Full paper  |
CASIA-CASSIL: a Chinese Telephone Conversation Corpus in Real Scenarios with Multi-leveled Annotation |
Bibtex |
@InProceedings{ZHOU10.248,
author = {Keyan Zhou, Aijun Li, Zhigang Yin and Chengqing Zong}, title = {CASIA-CASSIL: a Chinese Telephone Conversation Corpus in Real Scenarios with Multi-leveled Annotation}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |