Summary of the paper

Title ProPOSEC: A Prosody and PoS Annotated Spoken English Corpus
Authors Claire Brierley and Eric Atwell
Abstract We have previously reported on ProPOSEL, a purpose-built Prosody and PoSEnglish Lexicon compatible with the Python Natural Language ToolKit. ProPOSECis a new corpus research resource built using this lexicon, intended fordistribution with the Aix-MARSEC dataset. ProPOSEC comprises multi-levelparallel annotations, juxtaposing prosodic and syntactic information fromdifferent versions of the Spoken English Corpus, with canonical dictionaryforms, in a query format optimized for Perl, Python, and text processingprograms. The order and content of fields in the text file is as follows: (1)Aix-MARSEC file number; (2) word; (3) LOB PoS-tag; (4) C5 PoS-tag; (5) AixSAM-PA phonetic transcription; (6) SAM-PA phonetic transcription from ProPOSEL;(7) syllable count; (8) lexical stress pattern; (9) default content or functionword tag; (10) DISC stressed and syllabified phonetic transcription; (11)alternative DISC representation, incorporating lexical stress pattern; (12)nested arrays of phonemes and tonic stress marks from Aix. As an experimentaldataset, ProPOSEC can be used to study correlations between these annotationtiers, where significant findings are then expressed as additional features forphrasing models integral to Text-to-Speech and Speech Recognition. As atraining set, ProPOSEC can be used for machine learning tasks in InformationRetrieval and Speech Understanding systems.
Language Prosody
Topics Corpus (creation, annotation, etc.), Part of speech tagging, Prosody
Full paper ProPOSEC: A Prosody and PoS Annotated Spoken English Corpus
Bibtex @InProceedings{BRIERLEY10.749,
  author = {Claire Brierley and Eric Atwell},
  title = {ProPOSEC: A Prosody and PoS Annotated Spoken English Corpus},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA