Title |
ProPOSEC: A Prosody and PoS Annotated Spoken English Corpus |
Authors |
Claire Brierley and Eric Atwell |
Abstract |
We have previously reported on ProPOSEL, a purpose-built Prosody and PoSEnglish Lexicon compatible with the Python Natural Language ToolKit. ProPOSECis a new corpus research resource built using this lexicon, intended fordistribution with the Aix-MARSEC dataset. ProPOSEC comprises multi-levelparallel annotations, juxtaposing prosodic and syntactic information fromdifferent versions of the Spoken English Corpus, with canonical dictionaryforms, in a query format optimized for Perl, Python, and text processingprograms. The order and content of fields in the text file is as follows: (1)Aix-MARSEC file number; (2) word; (3) LOB PoS-tag; (4) C5 PoS-tag; (5) AixSAM-PA phonetic transcription; (6) SAM-PA phonetic transcription from ProPOSEL;(7) syllable count; (8) lexical stress pattern; (9) default content or functionword tag; (10) DISC stressed and syllabified phonetic transcription; (11)alternative DISC representation, incorporating lexical stress pattern; (12)nested arrays of phonemes and tonic stress marks from Aix. As an experimentaldataset, ProPOSEC can be used to study correlations between these annotationtiers, where significant findings are then expressed as additional features forphrasing models integral to Text-to-Speech and Speech Recognition. As atraining set, ProPOSEC can be used for machine learning tasks in InformationRetrieval and Speech Understanding systems. |
Language |
Prosody |
Topics |
Corpus (creation, annotation, etc.), Part of speech tagging, Prosody |
Full paper  |
ProPOSEC: A Prosody and PoS Annotated Spoken English Corpus |
Bibtex |
@InProceedings{BRIERLEY10.749,
author = {Claire Brierley and Eric Atwell}, title = {ProPOSEC: A Prosody and PoS Annotated Spoken English Corpus}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |