Summary of the paper

Title From Speech to Trees: Applying Treebank Annotation to Arabic Broadcast News
Authors Mohamed Maamouri, Ann Bies, Seth Kulick, Wajdi Zaghouani, Dave Graff and Mike Ciul
Abstract The Arabic Treebank (ATB) Project at the Linguistic Data Consortium (LDC) hasembarked on a large corpus of Broadcast News (BN) transcriptions, and this hasled to a number of new challenges for the data processing and annotationprocedures that were originally developed for Arabic newswire text (ATB1, ATB2and ATB3). The corpus requirements currently posed by the DARPA GALE Program,including English translation of Arabic BN transcripts, word-level alignment ofArabic and English data, and creation of a corresponding English Treebank,place significant new constraints on ATB corpus creation, and require carefulcoordination among a wide assortment of concurrent activities and participants. Nonetheless, in spite of the new challenges posed by BN data, the ATB’snewly improved pipeline and revised annotation guidelines for newswire haveproven to be robust enough that very few changes were necessary to account forthe new genre of data. This paper presents the points where some adaptationhas been necessary, and the overall pipeline as used in the production of BNATB data.
Language Part of speech tagging
Topics Corpus (creation, annotation, etc.), Parsing, Part of speech tagging
Full paper From Speech to Trees: Applying Treebank Annotation to Arabic Broadcast News
Bibtex @InProceedings{MAAMOURI10.558,
  author = {Mohamed Maamouri, Ann Bies, Seth Kulick, Wajdi Zaghouani, Dave Graff and Mike Ciul},
  title = {From Speech to Trees: Applying Treebank Annotation to Arabic Broadcast News},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA