LREC 2010 Proceedings

Summary of the paper

Title	Partial Parsing of Spontaneous Spoken French
Authors	Olivier Blanc, Matthieu Constant, Anne Dister and Patrick Watrin
Abstract	This paper describes the process and the resources used to automaticallyannotate a French corpus of spontaneous speech transcriptions in super-chunks.Super-chunks are enhanced chunks that can contain lexical multiword units. Thispartial parsing is based on a preprocessing stage of the spoken data thatconsists in reformatting and tagging utterances that break the syntacticstructure of the text, such as disfluencies. Spoken specificities wereformalized thanks to a systematic linguistic study of a 40-hour-long speechtranscription corpus. The chunker uses large-coverage and fine-grained languageresources for general written language that have been augmented with resourcesspecific to spoken French. It consists in iteratively applying finite-statelexical and syntactic resources and outputing a finite automaton representingall possible chunk analyses. The best path is then selected thanks to a hybriddisambiguation stage. We show that our system reaches scores that arecomparable with state-of-the-art results in the field.
Language	MultiWord Expressions & Collocations
Topics	Parsing, Speech resource/database, MultiWord Expressions & Collocations
Full paper	Partial Parsing of Spontaneous Spoken French
Bibtex	@InProceedings{BLANC10.554, author = {Olivier Blanc, Matthieu Constant, Anne Dister and Patrick Watrin}, title = {Partial Parsing of Spontaneous Spoken French}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} }