Title |
Partial Parsing of Spontaneous Spoken French |
Authors |
Olivier Blanc, Matthieu Constant, Anne Dister and Patrick Watrin |
Abstract |
This paper describes the process and the resources used to automaticallyannotate a French corpus of spontaneous speech transcriptions in super-chunks.Super-chunks are enhanced chunks that can contain lexical multiword units. Thispartial parsing is based on a preprocessing stage of the spoken data thatconsists in reformatting and tagging utterances that break the syntacticstructure of the text, such as disfluencies. Spoken specificities wereformalized thanks to a systematic linguistic study of a 40-hour-long speechtranscription corpus. The chunker uses large-coverage and fine-grained languageresources for general written language that have been augmented with resourcesspecific to spoken French. It consists in iteratively applying finite-statelexical and syntactic resources and outputing a finite automaton representingall possible chunk analyses. The best path is then selected thanks to a hybriddisambiguation stage. We show that our system reaches scores that arecomparable with state-of-the-art results in the field. |
Language |
MultiWord Expressions & Collocations |
Topics |
Parsing, Speech resource/database, MultiWord Expressions & Collocations |
Full paper  |
Partial Parsing of Spontaneous Spoken French |
Bibtex |
@InProceedings{BLANC10.554,
author = {Olivier Blanc, Matthieu Constant, Anne Dister and Patrick Watrin}, title = {Partial Parsing of Spontaneous Spoken French}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |