Summary of the paper

Title Improvements in Parsing the Index Thomisticus Treebank. Revision, Combination and a Feature Model for Medieval Latin
Authors Marco Passarotti and Felice Dell'Orletta
Abstract The creation of language resources for less-resourced languages like thehistorical ones benefits from the exploitation of language-independent toolsand methods developed over the years by many projects for modern languages.Along these lines, a number of treebanks for historical languages startedrecently to arise, including treebanks for Latin. Among the Latin treebanks,the Index Thomisticus Treebank is a 68,000 token dependency treebank based onthe Index Thomisticus by Roberto Busa SJ, which contains the opera omnia ofThomas Aquinas (118 texts) as well as 61 texts by other authors related toThomas, for a total of approximately 11 million tokens. In this paper, wedescribe a number of modifications that we applied to the dependency parserDeSR, in order to improve the parsing accuracy rates on the Index ThomisticusTreebank. First, we adapted the parser to the specific processing of MedievalLatin, defining an ad-hoc configuration of its features. Then, in order toimprove the accuracy rates provided by DeSR, we applied a revision parsingmethod and we combined the outputs produced by different algorithms. Thisallowed us to improve accuracy rates substantially, reaching results that arewell beyond the state of the art of parsing for Latin.
Language
Topics Parsing, Corpus (creation, annotation, etc.)
Full paper Improvements in Parsing the Index Thomisticus Treebank. Revision, Combination and a Feature Model for Medieval Latin
Bibtex @InProceedings{PASSAROTTI10.178,
  author = {Marco Passarotti and Felice Dell'Orletta},
  title = {Improvements in Parsing the Index Thomisticus Treebank. Revision, Combination and a Feature Model for Medieval Latin},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA