Title |
Error Correction for Arabic Dictionary Lookup |
Authors |
C. Anton Rytting, Paul Rodrigues, Tim Buckwalter, David Zajic, Bridget Hirsch, Jeff Carnes, Nathanael Lynn, Sarah Wayland, Chris Taylor, Jason White, Charles Blake III, Evelyn Browne, Corey Miller and Tristan Purvis |
Abstract |
We describe a new Arabic spelling correction system which is intended for usewith electronic dictionary search by learners of Arabic. Unlike other spellingcorrection systems, this system does not depend on a corpus of attested studenterrors but on student- and teacher-generated ratings of confusable pairs ofphonemes or letters. Separate error modules for keyboard mistypings, phoneticconfusions, and dialectal confusions are combined to create a weightedfinite-state transducer that calculates the likelihood that an input stringcould correspond to each citation form in a dictionary of Iraqi Arabic. Results are ranked by the estimated likelihood that a citation form could bemisheard, mistyped, or mistranscribed for the input given by the user. Toevaluate the system, we developed a noisy-channel model trained on studentsspeech errors and use it to perturb citation forms from a dictionary. Wecompare our system to a baseline based on Levenshtein distance and find that,when evaluated on single-error queries, our system performs 28% better than thebaseline (overall MRR) and is twice as good at returning the correct dictionaryform as the top-ranked result. We believe this to be the firstspellingcorrection system designed for a spoken, colloquial dialect of Arabic. |
Language |
Authoring tools, proofing |
Topics |
Lexicon, lexical database, Information Extraction, Information Retrieval, Authoring tools, proofing |
Full paper  |
Error Correction for Arabic Dictionary Lookup |
Bibtex |
@InProceedings{RYTTING10.440,
author = {C. Anton Rytting, Paul Rodrigues, Tim Buckwalter, David Zajic, Bridget Hirsch, Jeff Carnes, Nathanael Lynn, Sarah Wayland, Chris Taylor, Jason White, Charles Blake III, Evelyn Browne, Corey Miller and Tristan Purvis}, title = {Error Correction for Arabic Dictionary Lookup}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |