Title |
Corpora for Automatically Learning to Map Natural Language Questions into SQL Queries |
Authors |
Alessandra Giordani and Alessandro Moschitti |
Abstract |
Automatically translating natural language into machine-readable instructionsis one of major interesting and challenging tasks in Natural Language (NL)Processing. This problem can be addressed by using machine learning algorithms to generatea function that find mappings between natural language and programming languagesemantics. For this purpose suitable annotated and structured data arerequired.In this paper, we describe our method to construct and semi-automaticallyannotate these kinds of data, consisting of pairs of NL questions and SQLqueries. Additionally, we describe two different datasets obtained by applyingour annotation method to two well-known corpora, GeoQueries and RestQueries.Since we believe that syntactic levels are important, we also generate and makeavailable relational pairs represented by means of their syntactic trees whoselexical content has been generalized. We validate the quality of our corpora by experimenting with them and ourmachine learning models to derive automatic NL/SQL translators.Our promising results suggest that our corpora can be effectively used to carryout research in the field of natural language interface to database. |
Language |
Knowledge Discovery/Representation |
Topics |
Corpus (creation, annotation, etc.), Question Answering, Knowledge Discovery/Representation |
Full paper  |
Corpora for Automatically Learning to Map Natural Language Questions into SQL Queries |
Bibtex |
@InProceedings{GIORDANI10.724,
author = {Alessandra Giordani and Alessandro Moschitti}, title = {Corpora for Automatically Learning to Map Natural Language Questions into SQL Queries}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |