Title |
Cultural Heritage: Knowledge Extraction from Web Documents |
Authors |
Eva Sassolini and Alessandra Cinini |
Abstract |
This article presents the use of NLP techniques (text mining, text analysis) todevelop specific tools that allow to create linguistic resources related to thecultural heritage domain. The aim of our approach is to create tools for the building of an onlineknowledge network, automatically extracted from text materials concerningthis domain. A particular methodology was experimented by dividing theautomatic acquisition of texts, and consequently, the creation of referencecorpus in two phases. In the first phase, on-line documents have been extractedfrom lists of links provided by human experts. All documents extracted from theweb by means of automatic spider have been stored in a repository of textmaterials. On the basis of these documents, automatic parsers create thereference corpus for the cultural heritage domain. Relevant information andsemantic concepts are then extracted from this corpus. In a second phase, allthese semantically relevant elements (such as proper names, names ofinstitutions, names of places, and other relevant terms) have been used asbasis for a new search strategy of text materials from heterogeneous sources.In this case also specialized crawlers (TP-crawler) have been used to work on abulk of text materials available on line. |
Language |
Named Entity recognition |
Topics |
Information Extraction, Information Retrieval, Text mining, Named Entity recognition |
Full paper  |
Cultural Heritage: Knowledge Extraction from Web Documents |
Bibtex |
@InProceedings{SASSOLINI10.415,
author = {Eva Sassolini and Alessandra Cinini}, title = {Cultural Heritage: Knowledge Extraction from Web Documents}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |