Summary of the paper

Title Text Cluster Trimming for Better Descriptions and Improved Quality
Authors Magnus Rosell
Abstract Text clustering is potentially very useful for exploration of text sets thatare too large to study manually. The success of such a tool depends on whetherthe results can be explained to the user. An automatically extracted clusterdescription usually consists of a few words that are deemed representative forthe cluster. It is preferably short in order to be easily grasped. However,text cluster content is often diverse. We introduce a trimming method thatremoves texts that do not contain any, or a few of the words in the clusterdescription. The result is clusters that match their descriptions better. Inexperiments on two quite different text sets we obtain significant improvementsin both internal and external clustering quality for the trimmed clusteringcompared to the original. The trimming thus has two positive effects: it forcesthe clusters to agree with their descriptions (resulting in betterdescriptions) and improves the quality of the trimmed clusters.
Language Knowledge Discovery/Representation
Topics Document Classification, Text categorisation, Text mining, Knowledge Discovery/Representation
Full paper Text Cluster Trimming for Better Descriptions and Improved Quality
Bibtex @InProceedings{ROSELL10.540,
  author = {Magnus Rosell},
  title = {Text Cluster Trimming for Better Descriptions and Improved Quality},
  booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA