Title |
Text Cluster Trimming for Better Descriptions and Improved Quality |
Authors |
Magnus Rosell |
Abstract |
Text clustering is potentially very useful for exploration of text sets thatare too large to study manually. The success of such a tool depends on whetherthe results can be explained to the user. An automatically extracted clusterdescription usually consists of a few words that are deemed representative forthe cluster. It is preferably short in order to be easily grasped. However,text cluster content is often diverse. We introduce a trimming method thatremoves texts that do not contain any, or a few of the words in the clusterdescription. The result is clusters that match their descriptions better. Inexperiments on two quite different text sets we obtain significant improvementsin both internal and external clustering quality for the trimmed clusteringcompared to the original. The trimming thus has two positive effects: it forcesthe clusters to agree with their descriptions (resulting in betterdescriptions) and improves the quality of the trimmed clusters. |
Language |
Knowledge Discovery/Representation |
Topics |
Document Classification, Text categorisation, Text mining, Knowledge Discovery/Representation |
Full paper  |
Text Cluster Trimming for Better Descriptions and Improved Quality |
Bibtex |
@InProceedings{ROSELL10.540,
author = {Magnus Rosell}, title = {Text Cluster Trimming for Better Descriptions and Improved Quality}, booktitle = {Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |