Abstract
To alleviate sparseness in short text clustering, considerable researches investigate external information such as Wikipedia to enrich feature representation, which requires extra works and resources and might lead to possible inconsistency. Sparseness leads to weak connections between short texts, thus the similarity information is difficult to be measured. We introduce a special term-specific document set—potential locality set—to capture weak similarity. Specifically, for any two short documents within the same potential locality, the Jaccard similarity between them is greater than 0. In other words, the adjacency graph based on these weak connections is a complete graph. Further, a locality-sensitive term weighting scheme is proposed based on our potential locality set. Experimental results show the proposed approach builds more reliable neighborhood for short text data. Compared with another state-of-the-art algorithm, the proposed approach obtains better clustering performances, which verifies its effectiveness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
83% text-based recommender systems in the domain of digital libraries, see https://en.wikipedia.org/wiki/Tf-idf.
- 2.
idf has a probabilistic explanation of the odds that t occurs in d.
- 3.
- 4.
- 5.
References
Jin, O., Liu, N.N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering. In: 20th International Conference on Information and Knowledge Management, pp. 775–784. ACM, Glasgow, Scotland, UK (2011)
Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: 15th International Conference on World Wide Web, pp. 377–386. ACM, Edinburgh, Scotland (2006)
Phan, X.H., Nguyen, C.T., Le, D.T., Nguyen, L.M., Horiguchi, S., Ha, Q.T.: A hidden topic-based framework toward building applications with short web documents. Trans. KDE 23(7), 961–976 (2011)
Xu, J., Xu, B., Wang, P., Zheng, S., Tian, G., Zhao, J.: Self-taught convolutional neural networks for short text clustering. J. Neural Netw. 88, 22–32 (2017)
Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L., Hao, H.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. J. Neurocomput. 174, 806–814 (2016)
Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: International Joint Conference on Artificial Intelligence, pp. 1776–1781 (2011)
Wang, Z., Mi, H., Ittycheriah, A.: Semi-supervised clustering for short text via deep representation learning. In: 20th Conference on Computational Natural Language Learning, pp. 31–39, Berlin, Germany (2016)
Luo, H., Tang, Y.Y., Li, C., Yang, L.: Local and global geometric structure preserving and application to hyperspectral image classification. J. Math. Prob. Eng. 2015, 13 p (2015)
Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems, pp. 585–591 (2002)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Xing, E.P., Jordan, M.I., Russell, S.J., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: Advances in Neural Information Processing Systems, pp. 521–528 (2003)
Finegan, C., Coke, R., Zhang, R., Ye, X., Radev, D.: Effects of creativity and cluster tightness on short text clustering performance. In: 54th Annual Meeting of the Association for Computational Linguistics, pp. 654–665, Berlin, Germany (2016)
Xu, J., Peng, W., Guanhua, T., Bo, X., Jun, Z., Fangyuan, W., Hongwei, H.: Short text clustering via convolutional neural networks. In: NAACL-HLT, pp. 62–69, Denver, Colorado (2015)
Yan, X., Guo, J., Lan, Y., Cheng, X.: A Biterm topic model for short texts. In: 22nd International Conference on World Wide Web, pp. 1445–1456 (2013)
Acknowledgement
The work described in this paper was partially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China [Project No. CityU 11300715], and a grant from City University of Hong Kong [Project No. 7004674].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zheng, CT., Qian, S., Cao, WM., Wong, HS. (2017). Locality-Sensitive Term Weighting for Short Text Clustering. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10634. Springer, Cham. https://doi.org/10.1007/978-3-319-70087-8_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-70087-8_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70086-1
Online ISBN: 978-3-319-70087-8
eBook Packages: Computer ScienceComputer Science (R0)