Locality-Sensitive Term Weighting for Short Text Clustering

Zheng, Chu-Tao; Qian, Sheng; Cao, Wen-Ming; Wong, Hau-San

doi:10.1007/978-3-319-70087-8_46

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10634))

Included in the following conference series:

International Conference on Neural Information Processing

4848 Accesses

Abstract

To alleviate sparseness in short text clustering, considerable researches investigate external information such as Wikipedia to enrich feature representation, which requires extra works and resources and might lead to possible inconsistency. Sparseness leads to weak connections between short texts, thus the similarity information is difficult to be measured. We introduce a special term-specific document set—potential locality set—to capture weak similarity. Specifically, for any two short documents within the same potential locality, the Jaccard similarity between them is greater than 0. In other words, the adjacency graph based on these weak connections is a complete graph. Further, a locality-sensitive term weighting scheme is proposed based on our potential locality set. Experimental results show the proposed approach builds more reliable neighborhood for short text data. Compared with another state-of-the-art algorithm, the proposed approach obtains better clustering performances, which verifies its effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

CHF34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: CHF 24.95; Price includes VAT (Switzerland)

eBook: CHF 94.00; Price excludes VAT (Switzerland)

Softcover Book: CHF 118.00; Price excludes VAT (Switzerland)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Clustering small-sized collections of short texts

Article 30 November 2017

Word Mover’s Distance for Agglomerative Short Text Clustering

Anomaly-aware symmetric non-negative matrix factorization for short text clustering

Article 04 November 2024

Notes

1.
83% text-based recommender systems in the domain of digital libraries, see https://en.wikipedia.org/wiki/Tf-idf.
2.
idf has a probabilistic explanation of the odds that t occurs in d.
3.
http://jwebpro.sourceforge.net/data-web-snippets.tar.gz.
4.
https://github.com/jacoxu/StackOverflow.
5.
https://github.com/xiaohuiyan/BTM.

References

Jin, O., Liu, N.N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering. In: 20th International Conference on Information and Knowledge Management, pp. 775–784. ACM, Glasgow, Scotland, UK (2011)
Google Scholar
Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: 15th International Conference on World Wide Web, pp. 377–386. ACM, Edinburgh, Scotland (2006)
Google Scholar
Phan, X.H., Nguyen, C.T., Le, D.T., Nguyen, L.M., Horiguchi, S., Ha, Q.T.: A hidden topic-based framework toward building applications with short web documents. Trans. KDE 23(7), 961–976 (2011)
Google Scholar
Xu, J., Xu, B., Wang, P., Zheng, S., Tian, G., Zhao, J.: Self-taught convolutional neural networks for short text clustering. J. Neural Netw. 88, 22–32 (2017)
Article Google Scholar
Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L., Hao, H.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. J. Neurocomput. 174, 806–814 (2016)
Article Google Scholar
Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: International Joint Conference on Artificial Intelligence, pp. 1776–1781 (2011)
Google Scholar
Wang, Z., Mi, H., Ittycheriah, A.: Semi-supervised clustering for short text via deep representation learning. In: 20th Conference on Computational Natural Language Learning, pp. 31–39, Berlin, Germany (2016)
Google Scholar
Luo, H., Tang, Y.Y., Li, C., Yang, L.: Local and global geometric structure preserving and application to hyperspectral image classification. J. Math. Prob. Eng. 2015, 13 p (2015)
Google Scholar
Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems, pp. 585–591 (2002)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Xing, E.P., Jordan, M.I., Russell, S.J., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: Advances in Neural Information Processing Systems, pp. 521–528 (2003)
Google Scholar
Finegan, C., Coke, R., Zhang, R., Ye, X., Radev, D.: Effects of creativity and cluster tightness on short text clustering performance. In: 54th Annual Meeting of the Association for Computational Linguistics, pp. 654–665, Berlin, Germany (2016)
Google Scholar
Xu, J., Peng, W., Guanhua, T., Bo, X., Jun, Z., Fangyuan, W., Hongwei, H.: Short text clustering via convolutional neural networks. In: NAACL-HLT, pp. 62–69, Denver, Colorado (2015)
Google Scholar
Yan, X., Guo, J., Lan, Y., Cheng, X.: A Biterm topic model for short texts. In: 22nd International Conference on World Wide Web, pp. 1445–1456 (2013)
Google Scholar

Download references

Acknowledgement

The work described in this paper was partially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China [Project No. CityU 11300715], and a grant from City University of Hong Kong [Project No. 7004674].

Author information

Authors and Affiliations

Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
Chu-Tao Zheng, Sheng Qian, Wen-Ming Cao & Hau-San Wong

Authors

Chu-Tao Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Qian
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Ming Cao
View author publications
You can also search for this author in PubMed Google Scholar
Hau-San Wong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hau-San Wong .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, China
Derong Liu
Guangdong University of Technology, Guangzhou, China
Shengli Xie
South China University of Technology, Guangzhou, China
Yuanqing Li
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Dongbin Zhao
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, CT., Qian, S., Cao, WM., Wong, HS. (2017). Locality-Sensitive Term Weighting for Short Text Clustering. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10634. Springer, Cham. https://doi.org/10.1007/978-3-319-70087-8_46

Download citation

DOI: https://doi.org/10.1007/978-3-319-70087-8_46
Published: 24 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70086-1
Online ISBN: 978-3-319-70087-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Locality-Sensitive Term Weighting for Short Text Clustering

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Clustering small-sized collections of short texts

Word Mover’s Distance for Agglomerative Short Text Clustering

Anomaly-aware symmetric non-negative matrix factorization for short text clustering

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Locality-Sensitive Term Weighting for Short Text Clustering

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Clustering small-sized collections of short texts

Word Mover’s Distance for Agglomerative Short Text Clustering

Anomaly-aware symmetric non-negative matrix factorization for short text clustering

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation