Unsupervised Keyphrase Extraction from Scientific Publications

Papagiannopoulou, Eirini; Tsoumakas, Grigorios

doi:10.1007/978-3-031-24337-0_16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13451))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

426 Accesses

Abstract

We propose a novel unsupervised keyphrase extraction approach that filters candidate keywords using outlier detection. It starts by training word embeddings on the _target document to capture semantic regularities among the words. It then uses the minimum covariance determinant estimator to model the distribution of non-keyphrase word vectors, under the assumption that these vectors come from the same distribution, indicative of their irrelevance to the semantics expressed by the dimensions of the learned vector representation. Candidate keyphrases only consist of words that are detected as outliers of this dominant distribution. Empirical results show that our approach outperforms state-of-the-art and recent unsupervised keyphrase extraction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

CHF34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: CHF 24.95; Price includes VAT (Switzerland)

eBook: CHF 94.00; Price excludes VAT (Switzerland)

Softcover Book: CHF 118.00; Price excludes VAT (Switzerland)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

KP-Rank: a semantic-based unsupervised approach for keyphrase extraction from text data

Article 11 January 2021

A Framework for Keyphrase Extraction from Scientific Journals

SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation

Notes

References

Boudin, F.: PKE: an open source python-based keyphrase extraction toolkit. In: Proceedings of the 26th International Conference on Computational Linguistics, COLING 2016, Proceedings of the Conference System Demonstrations, Osaka, Japan, pp. 69–73 (2016). https://aclweb.org/anthology/C/C16/C16-2015.pdf
Boudin, F.: Unsupervised keyphrase extraction with multipartite graphs. In: Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics Proceedings of NAACL, NAACL 2018, New Orleans (2018)
Google Scholar
Bougouin, A., Boudin, F., Daille, B.: TopicRank: graph-based topic ranking for keyphrase extraction. In: Proceedings of the 6th International Joint Conference on Natural Language Processing, IJCNLP 2013, Nagoya, Japan, pp. 543–551 (2013). https://aclweb.org/anthology/I/I13/I13-1062.pdf
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1–7), 107–117 (1998)
Google Scholar
Das, S.: Elements of artificial neural networks [book reviews]. IEEE Trans. Neural Netw. 9(1), 234–235 (1998)
Article Google Scholar
Dreiseitl, S., Osl, M., Scheibböck, C., Binder, M.: Outlier detection with one-class SVMs: an application to melanoma prognosis. In: AMIA Annual Symposium Proceedings. AMIA Symposium 2010, pp. 172–176 (2010). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041295/
Florescu, C., Caragea, C.: A position-biased pagerank algorithm for keyphrase extraction. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, California, USA, pp. 4923–4924 (2017). https://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14377
Florescu, C., Caragea, C.: PositionRank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, pp. 1105–1115 (2017). https://doi.org/10.18653/v1/P17-1102
Goldstein, M., Uchida, S.: A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS ONE 11(4), e0152173 (2016)
Article Google Scholar
Gollapalli, S.D., Caragea, C.: Extracting keyphrases from research papers using citation networks. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence, Québec, Canada, pp. 1629–1635 (2014). https://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8662
Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, (Volume 1: Long Papers), Baltimore, MD, USA, pp. 1262–1273 (2014). https://aclweb.org/anthology/P/P14/P14-1119.pdf
Hawkins, S., He, H., Williams, G.J., Baxter, R.A.: Outlier detection using replicator neural networks. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 170–180. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-46145-0_17
Chapter Google Scholar
Hubert, M., Debruyne, M.: Minimum covariance determinant. Wiley Interdisc. Rev.: Comput. Stat. 2(1), 36–43 (2010)
Article Google Scholar
Hubert, M., Debruyne, M., Rousseeuw, P.J.: Minimum covariance determinant and extensions. Wiley Interdisc. Rev.: Comput. Stat. 10(3), e1421 (2018)
Article MathSciNet Google Scholar
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP 2003, Stroudsburg, PA, USA, pp. 216–223 (2003). https://doi.org/10.3115/1119355.1119383
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 11–21 (1972)
Article Google Scholar
Kim, S.N., Medelyan, O., Kan, M., Baldwin, T.: SemEval-2010 task 5: automatic keyphrase extraction from scientific articles. In: Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval@ACL 2010, Uppsala, Sweden, pp. 21–26 (2010). https://aclweb.org/anthology/S/S10/S10-1004.pdf
Krapivin, M., Autayeu, A., Marchese, M.: Large dataset for keyphrases extraction. In: Technical Report DISI-09-055, Trento, Italy (2008)
Google Scholar
Liu, F.T., Ting, K.M., Zhou, Z.: Isolation forest. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), Pisa, Italy, 15–19 December 2008, pp. 413–422 (2008). https://doi.org/10.1109/ICDM.2008.17
Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, Massachussets, USA, pp. 366–376 (2010). https://www.aclweb.org/anthology/D10-1036
Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, Singapore, pp. 257–266 (2009). https://www.aclweb.org/anthology/D09-1027
Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, Singapore, pp. 1318–1327 (2009). https://www.aclweb.org/anthology/D09-1137
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004, Barcelona, Spain, pp. 404–411 (2004). https://www.aclweb.org/anthology/W04-3252
Moya, M.M., Hush, D.R.: Network constraints and multi-objective optimization for one-class classification. Neural Netw. 9(3), 463–474 (1996)
Article Google Scholar
Nguyen, T.D., Kan, M.-Y.: Keyphrase extraction in scientific publications. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 317–326. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77094-7_41
Chapter Google Scholar
Papagiannopoulou, E., Tsoumakas, G.: Local word vectors guiding keyphrase extraction. Inf. Process. Manag. 54(6), 888–902 (2018). https://doi.org/10.1016/j.ipm.2018.06.004
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). https://dl.acm.org/citation.cfm?id=2078195
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, Doha, Qatar, pp. 1532–1543 (2014). https://aclweb.org/anthology/D/D14/D14-1162.pdf
Rousseau, F., Vazirgiannis, M.: Main core retention on graph-of-words for single-document keyword extraction. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 382–393. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16354-3_42
Chapter Google Scholar
Rousseeuw, P.J.: Least median of squares regression. J. Am. Stat. Assoc. 79(388), 871–880 (1984). https://doi.org/10.1080/01621459.1984.10477105
Article MathSciNet MATH Google Scholar
Rousseeuw, P.J., van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3), 212–223 (1999)
Article Google Scholar
Rousseeuw, P.J., Hubert, M.: Robust statistics for outlier detection. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 1(1), 73–79 (2011). https://doi.org/10.1002/widm.2
Article Google Scholar
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Article MATH Google Scholar
Schölkopf, B., Williamson, R.C., Smola, A.J., Shawe-Taylor, J., Platt, J.C.: Support vector method for novelty detection. In: Advances in Neural Information Processing Systems 12, NIPS Conference, Denver, Colorado, USA, 29 November–4 December 1999, pp. 582–588 (1999). https://papers.nips.cc/paper/1723-support-vector-method-for-novelty-detection
Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, Illinois, USA, pp. 855–860 (2008). https://www.aaai.org/Library/AAAI/2008/aaai08-136.php
Wang, R., Liu, W., McDonald, C.: Corpus-independent generic keyphrase extraction using word embedding vectors. In: Software Engineering Research Conference (2014)
Google Scholar
Wang, R., Liu, W., McDonald, C.: Using word embeddings to enhance keyword identification for scientific publications. In: Sharaf, M.A., Cheema, M.A., Qi, J. (eds.) ADC 2015. LNCS, vol. 9093, pp. 257–268. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19548-3_21
Chapter Google Scholar
Wille, L.T.: Review of “Learning Kernel Classifiers: Theory and Algorithms by Ralf Herbrich”. MIT Press, Cambridge (2002). 13–17, ISBN 026208306x, p. 384; and review of “learning with kernels: support vector machines, regularization optimization and beyond by Bernhard Scholkopf and Alexander J. Smola”. IT Press, Cambridge (2002). ISBN 0262194759, p. 644. SIGACT News 35(3) (2004). https://doi.org/10.1145/1027914.1027921

Download references

Author information

Authors and Affiliations

Aristotle University of Thessaloniki, University Campus, 54124, Thessaloniki, Greece
Eirini Papagiannopoulou & Grigorios Tsoumakas

Authors

Eirini Papagiannopoulou
View author publications
You can also search for this author in PubMed Google Scholar
Grigorios Tsoumakas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eirini Papagiannopoulou .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Papagiannopoulou, E., Tsoumakas, G. (2023). Unsupervised Keyphrase Extraction from Scientific Publications. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-24337-0_16
Published: 26 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24336-3
Online ISBN: 978-3-031-24337-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Unsupervised Keyphrase Extraction from Scientific Publications

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

KP-Rank: a semantic-based unsupervised approach for keyphrase extraction from text data

A Framework for Keyphrase Extraction from Scientific Journals

SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Unsupervised Keyphrase Extraction from Scientific Publications

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

KP-Rank: a semantic-based unsupervised approach for keyphrase extraction from text data

A Framework for Keyphrase Extraction from Scientific Journals

SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation