On Emerging Entity Detection

Färber, Michael; Rettinger, Achim; El Asmar, Boulos

doi:10.1007/978-3-319-49004-5_15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10024))

Included in the following conference series:

European Knowledge Acquisition Workshop

2431 Accesses
6 Citations
14 Altmetric

Abstract

While large Knowledge Graphs (KGs) already cover a broad range of domains to an extent sufficient for general use, they typically lack emerging entities that are just starting to attract the public interest. This disqualifies such KGs for tasks like entity-based media monitoring, since a large portion of news inherently covers entities that have not been noted by the public before. Such entities are unlinkable, which ultimately means, they cannot be monitored in media streams. This is the first paper that thoroughly investigates all types of challenges that arise from out-of-KG entities for entity linking tasks. By large-scale analytics of news streams we quantify the importance of each challenge for real-world applications. We then propose a machine learning approach which tackles the most frequent but least investigated challenge, i.e., when entities are missing in the KG and cannot be considered by entity linking systems. We construct a publicly available benchmark data set based on English news articles and editing behavior on Wikipedia. Our experiments show that predicting whether an entity will be added to Wikipedia is challenging. However, we can reliably identify emerging entities that could be added to the KG according to Wikipedia’s own notability criteria.

A. Rettinger—The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 611346.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

CHF34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: CHF 24.95; Price includes VAT (Switzerland)

eBook: CHF 94.00; Price excludes VAT (Switzerland)

Softcover Book: CHF 118.00; Price excludes VAT (Switzerland)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

On the Long-Tail Entities in News

Emerging Entity Discovery Using Web Sources

Entity Linking to One Thousand Knowledge Bases

Notes

1.
This fact results from our empirical analysis, see Sect. 2.2 for more details.
2.
Emerging relates to trending: Entities can emerge only once. Once they have become notable, any (repeated) increase in public interest is just a trend.
3.
See https://en.wikipedia.org/wiki/Wikipedia:Notability.
4.
See http://people.aifb.kit.edu/mfa/emerging-entity-detection/.
5.
As we are interested in novel/emerging entities, we do not consider deletions of entities or surface forms within \(\varDelta t\).
6.
The remaining few entities are not parseable by the Stanford parser.
7.
Given the set of 300 novel entities manually tagged as named entities, 95 of them got classified as of type Person, 51 of type Location, 27 of type Organization, and 24 of type Event (as subtype of Misc).
8.
For 11,639 of those 41,579 novel entities, however, only the Wikipedia title or redirects changed (due to typo correction or outsourcing of parts of a page). I.e., on average over 700 entities are inserted into Wikipedia each day which are “really” novel. For the task of Emerging Entity Detection (see Sect. 4), we only consider real novel entities which emerge (i.e., recently gained public interest for the first time).
9.
See http://trec-kba.org/, requested June 26, 2016.
10.
An entity is here understood as “noun phrase that could have a Wikipedia-style article if there were no notability or newness considerations, and which would have semantic types.” [12].
11.
Note that any text annotation method for Wikipedia could have been applied here.
12.
See http://people.aifb.kit.edu/mfa/emerging-entity-detection.
13.
We also experimented with aggregating all features for each NP series, but did not yield better evaluation results.
14.
See http://dumps.wikimedia.org/other/pagecounts-raw/.
15.
We also evaluated machine learning algorithms specialized on imbalanced and time-series data, such as cost-sensitive AdaBoost, cost-sensitive one class classifier and recurrent neural networks. However, this did not yield better results.
16.
See more information on our website.
17.
Given Wikipedia status of 2015-04-04 as the reference KG.
18.
Some of those entities were inserted later.
19.
Investigations revealed that the already existing Wikipedia entities were not annotated by x-LiSA because no suitable surface form were available for those entities. In most of those cases, the entity was a person and in the news article only the family name was mentioned and extracted. However, in the set of known surface forms from Wikipedia only the full name of the entity was contained. Resolving those issues are left to future work.

References

Ben-Hur, A., Horn, D., Siegelmann, H.T., Vapnik, V.: Support vector clustering. J. Mach. Learn. Res. 2, 125–137 (2002)
MATH Google Scholar
Bunescu, R., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11th Conference of of the European Chapter of the Association for Computational Linguistics (EACL-06), pp. 9–16, Trento, Italy (2006)
Google Scholar
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E., Mitchell, T.: Toward an architecture for never-ending language learning (2010)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on EMNLP-CoNLL, pp. 708–716, Prague, Czech Republic. Association for Computational Linguistics, June 2007
Google Scholar
Dredze, M., McNamee, P., Rao, D., Gerber, A., Finin, T.: Entity disambiguation for knowledge base population. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, Stroudsburg, PA, USA, pp. 277–285. Association for Computational Linguistics (2010)
Google Scholar
Dutta, A., Meilicke, C., Stuckenschmidt, H.: Enriching structured knowledge with open information. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Republic and Canton of Geneva, Switzerland, pp. 267–277 (2015)
Google Scholar
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Stroudsburg, PA, USA, pp. 1535–1545. Association for Computational Linguistics (2011)
Google Scholar
Gottipati, S., Jiang, J.: Linking entities to a knowledge base with query expansion. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Stroudsburg, PA, USA, pp. 804–813. Association for Computational Linguistics (2011)
Google Scholar
Hoffart, J., Altun, Y., Weikum, G.: Discovering emerging entities with ambiguous Names. In: Proceedings of the 23rd International Conference on World Wide Web, WWW 2014, New York, NY, USA, pp. 385–396. ACM (2014)
Google Scholar
Ji, H., Nothman, J., Hachey, B., Florian, R.: Overview of TAC-KBP2015 tri-lingual entity discovery and linking (2015)
Google Scholar
Lin, T., Etzioni, O.: No noun phrase left behind: detecting and typing unlinkable entities. In: Proceedings of the 2012 Joint Conference on EMNLP and CoNLL, EMNLP-CoNLL 2012, Stroudsburg, PA, USA, pp. 893–903. ACL (2012)
Google Scholar
Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proceedings of the 17th ACM conference on Information and knowledge management, CIKM 2008, New York, NY, USA, pp. 509–518. ACM (2008)
Google Scholar
Nakashole, N., Tylenda, T., Weikum, G.: Fine-grained semantic typing of emerging entities. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 1488–1497 (2013)
Google Scholar
Parada, C., Sethy, A., Dredze, M., Jelinek, F.: A spoken term detection framework for recovering out-of-vocabulary words using the web. Paragraph 10(71.24), 323K (2010)
Google Scholar
Soboroff, I., Harman, D.: Novelty detection: the TREC experience. In: HLT 2005, Stroudsburg, PA, USA, pp. 105–112. ACL (2005)
Google Scholar
Trampuš, M., Novak, B.: Internals of an aggregated web news feed. In: Proceedings of the Fifteenth International Information Science Conference IS SiKDD 2012, pp. 431–434 (2012)
Google Scholar
Wang, C., Chakrabarti, K., Cheng, T., Chaudhuri, S.: _targeted disambiguation of ad-hoc, homogeneous sets of named entities. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012, New York, NY, USA, pp. 719–728. ACM (2012)
Google Scholar
Wu, Z., Song, Y., Giles, C.L.: Exploring multiple feature spaces for novel entity discovery. In: AAAI 2016, AAAI - Association for the Advancement of Artificial Intelligence, February 2016
Google Scholar
Yosef, M.A., Bauer, S., Hoffart, J., Spaniol, M., Weikum, G.: HYENA: hierarchical type classification for entity names. In: COLING 2012, pp. 1361–1370 (2012)
Google Scholar
Zhang, L., Färber, M., Rettinger, A.: xLiD-Lexica: cross-lingual Linked data lexica. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 2101–2105. ELRA (2014)
Google Scholar
Zhang, L., Rettinger, A.: X-LiSA: cross-lingual semantic annotation. PVLDB 7(13), 1693–1696 (2014)
Google Scholar
Zhao, S., Li, C., Ma, S., Ma, T., Ma, D.: Combining POS tagging, lucene search and similarity metrics for entity linking. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013. LNCS, vol. 8180, pp. 503–509. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41230-1_44
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
Michael Färber, Achim Rettinger & Boulos El Asmar

Authors

Michael Färber
View author publications
You can also search for this author in PubMed Google Scholar
Achim Rettinger
View author publications
You can also search for this author in PubMed Google Scholar
Boulos El Asmar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Färber .

Editor information

Editors and Affiliations

Linköping University, Linköping, Sweden
Eva Blomqvist
University of Bologna, Bologna, Italy
Paolo Ciancarini
University of Bologna, Bologna, Italy
Francesco Poggi
University of Bologna, Bologna, Italy
Fabio Vitali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Färber, M., Rettinger, A., El Asmar, B. (2016). On Emerging Entity Detection. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds) Knowledge Engineering and Knowledge Management. EKAW 2016. Lecture Notes in Computer Science(), vol 10024. Springer, Cham. https://doi.org/10.1007/978-3-319-49004-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-49004-5_15
Published: 04 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49003-8
Online ISBN: 978-3-319-49004-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Emerging Entity Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

On the Long-Tail Entities in News

Emerging Entity Discovery Using Web Sources

Entity Linking to One Thousand Knowledge Bases

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

On Emerging Entity Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

On the Long-Tail Entities in News

Emerging Entity Discovery Using Web Sources

Entity Linking to One Thousand Knowledge Bases

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation