Abstract
We present a declarative approach implemented in a comprehensive open-source framework based on DBpedia to extract lexical-semantic resources – an ontology about language use – from Wiktionary. The data currently includes language, part of speech, senses, definitions, synonyms, translations and taxonomies (hyponyms, hyperonyms, synonyms, antonyms) for each lexical word. Main focus is on flexibility to the loose schema and configurability towards differing language-editions of Wiktionary. This is achieved by a declarative mediator/wrapper approach. The goal is to allow the addition of languages just by configuration without the need of programming, thus enabling the swift and resource-conserving adaption of wrappers by domain experts. The extracted data is as fine granular as the source data in Wiktionary and additionally follows the lemon model. It enables use cases like disambiguation or machine translation. By offering a linked data service, we hope to extend DBpedia’s central role in the LOD infrastructure to the world of Open Linguistics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Auer, S., Lehmann, J.: Making the web a data washing machine - creating knowledge out of interlinked data. Semantic Web Journal (2010)
Chesley, P., Vincent, B., Xu, L., Srihari, R.K.: Using verbs and adjectives to automatically classify blog sentiment. In: AAAI Spring Symposium (2006)
Chiarcos, C., Hellmann, S., Nordhoff, S., Moran, S., Littauer, R., Eckle-Kohler, J., Gurevych, I., Hartmann, S., Matuschek, M., Meyer, C.M.: The open linguistics working group. In: LREC (2012)
Gurevych, I., Eckle-Kohler, J., Hartmann, S., Matuschek, M., Meyer, C.M., Wirth, C.: Uby - a large-scale unified lexical-semantic resource based on lmf. In: EACL 2012 (2012)
Hellmann, S., Lehmann, J., Auer, S.: Linked-data aware URI schemes for referencing text fragments. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 175–184. Springer, Heidelberg (2012)
ISO 24613:2008. Language resource management – Lexical markup framework. ISO, Geneva, Switzerland
Kontokostas, D., Bratsas, C., Auer, S., Hellmann, S., Antoniou, I., Metakides, G.: Internationalization of Linked Data: The case of the Greek DBpedia edition. Journal of Web Semantics (2012)
Krizhanovsky, A.A.: Transformation of wiktionary entry structure into tables and relations in a relational database schema. CoRR (2010), http://arxiv.org/abs/1011.1368
McCrae, J., Cimiano, P., Montiel-Ponsoda, E.: Integrating WordNet and Wiktionary with lemon. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics. Springer (2012)
McCrae, J., Spohr, D., Cimiano, P.: Linking Lexical Resources and Ontologies on the Semantic Web with Lemon. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 245–259. Springer, Heidelberg (2011)
Meyer, C.M., Gurevych, I.: How web communities analyze human language: Word senses in wiktionary. In: Second Web Science Conference (2010)
Meyer, C.M., Gurevych, I.: Worth its weight in gold or yet another resource — A comparative study of wiktionary, openThesaurus and germaNet. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 38–49. Springer, Heidelberg (2010)
Meyer, C.M., Gurevych, I.: OntoWiktionary – Constructing an Ontology from the Collaborative Online Dictionary Wiktionary. In: Semi-Automatic Ontology Development: Processes and Resources. IGI Global (2011)
Moerth, K., Declerck, T., Lendvai, P., Váradi, T.: Accessing multilingual data on the web for the semantic annotation of cultural heritage texts. In: 2nd Workshop on the MSW, ISWC (2011)
Ngonga Ngomo, A.-C., Auer, S.: Limes - a time-efficient approach for large-scale link discovery on the web of data. In: Proceedings of IJCAI (2011)
Nuzzolese, A.G., Gangemi, A., Presutti, V.: Gathering lexical linked data and knowledge patterns from framenet. In: K-CAP (2011)
Sajous, F., Navarro, E., Gaume, B., Prévot, L., Chudy, Y.: Semi-automatic Endogenous Enrichment of Collaboratively Constructed Lexical Resources: Piggybacking onto Wiktionary. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) IceTAL 2010. LNCS (LNAI), vol. 6233, pp. 332–344. Springer, Heidelberg (2010)
Mörth, K., Budin, G., Declerck, T., Lendvai, P., Váradi, T.: Towards linked language data for digital humanities
Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009)
Weale, T., Brew, C., Fosler-Lussier, E.: Using the wiktionary graph structure for synonym detection. In: The People’s Web Meets NLP, ACL-IJCNLP (2009)
Zesch, T., Müller, C., Gurevych, I.: Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In: LREC (2008)
Zesch, T., Müller, C., Gurevych, I.: Using wiktionary for computing semantic relatedness. In: AAAI (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hellmann, S., Brekle, J., Auer, S. (2013). Leveraging the Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic Data Cloud. In: Takeda, H., Qu, Y., Mizoguchi, R., Kitamura, Y. (eds) Semantic Technology. JIST 2012. Lecture Notes in Computer Science, vol 7774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37996-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-37996-3_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37995-6
Online ISBN: 978-3-642-37996-3
eBook Packages: Computer ScienceComputer Science (R0)