Skip to main content

Leveraging the Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic Data Cloud

  • Conference paper
Semantic Technology (JIST 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7774))

Included in the following conference series:

Abstract

We present a declarative approach implemented in a comprehensive open-source framework based on DBpedia to extract lexical-semantic resources – an ontology about language use – from Wiktionary. The data currently includes language, part of speech, senses, definitions, synonyms, translations and taxonomies (hyponyms, hyperonyms, synonyms, antonyms) for each lexical word. Main focus is on flexibility to the loose schema and configurability towards differing language-editions of Wiktionary. This is achieved by a declarative mediator/wrapper approach. The goal is to allow the addition of languages just by configuration without the need of programming, thus enabling the swift and resource-conserving adaption of wrappers by domain experts. The extracted data is as fine granular as the source data in Wiktionary and additionally follows the lemon model. It enables use cases like disambiguation or machine translation. By offering a linked data service, we hope to extend DBpedia’s central role in the LOD infrastructure to the world of Open Linguistics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
CHF34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
CHF 24.95
Price includes VAT (Switzerland)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
CHF 47.00
Price excludes VAT (Switzerland)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
CHF 59.00
Price excludes VAT (Switzerland)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Auer, S., Lehmann, J.: Making the web a data washing machine - creating knowledge out of interlinked data. Semantic Web Journal (2010)

    Google Scholar 

  2. Chesley, P., Vincent, B., Xu, L., Srihari, R.K.: Using verbs and adjectives to automatically classify blog sentiment. In: AAAI Spring Symposium (2006)

    Google Scholar 

  3. Chiarcos, C., Hellmann, S., Nordhoff, S., Moran, S., Littauer, R., Eckle-Kohler, J., Gurevych, I., Hartmann, S., Matuschek, M., Meyer, C.M.: The open linguistics working group. In: LREC (2012)

    Google Scholar 

  4. Gurevych, I., Eckle-Kohler, J., Hartmann, S., Matuschek, M., Meyer, C.M., Wirth, C.: Uby - a large-scale unified lexical-semantic resource based on lmf. In: EACL 2012 (2012)

    Google Scholar 

  5. Hellmann, S., Lehmann, J., Auer, S.: Linked-data aware URI schemes for referencing text fragments. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 175–184. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  6. ISO 24613:2008. Language resource management – Lexical markup framework. ISO, Geneva, Switzerland

    Google Scholar 

  7. Kontokostas, D., Bratsas, C., Auer, S., Hellmann, S., Antoniou, I., Metakides, G.: Internationalization of Linked Data: The case of the Greek DBpedia edition. Journal of Web Semantics (2012)

    Google Scholar 

  8. Krizhanovsky, A.A.: Transformation of wiktionary entry structure into tables and relations in a relational database schema. CoRR (2010), http://arxiv.org/abs/1011.1368

  9. McCrae, J., Cimiano, P., Montiel-Ponsoda, E.: Integrating WordNet and Wiktionary with lemon. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics. Springer (2012)

    Google Scholar 

  10. McCrae, J., Spohr, D., Cimiano, P.: Linking Lexical Resources and Ontologies on the Semantic Web with Lemon. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 245–259. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  11. Meyer, C.M., Gurevych, I.: How web communities analyze human language: Word senses in wiktionary. In: Second Web Science Conference (2010)

    Google Scholar 

  12. Meyer, C.M., Gurevych, I.: Worth its weight in gold or yet another resource — A comparative study of wiktionary, openThesaurus and germaNet. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 38–49. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  13. Meyer, C.M., Gurevych, I.: OntoWiktionary – Constructing an Ontology from the Collaborative Online Dictionary Wiktionary. In: Semi-Automatic Ontology Development: Processes and Resources. IGI Global (2011)

    Google Scholar 

  14. Moerth, K., Declerck, T., Lendvai, P., Váradi, T.: Accessing multilingual data on the web for the semantic annotation of cultural heritage texts. In: 2nd Workshop on the MSW, ISWC (2011)

    Google Scholar 

  15. Ngonga Ngomo, A.-C., Auer, S.: Limes - a time-efficient approach for large-scale link discovery on the web of data. In: Proceedings of IJCAI (2011)

    Google Scholar 

  16. Nuzzolese, A.G., Gangemi, A., Presutti, V.: Gathering lexical linked data and knowledge patterns from framenet. In: K-CAP (2011)

    Google Scholar 

  17. Sajous, F., Navarro, E., Gaume, B., Prévot, L., Chudy, Y.: Semi-automatic Endogenous Enrichment of Collaboratively Constructed Lexical Resources: Piggybacking onto Wiktionary. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) IceTAL 2010. LNCS (LNAI), vol. 6233, pp. 332–344. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  18. Mörth, K., Budin, G., Declerck, T., Lendvai, P., Váradi, T.: Towards linked language data for digital humanities

    Google Scholar 

  19. Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  20. Weale, T., Brew, C., Fosler-Lussier, E.: Using the wiktionary graph structure for synonym detection. In: The People’s Web Meets NLP, ACL-IJCNLP (2009)

    Google Scholar 

  21. Zesch, T., Müller, C., Gurevych, I.: Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In: LREC (2008)

    Google Scholar 

  22. Zesch, T., Müller, C., Gurevych, I.: Using wiktionary for computing semantic relatedness. In: AAAI (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hellmann, S., Brekle, J., Auer, S. (2013). Leveraging the Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic Data Cloud. In: Takeda, H., Qu, Y., Mizoguchi, R., Kitamura, Y. (eds) Semantic Technology. JIST 2012. Lecture Notes in Computer Science, vol 7774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37996-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37996-3_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37995-6

  • Online ISBN: 978-3-642-37996-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

  NODES
Idea 1
idea 1
INTERN 2
Note 2