Abstract
With the explosive growth of geological data, considerable researches are focused on accurately retrieving specific information from massive data and fully exploiting the potential knowledge and information in unstructured data. Currently, the researches on unstructured content retrieval mostly ignore the association of semantics and knowledge or only consider the association of a singular granularity, which leads to the lost of concepts with the same semantics but expressed in different forms during the retrieval process. To address these problems, this paper has made some enhancements, and the main contributions include: (1) Define a decision rule, split unstructured geological survey data into content fragments, and construct a more fine-grained geologic textual semantic description model. (2) Present a multi-constraint fusion feature weighted model to extract the thematic feature items from the content fragments. (3) From the three granularity of document, content-item, feature-item, the associations of the same-granularity and cross-granularity are merged to construct a multi-granularity geological text hypernetwork model. (4) The experiments verify that the proposed approaches can improve the precision and recall rate of unstructured content retrieval.
Similar content being viewed by others
References
Asim MN, Wasim M, Khan MUG, Mahmood N, Mahmood W (2019) The use of ontology in retrieval: a study on textual, multilingual, and multimedia retrieval. IEEE Access 7:21662–21686
Atzeni P, Bugiotti F, Cabibbo L, Torlone R (2020) Data modeling in the NoSQL world. Computer Standards & Interfaces 67:103149
Ben Abacha A, Zweigenbaum P (2015) MEANS: a medical question-answering system combining NLP techniques and semantic web technologies. Inf Process Manag 51:570–594
Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Breunig M, Bradley PE, Jahn M, Kuper P, Mazroob N, Rösch N, al-Doori M, Stefanakis E, Jadidi M (2020) Geospatial data management research: Progress and future directions. ISPRS Int J Geo Inf 9:95
Brock A, Lim T, Ritchie JM, Weston N (2017) Smash: one-shot model architecture search through hypernetworks. arXiv preprint arXiv:170805344
Chandiok A, Chaturvedi D (2018) Cognitive functionality based question answering system. Int J Comput Appl 179:1–6
Chaokui L, Yanan Z, Keyan X, Jianhui C (2019) Innovation method of distributed storage for huge data of geological and mineral resources based on Hadoop. American Journal of Applied Scientific Research 5:6–16
Chen J, Li J, Cui N, Yu P (2015a) The construction and application of geological cloud under the big data background. Geological Bulletin of China 34:1260–1265
Chen X, Qiu X, Zhu C, Liu P, Huang X-J (2015b) Long short-term memory neural networks for chinese word segmentation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1197–1206
Chen X, Shi Z, Qiu X, Huang X (2017) Adversarial multi-criteria learning for chinese word segmentation. arXiv preprint arXiv:170407556
Chen Z, Song J, Yang Y (2018) An approach to measuring semantic relatedness of geographic terminologies using a thesaurus and lexical database sources. ISPRS Int J Geo Inf 7:98
Chen Z et al. (2020) An ontology-driven treatment article retrieval system for precision oncology. arXiv preprint arXiv:200205653
Cheng Y, Tao F, Zhao D, Zhang L (2017) Modeling of manufacturing service supply–demand matching hypernetwork in service-oriented manufacturing systems. Robot Comput Integr Manuf 45:59–72
Dai AM, Olah C, Le QV (2015) Document embedding with paragraph vectors. arXiv preprint arXiv:150707998
Daskin MS (1985) Urban transportation networks: equilibrium analysis with mathematical programming methods
Enkhsaikhan M, Liu W, Holden E-J, Duuring P (2018) Towards geological knowledge discovery using vector-based semantic similarity. In: INTERNATIONAL conference on advanced data mining and applications, Cham. Advanced data mining and applications. Springer International Publishing, pp 224–237
Eremenko VS, Naumova VV (2019) Computational and analytical environment for processing and analysis of geological data
Estrada E, Rodriguez-Velazquez JA (2005) Subgraph centrality in complex networks. Phys Rev E 71:056103
Estrada E, Rodríguez-Velázquez JA (2006) Subgraph centrality and clustering in complex hyper-networks. Physica A: Statistical Mechanics and its Applications 364:581–594
Gao J, Li M, Huang C-N, Wu A (2005) Chinese word segmentation and named entity recognition: a pragmatic approach. Computational Linguistics 31:531–574
Garcia LF, Abel M, Perrin M, dos Santos AR (2020) The GeoCore ontology: a core ontology for general use in geology. Comput Geosci 135:104387
Gessert F, Wingerath W, Friedrich S, Ritter N (2017) NoSQL database systems: a survey and decision guidance. Comput Sci Res Dev 32:353–365
Giachetta R (2015) A framework for processing large scale geospatial and remote sensing data in MapReduce environment. Comput Graph 49:37–46
Han SY, Tsou M-H, Clarke KC (2018) Revisiting the death of geography in the era of big data: the friction of distance in cyberspace and real space. International Journal of Digital Earth 11:451–469
Hearst MA, Plaunt C (1993) Subtopic structuring for full-length document access, vol 149, pp 59–68
Hou Z, Zhu Y, Gao X, Luo K, Wang D, Sun KA (2015) Chinese geological time scale ontology for geodata discovery. In: 2015 23rd international conference on geoinformatics. IEEE, pp 1–5
Hou Z, Zhu Y, Gao Y, Song J, Qin C (2018) Geologic time scale ontology and its applications in semantic retrieval. Journal of Geo-information Science 20:17–27
Huang L, Du Y, Chen G (2015) GeoSegmenter: a statistically learned Chinese word segmenter for the geoscience domain. Comput Geosci 76:11–17
Hwang J, Nam KW, Ryu KH (2012) Designing and implementing a geologic information system using a spatiotemporal ontology model for a geologic map of Korea. Comput Geosci 48:173–186
Landauer TK, Laham D, Derr M (2004) From paragraph to graph: latent semantic analysis for information visualization. Proc Natl Acad Sci 101:5214–5219
Li L, Liu Y, Zhu H, Ying S, Luo Q, Luo H, Kuai X, Xia H, Shen H (2017) A bibliometric and visual analysis of global geo-ontology research. Comput Geosci 99:1–8
Li W, Wu L, Xie Z, Tao L, Zou K, Li F, Miao J (2019) Ontology-based question understanding with the constraint of Spatio-temporal geological knowledge. Earth Sci Inf 12:599–613
Liang G, Peng Y, Dong Y (2015) SHDC: a fast documents classification method based on Simhash
Manning C, Raghavan P, Schütze H (2010) Introduction to information retrieval. Nat Lang Eng 16:100–103
Mehta V, Rishabh K, Raja R, Varma V (2016) MultiStack: multi-cloud big data research framework/platform. In: 2016 IEEE international conference on cloud computing in emerging markets (CCEM). IEEE, pp 147–152
Pei W, Ge T, Chang B (2014) Max-margin tensor neural network for Chinese word segmentation. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: Long papers), pp 293–303
Peng G, Wang H, Zhang H, Huang K (2019) A hypernetwork-based approach to collaborative retrieval and reasoning of engineering design knowledge. Adv Eng Inform 42:100956
Perrin M, Mastella LS, Morel O, Lorenzatti A (2011) Geological time formalization: an improved formal model for describing time successions and their correlation. Earth Science Informatics 4:81–96
Qi Z, Xuelong L (2019) Big data: new methods and ideas in geological scientific research. Big Earth Data 3:1–7
Qi Y, Das SG, Collobert R, Weston J (2014) Deep learning for character-based information extraction. In: European conference on information retrieval. Springer, pp 668–674
Qiu Q, Xie Z, Wu L, Li W (2018a) DGeoSegmenter: a dictionary-based Chinese word segmenter for the geoscience domain. Comput Geosci 121:1–11
Qiu Q, Zhong X, Liang W (2018b) A cyclic self-learning Chinese word segmentation for the geoscience domain. Geomatica 72:16–26
Salloum SA, Al-Emran M, Monem AA, Shaalan K (2018) Using text mining techniques for extracting information from research articles
Salton G, Buckley C (1987) Term weighting approaches in automatic text retrieval. Cornell University
Sidorov G, Gelbukh A, Gómez-Adorno H, Pinto D (2014) Soft similarity and soft cosine measure: similarity of features in vector space model. Computación y Sistemas 18:491–504
Sobhana N (2012) Enhancing retrieval of geological text using named entity disambiguation. International Journal of Emerging Technology and Advanced Engineering 2:2250–2459
Sobhana N, Barua A, Das M, Mitra P, Ghosh S (2010) Co-occurrence based place name disambiguation and its application to retrieval of geological text. In: Recent trends in networks and communications. Springer, pp 543–552
Wang J-P, Guo Q, Yang G-Y, Liu J-G (2015) Improved knowledge diffusion model based on the collaboration hypernetwork. Physica A: Statistical Mechanics and its Applications 428:250–256
Wang C, Ma X, Chen J, Chen J (2018a) Information extraction and knowledge graph construction from geoscience literature. Comput Geosci 112:112–120
Wang L, Ma Y, Yan J, Chang V, Zomaya AY (2018b) pipsCloud: high performance cloud computing for remote sensing big data management and processing. Futur Gener Comput Syst 78:353–368
Wei W, Guo C (2019) A text semantic topic discovery method based on the conditional co-occurrence degree. Neurocomputing 368:11–24
Wu D, Cong G, Jensen CS (2012) A framework for efficient spatial web object retrieval. VLDB J 21:797–822
Wu L, Xue L, Li C, Lv X, Chen Z, Jiang B, Guo M, Xie Z (2017) A knowledge-driven geospatially enabled framework for geological big data. ISPRS Int J Geo Inf 6:166
Wylot M, Cudré-Mauroux P (2015) Diplocloud: efficient and scalable management of rdf data in the cloud. IEEE Trans Knowl Data Eng 28:659–674
Xi Y, Yang Q, Liao X (2019) Research review on super-network and knowledge super-network. Modern Management 9:557–565
Xu J, Nyerges TL, Nie G (2014) Modeling and representation for earthquake emergency response knowledge: perspective for working with geo-ontology. Int J Geogr Inf Sci 28:185–205
Yan J, Ma Y, Wang L, Choo K-KR, Jie W (2018) A cloud-based remote sensing data production system. Futur Gener Comput Syst 86:1154–1166
Yanan Z, Chaokui L, Keyan X, Jianfu F (2019) Research on distributed storage method of geological and mineral big data based on Hadoop. Geological Bulletin of China
Yang G-Y, Hu Z-L, Liu J-G (2015) Knowledge diffusion in the collaboration hypernetwork. Physica A: Statistical Mechanics and its Applications 419:429–436
Yang C, Yu M, Hu F, Jiang Y, Li Y (2017) Utilizing cloud computing to address big geospatial data challenges. Comput Environ Urban Syst 61:120–128
Zhang S, Zhang Y, Zhang B, Sun D (2016) Research and implementation of the results geological data retrieval system. Land and Resource Information:38–44
Zheng X, Chen H, Xu T (2013) Deep learning for Chinese word segmentation and POS tagging. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 647–657
Zhong RY, Lan S, Xu C, Dai Q, Huang GQ (2016) Visualization of RFID-enabled shopfloor logistics big data in cloud manufacturing. Int J Adv Manuf Technol 84:5–16
Zhong S, Fang Z, Zhu M, Huang Q (2017) A geo-ontology-based approach to decision-making in emergency management of meteorological disasters. Nat Hazards 89:531–554
Zhu Y, Tan Y, Luo X, He Z (2018) Big data management for cloud-enabled geological information services. Scientific Programming 2018
Zykov AA (2007) Hypergraphs. Russian Mathematical Surveys 29:89–154
Acknowledgments
This work was funded by the National Natural Science Foundation of China (Grant No. 41671400) and the National Key Research and Development Program of China (Grant Nos. 2017YFB0503600, 2018YFB0505500, 2017YFC0602204, 2018YFB0505504). We thank the National Engineering Research Center of Geographic Information System for providing hardware support.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: H. Babaie
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhuang, C., Li, W., Xie, Z. et al. A multi-granularity knowledge association model of geological text based on hypernetwork. Earth Sci Inform 14, 227–246 (2021). https://doi.org/10.1007/s12145-020-00534-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12145-020-00534-w