Abstract
As RDF data continue to gain popularity, we witness the fast growing trend of RDF datasets in both the number of RDF repositories and the size of RDF datasets. Many known RDF datasets contain billions of RDF triples (subject, predicate and object). One of the grant challenges for managing these huge RDF data is how to execute RDF queries efficiently. In this paper, we address the query processing problems against the billion triple challenges. We first identify some causes for the problems of existing query optimization schemes, such as large intermediate results, initial query cost estimation errors. Then, we present our block-oriented dynamic query plan generation approach powered with pipelining execution. Our approach consists of two phases. In the first phase, a near-optimal execution plan for queries is chosen by identifying the processing blocks of queries. We group the join patterns sharing a join variable into building blocks of the query plan since executing them first provides opportunities to reduce the size of intermediate results generated. In the second phase, we further optimize the initial pipelining for a given query plan. We employ optimization techniques, such as sideways information passing and semi-join, to further reduce the size of intermediate results, improve the query processing cost estimation and speed up the performance of query execution. Experimental results on several RDF datasets of over a billion triples demonstrate that our approach outperforms existing RDF query engines that rely on dynamic programming based static query processing strategies.
Similar content being viewed by others
References
Abadi DJ, Marcus A, Madden SR, Hollenbach K (2007) Scalable semantic web data management using vertical partitioning. In: Proceedings of VLDB 2007. ACM, pp 411–422
Atre M, Chaoji V, Zaki MJ, Hendler JA (2010) Matrix bit loaded: a scalable lightweight join query processor for RDF data. In: Proceedings of WWW 2010. ACM, pp 41–50
Balkesen C, Teubner J, Alonso G, Özsu MT (2013) Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. In: Proceedings of ICDE’13, IEEE Computer Society
Bernstein PA, Chiu D-MW (1981) Using semi-joins to solve relational queries. J Assoc Comput Mach 28(1):25–40
Binna R, Gassler W, Zangerle E, Pacher D, Specht G (2010) Spiderstore: exploiting main memory for efficient RDF graph representation and fast querying. In: Proceedings of workshop on semantic data management (SemData@VLDB) 2010
Chebotko A, Lu S, Fotouhi F (2009) Semantics preserving SPARQL-to-SQL translation. Data Knowl Eng 68(10):973–1000
Harth A, Umbrich J, Hogan A, Decker S (2007) YARS2: a federated repository for querying graph structured data from the web. In: Proceedings of ISWC/ASWC2007. Springer, Berlin, pp 211–224
Hartig O, Bizer C, Freytag J-C (2009) Executing SPARQL queries over the web of linked data. In: Proceedings of ISWC 2009. Springer, Berlin, pp 293–309
Hartig O, Heese R (2007) The SPARQL query graph model for query optimization. In: Proceedings of ESWC 2007. Springer, Berlin, pp 564–578
Huang J, Abadi DJ, Ren K (2011) Scalable SPARQL querying of large RDF graphs. PVLDB 4(11):1123–1134
Ives ZG, Taylor NE (2008) Sideways information passing for push-style query processing. In: Proceedings of ICDE 2008
Janik M, Kochut K (2005) BRAHMS: a workbench RDF store and high performance memory system for semantic association discovery. In: Proceedings of ISWC 2005. Springer, Berlin, pp 431–445
Kim C, Sedlar E, Chhugani J, Kaldewey T, Nguyen AD, Blas AD, Lee VW, Satish N, Dubey P (2009) Sort vs. hash revisited: fast join implementation on modern multicore CPUs. PVLDB 2(2):1378–1389
Kossmann D, Stocker K (2000) Iterative dynamic programming: a new class of query optimization algorithms. ACM Trans Database Syst 25(1):4382
LUBM (2005) http://swat.cse.lehigh.edu/projects/lubm/
MonetDB (2010) Overview. http://monetdb.cwi.nl/
Neumann T, Weikum G (2009) Scalable join processing on very large RDF graphs. In: Proceedings of SIGMOD 2009. ACM, pp 627–639
Neumann T, Weikum G (2010a) The RDF-3X engine for scalable management of RDF data. VLDB J 19(1):91–113
Neumann T, Weikum G (2010b) x-RDF-3X: fast querying, high update rates, and consistency for RDF databases. PVLDB 3(1–2):256–263
Rohloff K, Schantz RE (2010) High-performance, massively scalable distributed systems using the mapreduce software framework: the shard triple-store. In: Proceedings of international workshop on programming support innovations for emerging distributed applications 2010 (PSI EtA ’10). ACM
Selinger PG, Astrahan MM, Chamberlin DD, Lorie RA, Price TG (1979) Access path selection in a relational database management system. In: Proceedings of SIGMOD’79. Springer, Berlin, p 2334
Semantic Web Challenge (2012) http://challenge.semanticweb.org/2012/
Stocker K, Kossmann D, Braumandl R, Kemper KA (2001) Integrating semi-join-reducers into state of the art query processors. In: Proceedings of ICDE 2001, pp 575–584
Stocker M, Seaborne A, Bernstein A, Kiefer C, Reynolds D (2008) SPARQL basic graph pattern optimization using selectivity estimation. In: Proceedings of WWW 2008. ACM, pp 595–604
SWEO Community Project (2010). Linking open data on the semantic web. http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
Udrea O, Pugliese A, Subrahmanian VS (2007) Grin: a graph based RDF index. In: Proceedings of the 22nd AAAI conference on artificial intelligence, pp 1465–1470
UniProt (n.d.). UniProt RDF distribution. ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/
W3C (2008) SPARQL query language for RDF. http://www.w3.org/TR/rdf-sparql-query/
Weiss C, Karras P, Bernstein A (2008) Hexastore: sextuple indexing for semantic web data management. PVLDB 1(1):1008–1019
Yan X, Yu PS, Han J (2004) Graph indexing: a frequent structure-based approach. In: Proceedings of SIGMOD 2004. ACM, pp 335–346
Yuan P, Liu P, Wu B, Liu L, Jin H, Zhang W (2013) TripleBit: a fast and compact system for large scale RDF data. PVLDB 6(7):517–528
Zou L, Mo J, Chen L, Özsu MT, Zhao D (2011) gStore: answering SPARQL queries via subgraph matching. PVLDB 8:482–493
Acknowledgments
The research is supported by National Science Foundation of China (61073096) and National High Technology Research and Development Program of China (863 Program) under Grant No. 2012AA011003. Ling Liu acknowledges the partial support of her research from Grants of NSF CISE NetSE program, SaTC program and I/UCRC Fundamental Research Program as well as Intel ISTC on Cloud Computing.
Author information
Authors and Affiliations
Corresponding author
Appendices
LUBM queries
-
PREFIX rdf: \(<\)http://www.w3.org/1999/02/22-rdf-syntax-ns#\(>\)
-
PREFIX ub: \(<\)http://www.lehigh.edu/\(\sim \)zhp2/2004/0401/univ-bench.owl#\(>\)
- Q1: :
-
SELECT ?x ?y ?z WHERE { ?y ub:teacherOf ?z . ?y rdf:type ub:Assistant-Professor . ?z rdf:type ub:Course . ?x ub:takesCourse ?z . ?x rdf:type ub:GraduateStudent . ?x ub:advisor ?y . }
- Q2: :
-
SELECT ?x ?y ?z WHERE { ?x rdf:type ub:AssistantProfessor . ?y rdf:type ub:Department . ?x ub:worksFor ?y . ?z rdf:type ub:UndergraduateStudent . ?z ub:memberOf ?y . ?z ub:advisor ?x . }
- Q3: :
-
SELECT ?x ?y WHERE { ?x rdf:type ub:AssistantProfessor . ?x ub:worksFor ?y . [] ub:memberOf ?y . ?y rdf:type ub:Department . }
- Q4: :
-
SELECT ?x ?y WHERE { ?x rdf:type ub:FullProfessor . ?y rdf:type ub:UndergraduateStudent . ?y ub:advisor ?x . ?x ub:worksFor [] . }
- Q5: :
-
SELECT ?x ?y ?z WHERE { ?z ub:subOrganizationOf ?y . ?y rdf:type ub:University . ?z rdf:type ub:Department . ?x ub:memberOf ?z . ?x rdf:type ub:GraduateStudent . ?x ub:undergraduateDegreeFrom ?y . }
- Q6: :
-
SELECT ?x ?y ?z WHERE { ?y ub:teacherOf ?z . ?y rdf:type ub:FullProfessor . ?z rdf:type ub:Course . ?x ub:takesCourse ?z . ?x rdf:type ub:UndergraduateStudent . ?x ub:advisor ?y . }
- Q7: :
-
SELECT ?x ?y WHERE { ?x rdf:type ub:GraduateStudent . ?x ub:takesCourse [] . ?y rdf:type ub:AssistantProfessor . ?x ub:advisor ?y . }
UniProt queries
-
PREFIX rdf: \(<\)http://www.w3.org/1999/02/22-rdf-syntax-ns#\(>\)
-
PREFIX rdfs: \(<\)http://www.w3.org/2000/01/rdf-schema#\(>\)
-
PREFIX uni: \(<\) http://purl.uniprot.org/core/ \(>\)
-
PREFIX taxon: \(<\) http://purl.uniprot.org/taxonomy/ \(>\)
- Q1: :
-
SELECT ?protein ?annotation WHERE { ?protein uni:annotation ?annotation . ?protein rdf:type uni:Protein . ?protein uni:organism [] . ?annotation rdf:type [] . ?annotation uni:range ?range . }
- Q2: :
-
SELECT ?protein ?annotation WHERE { ?protein uni:annotation ?annotation . ?protein rdf:type uni:Protein . ?protein uni:organism taxon:9606 . ?annotation rdf:type ?type . ?annotation rdfs:comment [] . }
- Q3: :
-
SELECT ?protein ?annotation WHERE { ?protein uni:annotation ?annotation . ?protein rdf:type uni:Protein . ?annotation rdf:type \(<\) http://purl.uniprot.org/core/Transmembrane_An-notation \(>\) . }
- Q4: :
-
SELECT ?b ?ab WHERE { ?b rdf:type uni:Protein . ?a uni:replaces ?ab . ?ab uni:replacedBy ?b . }
- Q5: :
-
SELECT ?protein ?annotation WHERE { ?protein uni:annotation ?annotation . ?protein rdf:type uni:Protein . ?protein uni:organism taxon:9606 . ?annotation rdf:type \(<\) http://purl.uni-prot.org/core/Disease_Annotation \(>\) . ?protein uni:modified “2008-07-22” . }
- Q6: :
-
SELECT ?a ?vo WHERE { ?a rdfs:seeAlso ?vo . ?a uni:classifiedWith \(<\) http://purl.uniprot.org/keywords/67 \(>\) . ?b uni:annotation ?annotation . ?b rdf:type uni:Protein . ?a uni:replaces ?ab . ?ab uni:replacedBy ?b . }
- Q7: :
-
SELECT ?annotation ?a WHERE { ?annotation rdf:type \(<\) http://purl.uniprot.org/core/-Transmembrane_Annotation \(>\) . ?annotation uni:range ?range . ?annotation rdfs:comment ?text . ?a rdfs:seeAlso ?vo. ?a uni:classifiedWith \(<\) http://purl.uniprot.org/keywords/67 \(>\) . ?a uni:annotation ?annotation . }
BTC 2012 queries
-
PREFIX geo: \(<\) http://www.geonames.org/ \(>\)
-
PREFIX pos: \(<\)http://www.w3.org/2003/01/geo/wgs84_pos#\(>\)
-
PREFIX dbpedia: \(<\) http://dbpedia.org/property/ \(>\)
-
PREFIX dbpediares: \(<\) http://dbpedia.org/resource/ \(>\)
-
PREFIX owl: \(<\)http://www.w3.org/2002/07/owl#\(>\)
-
PREFIX rdf: \(<\)http://www.w3.org/1999/02/22-rdf-syntax-ns#\(>\)
- Q1: :
-
SELECT distinct ?a ?b WHERE { ?a dbpedia:spouse ?b . ?a \(<\) http://www.w3.org/1999/02/-22-rdf-syntax-ns#type \(>\) \(<\) http://dbpedia.org/ontology/Person \(>\) . ?b \(<\) http://www.w3.org/-1999/02/22-rdf-syntax-ns#type \(>\) \(<\) http://dbpedia.org/ontology/Person \(>\) . }
- Q2: :
-
SELECT ?a ?l WHERE { ?a \(<\) http://www.w3.org/1999/02/22-rdf-syntax-ns#type \(>\) \(<\) http://dbpedia.org/ontology/Person \(>\) . ?a dbpedia:deathPlace ?l . ?l pos:lat ?lat . }
- Q3: :
-
SELECT ?p ?l WHERE { ?p dbpedia:name [] . ?p dbpedia:deathPlace ?l . ?p dbpedia:spouse ?c . ?p \(<\) http://www.w3.org/1999/02/22-rdf-syntax-ns#type \(>\) \(<\) http://dbpedia.org/-ontology/Person \(>\) . ?l pos:long ?long . }
- Q4: :
-
SELECT ?a ?b WHERE { ?a dbpedia:-spouse ?b . ?a \(<\) http://www.w3.org/1999/02/22-rdf-syntax-ns#type \(>\) \(<\) http://dbpedia.org/ontology/Person \(>\) . ?b \(<\) http://www.w3.org/1999/02/22-rdf-syntax-ns#type \(>\) \(<\) http://dbpedia.org/ontology/Person \(>\) . ?c owl:sameAs ?c2 . ?c2 pos:long [] . ?a dbpedia:deathPlace ?c . }
- Q5: :
-
SELECT distinct ?a ?c ?c2 WHERE { ?a \(<\) http://www.w3.org/1999/02/22-rdf-syntax-ns#type \(>\) \(<\) http://dbpedia.org/ontology/Person \(>\) . ?a dbpedia:placeOfBirth ?c . ?c owl:sameAs ?c2 . ?c2 pos:lat ?lat . ?c2 pos:long “-4.256901” . }
- Q6: :
-
SELECT distinct ?a ?b ?c WHERE { ?a dbpedia:spouse ?b . ?a \(<\) http://www.w3.org/1999/02/22-rdf-syntax-ns#type \(>\) [] . ?a dbpedia:placeOfBirth ?c . ?b dbpedia:placeOfBirth ?c . ?c owl:sameAs ?c2 . ?c dbpedia:name ?d . }
- Q7: :
-
SELECT distinct ?a ?b ?lat ?long WHERE { ?a dbpedia:spouse ?b . ?a \(<\) http://www.w3.org/-1999/02/22-rdf-syntax-ns#type \(>\) \(<\) http://dbpedia.org/ontology/Person \(>\) . ?b \(<\) http://www.w3.org/1999/02/22-rdf-syntax-ns#type \(>\) \(<\) http://dbpedia.org/ontology/Person \(>\) . ?a dbpedia:place-OfBirth ?c . ?b dbpedia:placeOfBirth ?c . ?c owl:sameAs ?c2 . ?c2 pos:lat ?lat . ?c2 pos:long ?long . }
Rights and permissions
About this article
Cite this article
Yuan, P., Xie, C., Jin, H. et al. Dynamic and fast processing of queries on large-scale RDF data. Knowl Inf Syst 41, 311–334 (2014). https://doi.org/10.1007/s10115-013-0726-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0726-7