Skip to main content

Query-Oriented Answer Imputation for Aggregate Queries

  • Conference paper
  • First Online:
Advances in Databases and Information Systems (ADBIS 2019)

Abstract

Data imputation is a well-known technique for repairing missing data values but can incur a prohibitive cost when applied to large data sets. Query-driven imputation offers a better alternative as it allows for fixing only the data that is relevant for a query. We adopt a rule-based query rewriting technique for imputing the answers of analytic queries that are missing or suffer from incorrectness due to data incompleteness. We present a novel query rewriting mechanism that is guided by partition patterns which are compact representations of complete and missing data partitions. Our solution strives to infer the largest possible set of missing answers while improving the precision of incorrect ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
CHF34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
CHF 24.95
Price includes VAT (Switzerland)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
CHF 70.00
Price excludes VAT (Switzerland)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
CHF 87.50
Price excludes VAT (Switzerland)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We omit attribute names when they’re not necessary for understanding.

References

  1. Buck, S.F.: A method of estimation of missing values in multivariate data suitable for use with an electronic computer. J. R. Stat. Soc. Ser. B (Methodol) 22, 302–306 (1960)

    MathSciNet  MATH  Google Scholar 

  2. Cambronero, J., Feser, J.K., Smith, M.J., Madden, S.: Query optimization for dynamic imputation. Proc. VLDB Endowment 10(11), 1310–1321 (2017)

    Article  Google Scholar 

  3. Chu, X., Ilyas, I.F., Krishnan, S., Wang, J.: Data cleaning: overview and emerging challenges. In: Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data, pp. 2201–2206. ACM, New York (2016)

    Google Scholar 

  4. Chung, Y., Mortensen, M.L., Binnig, C., Kraska, T.: Estimating the impact of unknown unknowns on aggregate query results. ACM Trans. Database Syst. (TODS) 43(1), 3 (2018)

    Article  MathSciNet  Google Scholar 

  5. Dallachiesa, M., et al.: NADEEF: a commodity data cleaning system. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 541–552. ACM (2013)

    Google Scholar 

  6. Fan, W.: Dependencies revisited for improving data quality. In: Proceedings of the 2008 ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 159–170. ACM (2008)

    Google Scholar 

  7. Fan, W., Geerts, F.: Relative information completeness. ACM Trans. Database Syst. (TODS) 35(4), 27 (2010)

    Article  Google Scholar 

  8. Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. Proc. VLDB Endowment 3(1–2), 173–184 (2010)

    Article  Google Scholar 

  9. Farhangfar, A., Kurgan, L., Dy, J.: Impact of imputation of missing values on classification error for discrete data. Pattern Recognit. 41(12), 3692–3705 (2008)

    Article  Google Scholar 

  10. Garofalakis, M.N., Gibbons, P.B.: Approximate query processing: taming the terabytes. In: Proceedings of 27th International Conference on Very Large Databases (VLDB), pp. 343–352 (2001)

    Google Scholar 

  11. Hannou, F.Z., Amann, B., Baazizi, A.M.: Exploring and comparing table fragments with fragment summaries. In: The Eleventh International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA). IARIA (2019)

    Google Scholar 

  12. Liao, Z., Lu, X., Yang, T., Wang, H.: Missing data imputation: a fuzzy k-means clustering algorithm over sliding window. In: 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery, vol. 3, pp. 133–137. IEEE (2009)

    Google Scholar 

  13. Mansinghka, V., Tibbetts, R., Baxter, J., Shafto, P., Eaves, B.: BayesDB: A probabilistic programming system for querying the probable implications of data. arXiv preprint arXiv:1512.05006 (2015)

  14. Razniewski, S., Korn, F., Nutt, W., Srivastava, D.: Identifying the extent of completeness of query answers over partially complete databases. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, pp. 561–576, 31 May–4 June 2015

    Google Scholar 

  15. Silva-Ramírez, E.L., Pino-Mejías, R., López-Coello, M., Cubiles-de-la Vega, M.D.: Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw. 24(1), 121–129 (2011)

    Article  Google Scholar 

  16. Wang, J., Krishnan, S., Franklin, M.J., Goldberg, K., Kraska, T., Milo, T.: A sample-and-clean framework for fast and accurate query processing on dirty data. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 469–480. ACM (2014)

    Google Scholar 

  17. Wang, J., Tang, N.: Towards dependable data repairing with fixing rules. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 457–468 (2014)

    Google Scholar 

  18. Zhu, B., He, C., Liatsis, P.: A robust missing value imputation method for noisy data. Appl. Intell. 36(1), 61–74 (2012)

    Article  Google Scholar 

Download references

Acknowledgement

This work has partially been supported by the EBITA collaborative research project between the Fraunhofer Institute and Sorbonne Université.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bernd Amann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hannou, FZ., Amann, B., Baazizi, MA. (2019). Query-Oriented Answer Imputation for Aggregate Queries. In: Welzer, T., Eder, J., Podgorelec, V., Kamišalić Latifić, A. (eds) Advances in Databases and Information Systems. ADBIS 2019. Lecture Notes in Computer Science(), vol 11695. Springer, Cham. https://doi.org/10.1007/978-3-030-28730-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28730-6_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28729-0

  • Online ISBN: 978-3-030-28730-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

  NODES
INTERN 8
Note 3
Project 1