Home   >   CSC-OpenAccess Library   >    Manuscript Information
AQPrius: Offline Approximate Query Processing Enhanced by Error Assessment using Bootstrap Sampling
Feng Yu, Sabin Maharjan, Lucy Kerns, Xiangjia Min, Abdu Arslanyilmaz, Michelle Zhu
Pages - 30 - 47     |    Revised - 31-08-2024     |    Published - 01-10-2024
Volume - 18   Issue - 3    |    Publication Date - October 2024  Table of Contents
MORE INFORMATION
KEYWORDS
Approximate Query Processing, Bootstrap Sampling, Big Data.
ABSTRACT
In this work, we present AQPrius, an offline approximate query processing (AQP) engine that can efficiently answer complex analytic queries on large datasets. Unlike existing systems that employ the online AQP schemes, AQPrius employs the offline AQP scheme which has two advantages: (1) it doesn't require high-end hardware or expensive auxiliary data structures such as indices or hash tables; (2) the synopses collected are reusable for future queries on the same database which can significantly save computing resources. However, the error assessment for offline AQP systems is still a challenging problem. The contributions of this research are four-fold. First, AQPrius is an offline AQP engine that can quickly answercommon analytic queries including selection conditions, join conditions, and aggregate functions. It can speed up complex query processing on big data. Second, AQPrius enables error assessment using a non-parametric statistic method, namely bootstrap sampling, that can provide the standard error of query estimation. Third, using the standard error by bootstrap sampling, we extend the traditional offline AQP system from providing a single-point query estimation to a range estimation which is a bounded answer presented as a confidence interval (CI). Finally, the system is developed using the Rust programming language which can prevent many security issues and potential vulnerabilities. We evaluate AQPrius using the well-known TPC-H benchmarks. The experimental results show that AQPrius can rapidly generate accurate bounded query answers for various test queries with selection and join conditions.
"TPC-H Benchmark." [Online]. Available: https://www.tpc.org/tpch/
B. Efron and R. J. Tibshirani, An introduction to the bootstrap. CRC press, 1994.
C. Jermaine, S. Arumugam, A. Pol, and A. Dobra, "Scalable approximate query processing with the DBO engine," ACM Trans. Database Syst., vol. 33, no. 4, pp. 1-54, 2008, doi: 10.1145/1412331.1412335.
D. L. Quoc et al., "Approximate Distributed Joins in Apache Spark," ArXiv e-prints, vol. abs/1805.0, May 2018, [Online]. Available: http://arxiv.org/abs/1805.05874
D. Wilson, W.-C. Hou, and F. Yu, "Scalable Correlated Sampling for Join Query Estimations on Big Data," in Proc. of 28th International Conference on Software Engineering and Data Engineering
F. Harris, S. Dascalu, S. Sharma, and R. Wu, Eds., EasyChair, 2019, pp. 41-50. doi: 10.29007/87vt.
F. Li et al., "Wander Join: Online Aggregation via Random Walks," Proc. SIGMOD'16, pp. 615-629, 2016.
F. Li, B. Wu, K. Yi, and Z. Zhao, "Wander Join and XDB: Online Aggregation via Random Walks," ACM Trans. Database Syst., vol. 44, no. 1, p. 2:1-2:41, Jan. 2019.
F. Yu, W.-C. Hou, C. Luo, D. Che, and M. Zhu, "CS2: A New Database Synopsis for Query Estimation," in Proc. SIGMOD'13, ACM, 2013, pp. 469-480. doi: 10.1145/2463676.2463701.
F. Yu, W.-C. Hou, C. Luo, D. Che, and M. Zhu, "CS2: a new database synopsis for query estimation," in SIGMOD 2013, ACM, 2013, pp. 469-480.
J. Bater, Y. Park, X. He, X. Wang, and J. Rogers, "Saqe: practical privacy-preserving approximate query processing for data federations," Proceedings of the VLDB Endowment, vol. 13, no. 12, pp. 2691-2705, 2020.
J. Spiegel and N. Polyzotis, "TuG synopses for approximate query answering," ACM Trans. Database Syst., vol. 34, no. 1, p. 3:1—-3:56, Apr. 2009, doi: 10.1145/1508857.1508860.
K. Li and G. Li, "Approximate query processing: what is new and where to go?," Data Science and Engineering, vol. 3, no. 4, pp. 379-397, 2018.
M. Sch, J. Schildgen, and S. Deßloch, "Sampling with Incremental MapReduce," in Datenbanksysteme für Business, Technologie und Web (BTW), 2015.
Q. Liu, "Approximate Query Processing," in Encyclopedia of Database Systems, L. LIU and M. T. ÖZSU, Eds., Springer US, 2009, pp. 113-119. doi: 10.1007/978-0-387-39940-9_534.
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy, "Join Synopses for Approximate Query Answering," in Proc. SIGMOD'99, ACM, 1999, pp. 275-286.
S. Agarwal et al., "BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data," in Eurosys'13, 2013, pp. 29-42. doi: 10.1145/2465351.2465355.
S. Agarwal et al., "Knowing When You're Wrong: Building Fast and Reliable Approximate Query Processing Systems," in Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data - SIGMOD, 2014, pp. 481-492. doi: 10.1145/2588555.2593667.
S. Chaudhuri, B. Ding, and S. Kandula, "Approximate query processing: No silver bullet," in Proc. SIGMOD'17, 2017, pp. 511-519.
T. Siddiqui, A. Jindal, S. Qiao, H. Patel, and W. Le, "Cost models for big data query processing: Learning, retrofitting, and our findings," in Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 2020, pp. 99-113.
T. Tian, "Social big data: techniques and recent applications," International Journal of Computer Science and Security (IJCSS), vol. 14, no. 5, p. 224, 2020.
V. Leis, B. Radke, A. Gubichev, A. Kemper, and T. Neumann, "Cardinality Estimation Done Right: Index-Based Join Sampling," in Proc. CIDR'17, 2017.
Y. Chen and K. Yi, "Two-Level Sampling for Join Size Estimation," in Proc. ICDE'17, ACM, 2017, pp. 759-774. doi: 10.1145/3035918.3035921.
Y. Park, B. Mozafari, J. Sorenson, and J. Wang, "VerdictDB: universalizing approximate query processing," in Proc. SIGMOD'18, ACM, 2018, pp. 1461-1476.
Z. Zhou, H. Zhang, S. Li, and X. Du, "Hermes: A Privacy-Preserving Approximate Search Framework for Big Data," IEEE Access, vol. 6, pp. 20009-20020, 2018, doi: 10.1109/ACCESS.2017.2788013.
Dr. Feng Yu
Computer Science and Information Systems, Youngstown State University, Youngstown, OH 44555 - United States of America
fyu@ysu.edu
Mr. Sabin Maharjan
Computer Science and Information Systems, Youngstown State University, Youngstown, OH 44555 - United States of America
Dr. Lucy Kerns
Statistics and Mathematics, Youngstown State University, Youngstown, OH 44555 - United States of America
Dr. Xiangjia Min
Bioinformatics and Plant Biology, Youngstown State University, Youngstown, OH 44555 - United States of America
Dr. Abdu Arslanyilmaz
Computer Science and Information Systems, Youngstown State University, Youngstown, OH 44555 - United States of America
Dr. Michelle Zhu
School of Computing, Montclair State University, Montclair, NJ 07043 - United States of America


CREATE AUTHOR ACCOUNT
 
LAUNCH YOUR SPECIAL ISSUE
View all special issues >>
 
PUBLICATION VIDEOS