Join Size Estimation Subject to Filter Conditions

Join Size Estimation Subject to Filter Conditions

David Vengerov, Andre Cavalheiro Menck, Mohamed Zait, Sunil P. Chakkappen

02 September 2015

In this paper, we present a new algorithm for estimating the size of equality join of multiple database tables. The proposed algorithm, Correlated Sampling, constructs a small space synopsis for each table, which can then be used to provide a quick estimate of the join size of this table with other tables subject to dynamically specified predicate filter conditions, possibly specified over multiple columns (attributes) of each table. This algorithm makes a single pass over the data and is thus suitable for streaming scenarios. We compare this algorithm analytically to two other previously known sampling approaches (independent Bernoulli Sampling and End-Biased Sampling) and to a novel sketch-based approach. We also compare these four algorithms experimentally and show that results fully correspond to our analytical predictions based on derived expressions for the estimator variances, with Correlated Sampling giving the best estimates in a large range of situations.


Venue : VLDB 2015

External Link: http://www.vldb.org/pvldb/vol8/p1530-vengerov.pdf