Friday, October 29, 2021
Sort columns are great for file skipping when you have predicates that are selective on a single dimension, but what do you do when you have multiple dimensions to filter on? Adding additional hierarchal sort columns has diminishing results so what can we do? Enter Z-Ordering. Instead of using a single column to sort our data, we order our data based on value constructed from a combination of multiple columns. This combined Z-Value is constructed such that rows with similar z-values share column values that are similar. This lets us write data files which will be selective over any of the columns that the z-value is constructed from. In this presentation we’ll go over the basics of the math behind the computation and how we are implementing it in Apache Iceberg.
Download these images to your phone and post using the Instagram app.