Dremel
BigQuery with Jordan Tigani
Large-scale data analysis was pioneered by Google, with the MapReduce paper. Since then, Google’s approach to analytics has evolved rapidly, marked by papers such as Dataflow and
Dremio with Tomer Shiran
The MapReduce paper was published by Google in 2004. MapReduce is an algorithm that describes how to do large-scale data processing on large clusters of commodity hardware. The MapReduce
Columnar Data: Apache Arrow and Parquet with Julien Le Dem and Jacques Nadeau
Column-oriented data storage allows us to access all of the entries in a database column quickly and efficiently. Columnar storage formats are mostly relevant today for performing large