Tag Dremel

BigQuery with Jordan Tigani

http://traffic.libsyn.com/sedaily/BigQuery.mp3Podcast: Play in new window | Download Large-scale data analysis was pioneered by Google, with the MapReduce paper. Since then, Google’s approach to analytics has evolved rapidly, marked by papers such as Dataflow and Dremel. Dremel combined a column-oriented, distributed file system with a novel way of processing queries. A single Dremel query is distributed into a tree of servers, starting with the root server, splitting into the intermediate servers,

Continue reading…

Dremio with Tomer Shiran

http://traffic.libsyn.com/sedaily/Dremio.mp3Podcast: Play in new window | Download The MapReduce paper was published by Google in 2004. MapReduce is an algorithm that describes how to do large-scale data processing on large clusters of commodity hardware. The MapReduce paper marked the beginning of the “big data” movement. The Hadoop project is an open source implementation of the MapReduce paper. Doug Cutting and Mike Cafarella wrote software that allowed anybody to use MapReduce,

Continue reading…

Columnar Data: Apache Arrow and Parquet with Julien Le Dem and Jacques Nadeau

http://traffic.libsyn.com/sedaily/columnardata_edited_fixed.mp3Podcast: Play in new window | Download Column-oriented data storage allows us to access all of the entries in a database column quickly and efficiently. Columnar storage formats are mostly relevant today for performing large analytics jobs. For example, if you are a bank, and you want to get the sum of all of the financial transactions that took place on your system in the last week, you don’t want

Continue reading…