Tag Dremio

Dremio with Tomer Shiran

http://traffic.libsyn.com/sedaily/Dremio.mp3Podcast: Play in new window | Download The MapReduce paper was published by Google in 2004. MapReduce is an algorithm that describes how to do large-scale data processing on large clusters of commodity hardware. The MapReduce paper marked the beginning of the “big data” movement. The Hadoop project is an open source implementation of the MapReduce paper. Doug Cutting and Mike Cafarella wrote software that allowed anybody to use MapReduce,

Continue reading…

Columnar Data: Apache Arrow and Parquet with Julien Le Dem and Jacques Nadeau

http://traffic.libsyn.com/sedaily/columnardata_edited_fixed.mp3Podcast: Play in new window | Download Column-oriented data storage allows us to access all of the entries in a database column quickly and efficiently. Columnar storage formats are mostly relevant today for performing large analytics jobs. For example, if you are a bank, and you want to get the sum of all of the financial transactions that took place on your system in the last week, you don’t want

Continue reading…