Sort by:

Databricks Unity Catalog with Zeashan Pappa

Data catalogs are one way to address the tension between wanting to use all the data for business advantage and needing to govern all the data for compliance. Today, Zeashan Pappa, a

Data Mechanics: Data Engineering with Jean-Yves Stephan

Apache Spark is a popular open source analytics engine for large-scale data processing. Applications can be written in Java, Scala, Python, R, and SQL. These applications have flexible

Data Lakehouse with Michael Armbrust

A data warehouse is a system for performing fast queries on large amounts of data. A data lake is a system for storing high volumes of data in a format that is slow to access. A typical

Spark Geospatial Analytics with Ram Sriharsha

Phones are constantly tracking the location of a user in space. Devices like cars, smart watches, and drones are also picking up high volumes of location data. This location data is also

Spark and Streaming with Matei Zaharia

Apache Spark is a system for processing large data sets in parallel. The core abstraction of Spark is the resilient distributed dataset (RDD), a working set of data that sits in memory