Large-scale data analysis was pioneered by Google, with the MapReduce paper. Since then, Google’s approach to analytics has evolved rapidly, marked by papers such as Dataflow and Dremel.
Dremel combined a column-oriented, distributed file system with a novel way of processing queries. A single Dremel query is distributed into a tree of servers, starting with the root server, splitting into the intermediate servers, and ending with the leaf servers talking to the file system. Once the data is pulled from the file system into the leaves, the data propagates back to the root server, and is shuffled along the way so that the root server receives a sorted response.
When Google started turning its internal services into customer-facing cloud products, the effort to productize Dremel began, and BigQuery was born. Jordan Tigani is an engineering lead who works on BigQuery, and he joins the show to discuss the evolution of the data warehouse.
Large scale distributed queries still can take a long time–but queries get faster every year. Queries that required a nightly Hadoop job 10 years ago can be viewed in a frequently updated user-facing dashboard. Power users of BigQuery talk about the speed and the query interface as being two of its most valuable differentiating features. As the job of a large scale data analyst becomes less technically intensive, tools like BigQuery will continue to rise in popularity.
We have done some great shows about Google papers like Spanner, Dremel, and Dataflow. To find these old episodes, you can download the Software Engineering Daily app for iOS and for Android. In other podcast players, you can only access the most recent 100 episodes. With these apps, we are building a new way to consume content about software engineering. They are open-sourced at github.com/softwareengineeringdaily. If you are looking for an open source project to get involved with, we would love to get your help.
Shout out to today’s featured contributor Shreyans Sheth. Shreyans has worked on the Software Engineering Daily search API, and has also helped us understand open source best practices, which we are still learning. Thanks again Shreyans for your work.
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Amazon Redshift powers the analytics of your business–and Intermix.io powers the analytics of your Redshift. Intermix.io gives you the tools you need to analyze your Amazon Redshift performance and improve the toolchain of everyone downstream from your data warehouse. The team at Intermix has seen so many Redshift clusters, they are confident they can solve whatever performance issues you are having. Go to intermix.io/sedaily
to get a free 30-day trial. Intermix collects all your Redshift logs and makes it easy to figure out what’s wrong so you can take action. All in a nice, intuitive dashboard. Go to intermix.io/sedaily
to start your free 30-day trial.
A thank you to our sponsor, Datadog, a cloud monitoring platform bringing full visibility to dynamic infrastructure and applications. Create beautiful dashboards, set powerful, machine learning–based alerts, and collaborate with your team to resolve performance issues. Datadog integrates seamlessly with more than 200 technologies, including Google Cloud Platform, AWS, Docker, PagerDuty, and Slack. With fast installation and setup, plus APIs and open source libraries for custom instrumentation, Datadog makes it easy for teams to monitor every layer of their stack in one place. But don’t take our word for it—start a free trial today & Datadog will send you a free T-shirt! Visit softwareengineeringdaily.com/datadog
to get started.
Dice helps you accelerate your tech career. Whether you’re actively looking for a job or need insights to grow in your role, Dice has the resources you need. Dice’s mobile app is the fastest and easiest way to get ahead. Search thousands of tech jobs – from software engineering to UI/UX to product management. Discover your worth with Dice’s Salary Predictor based on your unique skill set. Uncover new opportunities with Dice’s new career pathing tool which can give you insights about the best types of roles to transition to – and the skills you’ll need to get there. Manage your tech career and download the Dice Careers app on Android or iOS today. So check out Dice and support Software Engineering Daily, go to Dice.com/sedaily
. Thanks to Dice for being a sponsor of Software Engineering Daily.
Simplify continuous delivery with GoCD, the on-premise, open source, continuous delivery tool by ThoughtWorks. With GoCD, you can easily model complex deployment workflows using pipelines and visualize them end-to-end with the Value Stream Map. You get complete visibility into and control of your company’s deployments. At gocd.org/sedaily
, find out how to bring continuous delivery to your teams. Say goodbye to deployment panic and hello to consistent, predictable deliveries. Visit gocd.org/sedaily
to learn more about GoCD. Commercial support and enterprise add-ons, including disaster recovery, are available.