Tag Data Engineering

Data Science at Spotify with Boxun Zhang

“I normally try to sit together or very close to a product team or engineering team. And by doing so, I get very close to the source of all kinds of challenging problems.”

Continue reading…

Data Engineering with David Drummond and Austin Ouyang

“We want people to be able to pick up whatever tool it is and really push themselves to get something done with it in a short amount of time, because that’s ultimately what they need to do as a data engineer in the industry.”

Continue reading…

Machine Learning and Technical Debt with D. Sculley

“Changing anything changes everything.”

Technical debt, referring to the compounding cost of changes to software architecture, can be especially challenging in machine learning systems.

Continue reading…

Galvanize Data Science with Jonathan Dinu and Ryan Orban

“There’s not enough data scientists out there, and every company wants them to do everything. So, you really have to focus on ‘How can I be most impactful with the limited time and resources I have?’ ”

Continue reading…

Netflix Genie with Tom Gianos

“Sometimes there’s a misconception that Genie is a job scheduling platform… Genie really represents our extraction layer, from what our computational resources are, to our end user jobs.”

Genie is an open-source tool that provides job and resource management for the Hadoop ecosystem in the cloud.

Continue reading…

Bridging Data Science and Engineering with Greg Lamp

Current infrastructure makes it difficult for data scientists to share analytical models with the software engineers who need to integrate them. Yhat is an enterprise software company tackling the challenge of how data science gets done. Their products enable companies and users to easily deploy data science environments and translate analytical models into production code.

Continue reading…

Applied Data Science with Edwin Chen

“A lot of data science teams – if you ask them what their ten most important questions are… a lot of people can’t even come up with those.”

Many companies find themselves drowning in data. The quantity of data matters far less than the right questions in the pursuit of actionable insights.

Continue reading…

Data Science Overview with Yad Faeq

Data science is a broad topic with numerous subfields such as data engineering and machine learning. Yad Faeq returns to the podcast to discuss data science at a high level, and rescue Software Engineering Daily from the threat of the hype vortex.

Continue reading…

Teaching Data Science with Vik Paruchuri

There is a need for more data scientists to make sense of the vast amounts of data we produce and store. Dataquest is an in-browser platform for learning data science that is tackling this problem.

Vik Paruchuri is the founder of Dataquest. He was previously a machine learning engineer at EdX and before that a U.S. diplomat.

Continue reading…

Apache Kafka with Guozhang Wang

http://traffic.libsyn.com/sedaily/guozhang_kafka.mp3Podcast: Play in new window | DownloadApache Kafka is a publish-subscribe messaging system rethought as a distributed commit log. Kafka serves as the central repository for data streams in a distributed system. Guozhang Wang is an engineer at Confluent, which offers a stream data platform built using Kafka. Questions include: What is a central repository for data streams? How does Kafka improve transportation between systems? How does Kafka allow for richer

Continue reading…