Tag Data Science

Data Science at Spotify with Boxun Zhang

“I normally try to sit together or very close to a product team or engineering team. And by doing so, I get very close to the source of all kinds of challenging problems.”

Continue reading…

Data Engineering with David Drummond and Austin Ouyang

“We want people to be able to pick up whatever tool it is and really push themselves to get something done with it in a short amount of time, because that’s ultimately what they need to do as a data engineer in the industry.”

Continue reading…

Machine Learning and Technical Debt with D. Sculley

“Changing anything changes everything.”

Technical debt, referring to the compounding cost of changes to software architecture, can be especially challenging in machine learning systems.

Continue reading…

Treehouse with Ryan Carson

“[As adults] we get overly serious, and a lot of the fun goes out of learning, so I think what we’re trying to do is make Treehouse delightful.”

Continue reading…

Galvanize Data Science with Jonathan Dinu and Ryan Orban

“There’s not enough data scientists out there, and every company wants them to do everything. So, you really have to focus on ‘How can I be most impactful with the limited time and resources I have?’ ”

Continue reading…

Netflix Genie with Tom Gianos

“Sometimes there’s a misconception that Genie is a job scheduling platform… Genie really represents our extraction layer, from what our computational resources are, to our end user jobs.”

Genie is an open-source tool that provides job and resource management for the Hadoop ecosystem in the cloud.

Continue reading…

Bridging Data Science and Engineering with Greg Lamp

Current infrastructure makes it difficult for data scientists to share analytical models with the software engineers who need to integrate them. Yhat is an enterprise software company tackling the challenge of how data science gets done. Their products enable companies and users to easily deploy data science environments and translate analytical models into production code.

Continue reading…

Kaggle with Ben Hamner

Data science competitions are an effective way to crowdsource the best solutions for challenging datasets. Kaggle is a platform for data scientists to collaborate and compete on machine learning problems with the opportunity to win money from the competitions’ sponsors.

Continue reading…

Applied Data Science with Edwin Chen

“A lot of data science teams – if you ask them what their ten most important questions are… a lot of people can’t even come up with those.”

Many companies find themselves drowning in data. The quantity of data matters far less than the right questions in the pursuit of actionable insights.

Continue reading…

Data Science Overview with Yad Faeq

Data science is a broad topic with numerous subfields such as data engineering and machine learning. Yad Faeq returns to the podcast to discuss data science at a high level, and rescue Software Engineering Daily from the threat of the hype vortex.

Continue reading…

Teaching Data Science with Vik Paruchuri

There is a need for more data scientists to make sense of the vast amounts of data we produce and store. Dataquest is an in-browser platform for learning data science that is tackling this problem.

Vik Paruchuri is the founder of Dataquest. He was previously a machine learning engineer at EdX and before that a U.S. diplomat.

Continue reading…

Data Science at Pivotal with Sarah Aerni

Data science is saving and improving lives by leveraging sensor data and machine learning. Pivotal makes software platforms and database products to enable enterprises to make use of their data.

Sarah Aerni is principal data scientist at Pivotal.

Continue reading…

Sysadmin vs Scientist

Dima Korolev, Engineer and Data Scientist via Quora Here are the two approaches to data science, which I call Sysadmin approach and Scientist approach. Sysadmin approach: Use the knowledge obtained by reading Apache logs, nginx logs, systemd logs, cron logs, etc.. A good sysadmin would open the log file, press page down and watch it, stopping and scrolling back on anomalies. A great sysadmin would make a couple iterations of

Continue reading…

Apache Spark Creator Matei Zaharia Interview

http://traffic.libsyn.com/sedaily/matei_spark.mp3Podcast: Play in new window | Download  Apache Spark is a fast and general engine for big data processing. Matei Zaharia created Spark, and is the co-founder of Databricks, a company using Spark to power data science. Questions: What was the motivation behind creating Spark? How much faster is a Spark job than a Hadoop job? What is the relationship between streaming and batch processing? Is Spark’s core advantage over Storm

Continue reading…

  • 1 2