Tag Cloudera

Competition in the Open Source Ecosystem

From Eric Sammer’s answer via Quora: At Cloudera (company) we regularly work on open source code right along side our competitors. I tend to joke that the engineers at our competitors are effectively our coworkers. Since the question specifically asks about how one deals with code (rather than the working relationship) I’ll focus on that. Honestly though, that’s probably the less interesting part. In (almost) all ways, our engineers shed their

Continue reading…

Kudu with Todd Lipcon

“If you have an architecture where you’re trying to periodically trying to dump from one system to the other and synchronize, you can simplify your life quite a bit by just putting your data in this storage system called Kudu.”

Continue reading…

Replacing Hadoop with Joe Doliner

“There are a lot more people who have the problem that Hadoop solves than there are people using Hadoop.”

Pachyderm is a containerized data analytics platform that seeks to replace Hadoop.

Continue reading…

Big Data: Fundamental Answers

Fundamental questions as big as data itself loomed at the beginning of Big Data Week. Some answers: How do customers of multiple managed big data companies deal with the heterogeneity? Confluent provides Kafka, Rocana provides ops, Databricks gives you data science, Cloudera and Hortonworks give you everything else. Each company has a proprietary layer meshed with open-source software. Generally, the more proprietary software you are running, the more you will need

Continue reading…

Streaming vs Batch: The Differences

Sean Owen, Director, Data Science @ Cloudera via Quora Although people use the word in different ways, Hadoop refers to an ecosystem of projects, most of which are not processing systems at all. It contains MapReduce, which is a very batch-oriented data processing paradigm. Spark is also part of the Hadoop ecosystem, I’d say, although it can be used separately from things we would call Hadoop. Spark is a batch

Continue reading…

Cloudera Chief Technologist Eli Collins Discusses Streaming, Batch, Business, and Open-Source

http://traffic.libsyn.com/sedaily/eli_cloudera.mp3Podcast: Play in new window | DownloadCloudera allows enterprises to leverage their data through its Hadoop platform. Eli Collins is the Chief Technologist at Cloudera. Topics include: changes to Hadoop since Cloudera’s founding Cloudera’s usage of Spark, Docker, and other open-source technologies how enterprises use batch and streaming together Cloudera’s open-source policy Should Frito Lay open source its chip-making abilities? how collaboration occurs between big, competing companies the growth of increasingly

Continue reading…