Tag Batch

Apache Beam with Frances Perry

http://traffic.libsyn.com/sedaily/Apache_Beam__Edited.mp3Podcast: Play in new window | Download Unbounded data streams create difficult challenges for our application architectures. The data never stops coming, and we are forced to assume that we will never know if or when we have seen all of our data. Some streaming systems give us the tools to deal partially with unbounded data streams, but we have to complement those streaming systems with batch processing, in a

Continue reading…

Demystifying Stream Processing with Neha Narkhede

“Systems are giving up correctness for latency, and I’m arguing that stream processing systems have to be designed to allow the user to pick the tradeoffs that the application needs.”

Continue reading…

Stream Processing with Satish Mittal

“We still need to see in the long run how much of community and industry adoption is there. Because at the end of the day, these are the single two most important things which define and determine the success of any platform.”

Continue reading…

Databases: Fundamental Answers

Databases Week began with a set of fundamental questions. What is a database? Every interviewee during Database Week has given a different answer to the question of "What is a database?" — SE Daily (@software_daily) August 21, 2015 One definition: “an application component for storing and retrieving data”. All of the different databases companies have this functionality. But similarities end there. RethinkDB pushes data to the application MemSQL is a faster, proprietary version

Continue reading…

Transactions and Analytics with VoltDB’s Ryan Betts

http://traffic.libsyn.com/sedaily/voltdb_rbetts.mp3Podcast: Play in new window | DownloadStreaming pipelines and in-memory analytics are difficult to support with old database systems. VoltDB provides streaming analytics with transactions.     Questions How does VoltDB exemplify Michael Stonebraker’s thesis that one size does not fit all? What is the difference between OLTP and Streaming? How does VoltDB serve the common Zookeeper-Kafka-Storm-Cassandra stack? What trends and requirements among OLTP and OLAP systems are changing most

Continue reading…

Hadoop Ops: Rocana CTO Eric Sammer Interview

http://traffic.libsyn.com/sedaily/rocana_esammer.mp3Podcast: Play in new window | DownloadRocana applies big data, advanced analytics, and visualizations to dev ops in order to guide users to the root causes of problems. Eric Sammer is the co-founder and CTO of Rocana. At Cloudera, he served as an Engineering Manager responsible for tools and partner integrations. Within that role, he developed many of Cloudera’s best practices for developing large, distributed, data processing infrastructure. Questions include: Does

Continue reading…

Streaming vs Batch: The Differences

Sean Owen, Director, Data Science @ Cloudera via Quora Although people use the word in different ways, Hadoop refers to an ecosystem of projects, most of which are not processing systems at all. It contains MapReduce, which is a very batch-oriented data processing paradigm. Spark is also part of the Hadoop ecosystem, I’d say, although it can be used separately from things we would call Hadoop. Spark is a batch

Continue reading…

Cloudera Chief Technologist Eli Collins Discusses Streaming, Batch, Business, and Open-Source

http://traffic.libsyn.com/sedaily/eli_cloudera.mp3Podcast: Play in new window | DownloadCloudera allows enterprises to leverage their data through its Hadoop platform. Eli Collins is the Chief Technologist at Cloudera. Topics include: changes to Hadoop since Cloudera’s founding Cloudera’s usage of Spark, Docker, and other open-source technologies how enterprises use batch and streaming together Cloudera’s open-source policy Should Frito Lay open source its chip-making abilities? how collaboration occurs between big, competing companies the growth of increasingly

Continue reading…