Tag Kafka

Kafka at NY Times with Boerge Svingen

http://traffic.libsyn.com/sedaily/KafkaatNYT.mp3Podcast: Play in new window | Download The New York Times is a newspaper that evolved into a digital publication. Across its 166 year history, The Times has been known for longform journalistic quality, in addition to its ability to quickly churn out news stories. Some content on the New York Times is old but timeless “evergreen” content. Readers of the New York Times website are not only looking for the

Continue reading…

Mesos, Kubernetes, and Infrastructure of the Future with Dharmesh Kakadia

http://traffic.libsyn.com/sedaily/mesos-and-kubernetes_edited_1.mp3Podcast: Play in new window | Download Mesos and Kubernetes are tools for distributed systems management. Kubernetes is built with an emphasis on running services, whereas Mesos is commonly used for a wider variety of workloads, including data infrastructure like Spark and Kafka. Mesos can also be used as a platform to provide resource management for Kubernetes. Dharmesh Kakadia is the author of Apache Mesos Essentials, and has spent time

Continue reading…

Kafka Event Sourcing with Neha Narkhede

http://traffic.libsyn.com/sedaily/event_sourcing_edited.mp3Podcast: Play in new window | Download When a user of a social network updates her profile, that profile update needs to propagate to several databases that want to know about such an update–search indexes, user databases, caches, and other services. When Neha Narkhede was at LinkedIn, she helped develop Kafka, which was deployed at LinkedIn to help solve this very problem. Using Kafka as an event queue, LinkedIn adopted

Continue reading…

Scaling Email with J.R. Jasperson

http://traffic.libsyn.com/sedaily/scaling_emails_Edited.mp3Podcast: Play in new window | Download “As the scale continues to increase, certain effects of architecture become less and less efficient.” When you spend money online, you expect a receipt to come in your email. When you register for a new web site, you need to verify your sign up in your email. These types of emails are called “transactional email” and sending these types of email at scale

Continue reading…

Apache Kafka’s Uses and Target Market

From Nicolae Marasoiu’s answer via Quora: Kafka is a high performance messaging system which provides an immutable, linearizable, sharded log of messages. Throughput and storage capacity scale linearly with nodes. Kafka can push astonishingly high volume through each node; often saturating disk, network, or both, while keeping a low cpu utilization. You would use Kafka in scenarios of asynchronous communication and processing pipelines, predominantly in distributed systems, cloud & big data,

Continue reading…

Demystifying Stream Processing with Neha Narkhede

“Systems are giving up correctness for latency, and I’m arguing that stream processing systems have to be designed to allow the user to pick the tradeoffs that the application needs.”

Continue reading…

Apache Flink with Stephan Ewen

“My bet is that there is going to be a big shift towards streaming technologies in the future.”

Apache Flink is an open-source framework for distributed stream and batch data processing.

Continue reading…

Databases: Fundamental Answers

Databases Week began with a set of fundamental questions. What is a database? Every interviewee during Database Week has given a different answer to the question of "What is a database?" — SE Daily (@software_daily) August 21, 2015 One definition: “an application component for storing and retrieving data”. All of the different databases companies have this functionality. But similarities end there. RethinkDB pushes data to the application MemSQL is a faster, proprietary version

Continue reading…

Transactions and Analytics with VoltDB’s Ryan Betts

http://traffic.libsyn.com/sedaily/voltdb_rbetts.mp3Podcast: Play in new window | DownloadStreaming pipelines and in-memory analytics are difficult to support with old database systems. VoltDB provides streaming analytics with transactions.     Questions How does VoltDB exemplify Michael Stonebraker’s thesis that one size does not fit all? What is the difference between OLTP and Streaming? How does VoltDB serve the common Zookeeper-Kafka-Storm-Cassandra stack? What trends and requirements among OLTP and OLAP systems are changing most

Continue reading…

Streaming SQL with PipelineDB CEO Derek Nelson

http://traffic.libsyn.com/sedaily/pipelinedb_derek.mp3Podcast: Play in new window | DownloadPipelineDB is a streaming SQL database. Derek Nelson is the CEO of PipelineDB. Questions What are continuous views? Why is PipelineDB a good fit for the Kafka+Storm+HBase-type architecture? How does PipelineDB affect the application tier or the browser tier? What are the latency guarantees for how long it takes raw data streams to be converted into the refined queries provided by a continuous view?

Continue reading…

Big Data: Fundamental Answers

Fundamental questions as big as data itself loomed at the beginning of Big Data Week. Some answers: How do customers of multiple managed big data companies deal with the heterogeneity? Confluent provides Kafka, Rocana provides ops, Databricks gives you data science, Cloudera and Hortonworks give you everything else. Each company has a proprietary layer meshed with open-source software. Generally, the more proprietary software you are running, the more you will need

Continue reading…

Apache ZooKeeper with Flavio Junqueira

http://traffic.libsyn.com/sedaily/fpj_zookeeper.mp3Podcast: Play in new window | DownloadApache ZooKeeper enables highly reliable distributed coordination. Flavio Junqueira is a committer and PMC of Apache ZooKeeper, and former VP of ZooKeeper. Questions include: Why is master election so important in Hadoop? How does a new user begin working with ZooKeeper? How do nodes “watch” each other? Should ZooKeeper be used as a message queue or notification system? What is ZooKeeper’s place in a data center

Continue reading…

Apache Kafka with Guozhang Wang

http://traffic.libsyn.com/sedaily/guozhang_kafka.mp3Podcast: Play in new window | DownloadApache Kafka is a publish-subscribe messaging system rethought as a distributed commit log. Kafka serves as the central repository for data streams in a distributed system. Guozhang Wang is an engineer at Confluent, which offers a stream data platform built using Kafka. Questions include: What is a central repository for data streams? How does Kafka improve transportation between systems? How does Kafka allow for richer

Continue reading…

Cloudera Chief Technologist Eli Collins Discusses Streaming, Batch, Business, and Open-Source

http://traffic.libsyn.com/sedaily/eli_cloudera.mp3Podcast: Play in new window | DownloadCloudera allows enterprises to leverage their data through its Hadoop platform. Eli Collins is the Chief Technologist at Cloudera. Topics include: changes to Hadoop since Cloudera’s founding Cloudera’s usage of Spark, Docker, and other open-source technologies how enterprises use batch and streaming together Cloudera’s open-source policy Should Frito Lay open source its chip-making abilities? how collaboration occurs between big, competing companies the growth of increasingly

Continue reading…