Tag Kafka

Kafka Design Patterns with Gwen Shapira

http://traffic.libsyn.com/sedaily/2018_02_20_GwenShapiro.mp3Podcast: Play in new window | Download Kafka is at the center of modern streaming systems. Kafka serves as a database, a pubsub system, a buffer, and a data recovery tool. It’s an extremely flexible tool, and that flexibility has led to its use as a platform for a wide variety of data intensive applications. Today’s guest is Gwen Shapira, a product manager at Confluent. Confluent is a company that

Continue reading…

Streaming Architecture with Tugdual Grall

http://traffic.libsyn.com/sedaily/2018_02_15_TugdualGraal.mp3Podcast: Play in new window | Download At a big enough scale, every software product produces lots of data. Whether you are building an advertising technology company, a social network, or a system for IoT devices, you have thousands of events coming in at a fast pace that you want to aggregate, study and act upon. For the last decade, engineers have been learning to store and process these vast

Continue reading…

Machine Learning Deployments with Kinnary Jangla

http://traffic.libsyn.com/sedaily/2018_02_14_ProductionMLSystems.mp3Podcast: Play in new window | Download Pinterest is a visual feed of ideas, products, clothing, and recipes. Millions of users browse Pinterest to find images and text that are tailored to their interests. Like most companies, Pinterest started with a large monolithic application that served all requests. As Pinterest’s engineering resources expanded, some of the architecture was broken up into microservices and Dockerized, which make the system easier to

Continue reading…

Kafka at NY Times with Boerge Svingen

http://traffic.libsyn.com/sedaily/KafkaatNYT.mp3Podcast: Play in new window | Download The New York Times is a newspaper that evolved into a digital publication. Across its 166 year history, The Times has been known for longform journalistic quality, in addition to its ability to quickly churn out news stories. Some content on the New York Times is old but timeless “evergreen” content. Readers of the New York Times website are not only looking for the

Continue reading…

Kafka in the Cloud with Neha Narkhede

http://traffic.libsyn.com/sedaily/KafkaCloud.mp3Podcast: Play in new window | Download Apache Kafka is an open-source distributed streaming platform. Kafka was originally developed at LinkedIn, and the creators of the project eventually left LinkedIn and started Confluent, a company that is building a streaming platform based on Kafka. Kafka is very popular, but is not easy to deploy and operationalize. That is why Confluent has built a Kafka-as-a-service product, so that managing Kafka is

Continue reading…

Managed Kafka with Tom Crayford

http://traffic.libsyn.com/sedaily/heroku_kafka_edited.mp3Podcast: Play in new window | Download Kafka is a distributed log for producers and consumers to publish messages to each other. We’ve done many shows about Kafka as a key building block for distributed systems, but we often leave out the discussion of the complexities of setting up Kafka and monitoring it. Kafka deployments can be a complex piece of software to manage. Tom Crayford is an engineer at

Continue reading…

PANCAKE STACK Data Engineering with Chris Fregly

http://traffic.libsyn.com/sedaily/pancakestack_edited_fixed.mp3Podcast: Play in new window | Download Data engineering is the software engineering that enables data scientists to work effectively. In today’s episode, we explore the different sides of data engineering–the data science algorithms that need to be processed and the implementation of software architectures that enable those algorithms to run smoothly. The PANCAKE STACK is a 12-letter acronym that Chris Fregly gave to a collection of data engineering technologies

Continue reading…

Kafka Event Sourcing with Neha Narkhede

http://traffic.libsyn.com/sedaily/event_sourcing_edited.mp3Podcast: Play in new window | Download When a user of a social network updates her profile, that profile update needs to propagate to several databases that want to know about such an update–search indexes, user databases, caches, and other services. When Neha Narkhede was at LinkedIn, she helped develop Kafka, which was deployed at LinkedIn to help solve this very problem. Using Kafka as an event queue, LinkedIn adopted

Continue reading…

Kafka Streams with Jay Kreps

http://traffic.libsyn.com/sedaily/kafka_streams_edited.mp3Podcast: Play in new window | Download Kafka Streams is a library for building streaming applications that transform input Kafka topics into output Kafka topics. In a time when there are numerous streaming frameworks already out there, why do we need yet another? To quote today’s guest Jay Kreps “the gap we see Kafka Streams filling is less the analytics-focused domain these frameworks focus on and more building core applications

Continue reading…

Scaling Email with J.R. Jasperson

http://traffic.libsyn.com/sedaily/scaling_emails_Edited.mp3Podcast: Play in new window | Download “As the scale continues to increase, certain effects of architecture become less and less efficient.” When you spend money online, you expect a receipt to come in your email. When you register for a new web site, you need to verify your sign up in your email. These types of emails are called “transactional email” and sending these types of email at scale

Continue reading…

Apache Kafka’s Uses and Target Market

From Nicolae Marasoiu’s answer via Quora: Kafka is a high performance messaging system which provides an immutable, linearizable, sharded log of messages. Throughput and storage capacity scale linearly with nodes. Kafka can push astonishingly high volume through each node; often saturating disk, network, or both, while keeping a low cpu utilization. You would use Kafka in scenarios of asynchronous communication and processing pipelines, predominantly in distributed systems, cloud & big data,

Continue reading…

Demystifying Stream Processing with Neha Narkhede

“Systems are giving up correctness for latency, and I’m arguing that stream processing systems have to be designed to allow the user to pick the tradeoffs that the application needs.”

Continue reading…

Apache Flink with Stephan Ewen

“My bet is that there is going to be a big shift towards streaming technologies in the future.”

Apache Flink is an open-source framework for distributed stream and batch data processing.

Continue reading…

Databases: Fundamental Answers

Databases Week began with a set of fundamental questions. What is a database? Every interviewee during Database Week has given a different answer to the question of "What is a database?" — SE Daily (@software_daily) August 21, 2015 One definition: “an application component for storing and retrieving data”. All of the different databases companies have this functionality. But similarities end there. RethinkDB pushes data to the application MemSQL is a faster, proprietary version

Continue reading…

Transactions and Analytics with VoltDB’s Ryan Betts

http://traffic.libsyn.com/sedaily/voltdb_rbetts.mp3Podcast: Play in new window | DownloadStreaming pipelines and in-memory analytics are difficult to support with old database systems. VoltDB provides streaming analytics with transactions.     Questions How does VoltDB exemplify Michael Stonebraker’s thesis that one size does not fit all? What is the difference between OLTP and Streaming? How does VoltDB serve the common Zookeeper-Kafka-Storm-Cassandra stack? What trends and requirements among OLTP and OLAP systems are changing most

Continue reading…

Streaming SQL with PipelineDB CEO Derek Nelson

http://traffic.libsyn.com/sedaily/pipelinedb_derek.mp3Podcast: Play in new window | DownloadPipelineDB is a streaming SQL database. Derek Nelson is the CEO of PipelineDB. Questions What are continuous views? Why is PipelineDB a good fit for the Kafka+Storm+HBase-type architecture? How does PipelineDB affect the application tier or the browser tier? What are the latency guarantees for how long it takes raw data streams to be converted into the refined queries provided by a continuous view?

Continue reading…

Big Data: Fundamental Answers

Fundamental questions as big as data itself loomed at the beginning of Big Data Week. Some answers: How do customers of multiple managed big data companies deal with the heterogeneity? Confluent provides Kafka, Rocana provides ops, Databricks gives you data science, Cloudera and Hortonworks give you everything else. Each company has a proprietary layer meshed with open-source software. Generally, the more proprietary software you are running, the more you will need

Continue reading…

Apache ZooKeeper with Flavio Junqueira

http://traffic.libsyn.com/sedaily/fpj_zookeeper.mp3Podcast: Play in new window | DownloadApache ZooKeeper enables highly reliable distributed coordination. Flavio Junqueira is a committer and PMC of Apache ZooKeeper, and former VP of ZooKeeper. Questions include: Why is master election so important in Hadoop? How does a new user begin working with ZooKeeper? How do nodes “watch” each other? Should ZooKeeper be used as a message queue or notification system? What is ZooKeeper’s place in a data center

Continue reading…

Apache Kafka with Guozhang Wang

http://traffic.libsyn.com/sedaily/guozhang_kafka.mp3Podcast: Play in new window | DownloadApache Kafka is a publish-subscribe messaging system rethought as a distributed commit log. Kafka serves as the central repository for data streams in a distributed system. Guozhang Wang is an engineer at Confluent, which offers a stream data platform built using Kafka. Questions include: What is a central repository for data streams? How does Kafka improve transportation between systems? How does Kafka allow for richer

Continue reading…

Cloudera Chief Technologist Eli Collins Discusses Streaming, Batch, Business, and Open-Source

http://traffic.libsyn.com/sedaily/eli_cloudera.mp3Podcast: Play in new window | DownloadCloudera allows enterprises to leverage their data through its Hadoop platform. Eli Collins is the Chief Technologist at Cloudera. Topics include: changes to Hadoop since Cloudera’s founding Cloudera’s usage of Spark, Docker, and other open-source technologies how enterprises use batch and streaming together Cloudera’s open-source policy Should Frito Lay open source its chip-making abilities? how collaboration occurs between big, competing companies the growth of increasingly

Continue reading…