Stream Processing at Uber with Danny Yuan
“Be aggressive in vision, but conservative in operation.”
Uber is a transportation company with a high volume of temporal spacial data, constantly being collected from the devices of its users. At any given time, the engineers and data scientists at Uber need to be able to query the system, and understand what is going on with drivers and riders.
The unique real-time engineering requirements of Uber lead to an interesting architecture. Danny Yuan joins us today to discuss Uber’s data engineering stack and how the company makes use of its streaming data.
- What are the application requirements for the Uber stream processing marketplace?
- What types of data are you ingesting, and how do you transform it before you can analyze it?
- Why can’t you store the data in a key-value store?
- Have you looked into Alluxio and what are your thoughts on it?
- How do you use Samza and Spark differently?
- Were there any tradeoffs you made when you built your streaming pipeline under a tight deadline?
- How do you know whether a new piece of data infrastructure technology is worthwhile or not?
- Why are there so many stream frameworks out there?