Sean Owen, Director, Data Science @ Cloudera via Quora
Although people use the word in different ways, Hadoop refers to an ecosystem of projects, most of which are not processing systems at all. It contains MapReduce, which is a very batch-oriented data processing paradigm.
Spark is also part of the Hadoop ecosystem, I’d say, although it can be used separately from things we would call Hadoop. Spark is a batch processing system at heart too. Spark Streaming is a stream processing system.
To me a stream processing system:
A batch processing system to me is just the general case, rather than a special type of processing, but I suppose you could say that a batch processing system:
I sometimes hear streaming used as a sort of synonym for real-time . Real-time stuff usually takes the form of needing to respond to an event in milliseconds, as in a synchronous API. This isn’t streaming to me.