Stream Processing vs. Complex Event Processing

From Srinath Perera’s answer via Quora:

I will try to give a description of current status (as of 2015) as opposed to a definition. If you are looking for a definition, the best would be What’s the Difference Between ESP and CEP?

As the above picture shows, technically CEP is a subset of Event Stream Processing. However, Stream processing engines and CEP engines are pretty different and they come from different background. Use cases they target and issues they choose to handle or not handle are different.

Stream processing engines let you create a processing graph, and inject events into the processing graph. Each operator processes and send events to next processors. In most Stream processing engines like Storm, S4, etc, users write code to create the operators, wire them up in a graph and run them.  Then the engine runs the graph in parallel using many computers.

In contrast, CEP engines let users write queries using a higher level language such as a SQL like query language. They come from stock market related use cases where they generate a response within milliseconds. Also, CEP has build in operators such as time windows, temporal event sequences etc (see Patterns for Streaming Realtime Analytics).

Following are few common differences between the two types of engines found in practice ( I am generalizing, but they will help you to understand).

  1. Stream Processing Engines tend to be distributed and parallel natively (10-100s of nodes) as oppose to CEP engines tends to be more centralized ( 2 or few nodes).
  2. With most Stream Processing Engines, you need to write code. Also it force you to implement higher order operators like Windows, Temporal Patterns, and Joins yourself while CEP engines support them natively. CEP engines often have a SQL like query language.
  3. Due to their stock market based history, CEP engines are tuned for low latency. Often they respond within few milliseconds and sometimes sub milliseconds. In contrast, most Stream processing engines takes close to a second to generate results.
  4. Stream Processing engines focus on reliable message processing while CEP engines have often opt to throw away some events when failure happens and continue. ( Yes this is changing now with new use cases).

CEP engines were around for a long time. Their history goes back to 90’s (see CEP Market players – end of 2014 – from Paul Vincent). They were used in decent number of real world use cases. However, they were niche and expensive.

In the aftermath of Big Data taking off, people started to look for a streaming analytics solution that is like Hadoop. Storm was created at that time, and it is a Stream processing engine. It mirrored the MapReduce model, where you can write some code and attach it to a processing graph. It stole the limelight and outshone CEP solutions.

Meanwhile, CEP was pretty much excluded from the spot light. It is worth noting that analysts did not do that. For example, in this 2008 Gartner report CEP has been mentioned and CEP is mentioned ever since. (https://eventprocessing.wordpres…).

Now another trend, IoT, might bring CEP back into lime light. This is due to three main reasons.

  1. IOT data are time series data where data is autocorrelated. CEP is much better placed to handle them.
  2. Most IoT use cases deal with real life. If you are to act on those insights, you need those insights very fast. CEP has an advantage in turnaround time.
  3. Most IoT use cases are complex, and they go beyond calculating or aggregating data.  Those use cases need support for complex operators like time windows and temporal query patterns.

Having said that, most IoT use cases would have very high event rates. So what ever event technology is used in those use cases needs to be able to scale up. This is something Stream processing does better than CEP. These two markets are merging and it is not clear which one will win in the yet.

Having said that, the differences are diminishing. IBM infosphere, which is a stream processing engine, have had CEP like operators for a long time. WSO2 CEP can now accept SQL like queries and runs on top of Apache Storm. SQL stream is a CEP engine that is highly parallel. My belief is that we will end up with a combination of both and we all will be better off for it