Stream processing

The stream processing component itself consists of three main sub-components, which are:

  • The Broker: that collects and holds the events or data streams from the data collection agents
  • The Processing Engine: that actually transforms, correlates, aggregates the data, and performs other necessary operations
  • The Distributed Cache: that actually serves as a mechanism for maintaining common datasets across all distributed components of the Processing Engine

The same aspects of the stream processing component are zoomed out and depicted in the diagram that follows:

There are a few key attributes that should be catered for by the stream processing component:

  • Distributed components thus offering resilience to failures
  • Scalability to cater for the growing needs of an application or sudden surge of traffic
  • Low latency to handle the overall SLAs expected from such applications
  • Easy operationalization of a use case to be able to support evolving use cases
  • Built for failures, the system should be able to recover from inevitable failures without any event loss, and should be able to reprocess from the point it failed
  • Easy integration points with respect to off-heap/distributed cache or data stores
  • A wide variety of operations, extensions, and functions to work with the business requirements of the use case

These aspects are basically considered while identifying and selecting the stream processing application/framework for a real-time use case implementation.