Practical Real-time Data Processing and Analytics
Shilpi Saxena Saurabh Gupta更新时间:2021-07-08 10:23:51
最新章节:Summarycoverpage
Title Page
Copyright
Practical Real-Time Data Processing and Analytics
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Why subscribe?
Customer Feedback
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
Introducing Real-Time Analytics
What is big data?
Big data infrastructure
Real–time analytics – the myth and the reality
Near real–time solution – an architecture that works
NRT – The Storm solution
NRT – The Spark solution
Lambda architecture – analytics possibilities
IOT – thoughts and possibilities
Edge analytics
Cloud – considerations for NRT and IOT
Summary
Real Time Applications – The Basic Ingredients
The NRT system and its building blocks
Data collection
Stream processing
Analytical layer – serve it to the end user
NRT – high-level system view
NRT – technology view
Event producer
Collection
Broker
Transformation and processing
Storage
Summary
Understanding and Tailing Data Streams
Understanding data streams
Setting up infrastructure for data ingestion
Apache Kafka
Apache NiFi
Logstash
Fluentd
Flume
Taping data from source to the processor - expectations and caveats
Comparing and choosing what works best for your use case
Do it yourself
Setting up Elasticsearch
Summary
Setting up the Infrastructure for Storm
Overview of Storm
Storm architecture and its components
Characteristics
Components
Stream grouping
Setting up and configuring Storm
Setting up Zookeeper
Installing
Configuring
Standalone
Cluster
Running
Setting up Apache Storm
Installing
Configuring
Running
Real-time processing job on Storm
Running job
Local
Cluster
Summary
Configuring Apache Spark and Flink
Setting up and a quick execution of Spark
Building from source
Downloading Spark
Running an example
Setting up and a quick execution of Flink
Build Flink source
Download Flink
Running example
Setting up and a quick execution of Apache Beam
Beam model
Running example
MinimalWordCount example walk through
Balancing in Apache Beam
Summary
Integrating Storm with a Data Source
RabbitMQ – messaging that works
RabbitMQ exchanges
Direct exchanges
Fanout exchanges
Topic exchanges
Headers exchanges
RabbitMQ setup
RabbitMQ — publish and subscribe
RabbitMQ – integration with Storm
AMQPSpout
PubNub data stream publisher
String together Storm-RMQ-PubNub sensor data topology
Summary
From Storm to Sink
Setting up and configuring Cassandra
Setting up Cassandra
Configuring Cassandra
Storm and Cassandra topology
Storm and IMDB integration for dimensional data
Integrating the presentation layer with Storm
Setting up Grafana with the Elasticsearch plugin
Downloading Grafana
Configuring Grafana
Installing the Elasticsearch plugin in Grafana
Running Grafana
Adding the Elasticsearch datasource in Grafana
Writing code
Executing code
Visualizing the output on Grafana
Do It Yourself
Summary
Storm Trident
State retention and the need for Trident
Transactional spout
Opaque transactional Spout
Basic Storm Trident topology
Trident internals
Trident operations
Functions
map and flatMap
peek
Filters
Windowing
Tumbling window
Sliding window
Aggregation
Aggregate
Partition aggregate
Persistence aggregate
Combiner aggregator
Reducer aggregator
Aggregator
Grouping
Merge and joins
DRPC
Do It Yourself
Summary
Working with Spark
Spark overview
Spark framework and schedulers
Distinct advantages of Spark
When to avoid using Spark
Spark – use cases
Spark architecture - working inside the engine
Spark pragmatic concepts
RDD – the name says it all
Spark 2.x – advent of data frames and datasets
Summary
Working with Spark Operations
Spark – packaging and API
RDD pragmatic exploration
Transformations
Actions
Shared variables – broadcast variables and accumulators
Broadcast variables
Accumulators
Summary
Spark Streaming
Spark Streaming concepts
Spark Streaming - introduction and architecture
Packaging structure of Spark Streaming
Spark Streaming APIs
Spark Streaming operations
Connecting Kafka to Spark Streaming
Summary
Working with Apache Flink
Flink architecture and execution engine
Flink basic components and processes
Integration of source stream to Flink
Integration with Apache Kafka
Example
Integration with RabbitMQ
Running example
Flink processing and computation
DataStream API
DataSet API
Flink persistence
Integration with Cassandra
Running example
FlinkCEP
Pattern API
Detecting pattern
Selecting from patterns
Example
Gelly
Gelly API
Graph representation
Graph creation
Graph transformations
DIY
Summary
Case Study
Introduction
Data modeling
Tools and frameworks
Setting up the infrastructure
Implementing the case study
Building the data simulator
Hazelcast loader
Building Storm topology
Parser bolt
Check distance and alert bolt
Generate alert Bolt
Elasticsearch Bolt
Complete Topology
Running the case study
Load Hazelcast
Generate Vehicle static value
Deploy topology
Start simulator
Visualization using Kibana
Summary
更新时间:2021-07-08 10:23:51