In this tutorial series, we will learn how to use Structured Streaming – a Spark’s stream processing engine built on Spark SQL. In Structured Streaming, stream data is consider as an unbounded table where arriving data is like a new row being appended to this table every trigger interval (eg: every 1 sec, 3 secs,..). So, the Stream processing model of Spark is similar to the model of Batch processing (trigger interval acts like batch interval) and we can use stream queries just like queries in Batch processing.

Here we need to distinguish between Spark Streaming and Structured Streaming. In simple words, Spark Streaming is consider as an old version based on RDD while Structured Streaming is a new version based on Dataset/DataFrame. It is recommended to use the new version as it has more updated functions. Therefore, in this tutorial series we will only focus on Structured Streaming. However, if you want to know about Spark Streaming, please visit here.

November 20, 2018