Lambda architecture is a data processing architecture introduced by Nathan Marz [1]. It takes the advantages of both batch processing and stream-processing to handle a large amount of data effectively. Lambda architecture consists of 3 layers: Batch layer, Speed layer, and Serving layer.

https://i1.wp.com/www.jamesserra.com/wp-content/uploads/2016/08/hadoop_summit_2015_takeaway_the_lambda_architecture-picture_1.png?fit=645%2C293&ssl=1Batch layer acts like a ‘Data lake’ to store all collected data and process this data with batch processing (we need to pre-define the batch interval, which is the time between two batch processing, such as 30 mins, 3 hours or 1 day…). The advantage of using Batch processing in Lambda Architecture is because the collected data can be duplicated or contains unnecessary information. Therefore, we need an intermediate step to preprocess and clean the raw data. Batch processing can also address the late arrival problem caused by transmission disruption or being collected much later than the time of posting. In addition, the result of each Batch processing is updated frequently, thereby improving the accuracy of the system and fixing the incorrect result of the previous Real-time processing.

Speed layer is responsible for processing data in real time, thereby accomplishing the Batch layer as Batch layer has long latency and thus unable to process newly recieved data. However, the result of Speed layer is usually not as good as Batch layer due to limited processing time. At this layer, we can use a number of open source tools such as Apache Storm, Apache Spark or Apache Flume,…

Serving layer is responsible for storing outputs of Batch layer and Speed layer, therefore we can use database tools for this layer such as Apache Cassandra, MongoDB or ElasticSearch,…

The way Lambda Architecture work can be summarized into following five steps [1]:

LA overview1. All collected data is passed to both Batch layer and Speed layer

2. Batch layer carries out two works: storing data and processing data to produce batch views

3. Serving layer indexes batch views for quick access

4. Speed layer only processes recent data in real-time and produce real-time views

5. The system returns the result of users’ queries by combining both batch views and real-time views.

[1] “Lambda Architecture » λ lambda-architecture.net.” [Online]. Available: http://lambda-architecture.net [Accessed: 09-Jan-2019].

January 9, 2019
ITechSeeker