In this post, we will implement the fourth part of our system, which is the Serving Layer of Lambda architecture. We will use Akka Http to create a REST API that allows users to retrieve the processing results by accessing /getHashtagCount (more details about Akka Http can be found in An example of using Akka Http):

In the getViews() function, we read the processing results of both Batch Layer (hashtag_batchview table) and Speed Layer (hashtag_realtimeview table), then combine them into a list of Hashtag objects (hashtagList) as follows (we run the query on each individual table as Cassandra not yet supports queries like JOIN or UNION):

However, both hashtag_batchview table and hashtag_realtimeview table usually contain data with the same “hashtag” (the values of “count” are often different). Therefore, hashtagList will contain Hashtag objects that have the same “value” but different “count”. So we need to combine these object into one of which “count” is the sum of each object’s “count”. This process can be done by using .groupBy(), .map() and .sum as follows:

Thus, the full code of the Serving Layer is as follows:

Run AkkaServer.start(), then go to http://localhost:8080/getHashtagCount, we get the following result:

As already mentioned in Implementing the Batch Layer of Lambda Architecture, we run Batch Processing every 30 minutes and during this time, the hashtag_batchview table remains unchanged. However, the number of hashtag count is always updated due to changes in the hashtag_realtimeview table. Therefore, if we wait a while and then visit http://localhost:8080/getHashtagCount again, we will see that the number of hashtags and the count value of some hashtags are increased.

March 17, 2019