This week, we continue to look at data processing patterns used to build event triggered stream processing applications, the use cases that the patterns relate to, and how you would go about implementing within Wallaroo.
This purpose of these posts is to help you understand the data processing use cases that Wallaroo is best designed to handle and how you can go about building them right away.
I will be looking at the Wallaroo application builder, the part of your application that hooks into the Wallaroo framework, and some the business logic of the pattern.
You should also check out all the posts in the “Wallaroo in Action” category.
Pattern: Analyzing Trends
What makes stream processing different from alternatives like batch processing is that we continuously run our application logic over data as it comes in. Rather than running that logic in periodic intervals.
Similar to triggering alerts based on an event, sometimes you want to perform more detailed analysis on events in your application. We can all benefit from more monitoring or testing of user interface improvements.
Some specific examples could include:
- A/B testing
- Analyzing rage clicking in your application
- Determining infrastructure load based on user location
- Sentiment analysis of reviews for your product
- Updating the “Most Popular” filter for an e-commerce website
In this post we continue to use Twitter’s tweet api like in previous Wallaroo posts; Real-time Streaming Pattern: Preprocessing for Sentiment Analysis and Identifying Trending Twitter Hashtags in Real-time with Wallaroo
Like in the more detailed “Identifying Trending Twitter Hashtags in Real-time with Wallaroo” post, imagine we are creating an application showing the trending hashtags on Twitter.
The application itself requires only two computations. One to extract the hashtags in a tweet and a stateful computation to keep the count of hashtags.
Wallaroo Application Builder
ab.new_pipeline( "Analyzing Trends", wallaroo.TCPSourceConfig(in_host, in_port, in_decoder) ) ab.to_parallel(extract_hashtags) ab.to_stateful(compute_hashtags, HashtagState, "hashtags state") ab.to_sink(wallaroo.TCPSinkConfig(out_host, out_port, encoder)) return ab.build()
ab.new_pipeline( "Analyzing Trends", wallaroo.TCPSourceConfig(in_host, in_port, in_decoder) )
We define a new pipeline named
"Analyzing Trends" and the source of the data, in this case it’s coming from a socket over TCP.
Our first computation, is on that in parallel calls
extract_hashtags. Like other examples in the past
extract_hashtags isn’t modifying state and by calling
to_parallel we’re able to send the computation to all the workers available.
ab.to_stateful(compute_hashtags, HashtagState, "hashtags state")
Takes a commutation, a state class, and a name.
HashtagState describes how we would define our state. More information on
State in our documentation.
ab.to_sink(wallaroo.TCPSinkConfig(out_host, out_port, encoder)) return ab.build()
Lastly we tell Wallaroo the host and port to send the results. This could be another application listening on that host and port or a Kafka topic. In the case of our Twitter trending hashtags example, this data eventually was rendered on a webpage so the user could see the top 10 hashtags in real-time.
There are many different use-cases for stream processing and I hope this provided a good overview on how you could go about integrating Wallaroo into your infrastructure.
If you’re interested in running the example application from the Identifying Trending Twitter Hashtags in Real-time with Wallaroo example you can find the repository here.
Wallaroo’s lightweight API gives you the ability to construct your data processing pipeline and run whatever application logic you need to power your application.
Did you enjoy this post?
Subscribe to our blog and be updated each time we add a new post.