How I built a real-time Machine Learning system with Kafka, Elasticsearch, Kibana, and Docker

How I built a real-time Machine Learning system with Kafka, Elasticsearch, Kibana, and Docker

We will design and build a real-time sentiment analysis and hate detection system.

This is a project that I made in the “Turn Language into Action, Natural Language Hackathon by Expert.ai”.

I have always been interested in real-time systems and have always wondered how things work under the hood.

HOW? 🤔

So, I found this hackathon to be a perfect opportunity for me to learn and build something new.

Well then, Let’s ROLL!!!

Project Architecture

This is what the complete pipeline looks like. Don’t worry I will cover everything in detail.

Project Architecture

But before we move on with the tools and architecture, let me talk about our data sources.

I have used Twitter API for real-time tweets, specifically python’s tweepy library for streaming tweets. In addition to that, I have used NewsAPI for daily news articles.

I have used docker to set up all the necessary tools as containers for this project.

Now let’s talk about each component.

Apache Kafka

For ingesting the real-time data, I have used Apache Kafka.

Now, what is Apache Kafka? Well…

Apache Kafka (Kafka) is an open source, distributed streaming platform that enables (among other things) the development of real-time, event-driven applications. — IBM

Since I have used Python, there is a python client kafka-python available that makes working with Kafka relatively easy.

Using the KafkaProducer, I’ve sent the messages (Twitter and NewsAPI) via 2 Kafka topics to the KafkaConsumer. One for the tweets and the other one for the news articles respectively.

KafkaConsumer then calls the Machine Learning service to classify the sentiments of the news media articles and detect hate in the tweets.

Machine Learning service

Expert.ai turns language into data so teams can make better decisions.

Since I built this project as a part of the Expert.ai hackathon, I have used their API for sentiment analysis/classification and hate detection.

However, you can always use your own Tensorflow or PyTorch model. Also, Huggingface has some very relevant models for sentiment classifications and they are straightforward to set up. You should check them out!

I am using the Sentiment Analysis and Hate speech detection APIs from Expert.ai NL API.

Elasticsearch

Okay, we have the classified data. Now What?

We have to store that data somewhere to use it for further analytics. I have used Elasticsearch and Kibana to visualize the stored data.

You might ask, why Kibana?

Let me introduce you to the ELK stack.

“ELK” is the acronym for three open source projects: Elasticsearch, Logstash, and Kibana. Elasticsearch is a search and analytics engine. Logstash is a server‑side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to a “stash” like Elasticsearch. Kibana lets users visualize data with charts and graphs in Elasticsearch. — Elastic.co

Elasticsearch, Logstash and Kibana go hand in hand in most data engineering or data ingestion use cases. But I have omitted Logstash to keep the pipeline simple and clear to its goal.

But, you can always add Logstash and scale the pipeline further as needed.

That is enough about the ELK stack. Let’s jump into the Elasticsearch design.

Elasticsearch: The Official Distributed Search & Analytics Engine

Like databases, Elasticsearch has "Indexes". These indexes store data defined with certain mappings type. Mapping is more like a schema in other databases.

The mapping describes the fields in the JSON documents along with their data type, as well as how they should be indexed in the indexes.

Databases ~ Indexes

The above image will give you a better idea about Elasticsearch indexes compared to MySQL or PostgreSQL.

Kibana

Done with storing the messages/data in the Elasticsearch indexes? Okay, Great! We can finally use that resultant data to visualize and get more insights about the data.

We use Kibana for that.

Kibana: Explore, Visualize, Discover Data | Elastic

Your window into the Elastic Stack Kibana is a free and open user interface that lets you visualize your Elasticsearch…

www.elastic.co

Kibana is a free and open user interface that lets you visualize your Elasticsearch data and navigate the Elastic Stack.

Kibana Dashboard

This is what my final Kibana dashboard looks like. You can check out the code at my GitHub repo.

⭐ Feel free to leave a star if you like the project.

This part covers only the idea or the overview of the project along with the project architecture. I’ll soon add the coding section in a separate part so stay tuned for that


That’s all folks. See you soon 👋

Happy coding.