Apache Kafka & Confluent Cloud Crash Course for Beginners | AMG Inc | Skillshare
Drawer
Search

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Apache Kafka & Confluent Cloud Crash Course for Beginners

teacher avatar AMG Inc, Technologist

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

    • 1.

      Introduction to Course

      3:42

    • 2.

      What is Streaming Analytics

      5:27

    • 3.

      What is Apache Kafka

      3:02

    • 4.

      Architecture of Kafka

      4:40

    • 5.

      Managed Solution Kafka

      1:38

    • 6.

      Quick Starting with Confluent Cloud

      22:02

    • 7.

      Introduction to KSQLDB

      1:53

    • 8.

      Using KSQLDB

      19:02

    • 9.

      Overview of Kafka Streams

      2:47

    • 10.

      Introduction to Kafka Connect

      2:29

    • 11.

      Project1 Book Catalog: Overview

      0:57

    • 12.

      Project1 Book Catalog: Create Producer

      2:55

    • 13.

      Project1 Book Catalog: Create Topic

      1:08

    • 14.

      Project1 Book Catalog: Produce Messages

      1:29

    • 15.

      Project1 Book Catalog: Create Consumer

      2:07

    • 16.

      Project1 Book Catalog: Consume Messages

      2:56

    • 17.

      Project2 PacMan: Overview

      1:08

    • 18.

      Project2 PacMan: Configure and Run

      6:32

    • 19.

      Project2 PacMan: Setting up Pre Requisites

      2:30

    • 20.

      Project2 PacMan: Setting up KSQLDB

      4:53

    • 21.

      Project2 PacMan: Run Project and Sink Data

      11:14

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

109

Students

--

Projects

About This Class

Welcome to the Confluent Cloud Introductory Course. Confluent employs the use of Apache Kafka to stream events in real-time, which has become a necessity for the majority of Fortune 500 companies. Real-Time Event Streaming is the means through which many companies are making decisions at the right times and are avoiding significant financial losses. We cover the introductory part for each of them with fundamental concepts covered with use case examples. This course is designed for students who are at their initial stage in learning Cloud Computing and Event Streaming and is best suited for those who want to start their career in this area.

This course focuses on what Confluent is and how it can be used to stream events and data in real time. It also includes Practical Hands-On Lab Exercises which cover a major part in Deploying and Orchestrating Applications.

We use a combination of the CMD line /Terminal Interface and Programming to launch any Application of your choice as a Microservice Architecture. The Programming Part mainly includes writing shell Scripts and then using Command Line Commands to execute them and get the results we want.

Even if you don’t have any previous experience using any of these technologies, you will still be able to get 100% of the benefit from this course.

Meet Your Teacher

Teacher Profile Image

AMG Inc

Technologist

Teacher

Our company goal is to produce best online courses in Data Science and Cloud Computing technologies. Below is our team of experienced instructors.

Instructor1

Adnan Shaikh has been in the information technology industry for the past 18 years and has worked for the fortune 500 companies in North America. He has held roles from Data Analyst , Systems Analyst, Data Modeler, Data Manager to Data Architect with various technology companies.

He has worked with Oracle , SQL Server, DB2 , MySql and many other relational and non relational databases over the course of his career. He has written multiple data access applications using complex SQL queries and Stored Procedures for critical production systems.

He has a masters degree from Northwestern University of Chica... See full profile

Level: Beginner

Class Ratings

Expectations Met?
    Exceeded!
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction to Course: Hi, welcome to the introduction to the console cloud course. My name is Shannon, and I look forward to our time together in this course. For the duration of this course, you'll be focusing on the following. First, we're going to teach you in demand events, streaming abilities in confluent through hands-on lab exercises. We will also teach you the use of the console cloud environment, which is extremely easy to use and very functional. The key concepts in this course are explained with appropriate visuals so that the cement easy and that you understand them quickly. We're also going to have a capstone project at the end of the course. You will find regular quizzes during the length of this course to test your knowledge on what you have learned, as well as cement some fundamentals as we go. Meet our team of instructors who have created and designed this course. Together, we have just over ten years of experience in practical implementations of technology in teaching technical courses on the university level. Our team members have specializations in the areas of information technology, software engineering, and data science and design. Learning roadmap that we will be following for this course. We're going to start off with Apache Kafka and teach you everything you need to know about it in order to work with contours. The next step is actually showing you confluent and showing you what it's all about and setting it up for you. After that, we're gonna take a look at case equal db, how we use that and what its applications are. Moving on from there, we're going to explore streams, what they are, the libraries involved, and how you can work with them. Lastly, we're gonna look at Catholic connect as to how we can work with that and how we can link our projects externally. After we're done with all these concepts, we're going to go ahead and create two projects. Kafka is a data streaming service that allows the producer to write messages to a topic that have been consumed with the consumer. Though this concept is extremely simple in practice, things can get a little confusing and lost along the way. That's why we tried to explain the concepts this weekend. With visual explanations Among confluent is a Cloud-based events streaming service. It provides the environment in the form of clusters for prevents streaming and provides very nice visualization of operations. That said, you're gonna be seeing this screen a lot for the duration of this course. Beyond the theory and explanation three parts of this course for home. Similarly to be working a lot in biopsy, we're going to be seeing how we can run in our console cloud and Kafka using the CLI. And we're going to set up a project where we're going to make something amazing and seeing how it produces all that information into our events streaming is a big goal phrase. And Fortune 500 companies today, these are just some of the companies that need events streaming experts to fill their day to day operations and critical software development needs. We look forward to having you join our course. And we promised that this course will help you build your foundation of events streaming and console of knowledge. This will help you make your resume stand out and demand a competitive salary in the marketplace. 2. What is Streaming Analytics: Hello, and welcome to this course where I will be teaching you how to use confluent and your events streaming applications. This first section, we'll be discussing Apache Kafka and everything you need to know about it in order to understand and use confluent as a service. First, we need to understand what the buzzword streaming analytics even means and why it's so popular these days. What is streaming analytics? In layman's terms? It's the constant processing of data as it's generated and received in real-time. This data is found in most places. For example, the most common form of data that is used to be analyzed in streaming analytics is social media information. The moment you perform something on social media, that event is streamed in real-time to be processed in order to show you content that is more relevant to your likes as opposed to things you wouldn't like to see. In the past. And actually in many places still today, data was sent in batches rather than as a continuous stream. Data was collected up to a time interval. And when the internal midpoint arrived, the data was sent as one large chunk to be processed. Though this process is much cheaper than real-time data streaming. It does not give real-time insights and cannot be used for instantaneous actions and real-time graph generation. Streaming analytics was created to solve this problem within batch processing, that real-time decisions could be made and actions could be taken. Now let's look at how streaming analytics works. The first part of events streaming is the event that's produced. This event could be produced by either some device or application or any other producer for that matter, it's essentially just a message. That message is sent to a hub. That Q is the message behind other messages and makes it ingestible within the streaming analytics service. Within the streaming analytics phase, it's decided what to do with that message. Whether to present it on a graph on some dashboard, or to send a command to perform a certain action or event just to simply store that message. That's all decided within the stream analytics part. Lastly, the message is sent onwards from the Stream Analytics service to be consumed by the end consumer. Those are process may seem a bit technical. And you might be wondering at this point why even go through such a process in the first place. There's a fair amount of advantages to using streaming analytics. Using data visualization. Even someone who can't read the numbers right, can make sense of how things are progressing. Furthermore, it gives you a competitive edge as you get real-time insights and can make moves faster than your competition may even realize. It also gives you a much deeper and clearer insight to operations as you know what operations are being performed in real time. The ability to get information as soon as it's produced can both create as well as identify lost opportunities in the business world and more. Lastly, identifying trends with streaming analytics can help mitigate losses by alerting the company of the right things at the right time. A lot of companies in the market today employ the usage of streaming analytics with even more so adopting it each and every day. Most organizations are discovering their need for real-time continuous data and how much they serve to gain from it. It may not be apparent on the surface of it, but in many industries are natural candidates for data streaming. For example, consider the following industries. Finance, it's critical to know if a fraud is being committed in real-time. Otherwise it could lead to a huge loss for the company. Furthermore, real-time market analysis is a pivotal element in business decisions in e-commerce. Let us look at the father of all e-commerce says Amazon. Amazon deals with an absolutely huge amount of data. And it uses that data to continually adjust the deals and values of their products to optimize their cells. Furthermore, it helps in product recommendations for customers as well as optimizing the logistics of the entire operation. Amazon actually has its own product for this, it's called Amazon Kinesis. In sports. Let's take a look at Formula One. Each Formula One car produces gigabytes of data through the sheer number of instruments that are installed. During the race alone, teams collect a tremendous amount of data that they need in order to make real-time decisions that can make or break the race for them. The same logic also applies to online gaming, where technical details and player data are used to optimize the game's performance and fine tune the overall experience. 3. What is Apache Kafka: When you say the term streaming analytics, you will find that it is mostly associated with Apache Kafka. That's because apache Kafka is the most mainstream data streaming service that most companies use for their data streaming requirements. Let's explore what Apache Kafka is. In a textbook. Apache Kafka is a data streaming service. It is open-source and used by thousands of companies for streaming analytics. In a simpler definition, it provides the service we discussed back when we looked at that flow diagram of how streaming analytics works. What Apache Kafka does is it takes events or messages from a producer. Now that producer could be an IoT device or a smartphone, and it pushes it to Apache Kafka. Apache Kafka then manages it in such a way so that it can be given to many consumers, eliminates the need for having individual data streams from each producer to a consumer. And essentially makes it so that each producer and consumer only has a single data stream that is unaffected by external factors. Let's look at an example for a better understanding. Now, saying that you have to producers and three consumers without Apache Kafka, each producer needs a stream to a consumer with two producers and three consumers. That gives us a total of six streams to manage. These streams are coupled, which is to say that they may choke due to external factors. Slow consumer may affect the producer. Adding consumers would affect the producers. The failure of a consumer would block an entire stream. There's many weaknesses in a system like this. Let's take the system from before and add three more producers and to more consumers. Without Apache Kafka, we would need to maintain and manage 25 streams. Now that's a lot of streams and a lot of maintenance, which also means high costs as well as added risk. With Apache Kafka, we would only need ten streams and have the added a surety that those streams are decoupled and will not be affected by a producers or consumers ability to perform their tasks. From the example, you should now be able to tell how advantageous Apache Kafka is and how important to service it provides the benefits of Kafka or that it is very fast, easily scalable, endlessly reliable, extremely durable, and best of all, it's open source. However, you can also find it as a managed service, as we will be seeing later during this course. 4. Architecture of Kafka: Now that we know what Apache Kafka is and what it's used for. Let's see how it works on the inside. Within Apache Kafka, you will find the following competence. Producers, consumers, brokers, topics, partitions, clusters, and the zookeeper. We'll be looking at each confidence separately and seeing what they do. Producers. These are the elements that produce the events are messages that are sent to Apache Kafka. As we discussed previously, these could be a device or an application. The messages that producers send are written to a topic with similar messages inside. It's important to understand that multiple producers can exist within the same application. All the messages that are producer produces sense straight to the Kafka cluster. Consumers. These are the guys at the opposite end from the producer. They're there to take the messages from the producer and read, or in a more technical sense, consume them. Consumers subscribe to topics. And when messages are sent to those topics, the consumer consumes those messages as they come. When working with Apache Kafka, multiple consumers may consume the same message from the same topic. Though messages are consumed, they aren't destroyed after the process. That's one of the beauties of Apache Kafka. Consumers may consume different types of messages from the clusters as well. Furthermore, consumers know exactly where the data they need to consume is located within the Kafka cluster. Brokers. When we discussed how producers send messages to Apache Kafka, where they actually send those messages to are the brokers within the Kafka cluster. Brokers are the Kafka servers that receive and then store these messages for the consumers to take and consume them. A Kafka cluster may have multiple brokers, and each broker manages multiple partitions. As we will see soon. Topics. These are simply the defined channels through which the data is streamed. Producers produce their messages to topics. Consumers subscribe to topics to consume the messages within them. It's basically just a means to compartmentalize and organize messages and ordered them by their particular traits. Topics of unique identifying names within the Kafka cluster. And they're gonna be any number of topics. There's no defined limit. Partitions. Topics are divided into these partitions and are replicated to other brokers. Multiple consumers may read from a topic in parallel with the help of this, with partitions, producers can add keys to their masters to control which partition the message goes to. Otherwise it just goes around and around Robin pattern, where one partition gets one message and the other partition gets the next one, and so on and so forth. Keys allow the producer to control the order of message processing, which can be handy if the application requires that control over records. Just like topics, there's no defined limit on partitions. Given that the cluster is processing capacity can handle and manage IT. Clusters. These are the systems that manage the brokers. It is essentially the entire architecture of brokers that we call a cluster. Messages are written to topics that are within brokers, that are within clusters. Those messages are then read by consumers following the same hierarchy. The Zookeeper, this element is responsible for managing and coordinating the Kafka cluster, sort of like a conductor to an orchestra. It notifies every node in the system when a topology change occurs. This can be the joining of a new broker or even the failing of one. The Zookeeper also enables leadership elections between brokers and topic partition pairs to determine which broker should be the leader for a particular partition and which ones hold replicas of that data. In layman's terms, the Zookeeper manages and coordinates everything that the Kafka cluster does and provides fail-safe for a rainy day. 5. Managed Solution Kafka: Let's look at the managed solutions available in the market for Apache Kafka. At this time in 2021, there are a number of managed solution providers for Apache Kafka. One of them being confluent. Apart from that, we also have Azure, HDInsight, AWS, Kinesis, instead cluster. And even there's many advantages to paying a managed service provider for Apache Kafka as opposed to using the open source flavor of it. There's many hassles associated with open source kafka and a lot of management and maintenance is required to be done regularly. There's so many operations that need to be performed just to make things run. With a managed service. You don't need to worry about performing any operations, hence the term no ops. Furthermore, you don't have to download any files or manage any files locally. It's all done on the Cloud. Lastly, you can be sure that you'll be running Kafka smoothly without any hitches. Think of using open source as a free car engine. It's powerful and can generate a lot of torque, but by itself, it can't really do anything. Not until you construct the rest of the parts needed in order to make the engine useful. Managed solutions by comparison, are like fully fledged cars ready to drive. No hassle of setting much anything up, just turn a key and get going. It's that convenience that a lot of companies look at when considering managed solutions. 6. Quick Starting with Confluent Cloud: Kick-starting the confluent Cloud. We're going to kick things off by getting some basic tasks done within conflict. We're going to set up a cluster than a topic. And then we're going to see how we can create producers and consumers that will send messages to and from that topic using the Console and Cloud CLI. In order to get started with confluent, the first thing we're going to have to do is go to the console website, which is just confluent.io. Once there, you can see the option to get started for free. When you click on it, you'll be presented with your deployment options and you can simply choose on deployment option and stark free. Here you can fill out your full name, your company, email country, and pick a password and just click on this button. When you do that, as it says here, you will receive $400 to spend within Console and Cloud during your first 60 days. When you fill out this form and you click Start free, it'll send you a link to your email where you will have to confirm the ownership and access to that e-mail, from which point you can log in to conflict. Since we already have an account with confluent, we won't be doing this, instead will be logging into confluent. In order to do that, what we're going to have to do is go back to the Get Started page. And it just says on the bottom here, I have an account login. There we go. Open this up in a new tab where we can login. There we go. That's my e-mail. I click Next, that's my password. I click Next. Now this is the point where conflict is going to start giving you a tutorial on how you are supposed to set up your clusters within Console and Cloud. This is an excellent tutorial if you want to take the time, I do recommend that you go through it and read up on it. Though, for our projects, we will be providing all the materials that you will need in order to create them. There's this button right here that says Create Cluster. Go ahead and click it. And now you'll be presented with three different options for creating a cluster. You have your basic cluster, your standard cluster, and your dedicated one. Now, the basic one is the one that we will be using because as you can see, it has no based cost associated to it. And it has plenty of usage limits. You have an ingress of a 100 mgs, egress of a 100 MBAs, and storage of up to 5 thousand GBs. It can support high data flow. We wouldn't be going anywhere near these numbers, not in our projects. We can just select the basic type for the sake of knowledge, it's good to know that if you went for the standard option, it will cost you $1.50 per hour. And it can support more storage. Actually, you can take it up to unlimited storage. That's just the only benefit really that you're going to get out of taking standard over basic. When you go over to dedicate it, you can see that as you progress this bar, you get more and more benefits. Then you reach a point where they tell you that's got in touch so we can manage something out for Dedicated Clusters. But since we're not going for either of these options, let's just get a basic cluster going. When you come to this region, area where they ask you what region you want to select. Over here you can see how confluent is actually partnered with AWS, gcp, and Azure. And you can see that you have service elections from all of these, you can choose an AWS server, a GCP server, or an Azure server, whichever one you like, or whichever services you're already using. Its ideal normally that if you want to connect kafka to a service that you already have running, that you should create the cluster in a region using a service that you already have going before. So for example, say you have an Azure IoT Hub. You want to use confluent for the event streaming, then you should probably use Microsoft Azure for hosting for your confluent cluster. Generally, it makes things smoother and makes things easier and there's less hassles and the Connect process, but more than that, it's just easy to manage and it reduces the latency for our projects since we don't really have anything else going, we can just go straight to there we go. We can just choose. Isn't any Asia, Singapore, There we go. We don't need a multi-zone availability. Single zone works perfectly fine for us. So that's exactly the one we're going to go for. Now when you come to this third page, it'll prompt you, it'll ask you for a credit card. And it will charge that credit card initially, but you will be refunded your amount, and it needs a credit card to verify. So in case you go over your limit or you go beyond the 60 days, it sends you a notification to let you know, hey, if you want to keep using confluent, let us know, will charge it on the card and we'll just keep whatever clusters you have in projects you have going. So this cluster is perfectly fine. We can call this cluster cluster underscore 0, that works for us. And we can just launch this cluster. There we go. With this. We've set up our console cluster. Now let's proceed with making a topic within our cathode cluster, as well as logging into our confluent Cloud through our CLI. For that, let's head over to confluent right Now. From here, we can just go back to the Getting Started link and just login. There we go. That is my e-mail, so I will hit Next, that is my password. I will login. There we go. Once you have a cluster setup, this is what you're going to see inside confluent. It's going to show you that you have one cluster setup. It is live, it is working. So just go ahead and click on this cluster that you've made. Once you're inside, you will see the dashboard. You'll see that it gives you a bunch of options that you can pipe data in, configure a client if you wanted to set up the CLI. We're gonna be setting up the CLI soon, but before that we're going to set up a topic. Let's go over to topic right here and create a topic. We can give this topic any name. We can say that it has subjects. Why not? We can make this school courses, and we can just leave this at this default, default while you six, it won't really change much of anything, so we can just go create it with its defaults. We had over two messages. So we can produce new message to this topic. We can say that there we have a mountain view CA, all the rest of it. Okay, so since we don't want to send any technical message for the time being at this point, we're just trying to understand what topics are, what they do, and how messages would be sent back and forth and how we'd see them. Let's forget about all of this technical bits of code that are in the value section and we can just give it any value, say that we give it the subject of English. That's always a fun subject. And we can turn that volume to one. You can produce that. There we go. English partition tree, awesome. Then we can say that we want to produce not. You can say that we want another subject, math. And I guess the last one would be science. We say that we want our final message to be science. There we go. Now we have three messages that we have setup. These are just three subjects that we've said that, okay, These are three messages under the topic of subject or subjects. In our case, as we can see here, subjects. And we just want to pass these three messages along that we have science, math, and English as our subject. Now that we've set up our topic and we have a few messages inside our topic now, we should probably go ahead and try to set up our CLI so that we can actually then set up a consumer. The constructions are really simple. If you go over here onto the setup CLI page, all the instructions are right there in front of you and you can just copy these commands over. If you're working on a Mac, this should be fairly simple at this command will work out of the box. There won't be any problems. However, if you are like me and you are using, are trying to set up confluent Cloud or CLI on a Windows machine, you do need a Linux terminal in order to do this. If you are on Windows 11, to get a window, the Ubuntu terminal is pretty easy. All you have to do is go to the Windows Store. From here, you can just search for Ubuntu. Ubuntu, you will get this Ubuntu app as well as Ubuntu 12.04 LTS and 18.04 LTS. Since I already have this installed, I just have to go ahead and open it. I do have to advise you guys though, that if you are going to set up a boon to like this, you will need Windows Hypervisor if you are working with Intel machines or if you are working with AMD machines, you do need to enable SVM to get this to work. Okay, so now that we have, are The Ubuntu terminal open, it's probably time to run our CLI commands. However, I won't be running this curl command since I've already run it and I don't want to fall into the problem of duplicate directories. You should probably go ahead and run this first. It should give you a C Cloud installed afterwards. And when that happens, all you need to do is just run a simple command that says see cloud update. And it'll check for updates on C cloud to see if there are any. There should probably be a major update for me, but I'm not going to be updating it right now. Yeah, there we go. The only update is a major version and I don't have time for that right now. I'm gonna leave it afterwards. I think it's time to now log in to num2. Now another thing we have to do from inside our CLI or terminal is we need to gain access to our Console and Cloud from in here. In order to do that, we need to have some means of communicating to our Console and Cloud. To communicate with the Console and Cloud, you need to head over to data integration on the left side, and you'll see a little tab here that says API keys. Over here, you can create an API key. It'll give you two options here. Let's say you can either have global access or you can have granular access. For this, I would recommend that you go for global access. Once you have global access, you can go ahead and use these credentials to connect to your client, the Ubuntu terminal. Now we can give this most any description for the most part, I can just call this the skill curve key. If I want to, I can just download this and continue. It even gives you this little text which gives this key if yours, all the details of your key, it provides you a text file with all of them. Now that I have it and my API key is made, I'm just going to go over back to my Ubuntu terminal. Now assigned to log into our Console and Cloud using the Moon determinants are for that. I'm just gonna go over to see cloud, login, dash, dash save. Now from here it's going to ask me my confluent credential. I'm just going to put them in support at the rate skilled curb. Once I put those in, it's going to ask me for my password. I'm just going to pause real quick and put that in. Okay, so now that I have my password and I'm just going to press Enter, it should log me into my comps want cloud. And it has written those credentials to the NET RC file. Now that I'm logged into confluent, It's time to see our cluster on a CLI just so you can know that we are actually logged in. And it's always nice to sort of assure yourself that you can see the cluster. For that. Well, we need to do is we need to just write C Cloud Kafka cluster list to sort of see what we can find. And as you can see, we only made one cluster. And we can see the ID right here. So this is just the one cluster we have and we would want to use this cluster by default. We don't want to have to put the ID in every single time when we're using this cluster. So what we can do to make it our default cluster is we can just write C Cloud space, Kafka space cluster, space use. And then this name, so L, K, c dash z, two MPs. There we go. It has set are that are one Kafka cluster as the default and active cluster for our current working environment. Okay, So now that we have our default cluster setup within the terminal, within the CLI, now is a good time actually to bring in our API key that we made so that we can link the two together. In order to link the API key, we need to write it as C Cloud space API dash key and store. And what we need to store here is the API key, which I'm just copying over a here, space and paste. After that, it's going to ask me if it's a little difficult to see, but in the bottom corner there it's saying, what's your API secret? I'm copying over that API secret right now. And there we go. Now it has stored the API secret, and it's stored the API key for me to use. Now that I have a API key in, I want to tell my CLI, okay, now for, for a given resource, I want to use this API key for default. What we're essentially doing is making this API key work in default for the cluster we just set. So in order to do that, what we have to do is we just need to go see cloud API dash q0, forward slash use. We actually need this little thing here, like before. We can just put that in. So we can do dash, dash resource. Again, we can just take our resource name from here and copy it over. And now we have set our API key, such and such as the active API key for the one cluster that we have set up. With this done, now we're basically all set up to go ahead and start consuming and even producing messages to and from our Console and Cloud. Now, in order to consume messages, I would recommend we open a new terminal for Ubuntu so that we can see it separately. So we can just open one more of those and bring those to decide right here. Now, to consume with from a Kafka cluster or from confluent. In our case, what we need to do is we need to write a command and specify the topic from which we want to be consuming from. So in our case that will becomes C Cloud, Kafka, topic, consume dash, dash. From beginning. In where we specify we want everything to come from the beginning. We feed it subjects. There we go. Now you can see that we have English, math, and science. These are the subjects that we actually put into Kafka ourselves. So this is the exciting bit of Kafka that we've done, that we have made messages and now we can see those messages as a consumer. So far we've seen that we've connected to our confluent cluster and we've set up messages and our topic and we've consumed those messages. That's all well and good. But I think now we should move onwards from the consumer side over to the producer site and produce messages for our cluster. Before we did this, back from the GUI within confluent, we just opened up the page and we went to the topic and we just put them in on this really nice GUI website, but we should know how to do it from CLI. And if within CLI we can see it working in real-time. So in order to do it from CLI, the command we would use is C Cloud, space, Kafka, topic, produce dash, dash, dash, key, space, dash, dash, delimiter. Then beyond that we have a colon beyond which we have the value format. We can say for our example, since we're working with strings, we can say that this is a string. And we just specify the topic which in our case is subject. When we press Enter, there we go. It says that it is starting the Kafka producer. From here, you can actually start adding, producing messages. And we should be seeing those messages come to the consumer on the right side of the screen the moment we produce them. For example, let's say that we want to add more subjects. We can go for geography. We produce geography. And there you go. It got added straight into our consumer. We say geography is good in OB, but I also want to learn history for enough. There we go. History showed up on the right side. So we say six, for example, that now that's not good enough. I wanted to do internal or internal affairs or global affairs y naught. We can make this global affairs just keeps showing up. And that's just the beauty of Apache Kafka. That is why it's such a nice technology, is that the moment we produced a message, the consumer got the message, the moment and update was made. The consumer was alerted to it, the consumer knew about it and it could consume the updated messages immediately. With this, we have now gotten a grip over the consonants CLI. Now we know how it works. We can make a producer, we can make a consumer all within the CLI. And we've set up a cluster within our Confluence Cloud, and we've set up a topic. We have done virtually most of the fundamentals when it comes to working in Apache Kafka. Later on, we'll also be seeing how to work with streams. And we'll also be taking a look at what case equal DBS. 7. Introduction to KSQLDB: Section tree, K SQL DB, introduction to SQL DB. What is case equals dB? Well, K SQL DB is a real-time events streaming database that is built on top of Apache Kafka and Kafka streams. Now what Kafka streams are as a topic that we'll be discussing later. But for now, all you need to understand is this. The way a stream is defined on the really fundamental level is the data-flow from a producer to a consumer. K SQL DB takes this flow and links it to a relational database model using SQL commands. There are many use cases of K SQL DB, and it's used a lot in the industry because of how simple and easy it is to use. How does case equals db work? Casey equal db separates it's distributed computing layer from its storage layer for which it employs Kafka. Case equals dB itself handled all the tasks associated with the data, whether that's filtering, processing, streaming, whatever. And then it leaves the storage to Kafka to figure out and manage. How do we interact with Casey equal to EB. While this can be done in a number of ways, we can do it through rest APIs. We can throw it straight through code. Or we're gonna do it in the ways that we have defined in this course. The first way that we're going to communicate with case equal db is using the case equal db CLI. The second way we are going to be interacting with K SQL DB is using the confluent Cloud UI that we see on the Console. And Cloud itself. 8. Using KSQLDB: Using case equal db. Okay, so to get started with using KSQL dB, the first thing we have to do is to set up a cluster within which we will be working. When you go to your Console and Cloud into your environments page, you should be seeing something like this. If not, maybe a bit different if you have 12 clusters setup already. Since we don't have as a cluster setup, we're just going to go into our default environment here and just set up a cluster. So it says Create cluster on my own, just go for a basic cluster, begin to configuration. You can set this region to whatever you want it to be and just set single zone and continue because you'll see the base cost of this is $0 an hour, which for this demonstration, we don't really need anything too fancy. So we're just gonna go with that. So once that's done, you can just launch your cluster and it should launch really quickly. And it's just done setting up the cluster we're creating the cluster provisioning might take a second or two. Once you're done, it's gonna give you the screen and it's gonna say look, you want to set up maybe a client or something else. You don't really want to do that right now, so you can just cross it out here and it should bring you to the overview of your cluster. Once you're in this overview, you'll see a little menu on the left side here. Now in order to work with K SQL DB, it's kind of a given that you need a topic with which to play around with, to create tables and streams so that we have a means of getting information to and from, and we have a reference that information is going to go from. If we go to topics and there should be, yep, no topic set-up here. And of course your cluster will be ready to use in one to two minutes. Like I said, provisioning takes a bit of time. So what I'm gonna do is I'm going to pause the video here. I'm gonna come back from the cluster is completely provisioned and then we're gonna go ahead and set up a topic. Okay, so now that the cluster has been provisioned and is allowing me to create a topic. So I'm going to go ahead and create a topic here. I can call this topic anything. For the sake of this demonstration, let's just call it subjects. Why not? Or we can just call it a say users. Because in users you can have a bunch of information and it's easier to generally create a stream from it. So I'll just create this with the defaults. And it should just generally have this user's topic ready to go the moment you set this up. So now that you have your topic, setup it assigned to actually go into your case SQL DB and set up a case equal DB app that we're going to be using to manipulate this topic and actually work with it. So what do you go over to case equal to VB? Yep. It'll say it'll give you the ability to create it with the tutorial or you can just create the application yourself. We're just gonna go ahead and create the application ourself, give a global access, Continue. You can call it whatever you want to definitely make a blind bit of difference. I'm going to go with the smallest application size so that there's not too much cost on my Console and Cloud. And I'm just going to launch it up. Now you should know at this point that setting up case equal db applications does take a bit more time. In fact, it takes a lot more time than provisioning a confluent Kafka cluster. This process should take up to four to five minutes. So what I'm gonna do now is I'm going to pause the video here and come back when it is done provisioning the case equal DB app. You might see that when it's done provisioning your case equal DB app that you might encounter an error where the cluster says, I cannot connect to this case SQL DB app. You don't need to panic. That's still a part of provisioning. As soon as the provisioning is complete, that error should go away on its own. Now that we have that, clear it up, I'm going to pause the video now, let this provision and come back when it's all done. Okay, So our case equal DB app is now up and running. It has been provisioned. So we can go inside and take a quick look at it and you can see how on the inside, this is what a case equal DB app will look like. You will have an editor here that will allow you to run your SQL queries in it. And then next to it you will see the flow tab, which will show you exactly how things are working, what streams are feeding into, what tables are, what tables are feeding into what stream, so on and so forth. You can see your streams, the list of all the streams, in this case equal DB app here, they'll list of all your tables as well as your queries. Apart from that, then you can see, wow, your case equal DB app is doing how well it's performing. You can check out different settings for your case equal DB app. You'd need to know some of these settings in case you want to connect to it. For example, this endpoint here you'll often be using an endpoint to connect to case equal db apps when you have to work with k is equal to EB. Lastly, you'll have CLI instructions at the very end that will allow you to log into case equal db and login to your Console and Cloud and use your case equal DB app from the CLI. Now that we have this case equals DB app up and running, It's time to sort of generate some information into our topic and then make some streams and tables from that information. So let's go back over to our Cluster overview. Now we're going to see how we can generate some information into our topic of users. Kafka actually has a pretty neat thing if you want to generate information or messages into a topic is that it has a bunch of pre-made data generators that will keep on sending information into a topic if it is required to set up a data generator, what we're gonna do is we're going to go into Data Integration and go over to connectors. When you come into connectors here, you'll see all these connectors here. We're looking for data gen, so we're just going to search data gen source and this is from Ladin Kafka is not a third party, so it's not going to run into any issues in setup. Now that you're here, you have to set up a bunch of things. The first thing, of course, is your data gen source connector name. Fine, fair enough, doesn't really matter. We can use a Kafka API key to connect to our connector. So do we have an API key setup at this point, we don't really need to worry about that problem because we have a little link here that will generate a Kafka API key and secret for us in real-time. And what we can do is click it. And there we go. It's given us an API key and a secret. We can just call this our data gen key. And it'll download that key as it's done right there. And it's going to put in the credentials for you. You don't have to go through any problems and there's no risk of making a mistake inputting that in. Now we have that done. You have to move over to topics. Our topic is users right there. Now you have to pick your output messages so we can say Give us the values and JSON. Lastly, this is the important bit where it says data joint details where you have to choose now what template of information you'll want to feed into your topic. If you open this dropdown menu, you'll see just actually a bunch of different data Jens available to us. We have clickstream, clickstream codes, user's credit cards, inventory, orders, pagers, product purchases, rating, so on and so forth. But the funny thing is there is a data jet in here exactly for users. So what we're gonna do is we're going to click on this Data, Jen, and we're going to set a default of the default value of your messages that are sent, which is basically the interval between messages. It's sent one message, when does it send the next one? By default, values 1000. So if you leave this completely empty, what it's gonna do is it's just going to keep sending messages every second or every 100 milliseconds. So I'm just going to leave that for now. I want to put 1000 here to show you that it's going to be sending messages after every second. But even if I don't do this, it will still send messages every second. So if you plan on keeping it that way, you've may leave this field empty. I'm leaving it, I'm putting the value in here just so that we know that we have defined this value of 1 second for every message. Now after this is done, we can move over to the number of tasks that this connector is allowed to perform. Now, with each task, the cost of running the connector goes up. You'll see that if you choose more than one connector, you have to upgrade your confluent account to make more tasks work. So we're not gonna go into that one task is plenty for this example. So we're just going to make one task. We're going to leave everything else and we're just gonna go next. When you do this, it's going to tell you, look, for every task hour, it's going to charge you this much. And for every GB of data you use, it's gonna charge you this much. You have $400 in your trial account in confidence. So really to run this for a few minutes isn't not gonna be a problem whatsoever. So don't worry too much about the running cost of this and just launch. Next to the Launch button, of course, is a data preview that allows you to see what sort of information you'd be getting. But I'd like to show you that in the topic itself as opposed to showing it there. So now that you've major connector, it's going to be provisioning for a few minutes just like case equals dB did. So what I'm gonna do is I'm gonna pause the video here, come back when it's provisioned and then I'll show you the information it's fed into the topic. Okay, So setting up my connector for my data-generating was taking longer than I had anticipated. And I assumed it was because I had used that name before and I had a client that was already connected to that connect her name. I had to go back, delete the data generator that I made and make a new one called My dad did data generator. And now that that's up and running, let's go inside and see what it's doing. If you go into my data generator, you will see that it is, well so far it's produced 82 messages. And it has that one task going which is running just going to close this little information here that's telling me that my data generator is running. If you go into settings, you will see all the configuration that you gave to it. You can even come back and change this information if you need to, but I would not recommend it because it could completely make your stream go topsy turvy. All your configurations would have to change accordingly and it would just be a mess. So I wouldn't recommend that you make any changes to this unless you absolutely need to. Now that we know how the data generator is working, Let's go over into our topics. Yeah, let's go over to our user topic. But before that we can see that a has a value associated now within his production field, which is to say that this topic is actually receiving something. What that is, Let's go inside and take a look. If we go inside into messages, we should be seeing what messages have been produced here. So as you've come in here, it has now shown you the messages that have so far been generated to this user's topic. And there you go. You just received another one. If we open up this latest one, you can see that there's actually a bunch of values associated with a message. So if we just look at the values here, we have the registered time, the user ID, the region id, and the gender. If we go into the header, we can see what the, how the key was generated, the task ID, current iteration, and last but not least key, which is the key for this message, which is user underscore three. We have this data being generated now into our topics so we can go ahead and decay is equal to db and setup a stream for this data. So let's go into the app. Now we can actually set up a stream from the editor here. I'm going to make the stream by hand so that I can explain each element of the stream as we go about making it. So the first thing we have to write is create stream. Actually, if you're at Create, you will see that it gives you two options. When you do create stream, you can see that it gives you a bunch of options here. So if you just do create stream that, then we can just call this stream whatever we want. We could call it our users underscore extreme. Now we have to give it the fields that it has to take. What we can do for that is we can go into topics, users, messages, and we can just take a basic look at what messages we have inside this pick any one of these messages, open it up. We can see we have registered time. He's already reached an ID and gender. And these are going to go into where occur. This is going to go into, I believe, a big int. So I'm just going to put this in a big int and I hope that it's not going to crash on me as I do that. I'm just going to open up a notepad here. We're just going to store this information in a notepad right here. So when I go back to case equal db and set up my stream, but now we can do create stream, user underscore stream. We can give it registered time as a big int. We can give it a user ID as worker, where we can give it a reason ID, which is also a worker. Lastly, we can give a gender, which is also occur. When you've done this, you've defined what fields are going to go into the stream. Now you have to tell it what topic it is going to receive this information from and what the value format is going to be. So now we're gonna put it with Kafka underscore. Topic is equals to, we can say users. And we can say that the value format is going to be you underscore format is going to be JSON. Now that we've done that, we can just put a semicolon at the end of it. We can set this to the earliest, the auto offset reset earliest so that it gets the values from the very beginning of the topics existence, so to speak, and just run the query. There we go. So if this has run for you status success, which means that we did get our stream created. If it did work for you, you can go over to your flow and you should see a new stream setup here called user underscore stream. And if you open up user underscore stream, well, it's not showing us anything here, but if we go here and query the stream like so with earliest, we should get a few messages. Yep, there we go. We're actually getting a fair few amount of messages. But then again, it is receiving all the messages from the beginning. So it's going to take its time and it's going to give us all of these messages. If we wanted to even, we can even stop this query at any point. There we go. Now we can just use anything that we get from here. So all of this information we have, we have it in our stream. It's going now we've set up a stream. If we go over to our data integration and go over to clients, we can see that we actually have a connector producer and a consumer setup that is actually taking all of this information if you go to topics, yes, it is taking it from the user topics. So your stream is functioning now as both a producer and a consumer. And if you go over to your stream lineage, you'll see that your data generator is generating information into your topic users, which is then going consumed by these two consumers, your custom apps, if you would, that are consuming the information. Now that we've done this, we've set up a stream. Now we're gonna go ahead and see how we can set up a table. So let's go ahead and do that. So let's go into their case equal DB app again. From our case equal DB app again. Now instead of now it has a query setup for you ready to go select star from users stream and emit the changes where it would show you everything that was inside the user stream. But we don't want to do that right now. We want to go ahead and create a table. So what we're gonna do now is we're gonna get rid of this query that they've given us. And we say create a table. We can just call the same length. And so we can call this user's underscore table because we're using the same topic. We're not changing much of anything here, so we can just call it user ID. Where care, because it's a table. If you know basic SQL, you know how it works. You have to associate a primary key to the tables you create. So we're gonna set our user ID as the primary key for this table, which is the identifier for each row that lets us know that it's unique and it hasn't been repeated. So we can set the register time. Big int. Again, I'm just going to give a big int gender as Barker, the region id worker said that width the same as the assigned Kafka underscore topic is equal to users. And the value format, again, while you underscore, while you underscore format becomes JSON. There we go. Then we can just put that at the end, the semi-colon and run the query. There we go. And it's given us a success message, which means that our table has been created. So now if we go over to our float again, now we will see that we have a user table, table setup for us. And if we go into table, we can go into here and query the table. If we go over into R, If we run the query, we see that it's already got all this information inside it, that it's gotten from the topic. It all we had to do was set up a table against the topic and it's generated this entire table for us. They just off the bat. We didn't have to do any more configuration then that one line of SQL we had to write to set this table up. With this, you've effectively created a good stream as well as a table. With that effectively set up a case equal DB app as well as a stream and a table and a topic to which data is being generated. Two. Now you know how to set these elements up in the Confluence Cloud and get going with them. 9. Overview of Kafka Streams: Section for Kafka streams. Overview of Kafka streams. What our Kafka streams. Well, if you look at Kafka streams, it is basically just a Java library that helps us do what we've essentially just been doing up to this point. But in a much easier way. Streams defines the complete flow from producer to topic to consumer. And it allows you to create that flow with significantly less code than you would need otherwise if you were going about it differently. Kafka streams are extremely fault tolerant because of the distributed nature of Kafka clusters. We could also say that streams themselves have that distributed property. Consider the following example of how you create a consumer and a producer in a program. First you would have to create the consumer, then you have to create the producer, subscribe to the topic, which in this case is widgets and get your consumer record out of it, which in our case, that comes to color red. We just want a consumer to consume this message that the producers going to send, that we have a legit that is of the color red. Now consider this code. The funny thing is, you would think that these four lines aren't completely different to what you've just seen before. Or they may be complement the code that you just saw before this. But actually it does the same thing. It just abstract all of that programming and takes away a lot of that household from you. And it just puts all of it into a stream. That is how streams may coding concise and easy to work with for developers. Though it might be all well and good that Kafka streams gives us the ability to write longer code and shorter form. Why should we even use Kafka streams? While at the end of the day, Kafka streams is more than just simply make code concise. Streams allow us to perform certain tasks on our data, data tasks that we couldn't perform otherwise, such as filtering, joining an aggregation. These are tasks that we want our streams to do because we want our data to become more meaningful for the end consumer through Kafka streams. Well, we actually achieve as we take the data that is being produced and consumed and give it a lot more meaning. So that one is actually analyzed and looked into. We can make proper understandings of that data and possibly even use them for dashboards and for analytics later on. 10. Introduction to Kafka Connect: Section for Kafka Connect. Introduction to Kafka Connect. What is Kafka connect? Well, with Kafka Connect is a competent of Apache Kafka, and it's probably the most important competent of Apache Kafka. If you plan on using Kafka and AT services with a setup that you already have existing. Catholic Connect allows you to connect your Kafka cluster and your Kafka services to a variety of different services. It has connectors for most data services out there. Whether that's Amazon, GCP, Hadoop, Kafka Connect provides streaming integration between Kafka and other services. And it allows you to connect both sources as well as data syncs to your Kafka cluster. You can have data coming in from a data source to your Kafka cluster. Why a Catholic connect? Or you can have data going from your capital cluster to a data sink. Why a Kafka connect that data sync, of course, being in a service that is not Kafka. Now in our case, for the project we're going to make, we're going to set up a data sink within Amazon S3. Now what are the use cases of Kafka Connect? While as we discussed previously, Kafka Connect allows you to integrate with outside services that are already being used. You already have data on these services and it would be pointless to go ahead and make a brand new cluster and then populate that cluster with all that information that you already had. Instead, it's easier to just connected with what you currently have going. The second thing is Catholic and provides what is basically a seamless integration with your services for the most part. You just need to set it up mostly in the GUI. And you don't really have much of any problems in terms of configuration. If you just decide how you want that data, what form of data you want. And that's that for the most part. The last major benefit of using Catholic connect instead, it opens up a lot of opportunities in the domain of data streaming for you. So you can now get that data stream going and you can actually run queries on it. You can turn it into streams and turn it into much more meaningful data than it was being done before. Yeah, there's a lot of uses to Kafka Connect and we will be exploring it in our second project in this course. 11. Project1 Book Catalog: Overview: Okay, so to get this project started, let's just get an overview of what we're gonna be doing today. So the first thing we're going to be doing is we're going to create a producer in Python that is going to produce our messages to our topic. The second thing we're gonna do is we're going to go ahead into our consonant cluster. And we're going to create a topic that is going to receive our messages from the producer. Once we have these two things set up, we're then going to publish messages to our topic so that we can populate it nicely for when we create our consumer. After this, we're gonna go ahead and set up our consumer. And then we're gonna be ready to start consuming messages from our Kafka topic within confluent. It's gonna be a really simple project, not too technical. So without further ado, let's get into it. 12. Project1 Book Catalog: Create Producer: Now it's time to see how to set up our producer in Python. Because now that we have our entire configuration ready to go, it's time to create our producer that is going to produce the messages for us to our Kafka cluster. For that, Let's make a new file here. We can call it producer dot p-y. There we go. So now we have a producer file ready to go. Let's just open up our producer. Good, right, These welcome guides. Now for this producer, There's a bit of code that I'm already written for you. I'm just going to paste that into here and explain to you what it does. This top bit of the code here, all it does is it parses the command line and it parses the configuration. What that means is it's just trying to figure out what cluster I have to send my messages to. Do. I actually have an API key that allows me to do that. Do I have the correct secret, so on and so forth. So this handle, all of that in just this conflict parser. After that, beyond that, we have a producer that we are creating. An instance of. We say that we have a producer that is a producer coffee. Now, after this, we aren't trying to send messages. When we, when we send messages, there is often a possibility that the message doesn't get sent for any reason whatsoever. So in the event that happens, we have a delivery callback method that says, okay, if you ran into any problems, print the error message and tell the user what the problem was. Otherwise, if you were able to send it, then tell them client or the user that hey, I produce your event, a topic, whatever your topic is with this key and this value. Now, if you'd come to the bottom here, this is our payload, lines 34 through 36, where our topic is books. We have a bunch of user IDs against a bunch of products. How are we going to create messages? Is right here from lines 38 to 44, where we just took a range of ten. For ten messages, we are randomly allocating a user ID to a book. For example, it could be Jack and twilight, it could be Michael and Lord of the Rings, it could be Hannah and Aragon. And some of them might even happen twice or thrice. It's completely random, but we'll get ten messages out of it. This is a really simple, really easy, really light producer that's going to produce ten messages for us to our Kafka cluster into our topic. Let's just save this file and close. 13. Project1 Book Catalog: Create Topic: Okay, so now that we have our producers setup, it is time to go and create that topic to which we said that we are going to produce our messages to from the producer. Now that we have our producer setup, it's time to go ahead and make our topic. So I'm just going to open up a new window here and let us go over to confluent and create that topic to which we will be sending our messages. Let's log into our Console and Cloud. There we go. So let's head over into our cluster. And from in here we can go into topics from the left side. And over here we should see our topics. Now as you can see at the moment, we do not have a topic set for books, so we just need to add that topic immediately. We can just call it books as we've made it in our program. Once we said that creation, boom, it's created, there is no lag for this creation. Now that setup, all we can do, or what we really need to do now is move forward. 14. Project1 Book Catalog: Produce Messages: Now that our topic is ready and can receive messages, it is time to produce messages to our topic. Making on producing messages to your Kafka cluster using the method that we're using today is actually pretty simple. We're just going to go over two. We're going to move over into our project directory. From there we go into our virtual environment using source environment bin activate. Now, there's another extra step we have to do here because we've made our files on Windows. Unix doesn't really like those files a lot. You have to make a small conversion using the DOS to Unix commands. So we do DOS Unix producer dot p-y to Unix, config dot Iodide. Once we've done that, we can CH mod our producer file, u plus x, producer dot PY. And now we can run our producer. There we go. It has produced events to topic books. We say Jack took Aragon, Jack took Harry Potter book, Animal Farm, Michael to Animal Farm, so on, so forth. We just have ten keys and ten values, ten messages that we have sent to our topic within our Kafka cluster. 15. Project1 Book Catalog: Create Consumer: Now that we've produced messages to our topic and it is populated, it is time to set up a consumer to consume those messages. Setting up a consumer is really simple. All we have to do is move over into our Python project, create a new file. We can call this file consumer dot p-y. Once you've done that, just head inside and just paste this piece of code that you should find in the description. This is really simple, really basic consumption that we're gonna be doing if messages from the Kafka cluster. So the first thing we're gonna do, of course, as we were doing with the producer, just parsing the configuration on the command line. After that, we're just creating an instance of consumer. And as we create this instance of consumer, we want for it to pick messages from the very top, from the very beginning of all the messages. We said the offset, basically, which is what we're doing here. We define the reset offset and we take it all the way to the beginning. So this is offset beginning all the way to the back. After this, we are subscribing to the topic. We have our topic books, of course, as we do for the producer. Again, on a sign reset offsets. So as we subscribe, we go all the way back. Now, what happens next is actually a continuous process until it is interrupted where we are constantly trying and checking. If we have a message waiting for us. If we do, then we get that print of consumed event from topics, so on and so forth. If we don't, we just get this message waiting and if we run into an error, we get this error message. This last bit is just the keyboard interrupt where we say if you want to stop this, you can control C and it'll end. This is about as basic bare-bones a consumer as you can get. And we have setup our topic subscription within our consumer. So killing two birds with one stone there. 16. Project1 Book Catalog: Consume Messages: Now that our consumer is setup and ready at our topic has the populated messages. It is time to consume messages and see what that consumption looks like in real-time. Also, actually producing and consuming messages in real time is a fairly simple process. Again, because again, we made our consumer dot PY file within Windows. We have to convert that over for a unix to read it. So we just say, listen, we need you to change our consumer dot p-y over to the Unix reading fall. So now that we've done that, you can just CH mod our consumer. Now we can run our consumer dot p-y. With the conflict. There we go. Our consumer actually got all those messages that we send from our producer earlier and as read them now, another cool thing about this is we can actually do this in real time. So say for example, I run my producer from the side. So I say that run my producer here. The moment those messages were produced, the consumer got them and consume them in the exact order there were produced. And if we run it again, we'll see it again. But that's not the most interesting thing. The most interesting thing is if you made these messages in real-time, you'd still see them and you can still read them. Now let's head over to our console and flow. Move this window to the side. Let's go over to our topics and go to books. And as we can see, we've got some production going on in here. It sees that we have activity. If we go to produce a new message, we can call it anything. We can give it a value of say, a good value back to the future. Even though that's not a book, it's a movie. I digress. We can leave the key acids and the moment you produce that message, boom, you get that message in your consumer in real time that we have a event with the value back to the future and the key 18. And with this, you have work successfully created an application that both produces and consumes events or messages using confluent and Apache Kafka. 17. Project2 PacMan: Overview: So here's a brief overview of what we're gonna be seeing in our application that makes it a streaming application. So the first thing we're going to have is we're gonna have two topics, user game and user losses. And these two topics will be the basis upon which we are going to create two streams. These two streams are going to feed their information into two tables. The user losses stream is going to feed its information into the losses per user table. This is willing to record how many times you have lost all of your lives in the game. These are game stream is going to feed its information into the stats per user table, where it's going to record your highest score and how many times you have actually played the game and lost that information of how many times you've lost the game is actually going to be fed through your losses per user table into your stats per user table. Finally, this stats per user table is going to be queried to generate the scoreboard that shows you not only just your score, but the scores of everybody who's played the game. So you can compare your stats with everybody else's who has played the game. 18. Project2 PacMan: Configure and Run: Since I use Windows Subsystem for Linux, it allows me to go straight into my project directory on my Windows and see the files as they are. So as we can look at our streaming Pac-Man application, we can see that we have a bunch of files and folders in here. The first thing we have to do is go into our conflict file, which we should have a demo dot CFG. If we open this up, it should have our AWS access key and our AWS secret key right there. Now, what this is is it's the access that you define to your AWS account. Where would you find your AWS access key in your AWS secret key? Well, let's go ahead and see that. What we're gonna do is minimize this, come over to our AWS and just search IAM. And then head over to here. Once you're here, you can look at the users in your Access Management. So you can just go over to users. And from here you can add a user. When you add a user, you can give it the username, then set it as access key instead of password, which will assign you an access key ID and a secret access key. So if I add a new users, call it scale curve. Go next. You can add it to a user group for admins. Or if you don't have this setup as you wouldn't if you have a new account, you can just create a group, assign IT administrator access and call it the admins user group. Select it, and then just move over the next two tags and you can just add any key here. I can just call this filter key. That's all I need. Honestly. I can just assign it the same value. Just go Next. And it will create this user. The moment it does that, it's going to give you this success page and it's going to allow you to download as a CSV the credentials for your user. Apart from that, it also shows you access key ID and your secret access key. Take note of these because you will need them for this project. I would recommend you download the CSV. Then when you open it up, it'll give you exactly what you need. Your access key is, oops. It should just give you your access key and your secret access key right here, which are both the things you need to feed into your configuration. I already have an account setup in my configuration, so I'm not going to feed this new access key and secret access key inside it. I'm just going to move on to the next thing. I'm going to close this, save it as one, and move on. Now that that's done, let's just go back over to our folder and go back over to our file. Our demo dot CFG is good, so we're just going to close that and now we're coming back to streaming Pac-Man. The next thing we're going to have to look at is our stack configs folder. We need to make sure that there is nothing inside the stack configs folder because this configuration is going to be what Terraform is going to use in order to set the entire application up. The first time you run the start script, as I'm about to run, you're going to see that it's going to create a lot of things for you. And then the next time you run that script is winter form is going to actually provision and create all the resources. Now what we can do is we can go over to our terminal. And from here we can just cd into our streaming Pac-Man project and just run our start script. So if you haven't already done this first, you would have to CH mod u plus x. Your start dot SH loops u plus x. And then you can do the same for your stopped dot SH. You can do the same for Create case equal db app.js. These are the three shell files that you need to give yourself the permissions to run. Now that you've done that, all you need to do is just run the start command, Start shell. So we just run start dot SH. And there we go. It's just going to run for you. And it's gonna give you this prompt. And it's going to say that Look, listen, this is going to create resources that are going to cost something. So are you sure you want to proceed? And because we have to, we're just gonna say yes, we don't want to proceed. While the script is running. I'm just gonna go back here and come back to our S3 page because once we've finished running our script, we are actually going to see an S3 bucket created here. Also. I'm just going to go into our environments page on our console thought and it's already set up. That's actually kind of neat. It's already set up your a constant environment for you. We can see that right here, creating conflict CloudStack for environment and set the Kafka cluster and active cluster for environment E and we ECD in 23. We can actually check this. So what we can do is we can open up a new terminal. Let me go see cloud environment list. Yet you can see the ENV ZD and 23 is our streaming Pac-Man and public environment that we just created for this application. So I'm going to let this run and I'll be back. When this starts. Shell has actually finished running. This takes up to five to ten minutes. Okay, So the startup script has finished running. And what it's done is it's, let me just take you all the way up to the top here from where we started. Yep. Now it's what it did was it created a user and it gave it all the permissions to do exactly what it needs to do, to create all the resources that it needs to create in order to get the application going. Now, all of the things that had to do it did properly until it came to the KSQL part where it crashed. I created it so that it's supposed to crash here. Now let's see in the next section what we're supposed to do to make this work properly. 19. Project2 PacMan: Setting up Pre Requisites: Before we start this project, I'd like to give a huge shout out to Ricardo Ferrara for the original Pac-Man game streaming application. It is his application that we're actually going to be running and using today that I have tweaked and modify it a bit so that we have to perform certain tasks on our Confluence Cloud in order to get the whole thing working. Now that we've gotten that aside, It's time to see what we actually need to get the project going today. The first thing we're going to need is a confluent Cloud account. If you do not have a confluent Cloud account, I would recommend you go and get one, a trial account which should give you about $400, which is more than enough to make this whole project work. The second thing you're going to need is an Amazon AWS account. Again, you can set up a trial account, which for the short duration it gives you is more than enough. And it is plenty for the project that we are going to be making today. Once we're done with the project, you should see one S3 bucket here created for you. And then we will be creating another S3 bucket for our data sync. The last thing you're going to need is Terraform. Terraform is what is going to create the application for us. N is going to provision the resources that we need in our AWS S3 to set up to our form on your Linux, you just have to run these three commands. If you haven't already, I would recommend you go ahead and install terraform onto your local system. Besides this, we just need two more things. We need to check if we have Java, so we run the java dash version command. We need Java 11 or greater for this project. So please ensure that your JDK version is 11 or higher. The second thing we need is mainland, so we can check our Maven version. And yes, we were running Apache Maven 3.6.3. If you've been following this tutorial up to this point, you should have confluent and see Cloud installed. You just run see cloud. There we go. If we look at confluent, There we go. So if you haven't already logged into your C Cloud, I would recommend that you do that now, you would see cloud login, dash, dash safe. And what this does is it saves your credentials and the net RC file. Once you do see Cloud Login, it prompts you to put in your credentials or if you already have your credentials saved in the net RC file, it'll just log you in. Now that we have our environment completely setup, let's go ahead and start making our project. 20. Project2 PacMan: Setting up KSQLDB: Okay, so now that we've run our script, it's actually not done everything it was supposed to do. It's left some of the work to us and we're just going to go ahead and do those little bits of work that we were supposed to do and just move on and get the thing actually working. So the bit we have to do now, we can do from the conflict Cloud GUI. So we go into the new environment It's created for us. We go into the cluster. And from within the cluster we should see a case equal DB app already set up and created for us. If we go into this case equals DB app, show us that we don't have any streams going at the moment, just a standard one. There's no real flow, just this one default stream that you are given with every case equal DB app and no tables. Furthermore, if we go into topics, we will see that we have no topics, just this one processing logs that we get. I'm just going to add a topic here. I'm going to call it user underscore game. Just going to create this with default. Now I go back to topics and I create another topic called user underscore losses. Now that I've done that, I now have two topics. The two topics that I said I was going to have at the beginning in the overview of this project. Now that we have these two topics, it's time to make streams and tables to correlate these topics. So I'm gonna go back into case equals dB. In this little editor here. I'm going to run some case SQL commands. What those commands are available right here in the statements I'll SQL file. You can open this up and you'll have these five lines here that create two streams, user game, user losses, and three tables. Losses per user, stats per user and Summary stats. So we can just copy all of this, come back into case equal to db and pasted right here and run the query. There we go. It's run the query and it's created the streams and the tables for us. So if we go back into our flow tab, we should see there we go that we have this user losses stream that creates a table losses per user, then the user game stream, which creates the table stats per user, and also feeds a table of summary stats. Summary stats is just the extra table that shows us the general overview of what's going on in the game at a time. And if we go to streams again, we will see the two streams and we go to the tables. We will see these three tables. Now that we've done this, it's actually time to go give permission to our service account to actually manipulate the topics, however it needs to. So for that, we're going to go back into our Ubuntu terminal. And we're just going to go into our environment first TO see cloud environment list. Whoops, see cloud environment list. Then we're going to see cloud environment use E and Z, D N23. Then we're just going to run see cloud cathode cluster list and see Kafka cluster use L Casey dash six, j m r cubed, six GMR cube. Now that we're in here, we have to run a command that is C Cloud, Kafka, ACL create dash, dash. Allow. Then we need a service account. Well, we need to do before that is we need to run see cloud service account list, which we have five-year double 495 on Eigenanalysis see cloud Kafka, ACO, create dash, dash, dash, dash service account 50, double 4952, operation. Read, as well as operation right and dash, dash operation. Describe dash, dash topic, user underscore game. And it's done just that for us. Now if we run it for a user underscore losses, it's done that for us. So now that we've done everything we had to do on the case equal db side, we can just run the application again and get the game started. 21. Project2 PacMan: Run Project and Sink Data: Now let's get the game started by running the start script again, so we can do forward slash star. It's just going to start using terraform to create the entire project. Once Terraform has been successfully initialized, it's going to go ahead and create the project. Again, this is a process that takes its time. So I'm going to pause the video here and come back. When it's done. There we go. So the startup script has finished running and it ends up giving you just this output with a URL that says your Pac-Man game is at this URL here. We get opened this in a new tab right here. Before that, we didn't see any buckets on our S3 before this. And as I said, we would be seeing one the moment this was over. So if we refresh this, there we go. We have one bucket here called streaming, Pac-Man and GPB. And this was created for us by the application. And this holds basically the streaming Pac-Man website. So you can see all the website files right here. I'm just going to go back to Amazon S3 and go to the game. So I'm just going to put in the name so we can just call that name skill curve. And we can play. When we do that. It's going to take us into the calf, into the Pac-Man game that we're using for our event streaming, I'm going to generate it, but if a score, and then I'm gonna go ahead and die so that we can see all the possible events and we can see exactly what that looks like on our event dashboard. There we go. That's our scoreboard. And as you might see that this score is actually pretty big. And you just saw me launch the application. Well, that's because I paused the video and I went ahead and I played the game a bit to generate a bit more score and actually have some form or flow a lot of messages so that we can go ahead and see how we would store that using Kafka Connect in our S3. So if we come to our S3 now, you can see of course that we have our bucket here. But if we come back over to our Kafka cluster, over here, you'll see data integration and you'll see a section here called connectors. What you can do here is you can set up a connector to any data sync or any sort of place you want to even source information from if you want to bring data into your Catholic cluster. But in our case, we're going to be sending it out to a sink. So we're going to use this one here called Amazon S3 sync. When we do that, it's going to say, okay, what do you actually want to store? Well, I'm just going to give it all my topics. So the first one, the one that he gave us by default, we don't want that one. Apart from that, we want all of them. So we're just going to continue here. And now it's gonna be like okay, so how do you want to access your S3 sync connector? They do you want to maybe use an existing API key that you have for this cluster, or do you want to maybe choose a different access? But in our case, what we're gonna do is we're going to just create our API credentials here. When you do that, it's going to download the file for you just like that. And it's going to set it up. So it's going to be default putting you don't have to do anything and you can just call this anything. You can just call this Pac-Man Nash S3 dash sink. Then we can continue. Now here actually this is where he important is that you need your Amazon access key, ID, Amazon secret access key, and your Amazon S3 bucket names. So what I'm gonna do is I'm going to go over to my S3. Now pay attention to the region. It says US West to make my other S3 bucket in a different region. It's probably going to give me an error. It's good practice to keep all of your elements of a project in the same region. So what I'm gonna do now is I'm going to create a bucket. I'm going to call this bucket my Pac-Man dash sink. It's going to be in US West two. I don't have to change anything else here. I can just create the bucket. After it creates the bucket. There we go. So you have my Pac-Man sink. It's all there. It's all ready to go. I'm just going to copy this name over my Pac-Man sink and bring it over to here. Your Amazon access key ID and your Amazon's secret access key are the same ones you use for the configuration of this file. So if you go into your config and you can pick out your Amazon access key and secret from this file even so I can just copy this over, pasted here, copy my secret over, and paste that there. And then I can just continue. Now it's going to ask you, how do you want your data to comment? Do you want it to come maybe as Avro, as proto buff as JSON. So I'm going to make a JSON because that's the easiest way and it's also very neatly defined. So it's really easy to see what's inside your file and output message format, also JSON. Now when you've done these two things is going to ask you, okay, how regularly do you want your S3 to be populated? Do you want it done by the hour or do you want it done every day? For this example, I'm going to have it done by the hour because we're not going to have it setting up for too long. We're just going to have it setup for a few minutes. We're gonna see how the data goes inside, and then that'll be the end of it. You don't need to go into advanced configurations. So if you wanted to, you could change many properties, the descriptions of which are here. If you hover over the little i, it'll tell you exactly what this is. I wouldn't mess around with this too much unless I had specific configuration requirements for my project. But for now, since we do not, we're just going to ignore it and move on. Now, when you're at this stage, it was going to ask you in terms of sizing, how many tasks do you want a to-do per hour? And the more tasks you'd define it to do per hour, I ask you to upgrade if it's more than one and even for each task is going to charge you 0.034 per hour in my case. So what we're gonna do is we're just going to set this out one and we're just going to continue. Now. Finally, it's going to just give you all that information. It's going to say, Look, this is what you've asked of us. Are you sure you can set your connector name, not gonna change it here. Don't see the point to it. And we're just going to launch and just keep in mind that your connector pricing for each for your tasks, your charge this much per hour, but then also for your usage or charge this much per hour, which adds up when you launch it, you can see exactly how much you're gonna be charged to keep it in mind, even though it's a trial account, it doesn't matter for now. But if you're actually using confluent and you're using it off your own money. This can build up pretty quickly. Now our connector is being provisioned and of course the process takes time. So what I'm gonna do is I'm going to go back to this cluster and go to the connector. And I'll pause the video here and bring it back. When this is all done. There we go. So now our connector is up. It's running. A small message here that says your S3 sync connector 0 is now running. It's got that status running. And if we go in here, yeah, it says running. If you run into any problems in setting up this connector, you would see that error on this area where it would say that Look, we've run into this in this problem. So, you know, figure it out. Now we're going to see we have this connector setup, but what has it actually sent to art as three sinks. So if we go into our S3 sync, at the moment, it's got nothing. In a few minutes is actually going to send something. It has got this task running. And so far it hasn't said much of anything. So if I just play the game a bit more, play the game a bit, pause the video, come back, and we'll see then what it's actually sent to our S3 cluster. I mean, S3 data sync. Okay, So I played the game a bit on a different account. If we go over to the scoreboard, is gonna say so. Yep, It's got a second account, got a score of 1000 on the old one. But if we go over back to our S3 bucket now, which is here, now we'll see that it's got this folders topic inside. And if we go inside, it's got folders for each thing that it has received information for. If we go into user game. And it's got the year, the month, the day, and the hour. And on this hour it has received these JSON. Let's see, let's open one of these JSON and see what it actually has inside it. If we just open this up. In my case, it's probably gonna get downloaded yet and we can open this up. We can see that it's shared all this information with us in this JSON, that there was this user playing the game and he generated this much score. This is how we can use a Data Sync to store all the information that is consumed from our Kafka cluster. But it all gets saved. And if we'd need to query it or check on it later on, it's all available to us here in our data sync. Now that we've done this, we know how to sync works and we've seen it running. This is actually taking up a lot of resources to run. What I'm gonna do is now I'm going to show you how you can destroy all these resources that you have created. The first thing you want to do is you want to go into your connector and just say, Look, I do not want this connector running anymore. So deleted. You copy this name into the connector name. And there you go. Confirm. Your S3 sync connector is deleted, gone. You do not have this connector available to you anymore. Then if you go back to your Ubuntu over here, you just have to run the script. So you'd say dot slash, stop, dot SH. And it's telling you now that it's going to destroy all the resources that's made you say? Yes. Okay, go ahead and do that. And now it's going to proceed to destroy all the resources it's created. This is a process that takes its time to run as well. So I'm just going to pause the video here and come back when it's all done. There we go. It has now successfully run the stop script and it has destroyed all the resources that it allocated to make this project. If you go over to our S3, now if we go to our S3 sinks, we will find that that S3 bucket that we created for our project, it has gone and we can delete this bucket by just emptying it out. So we can just say look permanently delete. So you can fill this, filled up with permanently delete. Once you do that, you can then exit this and delete this bucket. For that, you just need to copy the paste, copy the name and paste it over here. Yeah, that's done. And if you go over to your console cloud, you go to the environments. You will find that the environment is completely gone. With that, you have successfully run project two, you have completed it and well done.