Transcripts
1. Introduction to Course: Hi, welcome to the introduction to the console cloud course. My name is Shannon, and I look forward to our time
together in this course. For the duration of this course, you'll be focusing
on the following. First, we're going to teach
you in demand events, streaming abilities in confluent through hands-on lab exercises. We will also teach you the use of the console
cloud environment, which is extremely easy to
use and very functional. The key concepts in this
course are explained with appropriate visuals so that the cement easy and that you
understand them quickly. We're also going to have a capstone project at
the end of the course. You will find regular
quizzes during the length of this course to test your knowledge on what
you have learned, as well as cement some
fundamentals as we go. Meet our team of
instructors who have created and designed
this course. Together, we have just over
ten years of experience in practical implementations
of technology in teaching technical courses
on the university level. Our team members have
specializations in the areas of
information technology, software engineering, and
data science and design. Learning roadmap that we will be following for this course. We're going to start off with
Apache Kafka and teach you everything you
need to know about it in order to work
with contours. The next step is actually
showing you confluent and showing you what it's all about and setting it up for you. After that, we're gonna take
a look at case equal db, how we use that and what
its applications are. Moving on from there, we're
going to explore streams, what they are, the
libraries involved, and how you can work with them. Lastly, we're gonna look at Catholic connect as to
how we can work with that and how we can link
our projects externally. After we're done with
all these concepts, we're going to go ahead
and create two projects. Kafka is a data streaming
service that allows the producer to
write messages to a topic that have been
consumed with the consumer. Though this concept is
extremely simple in practice, things can get a little confusing
and lost along the way. That's why we tried to explain
the concepts this weekend. With visual explanations Among confluent is a Cloud-based
events streaming service. It provides the environment in the form of clusters
for prevents streaming and provides very nice visualization
of operations. That said, you're
gonna be seeing this screen a lot for the
duration of this course. Beyond the theory and explanation three parts
of this course for home. Similarly to be working
a lot in biopsy, we're going to be seeing
how we can run in our console cloud and
Kafka using the CLI. And we're going to set
up a project where we're going to make
something amazing and seeing how it produces
all that information into our events streaming
is a big goal phrase. And Fortune 500 companies today, these are just some of the companies that need
events streaming experts to fill their day to day operations and critical
software development needs. We look forward to having
you join our course. And we promised that this
course will help you build your foundation of events streaming and console
of knowledge. This will help you make
your resume stand out and demand a competitive
salary in the marketplace.
2. What is Streaming Analytics: Hello, and welcome to this
course where I will be teaching you how
to use confluent and your events
streaming applications. This first section,
we'll be discussing Apache Kafka and
everything you need to know about it in
order to understand and use confluent as a service. First, we need to understand
what the buzzword streaming analytics even means and why
it's so popular these days. What is streaming analytics? In layman's terms? It's the constant
processing of data as it's generated and
received in real-time. This data is found
in most places. For example, the
most common form of data that is used to be analyzed in streaming analytics is social media information. The moment you perform
something on social media, that event is streamed in real-time to be processed
in order to show you content that is
more relevant to your likes as opposed to things
you wouldn't like to see. In the past. And actually
in many places still today, data was sent in batches rather than as a
continuous stream. Data was collected up
to a time interval. And when the internal
midpoint arrived, the data was sent as one
large chunk to be processed. Though this process is much cheaper than real-time
data streaming. It does not give
real-time insights and cannot be used for instantaneous actions and
real-time graph generation. Streaming analytics
was created to solve this problem
within batch processing, that real-time
decisions could be made and actions could be taken. Now let's look at how
streaming analytics works. The first part of events streaming is the event
that's produced. This event could be produced
by either some device or application or any other
producer for that matter, it's essentially just a message. That message is sent to a hub. That Q is the message
behind other messages and makes it ingestible within the streaming analytics service. Within the streaming
analytics phase, it's decided what to
do with that message. Whether to present it on a
graph on some dashboard, or to send a command to perform a certain action
or event just to simply store that message. That's all decided within
the stream analytics part. Lastly, the message
is sent onwards from the Stream Analytics service to be consumed by the end consumer. Those are process may
seem a bit technical. And you might be wondering
at this point why even go through such a process
in the first place. There's a fair
amount of advantages to using streaming analytics. Using data visualization. Even someone who can't
read the numbers right, can make sense of how
things are progressing. Furthermore, it gives you a
competitive edge as you get real-time insights
and can make moves faster than your competition
may even realize. It also gives you a much
deeper and clearer insight to operations as you know what operations are being
performed in real time. The ability to get information as soon as it's
produced can both create as well as identify lost opportunities in the
business world and more. Lastly, identifying trends with streaming analytics
can help mitigate losses by alerting the company of the right things
at the right time. A lot of companies in the market today employ the
usage of streaming analytics with even more so adopting it each
and every day. Most organizations are
discovering their need for real-time continuous data and how much they serve
to gain from it. It may not be apparent
on the surface of it, but in many industries are natural candidates
for data streaming. For example, consider the
following industries. Finance, it's
critical to know if a fraud is being
committed in real-time. Otherwise it could lead to a
huge loss for the company. Furthermore, real-time
market analysis is a pivotal element in business
decisions in e-commerce. Let us look at the father of
all e-commerce says Amazon. Amazon deals with an absolutely
huge amount of data. And it uses that data to
continually adjust the deals and values of their products
to optimize their cells. Furthermore, it helps in
product recommendations for customers as well as optimizing the logistics of the
entire operation. Amazon actually has its
own product for this, it's called Amazon Kinesis. In sports. Let's take a look
at Formula One. Each Formula One car
produces gigabytes of data through the sheer number of instruments that
are installed. During the race alone, teams collect a
tremendous amount of data that they need in order to make
real-time decisions that can make or break
the race for them. The same logic also
applies to online gaming, where technical details and
player data are used to optimize the game's performance and fine tune the
overall experience.
3. What is Apache Kafka: When you say the term
streaming analytics, you will find that it is mostly associated with Apache Kafka. That's because apache Kafka
is the most mainstream data streaming service
that most companies use for their data
streaming requirements. Let's explore what
Apache Kafka is. In a textbook. Apache Kafka is a data
streaming service. It is open-source and used by thousands of companies
for streaming analytics. In a simpler definition, it provides the service we discussed back when we looked at that flow diagram of how
streaming analytics works. What Apache Kafka does is it takes events or messages
from a producer. Now that producer could be an
IoT device or a smartphone, and it pushes it
to Apache Kafka. Apache Kafka then
manages it in such a way so that it can be given
to many consumers, eliminates the need for having individual data streams from
each producer to a consumer. And essentially makes it so that each producer and consumer only has a single data stream that is unaffected
by external factors. Let's look at an example
for a better understanding. Now, saying that you
have to producers and three consumers
without Apache Kafka, each producer needs a stream to a consumer with two producers
and three consumers. That gives us a total of
six streams to manage. These streams are coupled, which is to say that they may choke due to external factors. Slow consumer may
affect the producer. Adding consumers would
affect the producers. The failure of a consumer
would block an entire stream. There's many weaknesses
in a system like this. Let's take the system
from before and add three more producers
and to more consumers. Without Apache Kafka,
we would need to maintain and manage 25 streams. Now that's a lot of streams
and a lot of maintenance, which also means high costs
as well as added risk. With Apache Kafka, we would only need ten streams
and have the added a surety that those streams
are decoupled and will not be affected by a producers or consumers ability to
perform their tasks. From the example, you should now be able to tell how advantageous Apache Kafka is and how
important to service it provides the benefits of Kafka
or that it is very fast, easily scalable,
endlessly reliable, extremely durable, and best
of all, it's open source. However, you can also find
it as a managed service, as we will be seeing
later during this course.
4. Architecture of Kafka: Now that we know
what Apache Kafka is and what it's used for. Let's see how it
works on the inside. Within Apache Kafka, you will find the
following competence. Producers, consumers,
brokers, topics, partitions, clusters,
and the zookeeper. We'll be looking
at each confidence separately and
seeing what they do. Producers. These are the elements that
produce the events are messages that are
sent to Apache Kafka. As we discussed previously, these could be a device
or an application. The messages that
producers send are written to a topic with
similar messages inside. It's important to
understand that multiple producers can exist
within the same application. All the messages
that are producer produces sense straight
to the Kafka cluster. Consumers. These are the guys at the opposite end
from the producer. They're there to take
the messages from the producer and read, or in a more technical
sense, consume them. Consumers subscribe to topics. And when messages are
sent to those topics, the consumer consumes those
messages as they come. When working with Apache Kafka, multiple consumers may consume the same message
from the same topic. Though messages are consumed, they aren't destroyed
after the process. That's one of the
beauties of Apache Kafka. Consumers may consume
different types of messages from the
clusters as well. Furthermore, consumers know
exactly where the data they need to consume is located
within the Kafka cluster. Brokers. When we discussed how producers send messages to Apache Kafka, where they actually
send those messages to are the brokers within
the Kafka cluster. Brokers are the Kafka servers
that receive and then store these messages for the consumers to take
and consume them. A Kafka cluster may
have multiple brokers, and each broker manages
multiple partitions. As we will see soon. Topics. These are simply the
defined channels through which the
data is streamed. Producers produce their
messages to topics. Consumers subscribe to topics to consume the
messages within them. It's basically just a means
to compartmentalize and organize messages and ordered them by their particular traits. Topics of unique
identifying names within the Kafka cluster. And they're gonna be
any number of topics. There's no defined limit. Partitions. Topics are divided into these partitions and are
replicated to other brokers. Multiple consumers may read from a topic in parallel
with the help of this, with partitions,
producers can add keys to their masters to control which partition the message goes to. Otherwise it just goes around
and around Robin pattern, where one partition gets one message and the other
partition gets the next one, and so on and so forth. Keys allow the producer to control the order of
message processing, which can be handy
if the application requires that control
over records. Just like topics, there's no
defined limit on partitions. Given that the cluster
is processing capacity can handle and manage IT. Clusters. These are the systems
that manage the brokers. It is essentially the
entire architecture of brokers that we
call a cluster. Messages are written to topics
that are within brokers, that are within clusters. Those messages are then read by consumers following
the same hierarchy. The Zookeeper, this element is responsible for managing and coordinating the Kafka cluster, sort of like a conductor
to an orchestra. It notifies every node in the system when a
topology change occurs. This can be the joining of a new broker or even
the failing of one. The Zookeeper also enables leadership elections
between brokers and topic partition pairs to determine which broker
should be the leader for a particular partition and which ones hold
replicas of that data. In layman's terms, the Zookeeper manages and
coordinates everything that the Kafka cluster does and provides fail-safe
for a rainy day.
5. Managed Solution Kafka: Let's look at the
managed solutions available in the market
for Apache Kafka. At this time in 2021, there are a number of managed solution providers
for Apache Kafka. One of them being confluent. Apart from that, we
also have Azure, HDInsight, AWS, Kinesis,
instead cluster. And even there's many
advantages to paying a managed service provider for Apache Kafka as opposed to using the open
source flavor of it. There's many hassles associated with open source kafka and a lot of management and maintenance is required
to be done regularly. There's so many operations
that need to be performed just to
make things run. With a managed service. You don't need to worry about
performing any operations, hence the term no ops. Furthermore, you don't
have to download any files or manage
any files locally. It's all done on the Cloud. Lastly, you can be
sure that you'll be running Kafka smoothly
without any hitches. Think of using open source
as a free car engine. It's powerful and can
generate a lot of torque, but by itself, it can't
really do anything. Not until you construct
the rest of the parts needed in order to make
the engine useful. Managed solutions by comparison, are like fully fledged
cars ready to drive. No hassle of setting
much anything up, just turn a key and get going. It's that convenience
that a lot of companies look at when considering
managed solutions.
6. Quick Starting with Confluent Cloud: Kick-starting the
confluent Cloud. We're going to
kick things off by getting some basic tasks
done within conflict. We're going to set up a
cluster than a topic. And then we're going to
see how we can create producers and
consumers that will send messages to and from that topic using the
Console and Cloud CLI. In order to get started
with confluent, the first thing we're
going to have to do is go to the console website, which is just confluent.io. Once there, you can see the option to get
started for free. When you click on it, you'll be presented with your deployment
options and you can simply choose on deployment
option and stark free. Here you can fill out your
full name, your company, email country, and pick a password and just
click on this button. When you do that,
as it says here, you will receive
$400 to spend within Console and Cloud during
your first 60 days. When you fill out this form
and you click Start free, it'll send you a link to
your email where you will have to confirm the ownership
and access to that e-mail, from which point you
can log in to conflict. Since we already have an
account with confluent, we won't be doing this, instead will be logging
into confluent. In order to do that, what we're going to have to do is go back to the
Get Started page. And it just says on
the bottom here, I have an account login. There we go. Open this up in a new tab where we can login. There we go. That's my e-mail. I click Next,
that's my password. I click Next. Now this is the
point where conflict is going to start
giving you a tutorial on how you are
supposed to set up your clusters within
Console and Cloud. This is an excellent tutorial if you want to take the time, I do recommend that you go
through it and read up on it. Though, for our projects, we will be providing
all the materials that you will need in
order to create them. There's this button right here
that says Create Cluster. Go ahead and click it. And now you'll be presented with three different options
for creating a cluster. You have your basic cluster, your standard cluster,
and your dedicated one. Now, the basic one is the one that we will be using
because as you can see, it has no based cost
associated to it. And it has plenty
of usage limits. You have an ingress
of a 100 mgs, egress of a 100 MBAs, and storage of up
to 5 thousand GBs. It can support high data flow. We wouldn't be going
anywhere near these numbers, not in our projects. We can just select
the basic type for the sake of knowledge, it's good to know that if you went for the
standard option, it will cost you $1.50 per hour. And it can support more storage. Actually, you can take it
up to unlimited storage. That's just the
only benefit really that you're going to get out of taking standard over basic. When you go over to dedicate it, you can see that as
you progress this bar, you get more and more benefits. Then you reach a point
where they tell you that's got in touch so we can manage something out for
Dedicated Clusters. But since we're not going
for either of these options, let's just get a
basic cluster going. When you come to this region, area where they ask you what
region you want to select. Over here you can see how
confluent is actually partnered with AWS,
gcp, and Azure. And you can see that you have service elections
from all of these, you can choose an AWS server, a GCP server, or
an Azure server, whichever one you like, or whichever services
you're already using. Its ideal normally that
if you want to connect kafka to a service that
you already have running, that you should
create the cluster in a region using a service that you already
have going before. So for example, say you
have an Azure IoT Hub. You want to use confluent
for the event streaming, then you should probably
use Microsoft Azure for hosting for your
confluent cluster. Generally, it makes things
smoother and makes things easier and there's less hassles
and the Connect process, but more than that, it's just easy to
manage and it reduces the latency for our projects since we don't really
have anything else going, we can just go straight
to there we go. We can just choose. Isn't any Asia,
Singapore, There we go. We don't need a
multi-zone availability. Single zone works
perfectly fine for us. So that's exactly the one
we're going to go for. Now when you come to this
third page, it'll prompt you, it'll ask you for a credit card. And it will charge that
credit card initially, but you will be
refunded your amount, and it needs a credit
card to verify. So in case you go over your limit or you go
beyond the 60 days, it sends you a notification
to let you know, hey, if you want to keep
using confluent, let us know, will charge it on the
card and we'll just keep whatever clusters you have
in projects you have going. So this cluster is
perfectly fine. We can call this cluster
cluster underscore 0, that works for us. And we can just
launch this cluster. There we go. With this. We've set up
our console cluster. Now let's proceed with making a topic within our
cathode cluster, as well as logging into our confluent Cloud
through our CLI. For that, let's head over
to confluent right Now. From here, we can just go back to the Getting
Started link and just login. There we go. That is my e-mail,
so I will hit Next, that is my password. I will login. There we go. Once you have a cluster setup, this is what you're going
to see inside confluent. It's going to show you that
you have one cluster setup. It is live, it is working. So just go ahead and click on this cluster
that you've made. Once you're inside, you
will see the dashboard. You'll see that it
gives you a bunch of options that you
can pipe data in, configure a client if you
wanted to set up the CLI. We're gonna be setting
up the CLI soon, but before that we're
going to set up a topic. Let's go over to topic right
here and create a topic. We can give this topic any name. We can say that it has subjects. Why not? We can make this school courses, and we can just leave
this at this default, default while you six, it won't really change
much of anything, so we can just go create
it with its defaults. We had over two messages. So we can produce new
message to this topic. We can say that there we have a mountain
view CA, all the rest of it. Okay, so since we
don't want to send any technical message for the
time being at this point, we're just trying to
understand what topics are, what they do, and how messages would be sent back and forth and how we'd see them. Let's forget about all
of this technical bits of code that are in
the value section and we can just
give it any value, say that we give it the
subject of English. That's always a fun subject. And we can turn
that volume to one. You can produce
that. There we go. English partition tree, awesome. Then we can say that we
want to produce not. You can say that we want
another subject, math. And I guess the last
one would be science. We say that we want our
final message to be science. There we go. Now we have three messages
that we have setup. These are just three subjects
that we've said that, okay, These are three messages under the topic of subject
or subjects. In our case, as we can
see here, subjects. And we just want to pass these three messages
along that we have science, math, and English
as our subject. Now that we've set
up our topic and we have a few messages
inside our topic now, we should probably go
ahead and try to set up our CLI so that we can actually
then set up a consumer. The constructions
are really simple. If you go over here onto
the setup CLI page, all the instructions are
right there in front of you and you can just copy
these commands over. If you're working on a Mac, this should be fairly simple at this command will
work out of the box. There won't be any problems. However, if you are like
me and you are using, are trying to set
up confluent Cloud or CLI on a Windows machine, you do need a Linux terminal
in order to do this. If you are on Windows 11, to get a window, the Ubuntu
terminal is pretty easy. All you have to do is go
to the Windows Store. From here, you can just
search for Ubuntu. Ubuntu, you will get
this Ubuntu app as well as Ubuntu 12.04
LTS and 18.04 LTS. Since I already have
this installed, I just have to go
ahead and open it. I do have to advise
you guys though, that if you are going to
set up a boon to like this, you will need Windows
Hypervisor if you are working with Intel machines or if you are working
with AMD machines, you do need to enable
SVM to get this to work. Okay, so now that we have, are The Ubuntu terminal open, it's probably time to
run our CLI commands. However, I won't be running
this curl command since I've already run it and
I don't want to fall into the problem of
duplicate directories. You should probably go
ahead and run this first. It should give you a C
Cloud installed afterwards. And when that happens, all you need to do is just run a simple command that
says see cloud update. And it'll check for updates on C cloud to see if there are any. There should probably be
a major update for me, but I'm not going to be
updating it right now. Yeah, there we go. The only update is
a major version and I don't have time
for that right now. I'm gonna leave it afterwards. I think it's time to
now log in to num2. Now another thing we have to
do from inside our CLI or terminal is we need to gain access to our Console
and Cloud from in here. In order to do that, we need to have some means of communicating to our
Console and Cloud. To communicate with
the Console and Cloud, you need to head over to data integration on the left side, and you'll see a little tab
here that says API keys. Over here, you can
create an API key. It'll give you two options here. Let's say you can either have global access or you can
have granular access. For this, I would recommend that you go
for global access. Once you have global access, you can go ahead and use
these credentials to connect to your client,
the Ubuntu terminal. Now we can give this most any description
for the most part, I can just call this
the skill curve key. If I want to, I can just
download this and continue. It even gives you this
little text which gives this key if yours, all the details of your key, it provides you a text
file with all of them. Now that I have it and
my API key is made, I'm just going to go over
back to my Ubuntu terminal. Now assigned to log
into our Console and Cloud using the Moon
determinants are for that. I'm just gonna go
over to see cloud, login, dash, dash save. Now from here it's going to ask me my confluent credential. I'm just going to put them in support at the
rate skilled curb. Once I put those
in, it's going to ask me for my password. I'm just going to pause
real quick and put that in. Okay, so now that
I have my password and I'm just going
to press Enter, it should log me into
my comps want cloud. And it has written those
credentials to the NET RC file. Now that I'm logged
into confluent, It's time to see our
cluster on a CLI just so you can know that
we are actually logged in. And it's always nice to sort of assure yourself that you
can see the cluster. For that. Well, we need to do is
we need to just write C Cloud Kafka cluster list to sort of see what we can find. And as you can see, we
only made one cluster. And we can see the
ID right here. So this is just the one cluster we have and we would want to
use this cluster by default. We don't want to have
to put the ID in every single time when
we're using this cluster. So what we can do to make it our default cluster
is we can just write C Cloud space, Kafka space cluster, space use. And then this name, so L, K, c dash z, two MPs. There we go. It has set are that
are one Kafka cluster as the default and active cluster for our
current working environment. Okay, So now that we have our default cluster setup within the terminal,
within the CLI, now is a good time
actually to bring in our API key that we made so that we can
link the two together. In order to link the API key, we need to write it
as C Cloud space API dash key and store. And what we need to store
here is the API key, which I'm just copying over
a here, space and paste. After that, it's going to ask me if it's a little
difficult to see, but in the bottom corner
there it's saying, what's your API secret? I'm copying over that
API secret right now. And there we go. Now it has stored
the API secret, and it's stored the
API key for me to use. Now that I have a API key in, I want to tell my CLI, okay, now for, for
a given resource, I want to use this
API key for default. What we're essentially doing is making this API key work in default for the
cluster we just set. So in order to do that, what we have to do is
we just need to go see cloud API dash q0,
forward slash use. We actually need this little
thing here, like before. We can just put that in. So we can do dash,
dash resource. Again, we can just take
our resource name from here and copy it over. And now we have set our API key, such and such as the active API key for the one cluster that
we have set up. With this done, now
we're basically all set up to go ahead and start consuming and even producing messages to and from
our Console and Cloud. Now, in order to
consume messages, I would recommend we open a new terminal for Ubuntu so that we can
see it separately. So we can just open one more of those and bring those
to decide right here. Now, to consume with from a Kafka cluster
or from confluent. In our case, what we need
to do is we need to write a command and specify the topic from which we want to
be consuming from. So in our case that will
becomes C Cloud, Kafka, topic, consume dash, dash. From beginning. In where we specify we want everything to come
from the beginning. We feed it subjects. There we go. Now you can see that we have
English, math, and science. These are the subjects
that we actually put into Kafka ourselves. So this is the exciting bit
of Kafka that we've done, that we have made messages and now we can see
those messages as a consumer. So far we've seen that
we've connected to our confluent cluster and we've set up messages and our topic and we've
consumed those messages. That's all well and good. But I think now we should move onwards from the consumer side over to the producer
site and produce messages for our cluster. Before we did this, back from the GUI
within confluent, we just opened up
the page and we went to the topic and we just put them in on this
really nice GUI website, but we should know how
to do it from CLI. And if within CLI we can see
it working in real-time. So in order to do it from CLI, the command we would
use is C Cloud, space, Kafka, topic, produce dash, dash, dash, key, space,
dash, dash, delimiter. Then beyond that we have a colon beyond which we
have the value format. We can say for our example, since we're working
with strings, we can say that
this is a string. And we just specify the topic which in
our case is subject. When we press
Enter, there we go. It says that it is starting
the Kafka producer. From here, you can actually start adding,
producing messages. And we should be seeing
those messages come to the consumer on
the right side of the screen the moment
we produce them. For example, let's say that
we want to add more subjects. We can go for geography. We produce geography. And there you go. It got added straight
into our consumer. We say geography is good in OB, but I also want to learn
history for enough. There we go. History showed up
on the right side. So we say six, for example, that now
that's not good enough. I wanted to do internal
or internal affairs or global affairs y naught. We can make this global
affairs just keeps showing up. And that's just the
beauty of Apache Kafka. That is why it's such
a nice technology, is that the moment we
produced a message, the consumer got the message, the moment and update was made. The consumer was alerted to it, the consumer knew
about it and it could consume the updated
messages immediately. With this, we have now gotten a grip over
the consonants CLI. Now we know how it works. We can make a producer, we can make a consumer
all within the CLI. And we've set up a cluster
within our Confluence Cloud, and we've set up a topic. We have done virtually most of the fundamentals when it comes to working
in Apache Kafka. Later on, we'll also be seeing
how to work with streams. And we'll also be taking a
look at what case equal DBS.
7. Introduction to KSQLDB: Section tree, K SQL DB, introduction to SQL DB. What is case equals dB? Well, K SQL DB is a
real-time events streaming database that is built on top of Apache Kafka and Kafka streams. Now what Kafka streams are as a topic that we'll
be discussing later. But for now, all you need
to understand is this. The way a stream is defined on the really fundamental level is the data-flow from a
producer to a consumer. K SQL DB takes this
flow and links it to a relational database
model using SQL commands. There are many use
cases of K SQL DB, and it's used a lot in
the industry because of how simple and
easy it is to use. How does case equals db work? Casey equal db separates
it's distributed computing layer from
its storage layer for which it employs Kafka. Case equals dB itself handled all the tasks
associated with the data, whether that's filtering, processing, streaming, whatever. And then it leaves the storage to Kafka to figure
out and manage. How do we interact with
Casey equal to EB. While this can be done
in a number of ways, we can do it through rest APIs. We can throw it
straight through code. Or we're gonna do it in the ways that we have
defined in this course. The first way that we're going
to communicate with case equal db is using the
case equal db CLI. The second way we
are going to be interacting with K SQL DB is using the confluent Cloud UI
that we see on the Console. And Cloud itself.
8. Using KSQLDB: Using case equal db. Okay, so to get started
with using KSQL dB, the first thing we have
to do is to set up a cluster within which
we will be working. When you go to your Console and Cloud into your
environments page, you should be seeing
something like this. If not, maybe a bit
different if you have 12 clusters setup already. Since we don't have
as a cluster setup, we're just going to go into
our default environment here and just set up a cluster. So it says Create
cluster on my own, just go for a basic cluster,
begin to configuration. You can set this
region to whatever you want it to be and just set single zone and
continue because you'll see the base cost of
this is $0 an hour, which for this demonstration, we don't really need
anything too fancy. So we're just gonna
go with that. So once that's done, you can
just launch your cluster and it should launch really quickly. And it's just done setting up the cluster we're creating the cluster provisioning
might take a second or two. Once you're done,
it's gonna give you the screen and it's
gonna say look, you want to set up maybe a
client or something else. You don't really want
to do that right now, so you can just cross it out
here and it should bring you to the overview
of your cluster. Once you're in this overview, you'll see a little menu
on the left side here. Now in order to
work with K SQL DB, it's kind of a
given that you need a topic with which
to play around with, to create tables and
streams so that we have a means of getting
information to and from, and we have a reference that information is
going to go from. If we go to topics
and there should be, yep, no topic set-up here. And of course your
cluster will be ready to use in one
to two minutes. Like I said, provisioning
takes a bit of time. So what I'm gonna do is I'm going to pause
the video here. I'm gonna come back from
the cluster is completely provisioned and
then we're gonna go ahead and set up a topic. Okay, so now that the
cluster has been provisioned and is allowing me
to create a topic. So I'm going to go ahead
and create a topic here. I can call this topic anything. For the sake of
this demonstration, let's just call it
subjects. Why not? Or we can just call
it a say users. Because in users you
can have a bunch of information and it's easier to generally create
a stream from it. So I'll just create
this with the defaults. And it should just
generally have this user's topic ready to go
the moment you set this up. So now that you have your topic, setup it assigned to
actually go into your case SQL DB and set up
a case equal DB app that we're going
to be using to manipulate this topic and
actually work with it. So what do you go over
to case equal to VB? Yep. It'll say it'll give you
the ability to create it with the tutorial
or you can just create the application yourself. We're just gonna go
ahead and create the application ourself, give a global access, Continue. You can call it
whatever you want to definitely make a blind
bit of difference. I'm going to go
with the smallest application size so
that there's not too much cost on my
Console and Cloud. And I'm just going
to launch it up. Now you should know at this
point that setting up case equal db applications does
take a bit more time. In fact, it takes a
lot more time than provisioning a confluent
Kafka cluster. This process should take up
to four to five minutes. So what I'm gonna do
now is I'm going to pause the video here and come back when it is done provisioning
the case equal DB app. You might see that when
it's done provisioning your case equal DB
app that you might encounter an error
where the cluster says, I cannot connect to
this case SQL DB app. You don't need to panic. That's still a part
of provisioning. As soon as the
provisioning is complete, that error should
go away on its own. Now that we have
that, clear it up, I'm going to pause
the video now, let this provision and come
back when it's all done. Okay, So our case equal DB
app is now up and running. It has been provisioned. So we can go inside and take a quick look at it and you
can see how on the inside, this is what a case equal
DB app will look like. You will have an editor
here that will allow you to run your SQL queries in it. And then next to it you
will see the flow tab, which will show you exactly
how things are working, what streams are feeding
into, what tables are, what tables are feeding
into what stream, so on and so forth. You can see your streams, the list of all the streams, in this case equal DB app here, they'll list of all your tables
as well as your queries. Apart from that, then
you can see, wow, your case equal DB app is doing
how well it's performing. You can check out
different settings for your case equal DB app. You'd need to know some of these settings in case you
want to connect to it. For example, this
endpoint here you'll often be using an
endpoint to connect to case equal db apps when you have to work with
k is equal to EB. Lastly, you'll have
CLI instructions at the very end that will
allow you to log into case equal db and login to
your Console and Cloud and use your case equal
DB app from the CLI. Now that we have this case
equals DB app up and running, It's time to sort of generate some information into our topic and then make some streams and tables from that information. So let's go back over to
our Cluster overview. Now we're going to see
how we can generate some information into
our topic of users. Kafka actually has a
pretty neat thing if you want to generate
information or messages into a topic is
that it has a bunch of pre-made data
generators that will keep on sending
information into a topic if it is required to set
up a data generator, what we're gonna do is
we're going to go into Data Integration and
go over to connectors. When you come into
connectors here, you'll see all these
connectors here. We're looking for data gen, so we're just going to
search data gen source and this is from Ladin Kafka
is not a third party, so it's not going to run
into any issues in setup. Now that you're here, you have to set up a bunch of things. The first thing, of course, is your data gen
source connector name. Fine, fair enough,
doesn't really matter. We can use a Kafka API key
to connect to our connector. So do we have an API key
setup at this point, we don't really
need to worry about that problem because
we have a little link here that will generate
a Kafka API key and secret for us in real-time. And what we can do is click it. And there we go. It's given us an API
key and a secret. We can just call this
our data gen key. And it'll download that key
as it's done right there. And it's going to put in
the credentials for you. You don't have to go
through any problems and there's no risk of making a
mistake inputting that in. Now we have that done. You have to move over to topics. Our topic is users right there. Now you have to pick your
output messages so we can say Give us the
values and JSON. Lastly, this is
the important bit where it says data joint
details where you have to choose now what template of information you'll want
to feed into your topic. If you open this dropdown menu, you'll see just actually a bunch of different data
Jens available to us. We have clickstream, clickstream codes, user's credit cards, inventory, orders, pagers, product purchases, rating,
so on and so forth. But the funny thing is there is a data jet in here
exactly for users. So what we're gonna
do is we're going to click on this Data, Jen, and we're going to set a default of the default value of your
messages that are sent, which is basically the
interval between messages. It's sent one message, when does it send the next one? By default, values 1000. So if you leave this
completely empty, what it's gonna do is
it's just going to keep sending messages every second or every
100 milliseconds. So I'm just going to
leave that for now. I want to put 1000 here
to show you that it's going to be sending messages
after every second. But even if I don't do this, it will still send
messages every second. So if you plan on
keeping it that way, you've may leave
this field empty. I'm leaving it, I'm putting the value in here just so that we know that we have defined this value of 1 second
for every message. Now after this is done, we can move over
to the number of tasks that this connector
is allowed to perform. Now, with each task, the cost of running
the connector goes up. You'll see that if you choose
more than one connector, you have to upgrade your confluent account
to make more tasks work. So we're not gonna go into that one task is plenty
for this example. So we're just going
to make one task. We're going to leave everything else and we're just
gonna go next. When you do this, it's
going to tell you, look, for every task hour, it's going to charge
you this much. And for every GB
of data you use, it's gonna charge you this much. You have $400 in your trial
account in confidence. So really to run this
for a few minutes isn't not gonna be a
problem whatsoever. So don't worry too much about the running cost of
this and just launch. Next to the Launch
button, of course, is a data preview
that allows you to see what sort of information
you'd be getting. But I'd like to show you that in the topic itself as opposed
to showing it there. So now that you've
major connector, it's going to be
provisioning for a few minutes just like
case equals dB did. So what I'm gonna do is I'm
gonna pause the video here, come back when it's
provisioned and then I'll show you the information it's
fed into the topic. Okay, So setting up
my connector for my data-generating was taking
longer than I had anticipated. And I assumed it was because I had used that name
before and I had a client that was already connected to that
connect her name. I had to go back, delete
the data generator that I made and make a new one called
My dad did data generator. And now that that's
up and running, let's go inside and
see what it's doing. If you go into my
data generator, you will see that it is, well so far it's
produced 82 messages. And it has that one task going which is running
just going to close this little information
here that's telling me that my data
generator is running. If you go into settings, you will see all the configuration
that you gave to it. You can even come
back and change this information if you need to, but I would not recommend
it because it could completely make your
stream go topsy turvy. All your configurations
would have to change accordingly and it
would just be a mess. So I wouldn't recommend
that you make any changes to this unless
you absolutely need to. Now that we know how the
data generator is working, Let's go over into our topics. Yeah, let's go over
to our user topic. But before that we can
see that a has a value associated now within
his production field, which is to say that this topic is actually receiving something. What that is, Let's go
inside and take a look. If we go inside into messages, we should be seeing
what messages have been produced here. So as you've come in here, it has now shown you the
messages that have so far been generated to
this user's topic. And there you go. You just received another one. If we open up this latest one, you can see that there's
actually a bunch of values associated with a message. So if we just look
at the values here, we have the registered time, the user ID, the region
id, and the gender. If we go into the header,
we can see what the, how the key was
generated, the task ID, current iteration, and
last but not least key, which is the key
for this message, which is user underscore three. We have this data
being generated now into our topics
so we can go ahead and decay is equal
to db and setup a stream for this data. So let's go into the app. Now we can actually set up a
stream from the editor here. I'm going to make
the stream by hand so that I can explain each element of the stream
as we go about making it. So the first thing we have
to write is create stream. Actually, if you're at Create, you will see that it
gives you two options. When you do create stream, you can see that it gives
you a bunch of options here. So if you just do
create stream that, then we can just call this
stream whatever we want. We could call it our
users underscore extreme. Now we have to give it the
fields that it has to take. What we can do for
that is we can go into topics, users, messages, and we can just take
a basic look at what messages we have inside this pick any one of these
messages, open it up. We can see we have
registered time. He's already reached
an ID and gender. And these are going to
go into where occur. This is going to go into, I believe, a big int. So I'm just going to put this in a big int
and I hope that it's not going to crash
on me as I do that. I'm just going to open
up a notepad here. We're just going to
store this information in a notepad right here. So when I go back to case
equal db and set up my stream, but now we can do create
stream, user underscore stream. We can give it registered
time as a big int. We can give it a
user ID as worker, where we can give
it a reason ID, which is also a worker. Lastly, we can give a gender, which is also occur. When you've done this, you've defined what fields are
going to go into the stream. Now you have to tell it what
topic it is going to receive this information from and what the value format
is going to be. So now we're gonna put it
with Kafka underscore. Topic is equals to, we can say users. And we can say that the
value format is going to be you underscore format
is going to be JSON. Now that we've done that, we can just put a semicolon
at the end of it. We can set this to the earliest, the auto offset reset earliest so that it gets the values from the very beginning of
the topics existence, so to speak, and
just run the query. There we go. So if this has run for
you status success, which means that we did
get our stream created. If it did work for you, you can go over to your
flow and you should see a new stream setup here called
user underscore stream. And if you open up user
underscore stream, well, it's not showing
us anything here, but if we go here and query the stream like
so with earliest, we should get a few messages. Yep, there we go. We're actually getting a fair few
amount of messages. But then again, it is receiving all the messages
from the beginning. So it's going to take
its time and it's going to give us all of
these messages. If we wanted to even, we can even stop this
query at any point. There we go. Now we can just use anything
that we get from here. So all of this
information we have, we have it in our stream. It's going now we've
set up a stream. If we go over to our
data integration and go over to clients, we can see that we actually
have a connector producer and a consumer setup that is actually taking all of this information if
you go to topics, yes, it is taking it
from the user topics. So your stream is
functioning now as both a producer
and a consumer. And if you go over to
your stream lineage, you'll see that your
data generator is generating information
into your topic users, which is then going consumed
by these two consumers, your custom apps, if you would, that are consuming
the information. Now that we've done this,
we've set up a stream. Now we're gonna go ahead and see how we can set up a table. So let's go ahead and do that. So let's go into their
case equal DB app again. From our case equal
DB app again. Now instead of now it has a query setup for you
ready to go select star from users stream and emit the
changes where it would show you everything that was
inside the user stream. But we don't want to
do that right now. We want to go ahead
and create a table. So what we're gonna do
now is we're gonna get rid of this query that
they've given us. And we say create a table. We can just call
the same length. And so we can call this
user's underscore table because we're using
the same topic. We're not changing
much of anything here, so we can just call it user ID. Where care, because
it's a table. If you know basic SQL,
you know how it works. You have to associate a primary key to the
tables you create. So we're gonna set
our user ID as the primary key for this table, which is the identifier
for each row that lets us know that it's unique and
it hasn't been repeated. So we can set the register time. Big int. Again, I'm just going to give
a big int gender as Barker, the region id worker said that width the same as the assigned
Kafka underscore topic is equal to users. And the value format, again, while you underscore, while you underscore format
becomes JSON. There we go. Then we can just put
that at the end, the semi-colon and
run the query. There we go. And it's given us
a success message, which means that our
table has been created. So now if we go over
to our float again, now we will see that
we have a user table, table setup for us. And if we go into table, we can go into here
and query the table. If we go over into R, If we run the query, we see that it's already got all this information inside it, that it's gotten from the topic. It all we had to do was set up a table against the topic and it's generated this
entire table for us. They just off the bat. We didn't have to do any more
configuration then that one line of SQL we had to write
to set this table up. With this, you've
effectively created a good stream as
well as a table. With that effectively set up a case equal DB app as well as a stream and a table and a topic to which data is
being generated. Two. Now you know how to
set these elements up in the Confluence Cloud
and get going with them.
9. Overview of Kafka Streams: Section for Kafka streams. Overview of Kafka streams. What our Kafka streams. Well, if you look
at Kafka streams, it is basically just a
Java library that helps us do what we've essentially just been doing up to this point. But in a much easier way. Streams defines
the complete flow from producer to
topic to consumer. And it allows you to
create that flow with significantly less
code than you would need otherwise if you were
going about it differently. Kafka streams are extremely
fault tolerant because of the distributed nature
of Kafka clusters. We could also say that streams themselves have that
distributed property. Consider the following
example of how you create a consumer and a
producer in a program. First you would have to
create the consumer, then you have to
create the producer, subscribe to the topic, which in this case
is widgets and get your consumer
record out of it, which in our case, that comes to color red. We just want a consumer to consume this message that
the producers going to send, that we have a legit that
is of the color red. Now consider this code. The funny thing is,
you would think that these four lines aren't completely different to what
you've just seen before. Or they may be complement the code that you
just saw before this. But actually it does
the same thing. It just abstract all of that programming and takes away a lot of that
household from you. And it just puts all
of it into a stream. That is how streams may coding concise and easy to work
with for developers. Though it might be all
well and good that Kafka streams gives us the ability to write longer
code and shorter form. Why should we even
use Kafka streams? While at the end of the day,
Kafka streams is more than just simply make code concise. Streams allow us to perform
certain tasks on our data, data tasks that we couldn't
perform otherwise, such as filtering,
joining an aggregation. These are tasks that we
want our streams to do because we want our
data to become more meaningful for the end consumer
through Kafka streams. Well, we actually achieve as we take the
data that is being produced and consumed and
give it a lot more meaning. So that one is actually
analyzed and looked into. We can make proper
understandings of that data and possibly even use them for dashboards and for
analytics later on.
10. Introduction to Kafka Connect: Section for Kafka Connect. Introduction to Kafka Connect. What is Kafka connect? Well, with Kafka Connect is
a competent of Apache Kafka, and it's probably the most important
competent of Apache Kafka. If you plan on using Kafka and AT services with a setup that you
already have existing. Catholic Connect allows you to connect your Kafka cluster and your Kafka services to a
variety of different services. It has connectors for most
data services out there. Whether that's
Amazon, GCP, Hadoop, Kafka Connect provides
streaming integration between Kafka and
other services. And it allows you to
connect both sources as well as data syncs to
your Kafka cluster. You can have data coming in from a data source to
your Kafka cluster. Why a Catholic connect? Or you can have data going from your capital cluster
to a data sink. Why a Kafka connect
that data sync, of course, being in a
service that is not Kafka. Now in our case, for the project
we're going to make, we're going to set up a
data sink within Amazon S3. Now what are the use
cases of Kafka Connect? While as we discussed
previously, Kafka Connect allows
you to integrate with outside services that
are already being used. You already have data on these services and it would
be pointless to go ahead and make a brand new
cluster and then populate that cluster with all that information
that you already had. Instead, it's easier to just connected with what you
currently have going. The second thing is Catholic and provides what is basically a seamless integration with your services for the most part. You just need to set it
up mostly in the GUI. And you don't
really have much of any problems in terms
of configuration. If you just decide how
you want that data, what form of data you want. And that's that
for the most part. The last major benefit of using
Catholic connect instead, it opens up a lot of opportunities in the domain
of data streaming for you. So you can now get
that data stream going and you can actually
run queries on it. You can turn it into
streams and turn it into much more meaningful data than it was being done before. Yeah, there's a lot of uses
to Kafka Connect and we will be exploring it in our second project
in this course.
11. Project1 Book Catalog: Overview: Okay, so to get this
project started, let's just get an overview of what we're gonna
be doing today. So the first thing
we're going to be doing is we're going to create a producer in Python that is going to produce our
messages to our topic. The second thing we're gonna
do is we're going to go ahead into our
consonant cluster. And we're going to create
a topic that is going to receive our messages
from the producer. Once we have these
two things set up, we're then going to
publish messages to our topic so that we can populate it nicely for when
we create our consumer. After this, we're gonna go
ahead and set up our consumer. And then we're gonna be
ready to start consuming messages from our Kafka
topic within confluent. It's gonna be a really simple
project, not too technical. So without further ado,
let's get into it.
12. Project1 Book Catalog: Create Producer: Now it's time to
see how to set up our producer in Python. Because now that we have our entire configuration
ready to go, it's time to create our producer
that is going to produce the messages for us
to our Kafka cluster. For that, Let's make
a new file here. We can call it producer
dot p-y. There we go. So now we have a producer
file ready to go. Let's just open up our producer. Good, right, These
welcome guides. Now for this producer, There's a bit of code that
I'm already written for you. I'm just going to
paste that into here and explain to
you what it does. This top bit of the code here, all it does is it parses the command line and it
parses the configuration. What that means is it's
just trying to figure out what cluster I have to
send my messages to. Do. I actually have an API key
that allows me to do that. Do I have the correct
secret, so on and so forth. So this handle, all of that
in just this conflict parser. After that, beyond that, we have a producer
that we are creating. An instance of. We say that we have a producer
that is a producer coffee. Now, after this, we aren't
trying to send messages. When we, when we send messages, there is often a
possibility that the message doesn't get sent
for any reason whatsoever. So in the event that happens, we have a delivery callback
method that says, okay, if you ran into any problems, print the error message and tell the user
what the problem was. Otherwise, if you
were able to send it, then tell them client
or the user that hey, I produce your event, a topic, whatever your topic is with
this key and this value. Now, if you'd come
to the bottom here, this is our payload, lines 34 through 36, where our topic is books. We have a bunch of user IDs
against a bunch of products. How are we going to
create messages? Is right here from
lines 38 to 44, where we just took
a range of ten. For ten messages,
we are randomly allocating a user ID to a book. For example, it could
be Jack and twilight, it could be Michael
and Lord of the Rings, it could be Hannah and Aragon. And some of them might even
happen twice or thrice. It's completely random, but we'll get ten
messages out of it. This is a really
simple, really easy, really light producer that's going to produce ten messages for us to our Kafka
cluster into our topic. Let's just save this
file and close.
13. Project1 Book Catalog: Create Topic: Okay, so now that we have
our producers setup, it is time to go and create
that topic to which we said that we are
going to produce our messages to
from the producer. Now that we have
our producer setup, it's time to go ahead
and make our topic. So I'm just going to open up a new window here
and let us go over to confluent and create that topic to which we will
be sending our messages. Let's log into our
Console and Cloud. There we go. So let's head
over into our cluster. And from in here we can go into topics from the left side. And over here we
should see our topics. Now as you can see
at the moment, we do not have a
topic set for books, so we just need to add
that topic immediately. We can just call it books as we've made it in our program. Once we said that creation,
boom, it's created, there is no lag
for this creation. Now that setup, all we can do, or what we really need to
do now is move forward.
14. Project1 Book Catalog: Produce Messages: Now that our topic is ready
and can receive messages, it is time to produce
messages to our topic. Making on producing messages to your Kafka cluster using the method that we're using today is actually pretty simple. We're just going to go over two. We're going to move over
into our project directory. From there we go into our virtual environment using source environment bin activate. Now, there's another
extra step we have to do here because we've made
our files on Windows. Unix doesn't really
like those files a lot. You have to make a
small conversion using the DOS to Unix commands. So we do DOS Unix
producer dot p-y to Unix, config dot Iodide. Once we've done that, we can
CH mod our producer file, u plus x, producer dot PY. And now we can run our producer. There we go. It has produced events
to topic books. We say Jack took Aragon, Jack took Harry Potter book, Animal Farm, Michael
to Animal Farm, so on, so forth. We just have ten
keys and ten values, ten messages that we have sent to our topic within
our Kafka cluster.
15. Project1 Book Catalog: Create Consumer: Now that we've
produced messages to our topic and it is populated, it is time to set up a consumer to consume those messages. Setting up a consumer
is really simple. All we have to do is move
over into our Python project, create a new file. We can call this file
consumer dot p-y. Once you've done that, just head inside and just paste this piece of code that you should find in the description. This is really simple, really basic
consumption that we're gonna be doing if messages
from the Kafka cluster. So the first thing we're
gonna do, of course, as we were doing
with the producer, just parsing the configuration
on the command line. After that, we're just creating
an instance of consumer. And as we create this
instance of consumer, we want for it to pick
messages from the very top, from the very beginning
of all the messages. We said the offset, basically, which is
what we're doing here. We define the reset
offset and we take it all the way
to the beginning. So this is offset beginning
all the way to the back. After this, we are
subscribing to the topic. We have our topic books, of course, as we do
for the producer. Again, on a sign reset offsets. So as we subscribe, we
go all the way back. Now, what happens next
is actually a continuous process until it is
interrupted where we are constantly trying and checking. If we have a message
waiting for us. If we do, then we get that print of consumed event from topics, so on and so forth. If we don't, we just
get this message waiting and if we
run into an error, we get this error message. This last bit is just
the keyboard interrupt where we say if you
want to stop this, you can control C and it'll end. This is about as
basic bare-bones a consumer as you can get. And we have setup our topic subscription
within our consumer. So killing two birds
with one stone there.
16. Project1 Book Catalog: Consume Messages: Now that our consumer
is setup and ready at our topic has the
populated messages. It is time to
consume messages and see what that consumption
looks like in real-time. Also, actually producing and consuming messages in real time is a fairly simple process. Again, because again, we made our consumer dot
PY file within Windows. We have to convert that
over for a unix to read it. So we just say, listen, we need you to change
our consumer dot p-y over to the Unix reading fall. So now that we've done that, you can just CH
mod our consumer. Now we can run our
consumer dot p-y. With the conflict. There we go. Our consumer actually got all those messages
that we send from our producer earlier
and as read them now, another cool thing
about this is we can actually do
this in real time. So say for example, I run my producer from the side. So I say that run
my producer here. The moment those
messages were produced, the consumer got
them and consume them in the exact order
there were produced. And if we run it again,
we'll see it again. But that's not the most
interesting thing. The most interesting
thing is if you made these messages
in real-time, you'd still see them and
you can still read them. Now let's head over to
our console and flow. Move this window to the side. Let's go over to our
topics and go to books. And as we can see, we've got some production
going on in here. It sees that we have activity. If we go to produce
a new message, we can call it anything. We can give it a value of say, a good value back to the future. Even though that's not
a book, it's a movie. I digress. We can leave the
key acids and the moment you produce
that message, boom, you get that message
in your consumer in real time that we have a event with the value back
to the future and the key 18. And with this, you
have work successfully created an application that both produces and consumes events or messages using confluent
and Apache Kafka.
17. Project2 PacMan: Overview: So here's a brief overview of what we're gonna be seeing in our application that makes
it a streaming application. So the first thing
we're going to have is we're gonna have two topics, user game and user losses. And these two topics
will be the basis upon which we are going
to create two streams. These two streams
are going to feed their information
into two tables. The user losses stream
is going to feed its information into the
losses per user table. This is willing to
record how many times you have lost all of
your lives in the game. These are game stream
is going to feed its information into the
stats per user table, where it's going to record
your highest score and how many times you have
actually played the game and lost that information of how many times you've lost the game is actually going to be fed through your losses per user table into your
stats per user table. Finally, this stats per
user table is going to be queried to generate
the scoreboard that shows you not
only just your score, but the scores of everybody
who's played the game. So you can compare
your stats with everybody else's who
has played the game.
18. Project2 PacMan: Configure and Run: Since I use Windows
Subsystem for Linux, it allows me to go straight
into my project directory on my Windows and see
the files as they are. So as we can look at our
streaming Pac-Man application, we can see that we have a bunch of files and
folders in here. The first thing we have to do is go into our conflict file, which we should have
a demo dot CFG. If we open this up, it should have our
AWS access key and our AWS secret key right there. Now, what this is is it's the access that you define
to your AWS account. Where would you find
your AWS access key in your AWS secret key? Well, let's go
ahead and see that. What we're gonna do
is minimize this, come over to our AWS
and just search IAM. And then head over to here. Once you're here,
you can look at the users in your
Access Management. So you can just
go over to users. And from here you
can add a user. When you add a user, you can give it the username, then set it as access
key instead of password, which will assign
you an access key ID and a secret access key. So if I add a new users, call it scale curve. Go next. You can add it to a
user group for admins. Or if you don't have this setup as you wouldn't if you
have a new account, you can just create a group, assign IT administrator access and call it the
admins user group. Select it, and then
just move over the next two tags and you
can just add any key here. I can just call this filter key. That's all I need. Honestly. I can just assign
it the same value. Just go Next. And it
will create this user. The moment it does that, it's going to give you this success page and
it's going to allow you to download as a CSV the
credentials for your user. Apart from that,
it also shows you access key ID and your
secret access key. Take note of these because you will need them
for this project. I would recommend you
download the CSV. Then when you open it up, it'll give you exactly
what you need. Your access key is, oops. It should just give you your access key and your secret
access key right here, which are both the
things you need to feed into your configuration. I already have an account
setup in my configuration, so I'm not going to feed
this new access key and secret access key inside it. I'm just going to move
on to the next thing. I'm going to close this, save it as one, and move on. Now that that's done,
let's just go back over to our folder and go back
over to our file. Our demo dot CFG is good, so we're just going
to close that and now we're coming back to
streaming Pac-Man. The next thing we're going
to have to look at is our stack configs folder. We need to make sure
that there is nothing inside the stack
configs folder because this configuration
is going to be what Terraform is going to use in order to set the
entire application up. The first time you
run the start script, as I'm about to run, you're going to see
that it's going to create a lot of things for you. And then the next time
you run that script is winter form is going to actually provision and create
all the resources. Now what we can do is we can
go over to our terminal. And from here we
can just cd into our streaming Pac-Man project and just run our start script. So if you haven't
already done this first, you would have to
CH mod u plus x. Your start dot SH
loops u plus x. And then you can do the same
for your stopped dot SH. You can do the same for
Create case equal db app.js. These are the three shell
files that you need to give yourself the
permissions to run. Now that you've done that,
all you need to do is just run the start
command, Start shell. So we just run start dot SH. And there we go. It's just going to run for you. And it's gonna give
you this prompt. And it's going to say
that Look, listen, this is going to
create resources that are going to
cost something. So are you sure you
want to proceed? And because we have to,
we're just gonna say yes, we don't want to proceed.
While the script is running. I'm just gonna go back here
and come back to our S3 page because once we've finished
running our script, we are actually going to see
an S3 bucket created here. Also. I'm just going to go into our environments
page on our console thought and it's already set up. That's actually kind of neat. It's already set up your a
constant environment for you. We can see that right here, creating conflict CloudStack
for environment and set the Kafka cluster and
active cluster for environment E and we ECD in 23. We can actually check this. So what we can do is we can
open up a new terminal. Let me go see cloud
environment list. Yet you can see the ENV ZD and 23 is our streaming Pac-Man and public environment that we just created for
this application. So I'm going to let this
run and I'll be back. When this starts. Shell has actually
finished running. This takes up to
five to ten minutes. Okay, So the startup script
has finished running. And what it's done is it's, let me just take you
all the way up to the top here from
where we started. Yep. Now it's what it did
was it created a user and it gave it all the
permissions to do exactly what it needs to do, to create all the
resources that it needs to create in order to get
the application going. Now, all of the things
that had to do it did properly until it came to the
KSQL part where it crashed. I created it so that it's
supposed to crash here. Now let's see in the
next section what we're supposed to do to make
this work properly.
19. Project2 PacMan: Setting up Pre Requisites: Before we start this project, I'd like to give a huge shout
out to Ricardo Ferrara for the original Pac-Man game
streaming application. It is his application that we're actually
going to be running and using today that I
have tweaked and modify it a bit so
that we have to perform certain tasks on our Confluence Cloud in order to get the
whole thing working. Now that we've
gotten that aside, It's time to see
what we actually need to get the
project going today. The first thing we're
going to need is a confluent Cloud account. If you do not have a
confluent Cloud account, I would recommend
you go and get one, a trial account which
should give you about $400, which is more than enough to make this whole project work. The second thing you're
going to need is an Amazon AWS account. Again, you can set
up a trial account, which for the short duration it gives you is
more than enough. And it is plenty for the project that we are
going to be making today. Once we're done
with the project, you should see one S3 bucket
here created for you. And then we will be creating another S3 bucket
for our data sync. The last thing you're going
to need is Terraform. Terraform is what is going to create the
application for us. N is going to provision the
resources that we need in our AWS S3 to set up to
our form on your Linux, you just have to run
these three commands. If you haven't already, I would recommend you go ahead and install terraform onto
your local system. Besides this, we just
need two more things. We need to check
if we have Java, so we run the java
dash version command. We need Java 11 or
greater for this project. So please ensure that your
JDK version is 11 or higher. The second thing we
need is mainland, so we can check
our Maven version. And yes, we were running
Apache Maven 3.6.3. If you've been following this
tutorial up to this point, you should have confluent
and see Cloud installed. You just run see cloud. There we go. If we look at
confluent, There we go. So if you haven't already
logged into your C Cloud, I would recommend
that you do that now, you would see cloud
login, dash, dash safe. And what this does is it saves your credentials and
the net RC file. Once you do see Cloud Login, it prompts you to put
in your credentials or if you already have your credentials
saved in the net RC file, it'll just log you in. Now that we have our
environment completely setup, let's go ahead and start
making our project.
20. Project2 PacMan: Setting up KSQLDB: Okay, so now that
we've run our script, it's actually not done everything
it was supposed to do. It's left some of
the work to us and we're just going to go ahead
and do those little bits of work that we were
supposed to do and just move on and get the
thing actually working. So the bit we have to do now, we can do from the
conflict Cloud GUI. So we go into the new
environment It's created for us. We go into the cluster. And from within the cluster
we should see a case equal DB app already set
up and created for us. If we go into this
case equals DB app, show us that we don't have any streams going at the
moment, just a standard one. There's no real flow, just this one default stream
that you are given with every case equal DB
app and no tables. Furthermore, if we
go into topics, we will see that
we have no topics, just this one processing
logs that we get. I'm just going to
add a topic here. I'm going to call it
user underscore game. Just going to create
this with default. Now I go back to topics
and I create another topic called user underscore losses. Now that I've done that, I now have two topics. The two topics that I
said I was going to have at the beginning in the
overview of this project. Now that we have
these two topics, it's time to make streams and tables to correlate
these topics. So I'm gonna go back
into case equals dB. In this little editor here. I'm going to run some
case SQL commands. What those commands
are available right here in the
statements I'll SQL file. You can open this
up and you'll have these five lines here
that create two streams, user game, user losses,
and three tables. Losses per user, stats per
user and Summary stats. So we can just copy all of this, come back into case equal to db and pasted right
here and run the query. There we go. It's
run the query and it's created the streams
and the tables for us. So if we go back
into our flow tab, we should see there we go
that we have this user losses stream that creates
a table losses per user, then the user game stream, which creates the
table stats per user, and also feeds a table
of summary stats. Summary stats is just the
extra table that shows us the general overview
of what's going on in the game at a time. And if we go to streams
again, we will see the two streams and
we go to the tables. We will see these three tables. Now that we've done this, it's actually time to
go give permission to our service account to actually
manipulate the topics, however it needs to. So for that, we're going to go back into our Ubuntu terminal. And we're just going to
go into our environment first TO see cloud
environment list. Whoops, see cloud
environment list. Then we're going to see cloud environment
use E and Z, D N23. Then we're just going to
run see cloud cathode cluster list and see Kafka cluster use
L Casey dash six, j m r cubed, six GMR cube. Now that we're in here, we have to run a command
that is C Cloud, Kafka, ACL create
dash, dash. Allow. Then we need a service account. Well, we need to do before
that is we need to run see cloud service account list, which we have five-year
double 495 on Eigenanalysis see cloud Kafka,
ACO, create dash, dash, dash, dash service account 50,
double 4952, operation. Read, as well as operation right and
dash, dash operation. Describe dash, dash topic,
user underscore game. And it's done just that for us. Now if we run it for a
user underscore losses, it's done that for us. So now that we've done
everything we had to do on the case equal db side, we can just run the application again and get the game started.
21. Project2 PacMan: Run Project and Sink Data: Now let's get the game started by running the
start script again, so we can do forward slash star. It's just going to
start using terraform to create the entire project. Once Terraform has been
successfully initialized, it's going to go ahead
and create the project. Again, this is a process
that takes its time. So I'm going to pause the
video here and come back. When it's done. There we go. So the startup script has finished running and
it ends up giving you just this output with a URL that says your Pac-Man game
is at this URL here. We get opened this in
a new tab right here. Before that, we didn't see any buckets on our
S3 before this. And as I said, we would be seeing one the
moment this was over. So if we refresh
this, there we go. We have one bucket here called streaming,
Pac-Man and GPB. And this was created for
us by the application. And this holds basically the
streaming Pac-Man website. So you can see all the
website files right here. I'm just going to go back to Amazon S3 and go to the game. So I'm just going to put in the name so we can just
call that name skill curve. And we can play.
When we do that. It's going to take
us into the calf, into the Pac-Man game that we're using for
our event streaming, I'm going to generate
it, but if a score, and then I'm gonna go
ahead and die so that we can see all the possible
events and we can see exactly what that looks
like on our event dashboard. There we go. That's our scoreboard. And as you might see that this score is
actually pretty big. And you just saw me
launch the application. Well, that's because I
paused the video and I went ahead and I
played the game a bit to generate a bit more
score and actually have some form or flow a
lot of messages so that we can go ahead and
see how we would store that using Kafka
Connect in our S3. So if we come to our S3 now, you can see of course that
we have our bucket here. But if we come back over
to our Kafka cluster, over here, you'll
see data integration and you'll see a section
here called connectors. What you can do here
is you can set up a connector to any data
sync or any sort of place you want to even
source information from if you want to bring data into your Catholic cluster. But in our case,
we're going to be sending it out to a sink. So we're going to use this one here called Amazon S3 sync. When we do that,
it's going to say, okay, what do you
actually want to store? Well, I'm just going to
give it all my topics. So the first one, the one that he
gave us by default, we don't want that one. Apart from that, we
want all of them. So we're just going
to continue here. And now it's gonna be like okay, so how do you want to access
your S3 sync connector? They do you want to maybe use an existing API key that
you have for this cluster, or do you want to maybe
choose a different access? But in our case, what we're gonna do is
we're going to just create our API credentials here. When you do that, it's
going to download the file for you just like that. And it's going to set it up. So it's going to be default putting you don't
have to do anything and you can just
call this anything. You can just call this
Pac-Man Nash S3 dash sink. Then we can continue.
Now here actually this is where he
important is that you need your Amazon access key, ID, Amazon secret access key, and your Amazon S3 bucket names. So what I'm gonna do is I'm
going to go over to my S3. Now pay attention to the region. It says US West to make my other S3 bucket
in a different region. It's probably going
to give me an error. It's good practice
to keep all of your elements of a project
in the same region. So what I'm gonna do now is
I'm going to create a bucket. I'm going to call this
bucket my Pac-Man dash sink. It's going to be in US West two. I don't have to change
anything else here. I can just create the bucket. After it creates the bucket. There we go. So you have my Pac-Man sink. It's all there. It's all ready to go. I'm just going to copy
this name over my Pac-Man sink and bring
it over to here. Your Amazon access key ID and your Amazon's secret
access key are the same ones you use for the
configuration of this file. So if you go into your
config and you can pick out your Amazon access
key and secret from this file even so I can
just copy this over, pasted here, copy my secret
over, and paste that there. And then I can just continue. Now it's going to ask you, how do you want your
data to comment? Do you want it to
come maybe as Avro, as proto buff as JSON. So I'm going to make
a JSON because that's the easiest way and it's
also very neatly defined. So it's really easy
to see what's inside your file and output
message format, also JSON. Now when you've done these two things is
going to ask you, okay, how regularly do you
want your S3 to be populated? Do you want it done
by the hour or do you want it done every day? For this example, I'm
going to have it done by the hour because we're not going to have it setting
up for too long. We're just going to have it
setup for a few minutes. We're gonna see how
the data goes inside, and then that'll
be the end of it. You don't need to go into
advanced configurations. So if you wanted to, you
could change many properties, the descriptions
of which are here. If you hover over the little i, it'll tell you
exactly what this is. I wouldn't mess around with
this too much unless I had specific configuration
requirements for my project. But for now, since we do not, we're just going to
ignore it and move on. Now, when you're at this stage, it was going to ask you
in terms of sizing, how many tasks do you
want a to-do per hour? And the more tasks you'd
define it to do per hour, I ask you to upgrade if it's
more than one and even for each task is going to charge you 0.034 per hour in my case. So what we're gonna do is we're just going to set this out one and we're just
going to continue. Now. Finally, it's going to just give you all that information. It's going to say, Look, this
is what you've asked of us. Are you sure you can set
your connector name, not gonna change it here. Don't see the point to it. And we're just going to
launch and just keep in mind that your connector pricing
for each for your tasks, your charge this much per hour, but then also for your usage or charge this much per hour, which adds up when
you launch it, you can see exactly
how much you're gonna be charged to
keep it in mind, even though it's
a trial account, it doesn't matter for now. But if you're actually
using confluent and you're using it
off your own money. This can build up
pretty quickly. Now our connector is
being provisioned and of course the
process takes time. So what I'm gonna do is
I'm going to go back to this cluster and go
to the connector. And I'll pause the video
here and bring it back. When this is all done. There we go. So now
our connector is up. It's running. A small
message here that says your S3 sync connector
0 is now running. It's got that status running. And if we go in here,
yeah, it says running. If you run into any problems in setting up this connector, you would see that error on this area where it
would say that Look, we've run into this
in this problem. So, you know, figure it out. Now we're going to see we
have this connector setup, but what has it actually
sent to art as three sinks. So if we go into our S3 sync, at the moment, it's got nothing. In a few minutes is actually
going to send something. It has got this task running. And so far it hasn't
said much of anything. So if I just play
the game a bit more, play the game a bit,
pause the video, come back, and we'll see then what it's actually sent
to our S3 cluster. I mean, S3 data sync. Okay, So I played the game a
bit on a different account. If we go over to the
scoreboard, is gonna say so. Yep, It's got a second account, got a score of 1000
on the old one. But if we go over back
to our S3 bucket now, which is here, now we'll see that it's got this
folders topic inside. And if we go inside, it's got folders for each thing that it has
received information for. If we go into user game. And it's got the year, the month, the
day, and the hour. And on this hour it has
received these JSON. Let's see, let's open
one of these JSON and see what it
actually has inside it. If we just open this up. In my case, it's
probably gonna get downloaded yet and
we can open this up. We can see that it's shared all this information
with us in this JSON, that there was this user
playing the game and he generated this much score. This is how we can use
a Data Sync to store all the information that is consumed from our Kafka cluster. But it all gets saved. And if we'd need to query
it or check on it later on, it's all available to us
here in our data sync. Now that we've done this, we know how to sync works
and we've seen it running. This is actually taking up
a lot of resources to run. What I'm gonna do is now I'm
going to show you how you can destroy all these resources
that you have created. The first thing you want
to do is you want to go into your connector
and just say, Look, I do not want this
connector running anymore. So deleted. You copy this name into
the connector name. And there you go. Confirm. Your S3 sync connector
is deleted, gone. You do not have this connector
available to you anymore. Then if you go back to
your Ubuntu over here, you just have to run the script. So you'd say dot
slash, stop, dot SH. And it's telling you
now that it's going to destroy all the resources
that's made you say? Yes. Okay, go ahead and do that. And now it's going to proceed to destroy all the
resources it's created. This is a process that takes
its time to run as well. So I'm just going
to pause the video here and come back
when it's all done. There we go. It has now successfully run
the stop script and it has destroyed all the resources that it allocated to
make this project. If you go over to our S3, now
if we go to our S3 sinks, we will find that that S3 bucket that we
created for our project, it has gone and we can delete this bucket by just
emptying it out. So we can just say look
permanently delete. So you can fill this, filled
up with permanently delete. Once you do that, you can then exit this and
delete this bucket. For that, you just need
to copy the paste, copy the name and
paste it over here. Yeah, that's done. And if you go over to
your console cloud, you go to the environments. You will find that the
environment is completely gone. With that, you have
successfully run project two, you have completed
it and well done.