Prometheus Alerting and Monitoring | Sean Bradley | Skillshare

Playback Speed

  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Prometheus Alerting and Monitoring

teacher avatar Sean Bradley, Course Instructor

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

26 Lessons (3h 15m)
    • 1. Prometheus Course Introduction

    • 2. Install Prometheus

    • 3. Pointing A Domain Name

    • 4. Reverse Proxy Prometheus with Nginx

    • 5. Add SSL to Prometheus Reverse Proxy

    • 6. Add Basic Authentication to the Prometheus User Interface

    • 7. Scrape Target Basics

    • 8. Install an External Node Exporter

    • 9. Deleting a Time Series

    • 10. PromQL Example Queries

    • 11. Recording Rules

    • 12. Alerting Rules

    • 13. Install Prometheus Alert Manager

    • 14. Install Send Only SMTP Server

    • 15. Configure Alert Manager to Send Alerts from Prometheus

    • 16. Add the Prometheus Alert Manager UI

    • 17. Install Grafana

    • 18. Setup the Prometheus Datasource

    • 19. Setup Prometheus Dashboards

    • 20. SNMP in Prometheus

    • 21. Install the SNMP Exporter

    • 22. Install a Second External SNMP Daemon

    • 23. Setup SNMP for a CISCO Switch

    • 24. SNMP Exporter Configuration Generator

    • 25. Generate HUAWEI SNMP Exporter Module

    • 26. Finishing Up

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class

Learn and Build Your First Prometheus Alerting and Monitoring System for Your Infrastructure Today.

We learn the basics of Prometheus so that you can get started as soon as possible, follow the exercises, try them out for yourself and then see it all working.

In this course we will quickly build a bare bones Prometheus server from scratch, in the cloud.

We will keep it simple and set it up on a default, unrestricted, un-customised Ubuntu 20.04 LTS server. You will then be able to match what you see in the videos and copy/paste directly from my documentation and see the same result. Once you have the basic experience of seeing Prometheus work, you will be able to problem solve in a more directed manner, and apply your knowledge to other operating systems in the future.

At the end of the course, you will have a basic Prometheus setup, which will be in the cloud, behind a reverse proxy, with SSL, a domain name, Basic Authentication, with several custom recording rules, several alerting rules, several node exporters local and external, with alert manager using a send only SMTP server, a Grafana install and the Prometheus datasource configured and some dashboards.

Meet Your Teacher

Teacher Profile Image

Sean Bradley

Course Instructor


Hello, I'm Sean.

For over 20 years I have been an IT professional developing and managing real time, low latency, high availability, asynchronous, multi threaded, remotely managed, fully automated, monitored solutions in the education, aeronautical, banking, drone, gaming and telecommunications industries.

I have also created and written hundreds of Open Source GitHub Repositories, Medium Articles and YouTube video tutorials.

See full profile

Class Ratings

Expectations Met?
  • Exceeded!
  • Yes
  • Somewhat
  • Not really
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.


1. Prometheus Course Introduction: spoken to my course on Prometheus. Prometheus is a very popular, remarkable and extremely capable tool. The monitoring your infrastructure and applications. I demonstrate many things in this course with all example commands provided you to easily copy paste. This is a learn by example course where I demonstrate all the concepts discussed so that you can see them working on that. You can try them out yourself as well. With this course comes accompanying documentation that you can access free at from atheist tutorial. Start showing was so all commands entered in this course are included in a documentation website. City. Quickly copy and paste example Alerting rules. You could just copy that So in the schools will start off with stooling. Prom Atheist Bartering It started as a service will host of behind reverse proxy engine X. What SSL lack basic or syndication? Still a note exporter and start as service will create many example queries. Still a second external night explorer set the firewalls, crate several recording rules crate alerting rules and will install the Prometheus alert manager. Possibly get some alerts and we'll still on SMTP server for the loading manager and then we'll configure from atheist salute manager to use the new SMTP server. So at the end you'll have your own dedicated working Prometheus alerting monitoring server that you can call your own, ready for you to take to the next level. So once again, this is a loan by example course. Little example commands available view to copy and paste. Demonstrate them working and you'll be able to do that. So once again, thanks for taking part of my course and I'll see you there. 2. Install Prometheus: Okay, so welcome to my course on Prometheus. We're going to start by setting up I dedicated Prometheus server. Now, I'm going to install mine on an Ubuntu 20 for long-term support. So you will need a Linux server from somewhere. Now I recommend using a Linux server that you can throw away. Did you can destroy bright and start again. In the welcome a mile. I've given you a link to Digital Ocean where he will get currently an offer of $100 for 60 days credit. That means you can use $100 was a Digital Ocean servers for 60 days. At no cost. You only pay for what you use over that. And you're very much unlikely to. The server that I will use for my Prometheus server is only going to cost $5 a month. And if you don't yet have a Digital Ocean account and you use my link E1 cost you anything. And you can delete at the end if you don't want it anymore. Okay, so just to show you how easy it is to get a server from Digital Ocean, because you're going to need a Linux server. I'm going to quickly create muscle for brand new Ubuntu 20 server LTS, ok, so in Digital Ocean create droplets who been to 20 LTS basic $5 amongst. Remember yours will be free if you're just registered using my link. And you don't already have an account. Choose a region, you want it in Frankfurt's good enough for me. I have an SSH key already uploaded. So I'm going to use that. But if you don't, you can use a password and create your own password. Name it. Anything you like. Prometheus is a good name. And cried droplet. That is just the bare minimum you need. So in one minute, I will have a new server in Frankfurt, one gigabytes of RAM, 25 gigabyte SSD that I can use as my dedicated for media server. Now this is minimum spec body work. Okay, so that's the IP address. So I can copy that. And not Windows, I'm using party so I can put that IP address and their hostname, port 22, which is the default SSH, a nominal call it Prometheus. I'm gonna save it. And since I have an SSH key, I'm just going to link that up. Go back. Say that again. Now if you use the password option when you credit your server, you would get asked that password when you're logged in. But I used the SSH options so I can now open that and login as wrote. And I've just logged onto my new serve other I'm going to use for one dedicated for Meteor server. Yeah, you don't have to use digital ocean on just showing that to you because it's actually really easy and you can get started right away doing that. And it's not going to cost you anything if you don't already have digital ocean account. There are many providers of servers that you can use, such as AWS. Gcp is your hits now. So you have lot of choice, but if you just want to give a really easy for you, then just use my link. And also I recommend using the Ubunto 20 LTS so that what you see will exactly resemble what I do in the course. Prometheus, or work on almost every architecture of service areas. But there will be differences since there are like 600 different versions of Linux, I'm not gonna be able to cover all of them. So I'm going to put it on a bond to 20, which I find the least restrictive and the easiest to use, excellence. My brand new dedicated server and that server all create a simple from atheists setup that allows me to get alerts through email. Just the minimum components that will allow me to do that. So first thing we can do is we can install Prometheus straight away through the IP package manager that you get with pseudo, pseudo update, that we have a record of the lightest packages. Okay, so I sort of ice pseudo install from ACS and Todd? Yes. Okay. So there has installed Prometheus to 0.15.2. Now that IPT install, setup to processes the Prometheus process and also a node exporter process. Down here. The Prometheus process listens on port 9090 and the node exported at least as on port 9100. Dose services are now both running. So we can actually check that Prometheus status. And then we go says active, running up their control C to x of that. And we can also check the node exporter status, Prometheus not exported status and active running. There we go. Control C, that has also credit. I use a forest called Prometheus. So ps you, Prometheus. And that user is running two processes. One called Prometheus and the other core Prometheus node. That's the night exporter. Now since I'm using an unrestricted and Bantu 20 from Digital Ocean, I can do anything I want on it. There is no default firewall configured, so those ports, 9099100 are already open and I can choose a dose through a browser. So I'll use that IP address that I got from Digital Ocean. Yours will be different. Visit port 1990. There. It takes me straight to my Prometheus dedicated server. Now this is only very basic at the beginning. There isn't a lot to see, but have a good look through if you want. But we'll get onto all the features as we progress. L, the other service that was started, the note exporter that is on port 9100. So visit the IP address colon 9100, and that is the node exported endpoint with the metrics path. So slash metrics now from ACS will query this 9100 slash metrics endpoint at intervals and timestamp the values that have fonts. And you'll be able to view these back through Prometheus as time series data. Now your servers, depending on where you got them from, may have firewall rules blocking port nine, DIT and 9100. You'll have diminish that depending on your service providers technique. For example, AWS is different than doing it manually and centers, there are many ways of managing firewalls. Check your firewalls if you can't access either of those ports. Now, going back to 1990, this also has a metrics endpoint. So now I want a hundred and nine ninety both return a very similar metrics in point. And Prometheus will be reading the data from both of those. Now, going back to the main user interface for your new Prometheus more, I, Prometheus is on the internet, so I'm gonna do some extra things to it in order to lock it down a bit, such as I'm gonna give it an SSL certificate. I'm also gonna give it a domain name. And I'm also going to set up basic authentication so that you can't access it unless you have a username and password, those things are optional. But if you would have a permit to your server or note exporters that were accessible from the Internet, then I'd advise to lock them down as much as you can, because as you can see, the default installs of Prometheus and not exporter are unencrypted and do not require any authentication to access. Excellent. In next videos, I'll sort of my domain name and waste ASL and authentication to make man-eating knows things easier or sorted up behind and engine X reverse proxy so that you will have some exposure to that as well. Excellent. 3. Pointing A Domain Name: Okay, so my Prometheus server is now running and this is how I access it. Now, this is accessible from the internet, so I'm gonna lock it down a little bit. Now, these are optional things. They might not be important for you when you are learning, but I'm showing them to you anyway. What I'm gonna do is give it a domain name, so I don't have to use IP address anymore. I'm going to install an SSL certificate. I'm also going to manage those things using an indexed reverse proxy, which will also allow me to add basic authentication to it. So in the first step, I have this, this is how I access my Prometheus user interface, the IIP and the port. I'm going to in this video set up a domain name that points to that IP address. So what I'll be doing is going to my domain name provider, where I already own a domain name is And I'm going to add a sub-domain called Prometheus toured, given a nickname record for that IP address. Now I'll need a domain name for the step when I install an SSL certificate. So if you don't have an thereby name, hopefully maybe a friend can lend you one or a colleague or your workplace. But if you don't have one in my documentation, I provided a link to the service I use, which is you can search for domain name near so by Prometheus server search. And if you scroll down, XYZ is quite cheap. Club is 129, and there are many. But since I already have a domain name set up, I'm just gonna go straight to it for ASP and add a sub-domain. So in the configuration tool, my domain name provider, your domain name provided tools may be different. I haven't I record at their the earliest, That's the mine domains. So that's already set up voice now create a nother. I record there for my host. The host name will be Prometheus. The IP address that it will point to me that one like that. And then just press tick. Now after a minute or two, sometimes it can take a lot longer. I should then be able to access my Prometheus. You are using a domain name. Now this is the, I think I've done so far is pointed a domain name to the IP address. So let's check that in the browser. Very good. And that URL now for the domain now points to my Permit use user interface. In the next video, we'll set that up behind the engine X reverse proxy so that I don't have to use the port number anymore and old in at a SSL certificate as well. Excellent. 4. Reverse Proxy Prometheus with Nginx: So now I'm going to add the reverse proxy in front of Prometheus, which means that lighter it will be easier to manage the SSL certificate And also to add an authentication layer over top of the default unrestricted Prometheus web user interface. In the end, what we'll have is the ability to access the server using just that URL, HTTP, Prometheus ASP So SSH onto your server are morally there and we'll install engineer. Chris. Yes. That has already started. And we can see that it's active running Control C. Now, we need to CD to a folder called ATC engine X sorts enabled. Ellis is one fall in there called default. That describes a default web page. The engine X is now hosting on our IP address. So far was system isn't that IP address by itself? In a browser? It says Welcome to in genetics, that is just the default welcome page that you get when you install engine X and it's being host on port 8080 is default, so you don't have to target. Okay? So what I'm gonna do is create a specific configuration for Prometheus. So we use this Lani as pseudo nano Prometheus. Pseudo nano Prometheus. This will open up, I texted at all that is quite easy to use in Linux. Now, copy and paste this example. So just copy that they're TO back into nano layer. And if you right click on the mouse in potty, it pastes it before you sort of at a target. And we go, and I can use the cursor up and L lambda. So it's going to listen on port 80 server name, we have to change. So I set up Prometheus dot ASP So now when I visit that URL directly from atheists SB in the browser, it's going to forward or location that default location proxy pass onto HTTP localhost 90909090 is the port that Prometheus is listening on. And just localhost 9090. So control X to save that. Yes. Now we can test the that configuration is okay before we restart N2 x. So Engineer. And it says the configuration file is OK. Now that's the fall that he's saying is okay, but in that file that links to the sites, enables folder and loads all the configurations in that. So our test is successful. Okay, so we need to re-start engine X. Re-start and check its status. Very good. Active running. Now, we can visit that URL that on their HTTP, Prometheus ASP in the browser. And it's taken me strike two. Prometheus, just the URL and L Prometheus SB In the next video or set up SSL. Excellent. 5. Add SSL to Prometheus Reverse Proxy: Okay, SSL that's added to the reverse proxy. I already have a domain name set up. I'm gonna get a freeze to get from CERT bottom. Cert bot will install islets, encrypt SSL certificate for free, and show your domain name has propagated before running, cert bought your domain and IP will be different than mine. And note that it might take some time for the DNS record to propagate across the internet. Cert bought, We'll check your domain, resolves that IP address from several locations in the world. So I have already written out my instructions on what I'm gonna do with so bought. But if you visit the cert bot website, there's the link source where my documentation down here, you select which server you're running. So we're running engine X on Ubuntu 20. Then, then it tells us the instructions. Go, instructions are going to be slightly different depending on which operating system you're using and which webserver. But if a MI, pseudo static install, Classic cert bought so SSH onto your server. On my mind. So a classic cert bought, installed. Excellent. Next pseudo cert bot using the engineer. You need to agree. Yes or no is up to you. Now. It has gained my engine X configuration and found a website, Prometheus dot So I will just select number one. Press Enter. It's now obtaining a new certificate. The forming challenges. It has given me a certificate. Congratulations your certificate and China of being saved at ETC. Let's encrypt, lived from 18 says, four chain PM. And there's the private key there as well. Okay. Now, cert bot will auto renew these tickets for me, so I don't really have to worry about what's going to happen on that date because certain bot will do it for me. You've got some instructions here if you want to look at that, but it works. I've never had a problem with it. Okay, so now after installing the SSL certificate, Louise Hay, HTTPS URL will now work. So if I had been a browser and do HTTPS, Prometheus is is taken me strike to my Prometheus user interface, and it now has a padlock HTTPS Prometheus is big Now the other thing too, if you type HTTP Prometheus, like that, it forwards it automatically to HTTPS by default. Okay, now let's have a look at the changes that it made to our configuration that we made before in engine X sorts enabled. So copy that line is SH onto your server. Pseudo nano EDC engine X solid, stable Prometheus and L configuration has a whole lot more loins added to it and that's being modified slightly. So before we did this bit, listen 80 with the Prometheus ASP is now returning a 404 that's managed by cert bought. If the host equals that, is gonna return a 301 redirect to the HTTPS version. That's what you're seeing there. If we scroll up, we can see the new ports that as credit for 4-3, manage by cert, bought for 4-3 managed by cert, bought SSL certificate, ETC. Let's encrypt log for media says became full chain. And then there's the key, private key, and a few other settings by search bots. So certainly has rewritten our engine X configuration for us to support the SSL certificate. And it has lived the location proxy pass to 1990. That works for x. I'm going to have to save it. Didn't change anything. So excellent. Okay, so now my Prometheus UI is using SSL. Now, in the next video, we'll set up basic authentication so that you cannot access this unless you have a valid username and password. Excellent. 6. Add Basic Authentication to the Prometheus User Interface: Okay, so everything's good. Ssl with a padlock and connection is secure. All looks pretty good. But if I was to share that URL with anybody, anybody else could see my Prometheus user interface without needing to login. So in this video, we'll set up basic authentication for the Prometheus user interface. So SSH onto your server and we'll cd to this folder ATC engine X. I'm on my server, CD ADC engine X. That's the engine X configurations need to use a program called hashtag password. And depending on which operating system you're on as a different way of installing it, but I'm on Ubuntu, so I'm going to use pseudo IPT install Ubunto Apache two utils. Yes. Now to create a password file for user called admin. So that's this line here. Right-click h t password, CTC engine x-dot hashtag password, admin. And now we can put any password we like. And so and so Ls, Ls hyphen I actually, and that will show the hidden files and folders. They'll go hasty password, that's a fall that has been saved. You can open that if you like, and inspect it. Now to edit the proprietors configuration in your next thoughts. And I bought so copy that on. And we go. And so, and right at the beginning, we're going to add these two lines here. Fourth, basic protected area, that's just a string that will show up on the login prompt. You can modify that if you wanted to and a pointer to the file that we just credit VDC engineers hashtag password. So just down here, I'm gonna put it just a space, right-click to a Pastes and it go. So server, server name, both basic protected area. Both basic. Okay, good. Control, x. Yes, let's check that the configuration is okay, syntax is OK. Let's restart and check its status. Really go to active running Control C. Now, back on this server here, refresh, and now I'm asked to enter your username and password. So on n, Very good. Naca stepped up possible to follow. So now if you were to try that URL on a different browser, you wouldn't be able to login unless you knew the username and password. So there we go, much better. Very good. Now, that is not protected yet. We can still access that using the port number so that the 1342 or non, That was my address colon 1990. So if I open a completely new browser and enter that in, I can go straight there to my Prometheus server because port 1990 is still open externally, so we can use IP tables to block port 1990. Ok, so I pay tables. What I'm doing here is allowing port 1994, just local host and then dropping everything else. So now nothing can access port 1990 externally. So let's just check that those challenges exist in the IP tables, Rawls lists, and we go import except localhost anywhere, TCP port 1990, and drop anywhere port 1990. Ok, so my Firefox browser, if I refreshed that, that will now timeout. And we go connection has torn down. Now. It's not over yet. No, I want a 100 is still open. But no exporter. Let's block that as well because we only want those two ports to be accessible by localhost time thing again, I pay tables, now I'm 100. We're allowing local host to access 100 except and dropping everything else. Very good. And we can verify that exist in the rules except localhost 9100 and drop anywhere 9110. Same thing again. I should be able to access that from any browser now, even with authentication, because we haven't set up a rule for port 100 and in our engine X configuration. So he's a new browser here. I will time out eventually. And this will come out of the page. This one still thinking, OK, request, time it doubt. So perfect. Now Micah mesa server now has basic authentication. Excellent. 7. Scrape Target Basics: Okay, so I've looked into Prometheus now. And the first page we get taken to ease this graph page here. Now this expression field here, it's a dynamic field that gives you a list of properties that match what you're typing. Though, you can type the word go and you get a whole lot of properties about the Go runtime. And that's what Prometheus is written in. Or you could write Node. And you'd start seeing a whole lot of properties about node or process or Prometheus. Now if I was to look at one of those like go threads in more detail and press execute. I see two properties. One where the instances localhost 9090 and the other were the incidence is localhost 9100, job Prometheus and Java Node. So all of these different properties there where seeing here another one, Prometheus building fo execute there coming from somewhere there when we install Prometheus, it's set up to serve as this force from atheists service listening on port 9090 and a note exporter listening on port 9100. They've got two end points that were already set up. Slash metrics. They are 9090 and slash metrics on line 100. And if I go to the status section here and look at targets, that's the two targets that are already configured in Prometheus. That's nine 100s slash metrics and 9090 slash metrics. That's the endpoint that's giving us information about the Prometheus process, and that's the statistics that the node exporter is collecting. Ok, so if I look at these URLs more specifically, say the Prometheus one here, just copy that. I've SSH onto my server and my Prometheus server. And if I type curl, and then so that, that URL goes in and press enter, I get a response from the metrics endpoint. And here we go. These are the keys here, space and then a value. And there's lots of them. It's another one, a key and a value. There are lots of, there are lots of comments that will hopefully tell you what each of these keys means. There's many older, but each one of these, say for example, Prometheus, TSDB reloads failures title. If I put that into the graph expression here, that paste, press execute, it shows up there. So it's currently 0. Bit luck what it says there. So Prometheus is reading that metrics endpoint at a particular interval. There has been said already that we'll look at in the moment and keeping a history of those values in a time series database. So another good example to look at would be process virtual memory bytes. If asked us to actually type that in. You save the dynamic list filters down to help me understand which one I should slit and press execute. I'm seeing values and these are counters. Now I can also look at that as a graph and just zoom out a little bit. So 12 hours and at regular intervals of Prometheus. Taking a snapshot of the current value from the metrics end point and since storing in the database with a timestamp. Ok, so going back to targets again and looking at days again. So there are light bulbs instance, localhost 99, job because Prometheus and job equals node up here. For the node exporter metrics, though, Prometheus knows about these things because they are set automatically in the configuration when we used APT install. If we look here in command line flags is one here config file. That's the location of the configuration that Prometheus is using it as a YAML file. So we can actually open that file up and inspected. So, so sudo and ATC, Prometheus, Prometheus YAML, that's the configuration file that Prometheus is using. Now, going back to the beginning, there are several properties. Scraped interval, did it 15 seconds evaluation intervals that are 15 seconds. So every 15 seconds, our metrics endpoints are being raped by Prometheus alerting. We haven't set up and alerting component yet. Although it says there isn't alerting target here, there isn't actually anything set up important on in 93 yet. Rules falls, we haven't said that yet, we'll look at that later. But scrape conflicts is the important thing here. There are two job names from atheists and avoid. Go down further as another job name called node, that's a scrape target at that address, localhost 9100. And the Prometheus scraped target is localhost 9090. And they both have the metrics endpoint by default, even though it's not written in this section to steer the Prometheus scraped target here has described interval and describe timeout, overridden to five seconds. If you comment that out. Like so, then it would use describing interval for 15 seconds. And if you comment that out, scrape interval default will be actually every minute. But I'm going to not do that. And leave it how it was set up when Elizabeth store for now. But just note that they are options for you if you think you need them. Okay, so these are minimal setups. Target naughty, naughty, and target a 100. Looking back here in Targets, that's what we're seeing here. If we commented out, say that whole section there. Exit, saved, yes. Restarted Prometheus sudo service. Prometheus re-start and gone back to this page and refreshed it. After a few moments. As it's finished coming up, we only have one target that it knows about 1990 metrics. Going to get back into the YAML. And there it is. Localhost 1990 metrics, job name Prometheus. Job Prometheus. So I'm gonna put that note exporter one back, that's the one that was installed. Automatically. Control x? Yes. Re-start and refresh a few times and wait for to finish starting up. And they would go to node endpoint is back again. Every interval, as it was said, Prometheus is getting a new snapshot of the data and storing it in time series database is doing it for both those end points, normally a 199. If we look at this configuration here, this is virtually the same as the file that we just opened. That one they call Prometheus YML, but with a few more details explicitly listed for us, such as the scrape interval, scraped timeout. So right now, these Prometheus server, it's only monitoring the metrics for two endpoints, Prometheus and note exported, no exported metrics tells you a lot about the performance of the computer, such as CPU, the sky, near work IO, memory burst. The Prometheus metrics tells you about the time series data base plus a few other Prometheus server metrics. So let's look at a few of the properties that we can get mail from the node in the Prometheus, endpoints and go threads is a good one. Or threads execute. And to go threads property is returned by both the metrics endpoint listening at 9099100. So we can see on a graph the mount of threads running in either of those processes. So we can see here that time as I move along to a known threads And down here for the other one. Well no, there were seven threads running in here. There was a few less threads. And other property we can look at which is applicable to both endpoints is processed virtual memory bytes. We can see here. By zoom out a little two hours, six hours, 12 hours, one day. I've had disciplining for Dana have slowly going upwards, going down again, another one. Let's look at something more specific to just Prometheus. Prometheus engine, HTTP notifications, remote storage rules, service discovery, target, groping, TSDB. So this is more specific about the time series database at Prometheus uses. So we can look at time series storage block bytes execute. I guess it's slowly using more bytes over time as it starts to fill up of data. There are many value specific to just the node exporter as well, such as node on texts switches total execute, like a C, That's a counter ever-increasing. Disk read. Bytes total execute. We have properties about the operating system level, performance, load, one, CPU node, debt, stat, TCP in segues. There we go in segments or C14 out segments and the one node reboot required, execute either one or 0. So in the last two days, I didn't need a reboot, but all of a sudden, now I do last 12 l. So at that particular time, so very quickly you can see that Prometheus is now collecting data from both of those metrics. Endpoints is storing those values in a time series database with a timestamp for each value that it has retrieved. Status targets. They're the basic properties about those end points. And those endpoints are set in the YML, the wicking configure. And there is the global configuration, that's the data from the YML plus a few other things that weren't explicitly said in the YML, but taken from defaults, scrape interval 15 seconds, ten seconds, scrape interval five seconds, five seconds. They were overridden in the YML 1510 and evaluation 15, status command line flags, config file. That's where you can find the YML. Okay, so that's some of the basics so that you understand where this data's coming from. Now, in the next video, we'll install a second note exporter on another computer. So there'll be the no exporter running locally and another one on another server. So you can install unreserved node exporters If you want o here in this image, I've just drawn it as politically, there are four other night exporters on other servers and they can be anything, doesn't matter what you've got on that server running web servers, databases, Docker containers, you can put a non-expert on and you can monitor the performance of the server. And as you add each node exporter, you would open up the YML and you would create a new record for it with the address where Prometheus confined at the start Pauling its values. Excellent. 8. Install an External Node Exporter: Okay, so now in this video, I'm going to install a note exporter on a different server. It, I going to use another Reuben to 20 and it's going to be the same sub where I host my documentation is because I'm going to install a Prometheus now exporter on it. Okay, so SSH onto my other server here, this is my second server. And I'm going to install the Prometheus not explored on. And we go APT install Prometheus not exporter. Yes. Times took a status. It's active running for all c k. So if I open a browser by typing 100 slash metrics into straightaway, it's returning the metrics or the not exporter running on my, it's because there's lots of information. Go routines, item. If I search in there or find go threads, and I gotta go threads five. In many node properties. And good nodes, CPU sickness, total user system. Now we can set that up straight away in Prometheus and all do that now. So copy that URL or the URL, which you have SSH onto your Prometheus server. Okay, so this is my Prometheus server. Open the YML pseudo nano ADC from atheists Prometheus YAML, and we'll add the script tag. So there are multiple ways of doing this. You can create a new job name and Nyima node, node ASP code, for example. And then just filling the static configs, et cetera, with the target, or a more simpler way is to add another target. A right-click because I copied that. And I don't need that, or that it's just presumed metrics by default. So I have two targets down here in my job name node. So exert control x. Yes. Now we can re-start Prometheus. Now, a good thing we should do when we changed the Prometheus configuration is check that the configuration is valid and we get a tool called ProM tool. And that was also installed when Wayne stored Prometheus ProM tool, chick on fig. And then we just pass in the path to a configuration file, ATC, Prometheus, Prometheus YAML, and it says Checking YML success. Cisco's Now I can safely re-start Prometheus sudo service Prometheus race dot k. Now in the Prometheus user interface are gonna targets. I have in my node target two endpoints is big nominee a 100 metrics instance, SB colon. No, I want a 100 job, scraped duration. And last scribe. Many seconds ago. So that's just a basic straight away. You can add as many note exporters as you want. But I have civil problems for that. One of them is that it's exposed on the internet and anyone can access it. So there are several options. I can block port 9100 using IP tables. For example, I can enter those two commands, their IP tables, I import localhost except for local host and drop everything else. I could also beforehand create a specific rule that allows just the IP address or domain name. My from atheist server to access port and 100 on my USB code metrics server, and then continue on dose. What I'm going to do since I have a engineer's proxy also running on my SB code server is server here, cuz it's a web server. I'm using engine X as well. I'm just going to enter these two lines. I'm going to allow local host only on 9100. That drop 9100 for everything else. Ok, so on my SB code server, okay, I pay tables localhost except and drop everything else. So let's just check those rules. And we go except localhost 9100 and drop every where else 9100 going into Prometheus. Again, we can now see we have a problem there because we have blocked port 9100. Now, look, I say again beforehand, I couldn't have said except Prometheus, SB, except it's important that I run that line before I run the line describing drop. Otherwise, it will drop all the connections before it can work out that it's actually OK. That's an IP tables thing. And the way I'm not doing that, I'm using engine X and so I have it installed. And I also have an SSL certificate or installed from my domain name, a So that's why I'm also using engine X as well. These things are now optional. Me using an engineer's proxy and blocking ports using IP tables. But I think it's if you're going to have Prometheus running on the internet, you need to be thinking about things like that, I think is a lot of Prometheus processes running on the internet and they're not secure. Okay, so I'm going to open up my engine X configuration now from my website. Ok, so the process is very similar to how we set up engine X for the Prometheus server. I have Sita's server name by having several other properties in this configuration which are unrelated, Prometheus. But I've also installed an SSL certificate, say many cert bought for 4-3. And it also has a re-direct I3, I1 redirects. So any HTTP gets automatically converted to HTTPS there. And that is working perfectly as expected. So what I'm gonna do is create a new location for metrics. So location slash metrics. And I have some example code down here which you can copy. If I just delete that, then Right-click Location metrics. And I'm going to allow the remote IP of my Pharmacia server, which happens to be that IP address. I'm denying everything else and I'm proxy parsing too hasty to be localized 9100 and metrics, okay, so if that IP address does not match, then you won't be able to access the metrics endpoint at It's a control ecstasy that, yes. Let's check the configuration. Okay, and it says syntax is OK. Now let's restart engine X. All right, go to active running. Now, if I change that URL is B slash metrics, I get a four I3 forbidden, perfect. But if I was to call that from my Prometheus server using coal, for example, that would be okay. It returns a response. So my Prometheus server is able to call that URL HTTPS, SB slash metrics. So now the few other things, let's go back into the Prometheus, YML, 86, mightiest YAML. And in the scrape targets, I can now write simply ASP And that will work now by default scheme that this job name or use is HTTP. We look at this status configuration here and look at job name node just after metrics path here says scheme hate HTTP, and then there's the aesthetic conflicts. That's still the old value. So I haven't restarted the Prometheus service since my last change it. So what's going to happen is it's not going to request, hey, HTTPS colon slash slash, but hey HTTP colon slash SB And since my end genetics configuration has a 301 redirect here, that doesn't matter. Even though the Prometheus survey is going to be querying HTTP colon is to be slash metrics. My web server engine X is going to convert that to HTTPS automatically. That's a little bit of a major undertaking, probably. But if you're having mixed endpoints which some are HTTP and HTTPS, you're gonna understand that problem much better. But anyway, this is all optional. So control x and didn't make any changes there. So I'm going to say with these changes now Target's ASP and draw x. Yes, and check the syntax using ProM tool from toolchain, convict ADC, YAML, good, and re-start from ADS. Very good. Now, going back into Prometheus, targets is free fresh u times until the to the last scribe has been detected. And we go 1.47 seconds ago. And K, So note here says, Hey, HTTP as Scheme, and it's got poor, IT written there, odd-even, right? That, that's the default about wild web server is encrypting that anyway, because it's at the web server side, converting that to HTTPS because of the 301 redirect. Just a little bit more background information about what I've been talking about just in, if I was to go into Prometheus concise game HTTP and then have a study config and then say again scheme HTTPS and have a steady config. That wouldn't work because you cannot have multiple schemes in one job. You could cry to job names with different names being one node, node HTTPS, with a different scheme like that. But the reason why it works for me using this default scheme, HTTP, is because I have the three, i one, redirect them way in genetics proxy. Excellent. Or if that doesn't make sense, remember this is all optional. Control X. I didn't actually make any changes I want to save, so I'm pressing now. Now let's have a look what we have in Prometheus nail. So graphs, let's search for go threads, for example. Execute and I have three instances now, localhost, naughty, naughty, localhost nominal hundreds, and SB code dot, Excellent. So value 906. We can look at it on a graph. And there are values here for SB and I want a 199. Now, I'm also seeing a value for when I had my end points set up as ASP 100 back at the beginning of the video, we can see that down here as blow. And then this one here which is more cyan color, is SB colon ID, whether that one is a colon, NOT 100. Now, I no longer need any metrics that were recorded for that instance. Inside the next video, I'll show how you can delete data from the TSDB database the Prometheus uses. And just another useful metric you might find useful to ration, grab duration seconds, execute. We can see here that the scribe duration was 10 second step 0.4., Vietnam and a 100. That's probably because I blocked poured. No, I want a 100 and it was timing out. And anyway, that no longer exists. And in the next video, I'll show you how you can delete those things. Excellent. 9. Deleting a Time Series: Okay, deleting a time series. So far, I have one time series down AND S1 SB 9100. I want this one deleted because this is what I sit up as an experiment, for example. And I no longer wanted, okay, so I've just taught go threads in there and I've made my time rate larger so that it shows up. I can actually delete anything in the time series database, for instance, equals S colon 9100 or anything else, I could delete. Everything was a go threads or job equals node. Really anything that's common across any time series, I can delete it. So for this example, I'll delete everything. That is instance equals SB colon 100. Now before you can delete anything, you need to enable the admin API. So what we do is SSH onto the server and we need to edit a file so soon, NO, ETC, default, Prometheus, This is a new folder CDCD for here. When we install Prometheus accredited this fall as well. Press enter. And if you see here there's a property called args and it doesn't contain anything. All the options you have are listed down here in the comments. The one that I want is web enable admin API. So I'm just going to highlight that so that companies into the clipboard scroll up the cursor and right-click and never go args, hyphen, hyphen web enabled admin API. Okay, so now I can save that control x. Yes, out re-start, Prometheus, restart. Ticket status. Are we good? Now does go into Status command line flags. And we'll see down here in web enabled admin API value is now true, whereas before it was false. Now, after enabling the API, we can then run a local curl command that will delete whatever we put into the match property. So I'm gonna copy that line, clear and paste. So curl, post to a posting data to localhost 9090 API version one admin TSDB delete series match equals instance equals SB colon 9100 colon. So I'm matching that all loin of string. They're going back to the graph. The instance equals SB and I'm 100, even with the curly braces. So if I just enter that, that's done. And if I re-execute that, now, there is only three times series in this query, so that all time series has been removed from the TSDB database. Now, you can sleep. All kinds of matches doesn't matter. I could have just said delete go threads if I wanted to. We're not gonna do that. Now. Also, there is a retention period with Pharmacia IS a default was 15 days. If always happy to wait 15 days, then all the data would have been deleted anyway. So in 15 days, mydata would have been deleted and I wouldn't have needed to do that manually, but the option to delete manually is they afford you you went to the options for retention period are also their storage TSP retention, size and time. They are currently also as zeros here, but the default was 15 days. You could also change that by editing the default properties. You could have multiple. Here, I'd say state to find what I want. They'll go storage. Tsdb retention here is 15 days already. But I can say, I'm gonna highlight that. Take the cursor backup. Click I could say equals one year, for example, on look under that. If you have multiple properties in there, just put a space in between anywhere, I'm not gonna do the retention period anyway. And also now since I've finished deleting that time series, I'm just gonna put it back to normal, which is that draw x, yes, and restart and checks to us. Now that curl command from before where I tried to delete something that will fail and go status era unavailable. Admin IPR is disabled. Very good, excellent. And if I refresh that web enabled admin API now equals false. Excellent. 10. PromQL Example Queries: Okay, so we have enough scraped targets now to start some example queries. So the language that you use for querying information, this is called prom QL, Indian Prometheus. You can either view the data as a graph, as tables data, or on an external system such as a foreigner, ZAB yx, or many others. So just a quick example, scrape duration seconds. You put that into the expression built here on Prometheus. Ripe duration seconds execute. It brings up three results which are called instant vectors. This is the last value that it knows about and we're in the console view. They're, the Console tab is active. Now we've got instance equals localhost 9090 desktop Prometheus service. No, I'm 100 is a node job and SB ID and other no job. So we sell those up already. So we can look at that as a graph. And it shows three lines, one for each. And down here there's a legend describing which one. If I wanted to look at just one of those at a time, aka, then add a filter for any of the values that appear here in the curly braces, such as job equals, for example. And now it's only showing me the two whether job equals node, go job equals Prometheus. And so, and it shows me the one for Prometheus, or I can say, where instance equals localhost 9100. And it gives me just the 14x localhost 9100, but didn't filter by job this time, as you can say, but I could've EMOC JOB goes an openness, the same results. So I don't really need to do that. So very quickly you can see that you can select a metric. There are many to choose from. Our Prometheus already knows about. And if you were to get multiple results, you can filter them further by filtering by the tag that you can see here in the curly braces. So instance equals SP for example. Now, to make this a little more complicated, we can use regular expressions. So go back to the console tab and we'll search for something called no CPU seconds total execute. And it's given me a lot of results. Okay, so I can graph that as well. And there are actually many, many lines. But let's just say we've got to console. I wanted just the ones with the word ir cubed. Several of those as our queue, there are q, there is soft IRQ there. I could add a regular expression. So that's the filter, the curly braces where mode, for example, where Mode II calls it a tilde to indicate that as a regular expression, double-quotes bought. I finish off the double-quotes, press enter, and now is show me just the results where the mode is a regular expression, hear anything, and then the letters IRQ. Now, regular expressions can be quite complicated, but if you know them, debris will stall. Just the introductory example. Prometheus uses the RE2 style of regular expression. And if you want to know more about that, you can press that link there. But it's a very common format. Dot star means anything. So for example, I could say soft dot into and that will give me soft IRQ doesn't matter what is after the word soft dot star is like a wildcard for anything except a new line. Okay, anyway, so we can graph that starts in the graph. Now to discuss Data Types, There are three supported data types. There's a scalar and the Skylar's are numbers. Say example i minus one or 12.34 or one-two-three-four point, et cetera. There are instant vectors, which is like an array of Skylar's, but timestamped. So that's a time series and then they're arranged vectors, and that's an array of instant victory. So what we've been seeing so far are instant victory. When I do an execute day, I'm seeing two instant victors in the response, and I can graph those straightaway. So there are two lines there being graphed to instant vectors. And in our example that execute, they would go and that's one instant Victor filtered by a localhost 9100. And we can also see that as a result in the Console tab, one instant vector, the range of vectors are an instant vector split up into smaller instant vectors. An example, it's grabbed duration seconds with the instance localhost 9100. We can break that up into segments going over a period of one minute, press enter, and here there are, for instance, vectors now. So this is a range Victor consisting of, for instance vectors graph. And we can't see arrange vector as a graph says, invalid expression type range vector, range query must be scalar or instant victim. So going back to the console, also note here that that's split that up into four incent vectors or being 15 seconds apart. So liquid at 92 and then 807, that's 15 seconds. And then 822, That's another 15 seconds, 837 and another 15 seconds before I just remove this filter and look at described duration seconds over one minute for all of the instances. Do I know about this one here is listing many more instant vectors. It's splitting them up into smaller groups, such as thirty nine, forty four forty nine. These are five seconds apart, whereas these are 15 seconds apart and only three results done there. But if I could do it again, that's four. Okay? So what decides by default, hail arriving Victor splits up the instance vectors is here in status configuration. Scrape into law to the default is 15 seconds. But down here on job name, Prometheus, we have overridden it to be scraping interval five seconds. So that's why. When we look at this, we see this one for job Prometheus being by default, or I could up into five seconds instant vectors over a period of one minute. If I change that to two minutes, would get even more. And then we go navigate eight or 15 seconds apart. But to override the default in the query, you can add a colon and site 15 seconds, 15 seconds such as that. So now it's a range vector split up into 15-second instant victors. And we go, and we still can't grasp that. It doesn't matter what we need to do to be an a graph. That is to convert the range vector back into an instant vector or a scalar. So the example I'll use for that is a function called, right? Okay, so if I just wrapped while I have there, and actually I will filter it by instance localhost as well. So I want to get one to work with. There we go. If I wrap that whole expression in right function, put brackets around it, execute, it's turned it back into an instance picked up. So we can now view that as a graph. And we go a right scraped duration seconds instance localized nominal 100, one minute and 15 seconds. The right function is described on this Prometheus documentation page about functions here. So I'll just scroll down there. There's Right, right, it needs a range vector. So that's why we wrapped right around that range Victor query calculates per second average rate of increase of the time series in the range victim. Now, that is not that easy to understand that sentence if you've never seen that before. So a good way to understand what right is actually doing is to use a different metric. Is a metric or use is node netstat, TCP in segments and filter by localhost 9100. Okay, so that is just an Eastern victim. There we go. Now. It's a counter, so it's a number just going upwards all the time. But I want to see that as the right of change. So what I need to do is convert that into a range vector first by putting the square brackets like dad. And one minute is a good range to use my default scraping tools already 15 seconds, so which is the default. So I'm not going to put anything in there. Now. It's telling me there's a problem with the range vectors. So we wrap it in the right function, like that. Execute and now it shows me the right of change. So let's just compare that. Are moving, that, that, that's just a counter forever going upwards, but it's changing every now and then to speed changes a little bit. I want to understand the speed of change or the rate of change. So I converted to a range vector first and then wrap right around it. Now it's rate of change over time. Excellent, but that's a wider understand, right? It's good for converting counters which are, and just a snapshot of the latest value into something where it shows you the right over time, like that. Looking at all the functions here on the Prometheus page, prom QR supports a lot of functions now, all of these require either an instant vector, such as absent there or abs absolute converts to absolute value based wants an instant vector where the absent over time once a range Victor. So for all of these different functions, it's important understand which kind of victory wants. So ceiling here, once an instant vector changes, once a range victim die, the monk delta deriv. Okay, there are many of them. Another function you might want to try is some, so we can sum two values. So go threads is a good example. So if I do go threads execute, it's showing me go threads for those three instances, there are three numbers there. We can sum that sum execute, and it's showing 26. If I just to like the time different, six hours, 12 hours, one day, we can see the number of threads does change over time. With these functions. We can have functions in functions and that's known as subqueries, though, dug back to where we were before, let's just go back to basics and have a simple instance vector, which shows the numbers just continually rising. But you can see there is a right of change over time and I want to make that more prominent. So convert it to arrange vector as we did one minute. Okay, so it's now range vector. We can't graph that, but we can see that in the console though. Let's convert that to a right. Like so. That's now the right. Now we'll add the subquery will wrap another function around that called sealing. So now ceiling, here the documentation ceiling needs an instant of vector, okay, so outright already converts it to innocent victim, takes in a range of vector and creates a new instance vector. So that should work sort of way ceiling. And before I press it sit, the value here is 3.04443.7. They are floats ceiling. We'll convert that to a whole number, the next lowest whole number. So execute. So that number now is a whole number, 554546 there. So that's what ceiling does. Now, we can write another function called the riff. Like so, but deriv of find, it wants a range victim. So we need to, before this previous function ceiling can be used in deriv, we need to convert it to a range vector. So let's put the square brackets there one minute. And another peculiarity too, is you need to give it the scribe duration. So you can just use a colon and you can say 15 seconds if you like. Or we could just leave the colon and then execute. And now it's turned into a graph, a little more like that, like a waveform dinos deriv. Now that's just an introduction to prom QL way for showing you that you can a simple value. You can add a filter to it. We can convert it to a range of vector, then do a write on it. You can get the ceiling value and then you can create a derivative graph from it. Remember when looking at your functions, a good thing to be aware of is to understand that they will need a range of vector or instant vector, and that will either produce themselves range victors or instance vectors. Now as you progress with Prometheus, you'll want to experiment with a foreigner. The good thing about Gryffindor and Prometheus is a very well integrated together. You can just take this direct query here and execute that Inger foreigner. I have already set up Prometheus as a data source in Fano, and we'll discuss this later on in the course. But just to show already, I mean Explore tab here, that's this one down here. And I can paste that query and straightaway and Ron query. And I can get the same information as the graph. This means that I can now create more visually appealing dashboards about all my Prometheus node exporters or anything else that I've got connected through Prometheus in GR fauna that there is a prom QL query being executed from China. So that was an introduction to prom QL, the Prometheus query language. Excellent. 11. Recording Rules: Ok. recording rules in Prometheus, status rules, there are no rules defined in this video, we'll create some recording roles. Recording rule is an expression that you can create yourself, a custom expression that will run and appear as a metric like anything else. So we're going to create our own custom metric here using the recording rule functionality example I'm gonna use, I'm going to calculate how much percent memory free there is. So mem already in the node exporters or get node memory, memory bots. And if I execute that, I see the mainframe bytes values for both of my Node exporter instances. Now, I want to see that written as percentage. So to start with that, I can add an operator. This is a prom QL operator. I'm doing now Division Mm-hm. Free titles, node memory. Mm-hm. F3, total watts and execute. And now it's showing me mm-mm free bytes divided by total bytes. That operator there. More about prom QL operators, you have several operators you can choose. You can add metrics together, subtract, multiply, divide, modulus, and power exponentiation. And it's used on Skylar's or instant vectors. And we go, so that's what we've gotten there. Now, that's still not percent. I'm going to convert that to a percentage by typing 100 on us, on a 100 times. So that's a scalar, that's an incident vector, That's an instant Victor. And 100 minus execute. So on my local host and 188% free and on SBIC 0.9c, 90% frame, that's good. Now that's a complicated expression to have to write every day if I want to check that. So instead, all cried, I recording rule which will run that at regular intervals, that will recalculate that for me. And I'm going to name it something unique that will sharply in this list. And I can also type in here. So we didn't do is credit New fall, we can call it anything else what we like. Prometheus rules that YML, I'm gonna put it in the ATC Prometheus folder. So SSH on your server CD ATC Prometheus, ls, l height h. We can see what is in there already. I'm going to create a new file. Are from ACS falls dot YML. It's an empty file. And in that paste, this text here, I'll just right-click and just move it along as long. Okay, so groups, you can have multiple groups in there. The first one is called Naim custom rules. I could have a second group if I wanted to, called name that can Group. Welcome to do that. And in group one a got rules. The record. It's gonna be called node memory mem for a percent. Now that is the value that will appear where I begin to talk to letters. And it will also appear in it's dropped down. Now when you create a name geometric, it must confirm to certain roles. So it must contain the ascii, letters, digits, underscores or chords, and nothing else. Nesta rejects, that needs to work for your metric name, can't have inverted commas or anything or that symbol, question mark, things like that. Alpha numeric digits, colon or a 100 school. And that's the expression there that I just typed into the expression field. Exactly that. And that's the bad news networks. And that's what the graph looks like. So let's save it and verify that the syntax is OK. Second 40X, yes, we can use the ProM tool to check the syntax, type, checking rules and assess success. One who found Excellent. Now Prometheus isn't going to recognize these rules file unless we link it in the Prometheus YAML. So let's open up the Prometheus bimodal. Scroll down to the rules section, which is just here. And we'll create a new role. Or Prometheus roles YML, that's just a local YML file or just credit control x. Yes. Now let's just double-check that the configuration follows. Okay? Kick on. Fig, Prometheus dot YML says success one rule found. Okay, so let's restart from atheists. So this restart permit is excellent. If I refresh the graphs and start typing in mm, you've got a momentous refreshed again. We go mm for a percent has now appeared. Execute. So that's all I need to type now memory percent net also exists in this drop-down, ma'am, free percent, uh, some new entry in the top-down as well. Now, status rules, custom roles. That was the name of the group. Record, that's the name of the metric memory memory percent. So we sit there and expression, that's the expression there, k. So stay okay, last evaluation 13 seconds ago, but refresh all in seconds ago. Refreshed two seconds ago. So every 15 seconds that is re-evaluated. What decides how often our role is re-evaluated is here and the status configuration, there is a global evaluation interval of 15 seconds. Okay? There we go, Ma'am. 3% execute and graph. Very good. And so I had just begun. Okay, let's create another rule which is slightly more complicated. This time I'm going to count how much memory is free in the file system. So file system, there is file system Ray bytes divided by dt file system size bytes. But there's the first query. But look at the console, it's showing me the overall map points there. I'm just going to filter by just that mount point. Curly braces mount point and same their races mount point. Okay, so I'm using a filter now and an operator. So that's good. That's just a number. So I'm going to convert that to a percentage, 100 times q. Ok, so 89% AND 54% disc free device vD A1. So we can graph that. Let's set up a recording rule for that. Okay, let's reopen out Prometheus rules. Yaml. There are no Prometheus rules. And going across second rule called record, and we go, copy that. Go and scroll back that way. Denying node file system for a percent. And that's the exact expression that I just typed before. Control x. Yes. Check that it's okay, since we've already linked the rules, why ML in the Prometheus YML, we can just check the configured the Prometheus YAML. Like that. Check conflict. Prometheus ML says two rules found in 66. Excellent. Re-start, very good. Check the status. Very good. Now, status roles. Okay, let's go and search for node file system for a percent. And I go straightaway and graph it. Just begun. So a little bit of information down below as well. One minute per cent, node file system 3% and node memory, memory percent. These are the two new rules that we just created. 12. Alerting Rules: Okay, alerting rolls a learning rules are very similar to recalling roles. They like extended recording rolls, right? So in this video, we'll create an alerting role, information bias. And one, the alerts page. And there's nothing in here, 0 inactive, 0 pending in Xero firing, no alerting rules defined. But it's quite an alerting role. I'm going to use the existing Prometheus rules, YAML from the last lesson and add a new roles group called alert rules. So copy that SSH onto your server and go into the ADC Prometheus folder or where you're atheist falls stored and open Prometheus rules. Yaml, the group's name down here. I'm going to paste by right-click the other information from one documentation. Okay, so it's a second group and this one's called alert rules. The rules. It's not a record this time, but it's an alert. I'm going to call it instance down expression up equals 0. That's like Rp equals false. In the graph section, if I type up simply like that and execute that, it shows me whether each of these instances is up or not. That's a one means true. And graph that. And we can see one of them was 0 for a small moment in time there, that meant it was switched off. So I'm going to alert when one of these instances is a 0. When a means it's not up. So it's a one or 0. True or false. Expressions in alert rules should evaluate to being a true or false for one minute. Okay, that's optional. Go back to alerts for describes how long it will sit in a pending state. When we finish setting up the alert, it will sit in the inactive state and as soon as something goes wrong, it will go into the pending state for a period of time before it goes into the foreign state for describes how long it sits in pending status. But I'm saying for one minute, labels severity critical. That can be anything you'd like. You can say severity warning. We can say a, B, C course 1-2-3, doesn't really matter. It's just a label with a value that will show up in this section. And you'll see that you can use that information later on to decide how you handle that. Alert. Annotations is the text that you will see for the alert. So here it says instance labels dot instance, two sets of curly braces down. Labels dot instance comes from C instance dot localhost, instance dot In the next line description, I'm saying something similar. Labels or instance of job labels dot Job two sets of curly braces. And it is the job. It would print node or Prometheus, depending which one was down, has been down for more than one minute. Very good. So. That is an alerting rule. We can save that control x. Yes, it's checked. A syntax is OK. Using ProM tool, check, config Prometheus YAML, three rules found. One of those is an illiberal. Let's restart from ITS slumped. Go back to alerts and we have one inactive called instance down by click that I can see the different properties. So the expression up equals 0 when it will move into the pending AND dating to the firing, depending on our long for for default is 0. Sorry, if he left that law and out completely, you will just go straight from inactive to firing labels, severity, critical annotations, summary, and then the description labels dot instance labels dot Job in the templating braces there, and instance down, it was the name of my alert. Now I'm going to make that alert down into the pending state. If I look at graphs again and type up execute, I'm going to turn off my instance here, localhost 9100 on my server, sudo service Prometheus not exporter. Stop. Excellent. So that's stops now. After some time, that will no longer be a one, that will be a 0. So let's go into the alerts and have a look. Now, this page doesn't automatically refresh, so we should refresh. It. Will go seeing the pending state instance down, up equal 0 for one minute. Scene pending. After one minute, it's going to go into firing. Before I click Show Annotations. It shows the summary and the description with those values. Chinese localhost nine 100s of job node labels, job in labels, instance or refresh that again. One-minute must be almost up. And then we go firing. Now, one unit instance down, one active severity critical here and the labels alert instance down instance, job severity critical. All these information can be used further on state firing active since. But look at the graphs up. Execute 101 graph. Just few moments ago. If I turn it back on, start during 15 seconds because that's how long each evaluation period is. Go back to inactive. And I go inactive one instance down, there are no problems, but is a implored, I'm gonna create another alert now, which is going to use one of the recording rules that we created in the last listen. I'm going to call a disk space for a 10%. So I'm going to be checking no file system 40% is less than or equal to ten. So have a look at that now, where I've execute for my two node borders. No, either of those lists and 10% right now, so neither of them are going to fail, but I'm going to create this alert anyway. Okay, so copy this whole section here. So Copy to Clipboard. Let's go into the Prometheus was YML to deny Prometheus was uomo. Scroll down and right click and it will paste the next alerting. So with YAML, very whitespace sensitive, so you need to have exactly the correct amount of spaces indicating h section. So alert, alert disk space free 10% expression is node file system 3% less than or equal to ten. That returns a true or false if also is to put that in now. So it's just for that. There is nothing less than 104 system 3% less than equal to ten, that was greater than or equal to ten. It returns every node for system for a percent where the value is greater than or equal to ten. From wanting to alert for less than or equal to ten. Here, I don't have the four options, so I'm just going to go straight from inactive to firing severity as time warning, not critical annotation summary instance, labels, instance has 10% or less of free disk space. Prescription labels instance has only little more descriptive of the summary. For a draw. Yes, let's check the syntax. Check config Prometheus YAML for rules. Found everything good. It's restart or good alerts. I have two inactive alerts now. And you can see that if my disk space free 10% was ever through, it will then go straight in to the firing state and dirty warning annotations. So that's the basics of alerting roles. And they also Shop here in the status rules section so we can say steak, okay? Okay, last evaluation that long ago. And then there's the Custom Roles. Downloading Rawls. Excellent. 13. Install Prometheus Alert Manager: Okay, excellent. So we've set up to alerts now in order to know that these alerts are firing on a to check these page regularly and even refresh it from time to time to see if anything is following. Now, rather than doing that, I'd rather get alert through email or some other messenger service such as Slack or telegram for example. So to do that, I will install another service called the alert measure. So the alert manager will listen locally on port 9093. So it's not a process just like permissiveness and a node exporter that is already running locally. So we'll install that now. Manager is quite a sophisticated little process in itself and many, many things that I'm going to just keep it simple, but just so that you know that it's quite sophisticated, it can handle grouping, inhibition, silencers, client behaviour, and high availability. Anyway, let's just get started on my documentation. Pseudo IPT install Prometheus alert manager came up SSH into my server and press enter. Very good. Ok, so that has installed the alert manager process for us, so status. Okay, so it's already active. This is using the Prometheus user. It's either permitted use users now running three processes locally. The mind Prometheus process not exporter, and now also the alerting manager. If I cd into the folder ATC, Prometheus, LSL, height h. There are new foils here. Alert manager, YAML or that manage templates. There we go. Now the alert manager, He's exposing an endpoint on nano 93 so we can visit that in our browser. So your domain or YP. And it takes us to a default page. So your domain or IP core non-zero 93. I can look at the metrics. Reload healthy already. So matrix. Okay, so we can start reading that and permeate is, if we set that up, I'll set that up in a moment. Reload as not allowed. Lc, okay. And ready, okay, the alert manager conserve its own user interface. There is more information about that year. If that's something you want to read more about. But anyway, I'm just going to manage this through the configuration fall and Prometheus type. Now what I want this URL to be accessible externally, so I'm going to block port lines or non three IP tables. I'm going to accept localhost, only. Drop everything else. And just verify. There we go. 9093 is the E1_l except and drop. Now. Now we can verify that that works locally by using curl. And we go and that's just the HTML that we saw before. That works. We can also check the metrics endpoint. Excellent. Now I'm Prometheus in status configuration. The alerting section already has something set up for 9093. So we don't have to change the YAML for the alert managers to start reacting to those. But what I am going to do, since the alerting manager has his own metrics endpoint, I'm going to set up another scrape target in the scrape conflicts so that we can get properties from DI alert manager in Prometheus. So let's open the Prometheus YAML. Okay, so I'm already in the ADC Prometheus folder. Go global alert manager, target is already sit at localhost 9093. So there is now something listening on that port. Whereas just a few moments ago there wasn't the scrape targets, which is an optional thing to sit up. And I'm going to set up a new one. I'm gonna place my cursor where I wanted to start. I'm just going to highlight that Right-click. So it writes a new entry and call it manager as the job name. But just press Control K, it will delete those lines. The configs going to delete the second one. And for targets, localhost 993. And that will search for 9093 slash metrics just like it does for the other static configs end here today. So excellent. X, yes, tickets syntax is ok. Using the ProM tool, check config Prometheus YML. Oh good. Let's restart Prometheus. Okay. A good refresh graphs. If I just typed up executes, I now get up for the alert manager as well. So if the alert manager goes down, I will at least see the alert in the loading section in targets now, you alert manager appears on one up, localized non-zero 93 or manage. Okay, excellent. So now we have dealers manager set up on the same server, listening on port 9093. And the next few videos, we'll do some more to this so that we can actually get some alerts in our email inbox. Excellent. 14. Install Send Only SMTP Server: For audio, we have dealer manager running. And if Prometheus evaluates any problems, you look manager will now get that message. And the alert manager, right now it doesn't do anything with it. I'm going to set up the alert manager to send SMTP so I get an email or before the elopement issue can send that, it's going to need an SMTP server. If you already have an SMTP server setup somewhere on the internet, I recommend using that. You may have to do some whitelisting in order to let your server send an SMTP message. But if you don't have an SMTP server, I'm going to show you how to set up a local SMTP server on the server itself. Now this SMTP server is going to be a minimal setup and the only thing that it can do is send emails. It can't receive emails. For that. I'm going to use a program called postfix. Okay, so there's the command there to install postfix SSH onto your permit to your server API to install my all utils that will install postfix on an Ubuntu machine. Okay? Yes. Okay. Gives you a configuration screen. Enter four, okay. We're gonna select internet site which is 40 default press enter. The system. Male name will be my domain name. So this will work best if you have a domain name. For me, CISSP code Dalton in my domain name. Ok. We should now edit the main configuration file. So don't enter ADC postfix main CF. Scroll down. Now to change this line, In2 interfaces equals loopback HIV-1, only, like that. And on IP protocols, IP V for my server is configured to use IP v4 only anyway. So, but I have found that IP V6 can cause problems. So I just keep it simple with IPV four. Control X to save. Yes, sudo, restart and status. Very good. It's active. Right now you can probably send an email from this server using this command here. But before we set this up, since the Internet is full of spam email and many email providers will block your ability to send email by default, there are several things that we can do to our server so that our server can send emails. Now every email provider is different and now require different things before they allow your email address to receive an email from a server. The most significant thing I find that you can do is that reverse DNS works if your domain name. To demonstrate that I'm going to log on to a different server. This is my ZAB IX Server and I'm gonna type host from Atheists dot ASP And that tells me the IP address. Now check the reverse lookup works. So host IP address. And it says Not Found in X domain. That means that if I send an email from my server, Prometheus SB, it is very unlikely to be accepted by almost every email provider. Here is. So that's why I say if you have a Email, SMTP did up in your corporate already, then use that was I'm gonna show you how I'm gonna use this. I organized my Burma through digital ocean. That's it, the Pharmacia. So this is the server I'm using for Prometheus and the current one that features in the videos so far now on Digital Ocean avoiders click that name there and type Prometheus dot ASP and press Update. That will then change the domain settings for my server. And after some time, a reverse look-up for that IP address will return Prometheus dot ASP, or just need to give that some time for that to propagate through the internet. Okay, let's try that again. Since pretty stable now. Hopefully, my email provider thinks the same. Okay, so back on my Prometheus server, I'm now going to try and send an email to one of my inboxes. So copy this line here. Copy to Clipboard on my Prometheus server where my SMTP server has now been set up. Ok, so I'm going to send it to that address and it's going to be sent from a so Echo, this is the body pipe mail dot s. This is the subject I from colon admin at Prometheus dot to that address into right. Okay, so straight away that has made it into my inbox. If that doesn't work for you, you can try and find it in your spam folder. And then white listed as not being spam. That worked for me straightaway, just had to make sure that the reverse lookup worked. Okay, so in the next video, I will configure my alert manager to use the local SMTP server that was just sit up in this video. So I'm neither setting off an SMTP server. Iis very problematic for most people. If you have one already set up in your corporate, then I recommend you should use that. You will need the IP address and the port number of the SMTP endpoint. Ok, excellent. 15. Configure Alert Manager to Send Alerts from Prometheus: Okay, excellent. So if setup an SMTP server, now we need to configure it in the alert managers YML. Okay, so cd to the folder where dealers manager YAML is stored. So that's NCD, ADC Prometheus, ls, l height h, that we go alert manager. Now before we make a change step fall, I want to credit backup of it. So I'm gonna copy alert manager. I'm going to save it as original like that. Okay, so I've got a backup of it so I can use that for reference later. Hey, let's open up alert measure YAML. Now, there's a lot of information in here to take him all about groups and how routes of processes, such as regular expression on the different labels. And you can do different things, getting it set up, different inhibition roles. There are comments there that you can read. And you can set up various different receivers so one receiver can get an email and other kind of receiver can get emails and pager alerts, CPU setup, pager duty. There's quite a few examples there. One for the DB admins maybe. But I'm gonna keep this really simple because there's a lot of information to take in. There. Didn't delete the whole lot. And on my documentation here, I have just the bare minimum. You need a copy that paste that in while pressing right-click. Now the root, I've set one receiver called SMTP local, and this is the configuration for SMTP glycol down here. Now my to address who's going to be. My from address will be. I don't require TLS is just a local SMTP server. I don't need Sit username, password, sicker identity. If you were using Office 365 or some other service, for example, you can probably remove that and put in your username and password. But I don't need to do that. My local SMTP, the smart host is at localhost port 25. If you use the corporate SMTP server deported, the address will be different than that. Maybe port 587, Center resolved equals true. That means I will also get an alert when the alert stops firing. So when it is resolved, That's all you need for local SMTP server rho x? Yes. Now widths the alert manager, we also got a tool called AM tool, which allows us to check the config of the alert manager and YML. Note in the IM tool versus the ProM tool, chick dot configures a hyphen in the middle. So press enter. And it says, kicking success and found one receiver. Excellent. Let's restart the alert manager. And she could status as all good. Now I am Prometheus alerts. I have two inactive alerts. Let's turn off a node exporter so that I get an alert, for instance, downer. Okay, so I'm just stopping one of the node exporters after some time that will go into the pending state and then into the firing state based so it's pending. And once it goes into the firing state, I will get an email alert in my inbox. Okay, so it's now firing the annotations. Okay. So it was in pending state for one minute and so I had that sitting my inbox. Okay. So in my inbox, I now have an alert instance down firing localhost 9100 node example Critical. And here's some more information. Alert name, incidence down instance, job monitor, sample severity, critical annotation. There we go. Now, I will also get a resolution for this since I configured, send resolved through. So by race started, no exporter, start down there. Okay, and now it's back in inactive, for instance, down non-active. Ok, so in my inbox, I now have the resolution a mile. So let's show that there we go. Resolved instance down. That's the annotation, but it is now resolved. Now that can take several minutes and title to demonstrate the whole process. I have vast folded the video at moments. And when these alerts here, there are two links. There is this one here, review alert manager. We haven't installed a user interface for the alert manager. I'll do that in the next video. But down here, there is the source option here. Yeah, that can point to the Prometheus user interface. Now, since I've set up my Prometheus user interface as a public URL, it can be useful for me to configure how these link works. Right now, if you look at the link at the bottom corner of the page, it says Prometheus colon 1990. I can set that to being HTTPS, Prometheus dot ASP slash graph. So I'll show you how to do that now. So this is an option for you if you want. But right now, the alerts are working in on getting them in my inbox to configure that URL address in that email, we can set the ATC default properties for Prometheus. So, so I want you to server and run that command. Now in args here, I already have one setting from a previous video. Now, you can either have that or not. It's up to you. I'm going to add another variable. If I have more than one variable, I should have a space separating them. So I'm going to add this one here, web external URL equals, I'm going to set that to be high HTTPS. That's my external URL, ICPS, Prometheus dot Control X. Yes, restart Prometheus. Restart. Very good. Let's create a problem. I'll switch off. I note exporter. And there we go. I'm stopping and not exporter. Now, white for the next alert to come along in my inbox. And that linked on the bottom, we'll have more external URL on it. Okay, so there's a new message now from my admin. I can show it. Here it is, and it says it's firing. Now this link down the bottom here, if you can read that URL. So it says, hey, HTTPS, Prometheus,, etc. By Chris that he's taken me strike two, Prometheus, and it's showing me the query that is being evaluated to decide if there was an alert firing up equals 0. And that's the matching instance there by glows 9100, job node value 0. If I look at that alert here, instance down BY expression, up equals 0, severity critical. And they are all the labels, they're state and firing. Excellent night that, that does take quite a little while to test all those features because I have especially boilers sitting in a pending state for one minute. Okay, so excellent. In the next video, I'll show you how to set up the user interface for loop manager. Now this is an optional thing. I'm just showing you just so that you know. Okay. Excellent. 16. Add the Prometheus Alert Manager UI: Okay, so we're getting alerts and we have fixed this link down here source. So if you click it, it takes you straight to Prometheus. And you can see the expression that was boring. Now, to fix this button here, to view in alert measure. Now the loop manager that you get when you run APT install on Debian based systems such as the B12 doesn't come with a user interface. So the two things are that URL that you go to when you click that isn't configured. You can look at the bottom left of the window down there. And also if a was configured, it wouldn't actually go anywhere because it doesn't exist. When you visit the user interface at port 9093, it tells you the dBm package of the measure does not include a web application. That is, sign doesn't include a user interface. So in this video, I'll show you two things. One is how to get a version of the alert measure that contains a user interface. And, or you could just change the template that creates this email so that link is either not there or it's replaced by the URL in the source thing down here. To get a version of the loop manager with the user interface, you can download a prebuilt binary from GitHub. So what we do is stop the existing Prometheus alert manager that we already set up SSH onto my server. I'm stopping you alert manager. Now to download the binary would assign version number of my existing alert manager, which was 015 points for a. Now, to find out what version of alert manager you have, you can type Im2 version and it says 0.315. So I'm going to download the pre-built binary version 0.315 because I've already created a configuration that is suitable for that. Okay, so I've created a link down here, down there, Copy to Clipboard. Right-click W get from GitHub Prometheus alert manager releases download version 10.3, Alert manager 15.3 for my architecture law annex IMD, four tar.gz into what I say all the different downloads you can get from GitHub. Just copy that section of the URL there. Look manager releases, and then you'll get more information. So links to each the binaries, for example, your operating system. So I'm going to now on tar, that ball I just downloaded on tar, manage our 0.153. And now I'm going to cd into that folder, alert manager, ls, l h. And this is the newborn or alert manager just here. This is the folder I want. I don't need the YAML or the am tube because I've already configured dose. So I'm not gonna copy that binary to another folder and set it up as a service. We go copy the copy alert manager. To use a local bin alert manager. I'm going to cd to that folder just to be sure it's their user local been LS. This measure, it's just a binary file. Now to create the service. So pseudo nano ATC system, the system alert measured or service. We're creating a new configuration because we're going to create a service with this alert manager so that it continues to run in the background, like the existing Prometheus alert manager already did, but this alert manager as a user interface on it. Okay, in that file, copy this. So Copy to Clipboard, right-click sought pastes from ats alert manager, service type, simple user, Prometheus. We already have a Prometheus user on our server. So I'm just going to use that. The XX start is user local bin alert manage that points to binary that we just copied it to that folder. User, local bin, the config file, ETC. Prometheus alert manager. Why ML? This is alert major YML that we're been working on in the last few videos. So I'm just going to point the conflict to that same YML web external URL. This will be used when creating a link in the email here. So right now that says the bottom-left, Prometheus colon non-zero neu three, which isn't a valid URL on the Internet or my internal network. So that URL, the Christ that button, and even this bottom one down here that says sent boiler manager will be prefixed with that. Now, depending on whether you use engine X and reverse proxy, you're not that you're always going to be different, might be the URL of your alert manager. But since I have already set up in genetics with SSL, I'm going to use my real address. And I'm going to create a new path in engine X four alert manager. I haven't done this yet. So That's the URL link Prometheus dot ASP alert manager, that will appear here when I click that button. So wherever we prefix, I'm just leaving that as it is. A control x. Yes. Okay, now I can start this service manager start and status active running. So that's a new service that we credit. And this time I called it alert manager, not Prometheus alert manager. So the original Prometheus look manager that I installed when using APT install still exists, but right now it's switched off and I'm now using this new one, raw downloaded the binary instead. And also you can't run both of those at the same time unless you configure it to use a different port rather than nonzero 93. Okay, so Control C, get out of that. I'm just going to visit the local URL just to show you that it's there, 00 dot 1093. And this is just the code of the homepage printed out for us. It's isn't much to say, but I can't actually beauties across the web at the moment because I blocked port 9093 using IP tables. So I'm going to go straight into engine X now and create that new location. Or you look manager. Okay. So this is the site's enabled permit his convict that we credit rots beginning of the course. Okay, location pass goes to 1990. I'm going to create another location pass is going to highlight that. Right-click Sort creates a new copy of that. I'm gonna call it alert manager. And you shoot with a backslash. And it's going to proxy pass to internal non-zero nine, sorry. Alright, very good. Now control x. Yes. Check the engineers config, okay, restart engine X. And it's all good. Now, I can be said that in the browser amaze us as we go down a measure. And it takes me strike to the alert manager user interface Day. And because I have SSL on my engineers configuration or Prometheus SBIC 0.1m straightaway has a padlock there and it's using this ticket. And also I have basic authentication configured for that website as well. So it doesn't worry, logged on as browsers not ask me because I'm password, but if I were going to different browsers such as Chrome here is now asking me for username and password. So waterway, it's got a level of protection. Well, there's not a lot to see in the user interface. It looks pretty nice, but it shows us our configuration. These are the configurations that I set up. These ones, this one and this one required TLS false, everything else are defaults that the alert manager has added, such as resolved timeout, five-minute, and a few other properties. When it's not using CBA, SMTP lie coordinators to name their email convicts. Okay, so now since my service, if we look at the configuration again as web external URL set to that address, any new alerts are going to have that as the URL Prefix. So let's create an alert. I'm going to stop the local node exporter their press enter. And then after some time, check my new email alerts is to see what it looks like in the alert manager. So far the manager has picked up, There is an instance down. I haven't got the email just yet. Source takes us straight to Prometheus and has to wait for the email. And there is my new alert view in alert measure link down the bottom, they're hard to read, but it the URL is now correct. If I just click that, it's taken me strike to dealers manager user interface there. I can see the info and look at the source in Prometheus up localhost value 0. Excellent. Now crowding dealer manager user interface is a lot of work just so that you can have that button working. Instead, you may prefer just to change the email template that draws that button. So for that, I have logged on to the server using other program called WinSCP. And in the folder, users share Prometheus alert manager, default dot template there. If you double-click that, miners opened up in Visual Studio code, but you can configure that. But the alert manager URL here, this is the template. It creates a variable called alert manager URL, and then uses that value in further places throughout the template. In the HTML. Alert manager URL, a hater revolute manager URL style view. In a manager there, you can change these templates. Now there is no real good documentation on what's going on here. If you're familiar with HTML templating and various web technologies, you'll know how to understand what you're seeing here. That's just one option you have that might be easier for you rather than installing a new alert manager binary that contains a user interface. Okay, so if you do change the template, remember that it is very easy to break, test one thing at a time by generating a new alert. So if you change any of the text, just generate a new alert so that you undo what you just did. It's not easy. I've spent quite a lot of air was trying to modify my template at times. I've currently modified mine to now say View in Prometheus and, or does it just points to the Prometheus alerts page. Also note that the pre-built binary that we just had up at the beginning of this video. You can't edit that template file directly, but you can change the configuration in the service configuration. If you do change the template file, the easiest thing to do is to not use the alert message that we just installed in this video. One called alert manager. So stop that, but use the one that was credit using APT install that one there, Prometheus lobe manager. Now start because that's already configured to use the template file that sits here, user share Prometheus alert manager, default template. Anyway, so those are some of your options with the manager user interface and whether you want to modify the email template. Excellent. 17. Install Grafana: Ok, So earlier on in the course, I demonstrated using ProM QL from Ravana. Now co-founder and Prometheus work very well together. So I'm going to show you how to set up a megafauna server for yourself. And then we will connect to our Prometheus server. So the next few videos we'll be Becker file. Okay, so we don't need a server. You can either use your existing Prometheus server, install graphene on it, but I'm going to just use a brand new Ubuntu 20.04 LTS. And in stalker finer on that. Now he says I want your server, we're going to install Gryffindor and run sudo APD update K Of SSH onto my server that I'm going to use for profile to update. K now need to install the dependencies the co-founder needs. Normally add usable or eBay there, but lib conflict one might not be aware that processing triggers section can take a minute, sometimes. Now to download the binary. Okay, I'm downloading the release governor seven to 0, AMD 64 dB. That's done. Now, two, on the Debian package manager on that file. Okay, that's son. I can now start the grown-up process on the server so I can right-click that and check status. And we go and as active running, so open your browser, put in the IP address of your server, colon 3 thousand. And you're taken straight to the fauna login page. The username and password is admin, lowercase. Admin or lowercase. It wants a new password and submit. Very good. That's a new microphone or server that you can use. Now this won't be a full course on graphene. I do have a course on graphene or if you interested, Excellent. 18. Setup the Prometheus Datasource: Okay, so now we'll set up the Prometheus data source in graphene RNA. For foreigner, more need to know about the data source. So the nose, which protocol to use whereas querying that data. So add data source there. Prometheus, select now your URL to your Prometheus Server, C colon 1990. Maybe that was an IP address. But I set my Pharmacia server with a domain and with SSL and behind the engineer's proxy. So my address is simply HTTPS Prometheus dot ASP excess, server default. I'm also using basic laws because we set the username and password on it so that user was admin and that password was HTTP method I will use post. And scientists. Okay, data source is working. Okay, so now that's working. We can go straight to the Explore tab. And with Prometheus selected in the dropdown there, we can use these metrics drop down. And a lot of information already gathered from our Prometheus server is filling this drop-down. So we have everything about our node exporters, about Prometheus on the server right up and see everything which is up side my Prometheus server. And these values are all one equivalent of this page in Prometheus is this execute megafauna really used just doing more visual user-interface for Prometheus. Let's have a look at one of the previous examples. Go prints, for example, Run Query, and they go threads 1010788. And we can look at that over a particular range, say 24 hours or two dice. The good, excellent. In the next video, I'll show you how to set up some dashboards for Prometheus inside kafala. 19. Setup Prometheus Dashboards: Okay, excellent. Now let's create some predefined dashboards so that we don't have to keep typing queries. Now, DO2 configuration data sources select the Pharmacia data source that we've already set up here. There's not from the dashboard. Select that and select its 1-year Prometheus to stats. Import that. And now click it. And straight away, we have a dashboard about the properties of our Prometheus server. And if we want to look at a graph in more detail, you can hover over it and, and press the snout full screen. You can also edit these graphs by pressing a if you want. And if you do change something and you do break something, it doesn't really matter. You can go to configuration data sources, go back to dashboards and staked re-import. Now that's a dashboard for Prometheus properties. Now let's create a dashboard. Specifically faulty node exporters pay. So we'll import dashboard from the community so that a configuration data sources select plug-ins here. Find more plug-ins on graphene. Now, collect these dashboards, link at the top there and Data Source select from ACS. The particular dashboard I want is already pre-selected for me. Now, I'm going to install this one here. This third one down. This is the English version of this dashboard up here. If you can speak Chinese, then you'd probably prefer that. But I can't read or speak Chinese, So I'm going to use the English version, which is this third one just down here. So click that. And there are some previews of the dashboard that we can look at. What we need is this number here is 1107 fourths or copy that. Go back into her fauna. Go to dashboards, manage, select, Import, paste that I'd A11, 074, and press Load. Now that has downloaded the configuration from the website, select the default data source which has Prometheus. Press important. Okay, and straight away, I have configured three node exported scrape targets now on my server. So those three settings appear in this table dropdown here, which I can open and close. And I can see more specifically the properties about any particular node exported by that one or that one, or even that one. Excellent. So there is plenty to look at or recommend sitting up as many NADH exporters as you can on your Prometheus server that's in the script targets, the Prometheus YAML, come back to the fauna. And they will be a new row for every single night exporter that it finds. Excellent plenty to look at. Excellent. 20. SNMP in Prometheus: Okay, SNMP in Prometheus. Snmp is an advanced topic. You don't need to know SNMP in order to use Prometheus. But if you are going to use Prometheus, seriously, you're going to want to know but SNMP, because he does another option for you. Now is an MP stands for simple network management protocol. Now the mine res and for SNMP information, This is in those situations where you cannot install 8-note exporter on the device that you want to monitor. The device can support SNMP. Snmp has been around for many, many years since the 19 eighties. And many devices that you buy off the shelf will support SNMP, such as routers, switches and printers and servers and workstations. And especially in the cases of routers and switches and printers and network storage devices, you cannot install a node exporter on them. So that's way is an MPA becomes useful. Today is looking at the diagram. That's the Prometheus server. So we've installed Node exporters, some installed several note exporters, and we know all about that. We're getting plenty of properties from those and DEI work extremely well. But in relation days and MP, what we would do is install a SNMP exported instead and configure that to point to devices that are exposing SNMP properties, such as switches, routers, printers, or even the local workstation. So think of it as essentially the same. Prometheus is requesting data from exporters, or Prometheus requesting data through an SNMP exporter, which is then querying SNMP daemons running on various devices. Okay, so if you have the option to install not exporter, do that first. Snmp is quite complicated, and then our export is already explored. A whole lot of useful information in terms of monitoring a server. Okay, so if you're running Prometheus on a corporate network, there's a really good chance that you already have switches and routers, printers that you can query SNMP data from. But if you're doing this course from home, you probably don't have that. So what I'll do in this first section about SNMP is show you how to set up an incentive pay daemon on our local from atheist server. And then we can then query that using Prometheus via the SNMP exporter. So the first part of the section on SNMP will be about setting up an SNMP Damon locally so that we can see what SNMP is. So log onto your server and we'll install what's called an SNMP daemon, will also install some SNMP tools. And that is using SNMP that if you're using central seven, then the command is slightly different, but it's the same from then on. Okay, so I'm using a bunch of k, So suit, I pertain still SNMP and the SNMP Damon. My server, Prometheus server is now running in SM, SMP daemon, which is exposing properties very similar to the way I note exporter is serving properties, but it's using UDP protocol on port 1 sixth 1, okay, now to verify that the SNMP daemon is running, so copy that and paste and the service is running simple network management protocol. Damon, here we go, Control C. Now we can do some simple tests. Snmp walk, that comes from the SNMP tools that we also installed. So I'll copy that SNMP walk and that's the response. You should see daddy showing a list of values with that router prefix. So 1.3.6, one-to-one, one, say one volt. That also equals one. So 1361211 dot and anything after it for scroll up, it was quite a few. We can also use SNMP get to get one of those values explicitly. So I just chose that particular value there. And it gave me the name of the operating system. Now, these are called OID. There's potentially hundreds of thousands of those. You don't need to know what those are, but with familiarity of SNMP, you'll begin to start recognizing them. But the important thing so far, this stage is that these things work, that, that you can store this name payday and he can get his status in its active running Control C. And that SNMP walk works. And that is a very simple one using the version to see protocol with the public community on the local server 127001. And for example, that all IID that gives a response and the SNMP get also works. Just some examples, they had the control. Now, if you're familiar with SNMP, you probably notice that I didn't installed a mips downloader. Downloader is unnecessary at this point in time. So that's why I'm not downloading it. Tom, you'll need mips files on your server is if you're generating the SNMP configuration that the SNMP exporter we use, but we'll talk about that in the next videos. So right now, this die is three things need to work. Okay, excellent. So now in the next video, we'll install the SNMP exporter, then set it up so that we can read that in Prometheus. Excellent. 21. Install the SNMP Exporter: Okay, now to install the SNMP exporter that we'll query that SNMP daemon that we just set up in the last video. Once the SNMP exporter is installed and running, we can figure it ne Prometheus YAML. Now I'm going to install the SNMP exported on the same server as my Prometheus server. You don't have to do this. I'm just doing that because I have IT available, but you're gonna store it anywhere you like. Now, there is an easy way to install the SNMP exporter, and that is a big D install Prometheus SNMP exporter. But since I will lighter be generating SNMP configuration, I'm going to be using the latest version of the Prometheus is an MP exporter. If you use this method to install it, you're gonna get version 0.4.6, whereas the latest version at the time of making this video was 0.19. Okay, so to see which version is in the APT cash, it's 0.116, which is quite old. To see what the latest version is, visit that link. And it is version one, non-zero ongoing to manually install that version. I'm using Ubuntu on an AMD 64. So I'm going to be downloading this file here, IMD 64 to find out what architects you have. You can type i. And here it says it's an X86 64. There is no x 8664 in this list, but you can consider that the same as IMD 64. So Lomax IMD 64, if using Mac OS X, you probably download the dial on IMD 60 for all threads six, if you're using Windows, you download the Windows three, S6 or IMD 64. So I prepared the download link already so I can just copy that. But if you didn't have it, you could right-click that, copy the link, go onto your system. W get, right-click to paste it into this downloading. It's quite a large download, okay, so ls is a new file there called SNMP exporter 19, law on exam day 64, need to untie that. That's the same as unzipping on Windows. Because the command there, tau x, z, SMP exporter, enter ls. Again, there's a new folder now called SNMP exporter, that one. So cd into that ls, l height h, that shows that there are four new files in that folder. And that grain indicates that it is an executable file. See read rod XOR x down here. Now I'm going to set up this file as a Linux service. So I'm going to copy, I'm going to copy the SNMP exported file to use a local been SNMP exporter and copy the YML file as well SNMP YAML to use a local been SNMP YML. Okay, now to city to that folder. Ls, l height H. And we can see that those new files are there. Now to set up the service pseudo nano ATC system d, system SNMP exporter dot service there. Enter in that new file, copy this script. Paste Prometheus, SNMP exporters service type, simple user, Prometheus. That user already exists almost system. So i want to have any problems, but if you were installing DSMP export on a different server and you didn't have that user Prometheus, you could always create that user using this command here, pseudo user EDS system. Prometheus is all credit system only user that we can use to Romney SNMP exporter service. Now in this service is started, it will run user local been SNMP exporter. That's the URL that we just copied. And that will use the configuration file at user local bean SNMP dot YML, which was another forward just copied. Okay, excellent. Patrol x to save. Yes. Now we can start that service. So using these three commands often when you modify a service configuration, you need to restart the system control Damon. So I'm just gonna do that quickly. Snmp, exporter, start and check its status. And it says active running. We now have the SNMP exporter running as a service on our Linux server. So that means we can exit this session and we'll still run in the background. Okay? Now the SNMP exporter can be accessed through the browser on port one 16. So, so my domain name or IP, yours will be different. Colon non 16. And that's the SNMP exporter user interfaces, rather basic, but we can see that we can lease access it. First thing we can do is we can change that I pay to point to the local host. Any will try and find an SNMP daemon running at 127001 that we'll look at in a moment. But it could be blank or it could be all kinds of stuff. Just press submit. And right now we get several values. Being SNMP, scraped duration seconds, Great War curation seconds last initialize C subtype. Those values won't be visible information AS yet until we make changes to the Prometheus, YML onRestart Prometheus. But as you can see back, we have an SNMP exporter running now on this server down here on port 116, SNMP. And I just told it to query 127001, which points to that demon that we installed in a previous video. We can check out the config of the SNMP exporter. That's the day. It is a massive file and it is full of configurations that you'd expect to be common Configurations. I'm actually going to open this using Visual Studio Code. So I've opened a program I'm on Windows called WinSCP, and this is very good for looking at the file system on a lock server. So SNMP YAML, That's a fall that we just copied. If I double-click that, my system, it opens it in Visual Studio code. Okay, so I'm able to minimize certain nodes of that YAML and individually. So going back here, when this was first opened up, it had module. If maybe that is the module as described in the configuration there. If MIP is doing an SNMP walk on those days and an SNMP get on that our D. And in the metrics here for every OID that matches these numbers here, it will create a new property named Arthur C sup time or if number, that stands for interface number, interface index with an OID. There was a lot to reading. This fall is generated using the SNMP exported generator, which we'll discuss later on. Rod analysis, manual changes will be lost, so there's no point in modifying these foil k. So if we look further down, there are several preconfigured modules such as Cisco, WL, see D-Day WRT. I'm not going to demonstrate dos working in this video, but I'm just showing you that these things exist. And later on you'll be able to create your own modules. For example, rarely ten there and copy that I can type rarely tannin there, submit. And if there was an SNMP Damon at 1234, it will do all the walks and the gates for whatever was configured for the row ten module. Walks and gets and Healy areas does an error because nothing exists at 1234. But in the very first demo, when I first started, I had if maybe configured submit now is giving me response. But since in this case I've installed SNMP D on the local system, is not going to give me all the values for it's not gonna give me all the values that are being asked in the effeminate walk. Good. So let's just try these out and I'll show you how to fix this up. Okay, so this is a little bit more but estimate p. So let's just try this manually on the server to see what happens. So I'm copying that OID 1.3.6 1212 control copy on my server. Snmp walk version to see community. Public 127001 is right-click 1361212. It says no such object available on the site genomics. So ID, when you first install a SNMP day on a Linux server such as Ubuntu or sent to us, it's not going to return anything. With that prefix. It returns other values such as 11. Say I am getting a result there, or even or even one, 0.25. it's given me some values. But we wanted to turn 1.2x, as is being asked here in I-V response like that. So to do that, nano ATC, cinnamon P, S and M P We need to edit a file called SNMP So important as to be one. If you scroll down, this is the configuration for the local daemon that's listening on 127001. This also listing on IP version six. So I don't want that actually, so I'm just going to comment that out like so. Now, in my examples a few moments ago, the SNMP walk was turning everything prefixed with that number one No.1 and 1.2x, which are demonstrated. We want to see, are we seeing with one dot T2 as well? So what we can do is are the add a new line that allows 1.2x and everything with that prefix. Or I can just comment out that second line and delete that one there. And now it's going to return everything prefixed with 1.3.6 one-to-one, which is 1.3.6 one-to-one. And that should also include dose. So the SNMP walk after I save this file Control X. Yes, and then re-start the SNMP Damon time, just double-check its status. Okay, good. Snmp walk now for that ID, 1361212 will now work. There we go. I'm now getting all bunch of counters that are related to the module here. Now, that's interface, that yf stands for interface. And on my Linux server, I have two Ethernet interfaces. And we can see one of them written here, the red ink device 001 and that one low to air. So they're the actual alway days for that. But now, since more SNMP d, DEA has been configured to allow more OID is to be returned. If I refresh this page, is giving me a lot more information. Very, very good. Now it's time to configure the Prometheus. Why ML to look at this local SNMP exporter. So that's what I'll be doing configuring the Prometheus process here, the YML file to look at the SNMP exporter, which is capable of reading data from the SNMP daemon, running likely. Okay, so open D Prometheus YML file. Prometheus YAML. Ok, scroll down to the bottom. And I'm going to add a new job name called SNMP. So copy all that plus the whitespace at the beginning. I don't need those three dots there. That's just indicating the days, some content beforehand. Now, right-click and there we go. So I look at that here we have in L scrape configs on Prometheus. I knew job nine called SNMP. The metrics path endpoint is slash SNMP. And if I look at this again and just press submit and look at the URLs, is That's the URL Prometheus, normal ML six slash SNMP with target and module. The parameters the module is if maybe. So remember we didn't change the view configuration in the SNMP for fall, we wouldn't actually see that any data being returned. For that module, the targets, it is 127001. Now down here, we need all of that. But this down here, I'll show you what that does in a moment. But first let's control X to L new configuration. And also remember when you're writing these, why ML's? It's very whitespace sensitive. So I used a tablet. If you get a rod is manually, oh, x is. Let's check the syntax is OK from Tool. Check, config ADC from eighties Prometheus YAML. And it says, no problems, success and success. Res.end check its status. Very good. And his active running patrol seeing now to visit Prometheus. Okay, so logins you from atheist Server status targets down here. Snmp, One at a one is up, and that's the end point, H e to the 12700 normal 1 sixth slash SNMP, job SNMP instance 127001. Now this URL here, I'll come back to that in a moment. But what we can do now is go into graphs and we can start seeing some SNMP information. Snmp scrape, duration seconds, execute and they will go instance 1270 job SNMP. We can graph that as well and we'll start to see more information over time. So other things that you'll see in there, if so, interface, if in octets, for example, execute and we got scientists beginning five minutes. And if out of those things would be familiar with you if you have used SNMP many times before. Snmp is usually used for monitoring network devices. And nearly every single SNMP device will tell you how many octets it's transmitting in and out. Now, all these different names here, if index, high-speed broadcast packets, lost, change NTU, et cetera, are all set in this SNMP YML file. For example, let's look at if H C octets. So let's just copy that. Control F to find that we go out octet stats the OID. So the SNMP exporter has found that OID taught counter, that's the help information, et cetera. Everything's out, new cost packets. And that's where these values are being. Sit. If speed, if speed, and we go for very good. Ok, so this SNMP exporter URL is public auction block that using IP tables, if I want 49106, you've been added as a location of my engineers configuration so that it wouldn't have the bicycle dedication of the beginning of it. I'm not gonna do that. Okay, so now going backwards here to status targets down here, this end point URL I was talking about that. We can change what is written there because right now if I press that link in the browser, it's just gonna time out because 127001 doesn't mean anything to my local browser. So to do that, let's go back into the Prometheus YML. Prometheus, Prometheus YML go down to the bottom. And that's this value here, replacements. So I can say from atheist ASP, that's the URL as shown on the User Interface. Control X is just free. Status or refresh that page unknown as starting up. I go straight up and the endpoint URL De Bing from atheist ASP Now if I click that, it's now showing me the data directly in the browser. Now that's optional, you don't need that. Now in the next video, I'll set up a nother SNMP daemon on another server just to show you what multiple SNMP devices looks like in Prometheus. Excellent. 22. Install a Second External SNMP Daemon: Okay, so we have the setup now, the SNMP D, the SNMP exporter and Prometheus can read the data that the SME exporter is collecting from the SNMP demon. Here, I'm going to set up a second SNMP Daemon or another server, doesn't matter what it is, it's just anywhere else in the world is going to be an Ubuntu server as well. We've been to 20, but it could be anything you like Windows or Santos. For Mac, in this case, I'm using an operating system simply because it will be easier for you to replicate, rather than using a dedicated SNMP device such as a switch or router, et cetera, or do that in the next video. But nice video. It would just be a workstation style SNMP Damon k. So the SNMP exporter will be polling to SNMP daemons at the end of this video, that one and another resume pay Devin or another server somewhere else. Okay, looking at the status targets on information bias user interface, again, here to SNMP. I've got one at a one up. At the end of this video, I'll have to attitude up so that we and other end point at it. So I have gone out and organize myself another server. It could be anything. It's also on the Internet. I have endorsing a disk from Digital Ocean as well as credit a new droplet. It's a different server than everything else I've used. But what I'll need on it is SNMP D. So now I'm only installing the SNMP. Damon does time, not the tools or mips or anything like that? Just the daemon process. Okay. Yes. Okay. So that can take a minute to process that last step sometimes so that it's an MP daemon is running on that other external server. Data's active running. There we go. Control C. Now need to configure the SNMP D to allow external connections to it. But first I'll just demonstrate that problem from my Prometheus server. Ok, so SSH onto my Pharmacia SQL Server where the SNMP exporter is running. So that server that's running the SNMP exporter is going to need to be able to make SNMP gets up and walks to that other server. So far was to do something like SNMP version to see That's my community public, the IP address of my other server, which happens to be that. And I'm just going to query all days and see what I get. Now it's going to time out because the SNMP day on my new server is only listening on port 127001. Ok, so going back to the other server, I need to configure the SNMP configuration. So sudo, then our ETC, SNMP, SNMP d dot conf, SNMP for that's important is called the D into. If I scroll down. Now the agent address. If I just delete that whole section, such as that, and just type UDP con 1-6 one. What that's gonna do is listen on all interfaces on the server. So also the external IP address, UDP port 161 k. So that section that I commented out that's related to IP V6, which I don't want to use in this case. And that explicitly says, listen on the interface 127001, but now I've removed that. It's going to listen on all interfaces, UDP 1-6, one. Okay, going down further, same thing as I did in the last video. I want to allow the possibility of querying IY days prefixed with that number. So I'm just going to delete that and comment out that. So that will give me everything led to the AEF mips because I'll do an if query became very good, that's all I need. So control x, decide that, yes, it's re-start. Chickens status. For a good control. C takes up that again back onto my Prometheus server where the SNMP exporter is running. Let's try the SNMP walk again. And now I'll get a response in that SNMP walk or put in the full stop there, the period that just gives me all days. I could have said Give me SNMP, walk from, say, that. Note that, that R is o at the beginning. There is actually the same as just saying one. So there we go. And now she's given me everything from Arezzo 3.6.1, 21191 has was written there. But here I'm just saying give me everything from dot. Very good right now. Okay, so now I have that. I have a novel or SNMP day running on another server somewhere which is external from my Pharmacia server. Now going to configure the YML to tell the SNMP exporter about these other SNMP Damon IB here on this other IP address, K. So on my Prometheus server or may see us YML. Let's go down to the bottom. And in the config section here, could under the bottom and job name, SNMP targets to suppress spaces because a tub will cause a problem putting the IP address of this other SNMP Damon, wherever it is. For me, that was that number 157222. Very good. Everything else can stay the same for press Control X to save, yes. Let's check the syntax of YML is ok using them ProM tool. Until success, success, no problems found to do a re-start and chickens status. Very good, all good. Control say TO back into Prometheus status targets. And as a few seconds pass h, these will become up, down here in my SNMP. One at a two unknown refresh. There we go. I have two end points now in my SNMP, scribe conflicts with instance equaling a different IP address, instance 127001, SNMP, an instance one 57. So I can go down to graphs. Now it's hot bin. If in octets, for example, execute and I have in octets for all of my interfaces on both servers, 127001157, etc. So that is the SNMP exporter now querying SNMP statistics from two different SNMP daemons on different servers. One is local and the other one is somewhere else on the internet. By case that works. Now, that SNMP day Monod setup, now other server, that other server IIS on the internet. And since I've allowed all IP addresses to access that SNMP, David, I'm going to need to block access so that other service can't get access to it. So I'm gonna make an IPA tables rule down here to only allow my Prometheus server to access. So, so on my other server where I put these other SNMP, Damon IP tables, hyphen L, just to be sure there's something there, there were no rules blocking anything. So I'm going to add some for SNMP. So this first one. Okay, so the domain or IP that I'm going to allow access to port 161. We'll be Prometheus ASP That's the server that I have, the SNMP exporter running on. It's going to be allowing UDP port 1-6, one exit. And the next one, I'm just going to allow local host. This is optional or got renamed to have no need for local host to query as an unpaid Daemon. But they go, You can do it if you want. Dropping everything else on UDP port 100, 61. And now IP tables L. Very good Enter and the rules that talent. So now Yani server that can query the SNMP de omnibus server is my Prometheus server. One, the next one. There we go. We have that two SNMP daemons being queried through the SNMP exporter, which is up here on the Prometheus YAML. Okay, so the next video, I'm going to set up SNMP on a specific hardware device. Does Tom being a Cisco switch so that you can see how it's done. Excellent. 23. Setup SNMP for a CISCO Switch: Okay, in this video, I'm going to set up from atheists to query an actual SNMP de Vos. And this time being my Cisco switch. Earlier on, I said the main reason to use SNMP information is in the situations where you cannot install a node exporter on the device and common devices way cannot install Node exporters are routers, switches, printers, network storage devices, and other devices. The white font on a network and a guy some more switches, a 20-fold for Cisco Catalyst 2950. It's one of those. They are quite cheap from a by if you wanted to get you. So if one of those, I don't recommend them in today's world, but they could for learning, they're not very fast and they use a lot of electricity. Before I will configure it from atheists to query SNMP from this switch. I have ensured that the switch is working and it's connected and I can do certain things. So number one, it needs to have SNMP enabled. So do a voice needs to support SNMP and you will need a default gateway, sit on it as well. So I've set my default gateway to be the IP address of my main network router that is connected to the internet. So it looks a little bit like this. And this is my Prometheus server on the internet or a different network. Then my switch back between my switch and my atheist server. I have another road arm which is my Internet broadband Rudolf. And odd configured an external role to allow requests to port 161 onto the Cisco switch, internal one onto 168, 1.01y. Okay, so I have several devices connected to that switch, and that's my internal network. So to show that my switch is actually configured, I'm going to tell net onto it. So and show on fatigue. This is a Cisco switch. So if you have a Cisco switch, you will recognize those commands. You'll have to refer to your devices documentation for the actual commands to use fjord Abbas. But anyway, if I just scroll down a little bit, I can see here I have a device V land one set up IP address 100 to 1681 dot one. That's the IP address of my switch on mine tunnel network. And it has a default gateway set or one onto 1c site 1.2x or for that is my Rousseau, which connects to the Internet. And I have a community set up which has the public entities radon lease. So all I can do SNMP queries to this SNMP device from the external network that is providing access to the Internet in my case. So for exit that, okay, so since my Prometheus server will be doing the SNMP queries, I need to make sure at least that SNMP walk and SNMP gate works from that Prometheus server. So I'd SSH onto my Prometheus server. Let's try this one. It's an NP walk. Now to change the IP address to my external IP address to one that this router is accessible via, which happens to be currently that IP address. So if I just press enter, okay, so here's on an SNMP walk on my switch from my from Asia server. And it's told me that a Cisco in the KNIME is some more information. And these are the AOA days, et cetera. So that works. And the SNMP get would work as well because that's one of the OID. So I just query to stop there. But I got back in the walk anyway. So that is good. I can verify that my Prometheus server came query SNMP on my Cisco switch, which is on the internal network. We should configure the YML for Prometheus, tell about this new IP address, which is up there. So I have now the Prometheus YAML. They scroll down. And very, very quickly, I can add a new scrape target pressing spaces just in there. That's the IP address. It's an MP metrics SNMP is using the ethmoid module. The module is the most generic module you can use for SNMP devices. So at least start there. Ok, so save that control X s. I can verify the syntax is OK using the ProM tool yet, restart chiggers status. So good. And let's view that in Prometheus now, because I'm logging onto my Prometheus Server status targets. And down here at the bottom to listening to, I decided maybe 600. So it's working state is up. We can now query that in Prometheus graphs verse one we can try is if in octets and we go execute, and it is shown me a new metric for every single interface on my switch. So there is 10111213345, et cetera. And there is also one for the layer one. There we go, Title VII, L1 if in octets and values. We can see here for which interface to has that many inoculates, 34 have that many in octets. So far, these are just counters showing how me, how much data are spin transport octets. So that's working, that's very good. My Prometheus server is querying the switch. So we have this Prometheus SNMP exporter and it's querying one of my switches or my network. And does Tom is an external network. So I had to create the extra firewall rule on my router here. Anyway it so since I installed or Varna in an earlier video, I am going to view this younger fauna. Microphone server already has a Prometheus data source setup. And if I go to explore and I type in SNMP, duration. Seconds, for example, I'll say new row for my new router there. So we'll have a lot of extra properties I can query if in octets on query. And there we go. It's showing me all the inoculates as a title on here. So rather than using the Explorer here, we can create a dashboard here that is already set up to query the ethmoid module through the SNMP exporter. And so far in that, that's only Varna community. I have a link down here, will look for data sources, Prometheus. So I pin, if there we go, SNMP stats for the SNMP exporter. So let's look at that. And that's the 11169. So I'll copy that. And this is what it will look like. So somebody's already crowded that for us, this user here. Thank you. So we shall think decimals. So they'll copy that ID, the Copy to Clipboard back integral fauna import what that ID load. And it has pulled the SNMP stats information for us connected to the Prometheus datasource import. And straight away, we have a dashboard for all our SNMP devices that we can access through the SNMP exporter. So this is Ellie dies after a day out to these will look much more interesting. But I can see all my interfaces here for my switch, these innocuous adults and out in as a graph down here. Anyway. So Governor is another largest subject if you're interested, but there's enough information in there to experiment and make sense of what's actually going on. For example, Very good. Okay, so now I've set up my Cisco switch in Prometheus, and that was a very ASI device to sit up when I configured it in the YML. Yeah, Prometheus YAML. I have one job, nine SNMP. And it's using the module if mmm. And all these three targets are all using E for me, the most generic module that the SNMP exporter can use. If I look at the SNMP YAML data there, and they are the OID said it's walking and getting. Now, if you have a different kind of device, for example, you might have a Cisco W LLC or a rarity in, or even a printer that you want to connect to, you would use the printer may module or rarely ten module or anything else that is more appropriate TO divorce. Amusing if maybe it's the most generic, I think every SNMP device that you'll find will allay support those are wide days. So in that case, that's the module day. You can create a new job. And I, for example, you could edit this to use a different module. I've set these up in a very minimal way, so, but as you can see, I'm getting useful information now from that. Okay, so that's, that's an unpaid for a dedicated hardware DeVos. Excellent. 24. SNMP Exporter Configuration Generator: Okay, the SNMP exporter configuration generator, we can generate custom modules that the SNMP exporter can use. And we do this by generating a customized SNMP YML using a program called the SNMP exporter convict generator. Okay, on my map from atheists server, that's the SNMP exporter. And I've already set it up. Look at several SNMP devices. So I've got S and obey daemon running on two different servers. One local plus1 external. And I've also got it looking at my Cisco switch, which is a private network down here. Now what I configured those scrape targets in the Prometheus YML, I said use the if MIMO module here and leave me Module is doing those walks and that get maize. The OID is now off. I had another SNMP device, say one that wasn't in this list. For example, a waterway 5G Rudolf, I would want to create a specific mem for that. You know, I could actually use either me might serve the purpose, but in the next several videos, I'll actually generate a custom module for these YAML. Now also another one. You could actually just modify these individually and Adruino IDs. So 11366, et cetera, if you knew what the OID was and that is something that I would do myself. But here in his warning, it says this ball was auto-generated using SNMP exporter generator, manual changes will be lost. So what I'll do is demonstrate the SNMP exported generator. Now, using the SNP explore generator is quite a problematic process where you're going to experience a lot of issues and you might want to try something else again, all re-install. So I don't recommend running this on any of your existing service. So anything here, just create a brand new server somewhere else that you can use. That is not one of these because that way you can then just delete it and start again. If you've made some mistakes where you want to change something, though, I've already gone out and organize myself a brand new Ubuntu 20 to set up the SNMP exported generator on. Okay, so first thing to do with a brand new service pseudo APT update. Good. Now to install the dependencies generator will need. The most important one. There is a live SNMP dev package. We have everything we need in order for the generator to read SNMP, also mips files and interact in SNMP YAML for us, the copy that and paste Enter press. Yes. Okay, that can take a minute or two. Now to install the Go Compiler, darling, APT installed goaling. Oh, yes. Okay, let's check the version, go version. Okay, very good. And that's installed. The SNMP exporter generator will be compiled using the Go Compiler. Clear. We need to download the code. So go get that's the project code. There was a tree and it's using the Go compiler that we just installed now to cd into the path where the code was site so we can copy that. Taught that presenter. And if we ls, l height h, it will list the contents of that folder for us. Go source, Github, Prometheus, just in a big splurge generator. Okay, the important fall in there that will be editing afterwards is that one. We will look at it afterwards. Okay, now we need to build this project. Go build, go build LSH again. And here there's a new file called generator. So that fall has been successfully credit and build. It has execute permissions there. So very good. Now make mips. Until like mips, it's downloading a whole bunch of mips files from the internet. Memes are lucky lookup dictionary for OID. So instead of rotting AIDs in SNMP, get and walk, you can use mips descriptions. I'll show a few examples of those in a moment. Okay, so that's done. If we ls, l, which again, we can say there's a new folder called mips. Alright, so I've just logged onto that server using WinSCP and some are windows, and this is very good for navigating folders. I'm in the root folder here. Go. Source,, Prometheus, SNMP exporter. And this is the project we just downloaded now go into the generative folder. And here I'm in the same folder as I am in party. So now viewing it here in WinSCP. And there's the new fall generator that was credit with execute permissions and the new mips. Mips and in their loads of mips falls, that generator will use to generate the SNMP dot YAML. And there's loads of Cisco version to us as well. So this is my existing SNMP, YAML, my server. And I didn't need to generate that. I already got that when I install the SNMP exporter earlier on, but we can regenerate that fall now. First, we need to create a variable to the system can use export mips directories equals mips. That is telling Bash command prompt here to look in the mips folder for mips files. That was these things that I just showed. Now, we can run the generator, generator, generate. Okay, and that is finished. Lsl, which again, look at the date down here. Snmp YAML, this new fall has just been generated. Snmp YAML pay. So we can look at that. I'm going to open that using WinSCP, just refresh that. They'll go S and m, p dot YAML. And I've got my WinSCP using Visual Studio code. So it's much nicer for me to look at. And I can see this brand new SNMP YAML has been credited. Now I can copy the contents of that YAML over to my other YML that I'm using a my Prometheus server, the one where the SNMP exporter is running on the same server there. But it's unnecessary since it's already good. So the new aversion has an extra module, labored PDU and server takes century for. But I don't need all of those, so I'm not going to copy that isn't happy YML onto my other server. But if you wanted to, you could do that. Okay, so now the first part of the process is working. The SNMP exporter generate a program, works. That's important. Now, does it generate a know which files to cried or which modules to create and which I wanted to use in the same folder here, in the generator folder there, you have generated a dot YAML. Open that up and VS code modules. If maybe that's one way of saying quite a lot. That describes which made the query now see sup time interfaces and effects table via the MIP descriptions by have equivalent o IDs. You could use OID is in that place or mips descriptions. So the output of if maybe from a generator, why ML has decided that CSR time, which also happens to be that OID, is going to be a good interfaces, any fixed table of both suitable for an SNMP walks. It's put those OID is in there. So the SNMP YAML was generated using ODEs, which means that we don't need to have the mips download all running on the server that is running DSMP exporter. But the generator YAML as the option of using maybes or days, but they look ups. But that was already mentioned. I didn't install when we installed the package leap SNMP div server that I just set up the SNMP exported generator on bees just dedicated for these purposes only. But if either my mistakes, I can delete that service dot again and wouldn't be a problem. I wouldn't be breaking my existing Prometheus servers or anything else. Guys are excellent, at least my Sure. The SNMP exporter configuration generator is working on your system before you continue to write a customized module. Excellent. 25. Generate HUAWEI SNMP Exporter Module: Okay, now let's create a custom SNMP exporter module. I'll create one for a while by divorce. Since in the existing SNMP YML, there isn't anything for YoY, I've been using the if maybe module, which is very generic and is suitable for most situations. Or you could use the printer maybe if it's a printer, that's what generic as well. So when credit and custom exporter module, you'll need to do some research on your mind that you're going to use. So because you want to know what IDs are most appropriate for walking and getting. All. I have done some research. I've got onto the wild way website and I can see that there's all a lot of more specific mips, Danny a, but I'm going to create something called generic anymore. I looked through the example, I'll I days on the YoY example here, a lot of them begin with this prefix 1.3.6, one-for-one. And also by go further down, 1.3.6, one-to-one. But first I'm going to create an SNMP module which is very broad, which will be similar to the IV MI module, but it will search for those particular things. 1.3.6, one-to-one and one-for-one boat. Moving on, we have to download the Weiwei made that I'm going to use. So I'm gonna have to put it into the generator mips folder. So cd to that folder. And I'm on my specific computer that I've credited for SNMP exported generating, and I'm gonna city to that folder. This computer is the one that was set up in the last video is oldie mips that were installed and that's them there. Now to download the specific YoY, Mmm, now I'm gonna get it from circuit about FR. You'll need to research where you get your mips ROM. Discipline has always worked for me, so I'm gonna copy that, Leah and now just download that, kill that fall in, save it locally in the same folder while I maybe now ls, l height h, just to confirm it is spare. Their bodies while I'm in the folder now so I can use that. Okay, so now to go back down to the generative voter, cd dot dot ls, l height. And there we go. That's the volume into B. And I'm now going to edit the generator YML, that one they had generated YML to tell about the YoY maybe that I've just added to the mips folder. I can use nano to do that. But since I have WinSCP installed and if I open generated YML, I'll open it and VS code. And that makes it much more visual and easier for me to manage. Okay, so I can say he does the configuration b if amoeba vs PLC, API cups. And all of those modules that you see in the SNMP YAML, they're all configured again to generate a YAML. If I look at, if maybe it says do a walk on CSE uptime interfaces and you fix table. That's a generic configuration that will be used to generate the SNMP YML. So what I'm gonna do is create a new entry and here for the Weiwei device. Now on the official SNMP exporter generator repository, if I've got it down to the bottom, it describes the file format of the generated YAML. So modules, module name, walk, and a few other things such as version for Version two, SNMP authentication. For example, the community public is default. And if you're using SNMP version three, you have a few other options as well. Now all be using version two. In this example, anyway, I'm going to start to write it. So I'm going to name my module while away and get a walk. Like I said before, those common IY days, 13612113, X1 or one. Now, this is very generic. This will return a lot of metrics. And I'm doing this because I want to get as much information again about the YoY may be that I can then just refine the next step. If you knew exactly which oh, IDs you wanted to walk and get, then you can just do what they do down here with a Cisco WERC, Audi IP cups for example. But at the moment we don't know enough about the wild way, maybe to know exactly which days I want to walk OR gate. So I'm just going to have a lot of information with these prefixes. Now, this is just one way of doing this. Another way, you can know the Weiwei website and look at your specific maybe rated documentation and decide what the most appropriate OID is, are do you want that way? But this is just a generic example. Okay, so remember that's gonna return a lot of information, but I'll refine that in the next step. So I'm going to add a, an override, which is like that. Overrides if taught a num has Info. Just write enumerators out that it finds as the MIP description. I will also put in the community and the authentication. And that was written like this version to version two and made it public. That's default. So there was unnecessary, but you might have a different community such as my community. And anyway, it'll write that in the final SNMP YAML. So I'm gonna leave it public is studied, you can see it, okay, control S to save us generate a new SNMP YML using the information. Okay, so I need to set the variable min directories equals mips. So there it is, they just sit that there. And so, and now to run generator, generate. Ok, so that has created a new module for YoY. Let's look at that and VS code. Okay, so refresh. Now SNMP YML. It's quite a, quite a large foil for the Weiwei module here because they are very generic days, but powerful in these. Now, it's quite a heaps and heaps of metrics for us about temperature, battery. And it's hard to know what is appropriate for your particular YoY device. But if you don't really know what holidays you wanna put in, then you can do this, be very generic and then fine these days more. So I'm gonna do that now. I'm going to refine those. And just from experience, I know that that is a good LID to reform. And so I'm gonna go one dot 43, so it's not so generic. And just look through, I'm going to speed up the video and just sought on a few that I want. Ok, so I'll just click it a few days, which I think are going to be maybe appropriate. That's still too many. Anyway. I'm just going to always just say marshes storing those in the SNMP YAML temporarily, I go into the modules here and you can replace that and just make sure it's formatted correctly. And there are too many they I'm not going to juice that. I'm going to maybe ignore that one to one as well and perhaps one as well and just generate our days for that. I don't know what ODEs are really appropriate for the particular, Wow, I device might have. But I'm just showing you that this is what you'll need to do. There's no easy solution for this. You need to choose what you want. Guess an MP exporter to query to pay. So save that and re-generate the SNMP YAML using the command line. Excellent. Now re-open that. It's a much smaller fall now. Okay, so there's the new YoY module information. Still quite a lot to, a lot of information to be pulling from voice. So I'll probably refine that even more. Let's just say I only wanted that information, save that, close, that regenerate. And that's even smaller. Ok, so now my module is walking these days, includes sys description to Saudi. A whole lot of information really you'll have to do the research on. But you actually wanna pull in your module and just demonstrating that this is the kind of thing you'll need to do it anyway. So why have I knew, wow, boy, module in my SNMP YML, I can now copy the SNMP YML over to my Mesa server, where I'm running the SNMP exporter restart that was seen in pigs border. And he will be able to use the new YoY module that I just credit in the us into people ML. And just to show you how I might approach that process by have logged onto my Pharmacia server where the live SNMP wire malleus, and I can open that up in Visual Studio code. That's the new one. Control all Copy and Paste, Save login to my server, re-start SNMP exporter. And we go. So it's aware of the new YoY module. I've been up Prometheus YML, bow down to the SNMP convicts. Now I'm going to create a new dog here. So I'm just going to copy all that by highlighting it. And then just right-click. It, takes it from the clipboard there. I'm a cool. This SNMP. Yy, for example, module. Yoy, big conflicts. I don't have an IP address by there's an a YoY device. So I'm just using just something as an example there. And I just going to use it another example here, just anything. But that's how you would use the new module. So I don't have one, so I'm just creating a hypothetical version, right? Very good. Generating custom snippy exporter module. A lot of research involved, but you can save that. What you need to do is perhaps start off quite generic and refine which days you're going to walk or get in the generator YAML data. For example, I might decide that the only real valuable one would be that. And I could say regenerate, reopen, and have a look at the output to see whether it was adequately. Could also be more specific. For example, I can say I want that one. And that one only you will know. Excellent leeway as time goes on. Maybe more examples where people have created their own module specifically for Prometheus, SNMP exporter. And you might find those on GitHub or just doing a search on your favorite search engine. That is just a hypothetical example. Ok, excellent. 26. Finishing Up: okay, So excellent. We have achieved quite a lot in such a small amount of time. We didn't get too involved in the final details of Prometheus in this course. We just did enough just to be able to put this together. So we got around dedicated, permitted a server with recording and alerting rules. And we have an external node exporter running weaken credits many of these as we like. We could have hundreds of them if we want on our different various service and of also many should demonstrate getting an alert through e mail. Now, this is just the beginnings of from atheists you can do millions of things with. I mean now you know whether you want to take Prometheus further or not and a good place to look at, say, if you want something more specific in Prometheus is this default poor allocations page. It summarizes really all the different kinds of exporters that you can get for pro Macy s now. So anyway, if you have any questions, leave them in the Q a section under the most related video to your question, I'll try to answer, and depending on how that goes, all either adjust the content of the course or even had it to my other Prometheus medium advanced courses. So thanks for taking pop.