Elasticsearch 7 and the Elastic Stack: Hands On | Frank Kane | Skillshare

Elasticsearch 7 and the Elastic Stack: Hands On

Frank Kane, Founder of Sundog Education, ex-Amazon

Elasticsearch 7 and the Elastic Stack: Hands On

Frank Kane, Founder of Sundog Education, ex-Amazon

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
98 Lessons (8h 33m)
    • 1. Intro: Installing and Understanding Elasticsearch

    • 2. Installing Elasticsearch [Step by Step]

    • 3. Please follow me on Skillshare!

    • 4. Intro to HTTP and RESTful API's

    • 5. Elasticsearch Basics: Logical Concepts

    • 6. Elasticsearch Overview

    • 7. Term Frequency / Inverse Document Frequency (TF/IDF)

    • 8. Using Elasticsearch

    • 9. What's New in Elasticsearch 7

    • 10. How Elasticsearch Scales

    • 11. Quiz: Elasticsearch Concepts and Architecture

    • 12. Section 1 Wrapup

    • 13. Intro: Mapping and Indexing Data

    • 14. Connecting to your Cluster

    • 15. Introducing the MovieLens Data Set

    • 16. Analyzers

    • 17. Import a Single Movie via JSON / REST

    • 18. Insert Many Movies at Once with the Bulk API

    • 19. Updating Data in Elasticsearch

    • 20. Deleting Data in Elasticsearch

    • 21. [Exercise] Insert, Update and Delete a Movie

    • 22. Dealing with Concurrency

    • 23. Using Analyzers and Tokenizers

    • 24. Data Modeling and Parent/Child Relationships, Part 1

    • 25. Data Modeling and Parent/Child Relationships, Part 2

    • 26. Section 2 Wrapup

    • 27. Intro: Searching with Elasticsearch

    • 28. "Query Lite" interface

    • 29. JSON Search In-Depth

    • 30. Phrase Matching

    • 31. [Exercise] Querying in Different Ways

    • 32. Pagination

    • 33. Sorting

    • 34. More with Filters

    • 35. [Exercise] Using Filters

    • 36. Fuzzy Queries

    • 37. Partial Matching

    • 38. Query-time Search As You Type

    • 39. N-Grams, Part 1

    • 40. N-Grams, Part 2

    • 41. Section 3 Wrapup

    • 42. Intro: Importing Data

    • 43. Importing Data with a Script

    • 44. Importing with Client Libraries

    • 45. [Exercise] Importing with a Script

    • 46. Introducing Logstash

    • 47. Installing Logstash

    • 48. Running Logstash

    • 49. Logstash and MySQL, Part 1

    • 50. Logstash and MySQL, Part 2

    • 51. Logstash and S3

    • 52. Elasticsearch and Kafka, Part 1

    • 53. Elasticsearch and Kafka, Part 2

    • 54. Elasticsearch and Apache Spark, Part 2

    • 55. Elasticsearch and Apache Spark, Part 2

    • 56. [Exercise] Importing Data with Spark

    • 57. Section 4 Wrapup

    • 58. Intro: Aggregation

    • 59. Aggregations, Buckets, and Metrics

    • 60. Histograms

    • 61. Time Series

    • 62. [Exercise] Generating Histogram Data

    • 63. Nested Aggregations, Part 1

    • 64. Nested Aggregations, Part 2

    • 65. Section 5 Wrapup

    • 66. Intro: Using Kibana

    • 67. Installing Kibana

    • 68. Playing with Kibana

    • 69. [Exercise] Log analysis with Kibana

    • 70. Section 6 Wrapup

    • 71. Intro: Analyzing Log Data with the Elastic Stack

    • 72. FileBeat and the Elastic Stack Architecture

    • 73. X-Pack Security

    • 74. Installing FileBeat

    • 75. Analyzing Logs with Kibana Dashboards

    • 76. [Exercise] Log analysis with Kibana

    • 77. Section 7 Wrapup

    • 78. Intro: Elasticsearch Operations and SQL Support

    • 79. Choosing the Right Number of Shards

    • 80. Adding Indices as a Scaling Strategy

    • 81. Index Alias Rotation

    • 82. Index Lifecycle Management

    • 83. Choosing your Cluster's Hardware

    • 84. Heap Sizing

    • 85. Monitoring

    • 86. Elasticsearch SQL

    • 87. Failover in Action, Part 1

    • 88. Failover in Action, Part 2

    • 89. Snapshots

    • 90. Rolling Restarts

    • 91. Section 8 Wrapup

    • 92. Intro: Elasticsearch in the Cloud

    • 93. Amazon Elasticsearch Service, Part 1

    • 94. Amazon Elasticsearch Service, Part 2

    • 95. The Elastic Cloud

    • 96. Section 9 Wrapup

    • 97. Wrapping Up

    • 98. Let's Stay in Touch

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class

New for 2019! Elasticsearch 7 is a powerful tool not only for powering search on big websites, but also for analyzing big data sets in a matter of milliseconds! It's an increasingly popular technology, and a valuable skill to have in today's job market. This comprehensive course covers it all, from installation to operations, with over 90 lectures including 8 hours of video.

We'll cover setting up search indices on an Elasticsearch 7 cluster (if you need Elasticsearch 5 or 6 - we have other courses on that), and querying that data in many different ways. Fuzzy searches, partial matches, search-as-you-type, pagination, sorting - you name it. And it's not just theory, every lesson has hands-on examples where you'll practice each skill using a virtual machine running Elasticsearch on your own PC.

We'll explore what's new in Elasticsearch 7 - including index lifecycle management, the deprecation of types and type mappings, and a hands-on activity with Elasticsearch SQL. We've also added much more depth on managing security with the Elastic Stack, and how backpressure works with Beats.

We cover, in depth, the often-overlooked problem of importing data into an Elasticsearch index. Whether it's via raw RESTful queries, scripts using Elasticsearch API's, or integration with other "big data" systems like Spark and Kafka - you'll see many ways to get Elasticsearch started from large, existing data sets at scale. We'll also stream data into Elasticsearch using Logstash and Filebeat - commonly referred to as the "ELK Stack" (Elasticsearch / Logstash / Kibana) or the "Elastic Stack".

Elasticsearch isn't just for search anymore - it has powerful aggregation capabilities for structured data. We'll bucket and analyze data using Elasticsearch, and visualize it using the Elastic Stack's web UI, Kibana.

You'll learn how to manage operations on your Elastic Stack, using X-Pack to monitor your cluster's health, and how to perform operational tasks like scaling up your cluster, and doing rolling restarts. We'll also spin up Elasticsearch clusters in the cloud using Amazon Elasticsearch Service and the Elastic Cloud.

Elasticsearch is positioning itself to be a much faster alternative to Hadoop, Spark, and Flink for many common data analysis requirements. It's an important tool to understand, and it's easy to use! Dive in with me and I'll show you what it's all about.

Meet Your Teacher

Teacher Profile Image

Frank Kane

Founder of Sundog Education, ex-Amazon


Frank spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers, all the time. Frank holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.

See full profile

Class Ratings

Expectations Met?
  • Exceeded!
  • Yes
  • Somewhat
  • Not really
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Your creative journey starts here.

  • Unlimited access to every class
  • Supportive online creative community
  • Learn offline with Skillshare’s app

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.



1. Intro: Installing and Understanding Elasticsearch: Let's dive right in. In the real world, you'll probably be using elasticsearch on a cluster of Lennix machines, so we'll be using limits in this course you boon to in particular. Now, if you don't have a new boot to system handy, that's totally OK. I'm going to walk you through setting up a virtual machine on your windows or Mac PC that lets you run a boon to inside your existing operating system. It's actually really easy to do. Once we've got a new boot to machine up and running, will install Elasticsearch and just for fun will create a search index of the complete works of William Shakespeare and mess around with it. After that, we'll take a step back and talk about Elasticsearch and its architecture at a high level. So you have all the basics you need for later sections of this course. Roll up your sleeves and let's get to work 2. Installing Elasticsearch [Step by Step]: Let's dive in and get elasticsearch installed on your home PC so you can follow along in this course if you'd like to. Now Elasticsearch is going to be running on a new boot to Linux system for this course. But if you don't already have a new boot to system sitting around, that's okay. What we're going to show you is how to install virtual box on your Mac or Windows PC and that will allow you to install Lubutu running right on your own desktop within a little virtual environment. Once we have a boon to installed inside, Virtual Box will install elasticsearch on it and after that will install the complete works of William Shakespeare into Elasticsearch and see if we can successfully search that. So that's a lot to do in one lecture, but I'll get you through it. Let's talk briefly about system requirements pretty much any PC should be able to handle this. You don't need a ton of resource is for elasticsearch. If you do run into trouble, however, make sure that you have virtual ization and able to near bios settings on your PC and specifically make sure that hyper V virtual ization is off if that's an option in your bios , but just try following along first. You shouldn't need to tinker with your BIOS settings unless you run into problems. Also, be aware that the anti virus program, called a vast, is no to conflict with virtual box, so you'll need to switch to a different one or turn it off while using this course. If you're going to be using a vast anti virus with that, let's dive in and get this set up. So if you're running on Windows or a Mac PC, then you need to install a virtual environment to run a boon to within first. And to do that, we're going to use virtual box. So if you're already on a boon to you don't need to do this. But for those of you on Windows or Macs, you'll need to do this step first, head on over to virtual box dot org's right there and click on the big friendly download button. Here it is free software, and I am on windows, so I'm going to go ahead and download the Windows version of the binary. Once the installer for your operating system is downloaded. Go ahead and run it. There's nothing special about it, Really. Just go through the steps that it walks you through. Next, choose where you want to install it. All these defaults are a OK. It will interrupt your network connection while installing, so make sure you're OK with that and go ahead and install. Give it a permissions that needs and off it goes. That was easy. So let's go ahead and hit finish here and virtual box of sitting there waiting for us to add some virtual machines to it. So next we need to actually download an operating system to run within our virtual environment here. For that head on over to a boon to dot com like that from the boot to home page here. Just look for the download link, and we want to use the boon to server, and we're looking for the 18.0 for long term support version. Go ahead. We for that to download. This will take longer because it's a much bigger download. We'll come back when that's done. So the image for the U boon to installation media has downloaded successfully. Just make sure you know where it went on your PC, and we're gonna go back to virtual Box and set it up. So from the virtual box manager, let's go to the machine menu and say new to add a new virtual machine. We'll give it a name. Elasticsearch. Somebody changed the machine folder to another drive. It has more space on it. My C drive is almost full, so make sure that you're using a hard drive that has plenty of space for this. For me. My e drive has the most room, and you'll need to create a folder in there. To put this in already. Have a virtual box PM's folder and let's go ahead and create a new folder to put this stuff in. We'll call it elasticsearch seven and select that folder. Lennox is the correct type, but we want the version to be you Boon to 64 bit. There we go hit next, and I'm gonna allocate about half of my physical memory to this virtual machine. So for me, that's going to be eight gigabytes 81 92 megabytes. We will go ahead and create that virtual hard disk fort. We're gonna need about 20 gigabytes for that. So again, make sure we're putting it someplace that has plenty of room defaults. Find there dynamically allocated is fine. And we are going to select a different home for this. Make sure that that is on a drive that has plenty of space. And I'm gonna increase this from 10 gigabytes to 20 gigabytes because 10 just isn't enough . Doesn't have to be exact. All right, so we've got things set up. Let's go ahead and kick that off to sit the big friendly green start button. We're going to now select that icy winds that we download it. So click on the folder icon here, navigate your downloads and select the image of a boon to 18.4 point two server and off it goes all right. After a couple of minutes of booting up where into the installation menu here, go in and select your language. For me, that's English. And my keyboard layout is also English. Us just hitting entered its like those done selections there hit Enter again. We want to install a boon to, and that should be fine for the network configuration. We do not have a proxy server so again hit. Enter to skip past that. The default mirror addresses Fine hit. Enter and we will use the entire virtual disk. Remember, this is the virtual destroys setting up. There's no risk of actually corrupting our main operating system. Disk ear. So hit enter again and one more time and everything looks fine. So hit enter again to accept the done selection There. Now, here we need to to use the down arrow just like continue and say Yes, I'm sure you want Do this and enter type in your name. Let's say your name is student your service name. Whatever you want. Yes, seven. Sounds good to me. Is your name student password Password. Again. Use whatever you want here when you're done. Hit tab. Just like the done selection here and enter. We do want Open Estates server. So go ahead and hit space bar to select that and hit Tab and have began to select on. We'll install the software that we need by hand so going to hit tab to go to the done option there and hit enter again and now it's often doing its thing installing. So we have to wait for this to finish several minutes later. That initial installation is done and it's asking us to reboot now. So go ahead and hit. Enter. Don't worry about those failed messages. They're perfectly OK. We need hit enter again and that will reboot hopefully into our brand spanking new boot to environment after a minute s So it looks like it's finished booting up. Just hit Enter to get a log in, prompt here and we'll log in as a user named student that we set up and the password that he also set up during installation. And we're in Cool. All right, so we have a boon to up and running now. We just need to install elasticsearch itself on our new system. Now, if you'd like to follow along from written instructions from this point, you can head over to my website at Sun Dog Dash education dot com slash elasticsearch and you'll find written steps here of what we're about due or if you prefer to follow along with the video, you can do that as well. So here we go. Now that we're logged in, let's go ahead and first start off by telling Lubutu where to find the elasticsearch binaries. So we're gonna type in and w get dash que uppercase. Oh, that's an o nada zero than a dash with space after it. Https colon slash slash artifacts dot elastic dot co slash uppercase g p g dash key dash elasticsearch Make sure you pay attention to capitalization and spaces and everything here . One wrong keystroke and it won't work. All right, all right. Now we're going to space in a pipe symbol. That's Ah, shift back slash and in the space Sudo Apt Dash key Space ad space dash. All right. So again, double check everything. We should all the spaces, right. Make sure you got all the dash is where they should be. Everything supper cases should be And that's an O and not a zero. Well, type in your password again. All right, Step one is done. Next step sudo apt Dash, get install Apt dash transport dash Https should look like that. Next we'll say echo space Quote Deb. Https colon slash slash artifacts got elastic dot co slash packages slash seven point x because this is the last six or seven slash apt stable main quote and then another pipe to pseudo t dash a space slash e t c slash ap ti slash sources dot list dot de slash elastic dash seven point x Not list. All right, and finally Sudo Apt Dash get updates. Ampersand ampersand sudo apt dash, get install elasticsearch which should actually go out and install elasticsearch That appears to have worked. So now that we've installed elasticsearch, we need to configure it. To do that will say pseudo v i slash e t c slash elasticsearch slash elasticsearch not why ml so I need to make the following changes. Go ahead and use the arrow keys to move down to where it says no dot name and move over to the end and note and hit the I key to enter insert mode in the V i editor and then backspace to get rid of that comment there by naming our note. No Dash one very creative, and we'll keep scrolling down. Next, we're gonna look for the network dot host setting. There it is. Go ahead and uncommon that and change it from that 20.0 dot 0.0 that Just make sure that everything works fine in our virtual environment. Next, we're gonna go out of discovery and uncommon to discovery dot c dot hosts and change that from Host one and host to two 1 27.0 dot 0.1 inside quotation marks just like that, and finally will go to cluster dot initial master notes on comment that as well, and set that to just node one because we only have one node in our little virtual cluster here. All right, that should do the job. Let's go ahead. Hit the escape key to get out of insert mode and then type in colon. W Q two Right and quit now already actually start elasticsearch up. So let's say pseudo slash bin slash system CTL Demon dash reload. Next, we'll say pseudo been system control. Enable elasticsearch dot service and finally sudo slash bin slash system control. Start elasticsearch start service, and this will make sure that elasticsearch boots up automatically when we start our machine in the future. So it generally takes a minute or two for elasticsearch to actually start up successfully. We contest if it's up and running yet by doing the following curl dash X get all caps 1 27.0 dot 0.1 colon 9200. And right now we're getting a connection. Refused because it hasn't started up yet. So I'm just going to try this again in another minute to and what it actually gives me back a successful response will know that we're ready to move forward. All right? After a couple of minutes actually got back this response instead of a connection. Refused message. So once you see this, you know, you ready to keep on going, and you should get this default response back that just sends with, you know, for search. All right, so now that we have elasticsearch running, we need to actually have some data to search with. Let's go and download the complete works of William Shakespeare and import that so type in the following to get that. Don't you get http colon slash slash media dot son dog dash soft dot com slash e s seven slash shakes dash mapping dot Jason, This just defines this scheme of the data that we're about to install now that we've downloaded that data type mapping, Let's go ahead and submit it to elasticsearch thusly. Curl dash upper case H cooked content Dash Type Colin Application slash Jason Quote Dash export When. 27.0 dot 0.1 Colin 9200 slash Shakespeare That'll be the name of our index Dash dash data dash, binary at Sign Shakes Dash mapping dot Jason and that has submitted that data type mapping into elasticsearch. So knows how to interpret the data that we're about to get it. Let's go ahead and download the actual works of William Shakespeare with w Get http colon slash slash media dot son dog dash soft dot com slash es seven slash Shakespeare underscore 7.0 dot Jason And that's everything Shakespeare has ever written in Jason Format. Let's go ahead and submit that to our index curl dash Upper Case H quote content. Dash type colon application slash Jason Quote Dash Ex Post single quote 1 27.0 dot 0.1 colon 9200 slash Shakespeare slash underscore Bulk single quote. Dash dash Data Dash Binary at Sign, Shakespeare underscore 7.0 dot Jason and we'll talk about what this is all doing later on. Right now, I just wanna get you up and running and doing something cool. So it's gonna go ahead and two on the entire works of William Shakespeare and index that into elasticsearch. That will, of course, take a few minutes, so we'll come back when that's done. All right. It took about 15 minutes for all that data to get Index, but compared to the amount of time that it probably took William Shakespeare to write all of that, I guess that's nothing, right? Let's hit Enter just to get a nice, clean prompt back here and let's get some payoff from all this work. Hubby. We've done a lot so far today. We've actually installed in a boot to system running in a virtual environment on your own, PC, installed a lasted search from scratch and installed and index the entire works of William Shakespeare. So let's try and actually search that data now and actually do something with it. So let's issue the following command actually search for to be or not to be and see what play that came from. I think you might know the answer, but let's just see that it works. Typing Curl Dash Upper Case H quote content, bash, type colon application slash Jason Quote dash X Get single Quote 1 27.0 dot 0.1 colon 9200 slash Shakespeare slash Underscores search question mark Pretty single quote Dashti and another single quote. So basically, what we're saying so far is I'm going to issue a Jason request to our elasticsearch server that's running on when. 27 outs. Here, let's hear about one in the Shakespeare Index, and I'm going to issue a search query and get the results back in nicely formatted results . It enter and we're gonna start off are the body of our request with a open, curly bracket? Enter quote, query, quote colon open, curly bracket. Enter could again match underscore phrase quote colon and another curly bracket. Quote text. Underscore entry quote. Colon quote. To be or not to be quote you see what's going on here. Basically were saying that we're sending a query to Elasticsearch to match the following phrase that contains the text To be or not to be and terrible clothes off those curly brackets. 12 and three of them, and a final single quote to close that off and let's see what we get back they worked so cool. You can see here that to be or not to be came back from the play name Hamlet. The speaker was Hamlet, and the full line there was to be or not to be. That is the question. And apparently Elasticsearch has chosen to be during this lecture. We have actually successfully set it up from scratch on your own little Lubutu system. And now that we have elasticsearch running, we can start to learn more about how it works and start experimenting with it and doing more and more stuff with it. So keep on going, guys. It's about to get interesting if you're done. For now, however, the way to safely shut this down is to go to the machine menu of your virtual terminal here and say, a CP I shut down that will send a shutdown message to the host and cleanly shut things down . And then, when stun, you can be free to close the virtual box manager as well 3. Please follow me on Skillshare!: the world of big data and machine learning is immense and constantly evolving. If you haven't already be sure to hit the follow button next to my name on the main page of this course. That way you'll be sure to get announcements of my future courses and news as this industry continues to change. 4. Intro to HTTP and RESTful API's: So before we can talk about elasticsearch, we need to talk about rest and rest full AP eyes. The reason is that elasticsearch is built on top of a rest ful interface, and that means to communicate with elasticsearch. You need to communicate with it through http requests that adhere to a rest interface. So let's talk about what that means. So let's talk about http requests at a more high level here. So whenever you request a Web page from your browser, what's going on is that your Web browser is sending an http request to a Web server somewhere requesting the contents of that Web page that you want to look at and elasticsearch works the same way. So instead of talking to a Web server, you're talking to an elasticsearch server. But it's the same exact protocol. Now, on http requests contains a bunch of different stuff more than you might think. One is the method, and that's basically the verb of the request. What, you're asking the server to do so In the case of actually getting a Web page back from a Web server, you be sending a get request, saying that I want to get information back from the server. I'm not going to change anything or in any information on the server. I just want to get information back from it. You might also have a post verb. That means that you want to either insert a replace data that's stored on the server or put , which means to always create new data on the server. Or you can even send a delete verb. That means to remove information from the server. Normally won't be doing that from a Web browser, but from Elasticsearch client. Totally a valid thing to do. It also includes a protocol. So specifically what version of http Are you sending this request in Might be http slash 1.1. For example, you will be sending that request to a specific host, of course. So if you're requesting away page from our website, that might be Sunday, August education dot com. And the URL is basically what resource you are requesting from that server. What, you want that server to do So in the case again, of a Web server, that might be the path to the Web page that you want on that host. There's also a request body you can send along. You don't normally see that with a Web page request, but you can send extra data along in whatever structure data you want to the server within the body of that request is well. And finally, there are headers associating with each request that contains sort of metadata about the request itself, for example, information about the client itself that would be in the user Asian for a Web browser. What format the body is in that might be in the content type stuff like that. So let's look at a concrete example again, getting back to the example of a browser wanting to display a website. This is what an http request for that might contain in that example were sending a get verb to our Web server and were requesting the resource slash index dot html from the server, meaning we want to get the home page. We will say that we're sending this in 1.1 http protocol, and we're sending it to a specific host on That's our websites on dog education dot com. In this example, there is nobody being sent across because all the information we need to fulfill this request has already been specified and will be a whole slew of headers being sent along as well. That contains information about the browser itself. What type of information in languages it can accept back and return from the server. Information about cashing cookies that might be associated with this site. Things like that. So a bunch of information about you being sent around the Internet whenever you request a Web page. But fortunately with elasticsearch, where our use of headers is pretty minimal. So with that, let's talk about rest ful ap ice. Now that we understand, http requests So the really pragmatic practical definition of arrestable a p I is simply that you're using http request to communicate with your Web service of some sort. So because we're communicating with elasticsearch using http requests and responses, that means that we're basically using arrestable AP I Now there's more to it than that. You know, we'll get to that, but at a very simple level, that's all it means. You know, it sounds fancy, but that's really it. So, for example, if I want to get information back from my elasticsearch cluster like search results for example, I'm actually conducting a search I would send a get for belong with that request, saying I want to get this information from ELASTICSEARCH going to insert information into it. I would send a put request instead, and the information that I'm inserting would be within the request body. And if I want to actually remove information from my elasticsearch index, I would send a delete request to get rid of it, Mallika said. There's more to rest on. That s so let's get into them or the computer science e aspect of it. Rest stands for a representational state transfer, and it has six guiding constraints. And well, to be honest, these aren't really constraints. Not all of them. Some of them are a little bit fuzzy. We'll talk about that. Obviously, it needs to be a client server architecture we're dealing with. You know, the concept of sending requests and responses from clients to servers doesn't really make sense unless we're talking about client server architecture. And that is what elasticsearch offers we have in elasticsearch server, or maybe even a whole cluster of servers and several clients that are interacting with that server. It must be stateless. That means that every request and response must be self contained. You can't assume that there's any memory on the client or the server of the sequence of events that have happened there, Really? So you have to make sure that all the information you need to fulfill a request is contained within the request itself. And you're not keeping track of state between different requests. Cash ability. This is more of a fuzzy when it doesn't mean that your responses need to be cached on the client. It just means that the system allows for that. So maybe your responses include information about whether or not that information can be cashed again. Not really requirement. But it's on this list of rest constraints. Layered system again, not a requirement. But it just means that when you talk to, for example, son dog education dot com, that doesn't mean you're talking to a specific individual server. That request might get routed behind the scenes to one of an entire fleet of servers, so you can't assume that your request is going to a specific physical host. And again, this is why statelessness is important because one host might not know what's going on in the other, necessarily. So they might not be talking to each other. Really? Another sort of fuzzy constraint is code on demand. And this just means that you have the capability of sending code across as a payload on your responses. So, for example, a server might send back JavaScript code is part of its response body that could then inform the client of how to actually process that data. We're not actually gonna be doing that with elasticsearch, obviously, But rest says you can do that if you want to. And finally it demands a uniform interface. And what that means is, ah, pretty long topic. But at a fundamental level, it just means that your data that you're sending along is of some structured nature that is predictable. And, you know you can deal with changes to it in a structured way. So at a high level, that's all it is. With that out of the way, why are we talking about rested all here? Well, the reason is that we're going to do this whole course just talking about the http requests and responses themselves, and by dealing with that very low level of how the rest ful AP I itself of elasticsearch works, we can avoid getting mired into the details of how any specific language or system might be interacting with elasticsearch pretty much any language out there Java, JavaScript, python whatever you want to use is going to have some way of sending http requests. So it really doesn't matter what language you're using. What matters, Maurin understanding how to use elasticsearch is how to construct these requests and how to interpret the responses that are coming back from it. The mechanics of how you send that request and get the response back is trivial, right? You know any language can do that? If you're a Java developer, you can go look up how to do that. So we're not gonna get mired in the details of how to write a Java client for elasticsearch Instead, what we're going to teach you in this course is how to construct http requests and parts the responses you get back from elasticsearch in a meaningful way and by doing that, you'll be able to transfer this knowledge to any language in any system that you want very easily. Some languages may have a dedicated client library for elasticsearch that provide sort of a higher level rapper over the actual http requests and responses, but they'll generally be a pretty thin wrapper, so you still need to understand what's going on under the hood to use elasticsearch successfully. Lot people get confused on that in this course, But there's a very good reason for why we're just focusing on the actual http requests and responses and not the details of how to do it from a very specific language. All of elasticsearch documentation is done in the same style. The books that you can find about elasticsearch. Same idea. There's a good reason for that. So the way we're going to interact with Elasticsearch in this course is just using the curl command on a command line. So again, instead of using any specific programming language or client library, we're just going to use Curl, which is a limits command for sending http requests right from the command lines. We're just gonna bash out curl commands to send out requests on the fly to our service and get the responses back and see what comes back from them. The structure of a Curl Command looks like this. Basically, it's curl dash H, followed by any headers you need to send. And for elasticsearch that will always be content type of application slash Jason, meaning that whatever's in the body is going to be interpreted as Jason format. It will always be that. And in fact, we will show you a little bit of a hack on how to always make that header specified automatically for you on Curl to save you some typing That will be followed by the Earl, which contains both the host that you're saying this request to and in this course l will usually be the local host 1 27.0 dot 0.1, followed by any information that the server will need to actually fulfill that request. So you know what index Taiwan talk to what data type, what sort of command am I asking it to do? And finally, we will pass Dash D and then the actual message body within quotes. That will be Jason formatted data with additional information that the service needs to actually figure out what to give back to you or what to insert into Elasticsearch. Let's look at some concrete examples to make that more riel. So these 1st 1 of the top here we're basically querying the Shakespeare Index for the phrase to be or not to be. So let's take a closer look at that curl command and what's in it again. We're saying curled ash age content type application. Jason, that's sending a http header that says that the data in the body is going to be in Jason format. Dash X Get means that we're using the get method or the get verb, depending on your terminology, meaning that we just want to retrieve information back from elasticsearch. We're not asking it to change anything. And the girl, as you can see it, includes the host that were talking to in this case 1 27.0 dot 0.1, which is the local loop back address for your local host. Elasticsearch runs on Port 9200 by default, followed by the index name, which is Shakespeare and then followed by underscores search, meaning that where you want to process a search query as part of this request. The question Mark Pretty is a query line parameter. That means that We want to get the results back in a nicely formatted, human readable format because we're gonna be looking at it on the command line. And finally we have the request body itself, Swiss fight after a dash d into between single quotes. And if you've never seen Jason before, this is what it looks like. It's just a structure data format where each level is contained within curly brackets, so it's always contained by curly brackets of the top level. And then we're saying we have a query level and within those brackets were saying we have a match phrase command that matches the text entry to be or not to be. So that is how you would construct a riel search query and elasticsearch using nothing but an http request another example. Here, we're going to be inserting data. So in this one, we're using a put verb again to 1 27 0.0 dot one on Port 9200. This time we're talking to an index called Movies and a data type called Movie, and it's using a unique identifier for this new entry, called 109487 and under a movie I D. 109487 were including the following information in the message body. The genre is actually a list of genres, and in Jason that will be a comma delimited list of stuff that's enclosed in square brackets. So this particular movie is both the I Max in sci fi categories, its titles Interstellar and it came out in the year 2014. So that's what some real http requests look like when you're dealing with the last six or so. Now you know what to expect and how we're actually going to use elasticsearch and communicate with it. We can talk more about how Elasticsearch works and what it's all about. We'll do that next. 5. Elasticsearch Basics: Logical Concepts: So before we start playing with our shiny new elasticsearch server, let's go over some basics of elasticsearch. First, we'll understand the concepts of how it works, what it's all about, how it's architected. And when we're done with that, we'll have a quick little quiz to reinforce what you learned. After that, we'll start messing around with it. So there are two main logical concepts behind Elasticsearch. The first is the document. So if you're used to thinking of things in terms of databases, a document is a lot like a row in a database that represents a given entity, something that you're searching for. And remember in Elasticsearch. It's not just about text. Any structure data can work now. Elasticsearch works on top of Jason Formatted data. If you're familiar with Jason, it's basically just a way of encoding structure data that may contain strings or numbers or dates. Or what have you in a way that you can cleanly transmit across the Web and you'll see a ton of examples of this throughout the course, so it'll make more sense later on. Now, every document could have a unique I D. And you can either explicitly assign a unique I d to it yourself or allow elasticsearch to assign it for you. The second concept is the index and index is the highest level entity that you can query against in elasticsearch, and it can contain a collection of documents. So again, bringing this back to an analogy of a database, you can think of an index as a database table and a document as a row in that table. The scheme that defines the data types and your documents also belongs to the index. You can only have one type of document within a single index in Elasticsearch. So if you're used to the world of databases, you'll find elasticsearch to have similar concepts. Think of your cluster is a database. It's indices is tables and documents has rose in those tables. It's just different terminology. But as you'll soon see, even though the concepts air similar, how Elasticsearch works under the hood is very different from a traditional database 6. Elasticsearch Overview: let's start off a sort of a 30,000 foot view of the elastic stack in the components within it and how they fit together. So Elasticsearch is just one piece of this system it started off is basically a scale will version of the loose seen open source search framework. And it just added the ability to horizontally scale Lucy in in a See So we'll talk about shards of elasticsearch in each shard in elasticsearch is just a single loosening inverted index of documents, so every shard is an actual loose seen instance of its own. However, Elasticsearch has evolved to be much more than just loosen spread out across a cluster. It can be used for a much more than full text search now, and it can actually handle structure data and aggregate data very quickly. So it's not just researching and handle structure data of any type, and you'll see it's often used for things like aggregating logs and things like that. And what's really cool is that it's often a much faster solution than things like Hadoop or a Spark or Flink. You're actually building in new things into the elasticsearch all the time with things like graph visualization and machine learning that actually make elasticsearch a competitor for things like Hadoop and Spark and Flink. Only it could give you an answer in milliseconds instead of in hours. So for the right sorts of use cases, Elasticsearch could be a very powerful tool and not just for search. So let's zoom in and see what Elasticsearch is really about at a low level. It's really just about handling Jason requests. So you're not. We're not talking about pretty you eyes or graphical interfaces when we're just talking about the last two church itself. We're talking about a server that can process Jason requests and give you back Jason Data, and it's up to you to actually do something useful with that. So, for example, reason Curl here to actually issue an arrest request with a get firm forgiven index called Tags, and were just searching everything that's in it. And you can see the results come back in Jason format here, and it's up to you to parse all this. So, for example, we did get one result here called for the movie. Swimming to Cambodia has given User I D and a tag of Cambodia. So if this is part of a tags index that we're searching, this is what a result might actually look like. So just to make it riel, that's a sort of output you can expect from elasticsearch itself. But there's more to it than just elasticsearch. There's also Cabana, which sits on top of Elasticsearch, and that's what gives you a pretty Web. You Why? So if you're not building your own application on top of elasticsearch or your own Web application, Cabana can be used just for searching and visualizing what's in your search index graphically, and it could be very complex. Aggregations of data can craft your data. You can create charts, and it's often used to do things like log analysis. So if you're familiar with things like Google Analytics, the combination of Elasticsearch in Cabana can be used as sort of a way to roll your own Google Analytics at a very large scale. Let's zoom in and take a look at what it might look like. So here's national screenshot from Cabana Looking at some real log data, you can see there's, ah, multiple dashboards you can actually look at that are built into Cabana on this lets you visualize things like where the hits on my website coming from and where the error response codes how they all broken down. What's my distribution of Urals? Whatever you can dream up. So there are a lot of specialized dashboards for a certain kinds of data, and it kind of brings home the point that Elasticsearch is not just researching text anymore. You can actually used for aggregating things like Apache access logs, which is what this view in Cabana does. But you can also use Cabana for pretty much anything else you want to later on. This course will use it to visualize the complete works of William Shakespeare for up, for example, and you can see how can also be used for text data as well. It's a very flexible tool in a very powerful you buy. We can also have something called Log Stash and the Beats framework, and these airways of actually publishing data into elasticsearch in real time in a streaming format. So if you have, for example, a collection of Web server logs coming in, that you just want to feed into your search index over time, automatically file beat can just sit on your Web servers and look for new log files and parson out. Structure them in the way that Elasticsearch wants and then feed them into your elasticsearch cluster as they come in. Log Stash does much the same thing. It can also be used to push data around between your servers and elasticsearch, but often it's used as sort of an intermediate step. So you have a very lightweight file beat client that would set on your Web servers. Log stash would accept those and sort of collect them and pull them up for feeding into elasticsearch over time. But it's not just made for log files, and it's not just made for elasticsearch and Web servers, either. These are all very general purpose systems that allow you to tie different systems together and published data to ever needs to go, which might be elasticsearch might be something else, but it's all part of the elastic stack still, but it can also collect data from things like Amazon is three or caf co. Are pretty much anything else. You can imagine databases, and we'll look at all of those examples later in this course Finally, another piece of the elastic stack is called X Pack. This is actually a paid at on, offered by Elastic Dot Co. In it offers things like security and alerting and monitoring and reporting features like that. It also contains some of the more advanced features that are just starting to make it into elasticsearch now, such as machine learning and graph exploration. So you can see that with X PAC, Elasticsearch starts to become a real competitors for much more complex and heavyweight systems like Flink in Spark. But that's another piece of the elastic stack when we talk about this larger ecosystem and you can see here that there are free parts of expect like the monitoring framework that led to you quickly visualize what's going on with your cluster. You know, what's my CPU utilization system load? How much memory that I have available things like that. So when things start to go wrong with your cluster, this is a very useful tool toe have for understanding the health of your cluster. So that's it. At a high level. The elastic stack obviously elasticsearch can still be used for powering search on a website. You know, like Wikipedia or something, but with these components, it could be used for so much more. It's actually a larger framework for publishing data from any source you can imagine and visualize ing it as well through things like Cabana. And it also has operational capabilities through X Pac. So that is the elastic stack. At a high level, I have been more to elasticsearch itself and learn more about how it works. 7. Term Frequency / Inverse Document Frequency (TF/IDF): Now, of course, indices aren't quite that simple. And Index is actually what's called an inverted index, and this is basically the mechanism by which pretty much all search engines work. As an example. Imagine I have a couple of documents in my index that contain text to data. Let's say I have one document that contains space, the final frontier. These are the voyages, and maybe I have another document that says he's bad. He's number one. He's a space cowboy with a laser gun. And if you understand what both of those air references to then you and I have a lot in common now an inverted index wouldn't store those strings directly. Instead, it sort of flips it on its head. A search engines such as Elasticsearch, actually splits each document up into its individual search terms, and in this example, we'll just split it up for each word, and we'll convert them to lower case just to normalize things. Then what it does is map each search term to the documents that those search terms occur within. So in this example, the word space actually occurs in both documents. Many of the inverted index would indicate that the word space occurs in both documents one and two. The word the also appears in both documents, so that will also map to both documents one and two. And the word final only appears in the first document, so the inverted index would map the word final as a search term to document one. Now it's a little bit more complicated than that in practice, and in reality, it actually stores not only what document it's in, but also the position within the document that it's in but at a high conceptual level. This is the basic idea, an inverted indexes, what you're actually getting with a search index, where it's mapping things that you're searching for to the documents of those things live within. And, of course, it's not even quite that simple. So how do I actually deal with the concept of relevance? Let's take, for example, the word the How do I deal with that? The word the is going to be a very common word in every single document. So how do I make sure that only documents where the is a special word are the ones that I get back? If I actually search for the word the well. That's where T F I D E. F comes in that stands for a term frequency times inverse document frequency. It's a very fancy sounding term, but it's actually a very simple concept, So let's break it down. Term frequency is just how often a given search term appears within the given document. So if the word space occurs very frequently in a given document, it would have a high term frequency. The same applies of the word the appears frequently to the document. It would also have a high term frequency. Now. Document frequency is just how often a term appears in all of the documents in your entire index. So here's where things get interesting, so the word space probably doesn't occur very often across the entire index, so it would have a low document frequency. However, the word the does appear in all documents pretty frequently, so it would have a very high document frequency. So if we divide term frequency by document frequency, that's the same is multiplying by the inverse document frequency. Mathematically, we get a measure of relevance, so we see how special this term is to the document. It measures not only how often does this term occur within the document, but how does that compare to how often this term occurs and documents across the entire index. So with that example, the word space in an article about space would rank very highly. However, the word the wouldn't necessarily rank very highly at all. That's a common term found in every other document as well. And this is the basic idea of how search engines work. If you're searching for a given term, it will try to give you back results in the order of their relevancy. Relevancy is loosely based, at least on the concept of T F I D f. It's not really that complicated. 8. Using Elasticsearch: So how do you actually use an index and elasticsearch? Well, there's three ways we can talk about one. Is the rest ful a P I. Now, if you're not familiar with the concept of rest queries, let me explain it in a very high level. It's just like how you request a Web page from a Web server from your Web browser on your desktop. So when you're requesting a Web page on your browser, like chrome or whatever you use, what's happening is that your browser is sending arrest request to a Web server somewhere and for every rest request. It has a verb like get or put or post and some sort of body that specifies what it is that you want to get back. So, for example, if you're looking for a Web page, you would send the rest query for a get verb, and then that get would request a specific girl that you want to retrieve from that Web server. Now, Elasticsearch works exactly the same way over the same http protocol that Web servers work across, so this makes it very easy to talk to elasticsearch from different systems. So, for example, if you were searching for something on ELASTICSEARCH, you would issue a get request through arrest AP I over http and the body of that get request would contain the information about what it is that you want to retrieve in Jason format. We'll see examples of this later on. But the beautiful thing about this is that if you have a language or an A P I or a tool or an environment that can handle http requests just like talking to the Web normally, then it can also handle elasticsearch. You don't need anything beyond that if you understand how to structure the Jason requests for elasticsearch than any language that can talk to http can talk to in elasticsearch server, and most of this course is going to focus on doing it that way. Just so you understand how things work at a lower level and what elasticsearch is capable of under the hood. But you don't always have to do it the hard way. If you're accessing elasticsearch from some application, your writing like a Web server or Web application. Often there will be a client, a P I that provides a level of abstraction on top of those rest queries. So instead of trying to figure out how do I construct the right Jason format for the type of search that I want or inserting the kind of data that I want? There's a lot of client AP eyes out there that could make it easier for you. They just have specialized AP eyes for searching for things and putting things into the index without getting into the nitty gritty of constructing the actual request itself. So whether you're using Python or Ruby or PERL or C plus plus or java, there is an A P I out there that you can just use now in this course, we're going to focus on using the rest ful AP eyes and not these higher level clients. I don't want to single out one language as the only language with that we used in this course. If I were to go through this whole course using only the Java client, it would be useless to people coating in Java script or python, for example. But all of the different clients in every language boiled down to rest calls in the end. So if you understand the underlying, http requests that these clients generate. You can understand any of the client AP eyes, and you'll be able to move more easily from one language to another to so please don't get upset that I'm not going to teach you how to write Java or any other specific language to use elasticsearch. The lower level information I'm giving you will make it easy to use the job a client, a P I or the A P I for any other language. Finally, there are even higher level tools that could be used for analytics. And whether we'll look at in this course is called Cabana. It's part of the larger elastic stack, and that is a Web based graphical. You I that allows you to interact with your indices and explore them without writing any code at all. So it's really more of a visual analysis tool that you can unleash upon pretty much anyone in your organization. So, in order of low level to a higher level AP, I there are rest ful query so you can issue from whatever language you want. You can use client AP eyes to make things a little bit easier, or you could just use Web based you wise to get the information you need us. Well, so those are the basic concepts of how elasticsearch is structured and how you interface with it. With that under our belt, we can talk more about how it works under the hood and how its architecture works. 9. What's New in Elasticsearch 7: If you're already familiar with elasticsearch in just looking to get up to speed on what's new in the Last exerts seven, here's an overview of the main changes. Elasticsearch tends to roll out big new features, even within minor releases. So this isn't everything that's new since Elasticsearch 6.0, necessarily. But a lot of features introduced within the six point X run have been declared production ready with ELASTICSEARCH. Seven. For a while now, Elasticsearch has been in a long process of deprecating the concept of types. It used to be that in addition to documents and indices, there was also a type that allowed you to associate different schemes with documents within the same index. Conceptually, they found this to be a bad idea, as it made people think types work the same as a database table. When your reality they behave differently, you'll find that some AP eyes that used to take a type name now use a generic type called underscored Doc instead, and others just omit the type parameter and entirely. Now, configuration files and plug ins that used to require types to be specified no longer do. This is really the most pervasive changed to Elasticsearch, and it's what required us to re record this entire course when the last six or seven came out. The change I'm most excited about personally is the official release of sequel support in Elasticsearch. We've added a lecture and a hands on activity for this later in the course so you could get familiar with it. But it really couldn't be much easier. You canal query your elasticsearch index using the same sequel. Syntax, you probably already know Sequel seems to be the lingua franca that's tying together all sorts of big data technologies, and Elasticsearch is falling in line with that. There have been a lot of changes to the default configuration settings for elasticsearch, especially as they relate to the number of default shards, which is now one instead of five. And how replication works in a production setting. Though you really should be tuning thes