Elasticsearch 6 and Elastic Stack - In Depth and Hands On! | Frank Kane | Skillshare

Elasticsearch 6 and Elastic Stack - In Depth and Hands On!

Frank Kane, Founder of Sundog Education, ex-Amazon

Elasticsearch 6 and Elastic Stack - In Depth and Hands On!

Frank Kane, Founder of Sundog Education, ex-Amazon

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
64 Lessons (8h)
    • 1. Introduction, and Installing Elasticsearch

    • 2. Elasticsearch Overview

    • 3. Please Follow Me on Skillshare

    • 4. Intro to HTTP and RESTful API's

    • 5. Using Elasticsearch

    • 6. Elasticsearch Architecture

    • 7. Quiz: Elasticsearch Concepts and Architecture

    • 8. Quiz: Elasticsearch Concepts and Architecture

    • 9. Getting to Know the Movielens Data Set

    • 10. Create a Mapping for MovieLens

    • 11. Hacking CURL (Don't Skip This!)

    • 12. Import a Single Movie via JSON / REST

    • 13. Insert Many Movies at Once

    • 14. Updating Data in Elasticsearch

    • 15. Deleting Data in Elasticsearch

    • 16. [Exercise] Insert, Update, and Delete a Fictitious Movie

    • 17. Dealing with Concurrency

    • 18. Using Analyzers and Tokenizers

    • 19. Data Modeling with Elasticsearch

    • 20. Using Query-String Search

    • 21. Using JSON Search

    • 22. Full-Text vs. Phrase Search

    • 23. [Exercise] Search for New Star Wars Films Two Different Ways

    • 24. Pagination

    • 25. Sorting

    • 26. Using Filters

    • 27. [Exercise] Search for Science Fiction Movies Before 1960, Sorted by Title

    • 28. Fuzzy Queries

    • 29. Partial Matching

    • 30. N-Grams, and Search as you Type

    • 31. Importing Data from Scripts

    • 32. [Exercise] Import Movie Tags Into a New Index with a Python Script.

    • 33. Logstash Overview

    • 34. Installing Logstash

    • 35. Importing Apache Access Logs with Logstash

    • 36. Importing Data from MySQL using Logstash

    • 37. Importing Data from AWS S3 using Logstash

    • 38. Integrating Kafka with Elasticsearch

    • 39. Integrating Spark and Hadoop with Elasticsearch

    • 40. [Exercise] Import Movie Ratings from Spark to Elasticsearch

    • 41. Buckets and Metrics

    • 42. Histograms

    • 43. Aggregating Time Series Data

    • 44. [Exercise] When Did my Site Go Down?

    • 45. Nested Aggregations

    • 46. Installing Kibana

    • 47. Analyzing Shakespeare with Kibana

    • 48. [Exercise] Find the Shakespeare Plays with the Most Lines

    • 49. The ELK Stack and Elastic Stack

    • 50. Install, Configure, and Use Filebeat

    • 51. Analyzing Server Logs with Kibana

    • 52. [Exercise] Narrow Down the Source of 404 Errors

    • 53. How Many Shards Should I Use?

    • 54. Scaling with New Indices

    • 55. Choosing Your Hardware

    • 56. Heap Sizing

    • 57. Monitoring with X-Pack

    • 58. Practicing Failover

    • 59. Snapshots

    • 60. Rolling Restarts

    • 61. Using Amazon Elasticsearch Service

    • 62. Using Elastic Cloud

    • 63. I Made It! Now What?

    • 64. Let's Stay in Touch

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class

Elasticsearch is a powerful tool not only for powering search on big websites, but also for analyzing big data sets in a matter of milliseconds! It's an increasingly popular technology, and a valuable skill to have in today's job market. This comprehensive course covers it all, from installation to operations.

We'll cover setting up search indices on an Elasticsearch 6 cluster and querying that data in many different ways. Fuzzy searches, partial matches, search-as-you-type, pagination, sorting - you name it. And it's not just theory, every lesson has hands-on examples where you'll practice each skill using a virtual machine running Elasticsearch on your own PC.

Meet Your Teacher

Teacher Profile Image

Frank Kane

Founder of Sundog Education, ex-Amazon


Frank spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers, all the time. Frank holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.

See full profile

Class Ratings

Expectations Met?
  • Exceeded!
  • Yes
  • Somewhat
  • Not really
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Your creative journey starts here.

  • Unlimited access to every class
  • Supportive online creative community
  • Learn offline with Skillshare’s app

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.



1. Introduction, and Installing Elasticsearch: Hi, I'm Frank came from Sun Dog Education. I've used my decade of experience at amazon dot com and I am db dot com To teach 100,000 people around the world about big data and machine learning, Elasticsearch is a hot technology you need to know about in the field of big data. It's not just used for powering full text searches on big websites anymore. Increasingly, it's being used as a real time alternative to more complex systems like Hadoop and Spark. Elasticsearch can aggregate in graph structure data quickly and at massive scale. In this course, you'll gain hands on experience with elasticsearch all the way from installation To advance to usage will create search into season map ings import data into elasticsearch in several different ways. Aggregate structure, data and use hosted elasticsearch clusters from Amazon and Elastic Taco. You'll also get your hands dirty with the entire elastic stack, including Elasticsearch, Log, Stash X, Pac Cabana and the Beats framework. Together, these technologies form a complete system for collecting, aggregating, monitoring and visualizing your big data. I designed this course for any technologist who wants to add elasticsearch and the elastic stack to their tool chest for analyzing big data, and we all know these air highly valuable skills toe have in today's job market. Let's dive right in. In the real world, you'll probably be using elasticsearch on a cluster of Lennox machines, so we'll be using limits in this course Lubutu in particular. Now, if you don't have a new boot to system handy, that's totally OK. I'm going to walk you through setting up a virtual machine on your windows or Mac PC that lets you run a boon to inside your existing operating system. It's actually really easy to do. Once we've got a new booty machine up and running, will install Elasticsearch and just for Fun will create a search index of the complete works of William Shakespeare and mess around with it. After that, we'll take a step back and talk about Elasticsearch and its architecture at a high level. So you have all the basics you need for later sections of this course. Roll up your sleeves and let's get to work. All right, let's do this. Let's go ahead and install elasticsearch right on your own PC. Now Elasticsearch is going to be running on a new boon to Lennox system for this course. And if you don't already have in a boon to system sitting around, that's okay. Where we're going to do is show you how to install virtual box on your Mac or a Windows PC , and that will allow you to install a boon to running right on your own desktop within a little virtual environment. Once we have a bone to installed inside, Virtual Box will install elasticsearch on it and after that will install the complete works of William Shakespeare into Elasticsearch and see if we can successfully search that. So that's a lot to do in this one lecture. Let's ah, dive right into it. Talk about system requirements really briefly. Pretty much any PC should be able to handle this. You don't need a ton of resource is for elasticsearch. If you do run into trouble, however, make sure that you have virtualization enabled in your bios settings on your PC and specifically make sure that hyper V virtual ization is off. If that is an option in your BIOS, you could just, you know, go through these steps and see if you run into trouble on these air basically troubleshooting steps. Also be aware that the anti virus program, called a vast, is known to conflict with virtual box, so you'll need to switch to a different one or turn it off while using this course. If you're going to be using a vast now, if you head over to Sun Dog Dash education dot com slash elasticsearch for their, you'll find step by step instructions for what we're about to do, as well as troubleshooting tips. If you run into trouble and you'll also find a link to the core slides there as well. So be sure to head over there for reference materials and any troubleshooting steps you may need with that. Let's dive in and just get this done. Let's get you set up. So let's go ahead and download virtual box, which is what we're going to use to run your Abou to image on your desktop PC to set over to virtual box dot or GTA. And there should be a big friendly download button. Go ahead and select the operating system you're on for me. That's Windows and 118 megabytes. Later, that should come down and assist your standard windows installer. Go ahead and click that, and you can accept the defaults. Nothing real special here except any security warnings. And be aware that it will interrupt your network interfaces while it's installing. Let's go ahead and start it up now that it's done installing. And if you don't run into any trouble with installing virtual box, head on over to Sun Dog Dash education dot com slash elasticsearch and they'll be some troubleshooting tips for you there. So here we have it. Next thing we need to do is download a Ubu to image so that we can actually install that in our virtual machines. So head on over to boon to dot com just like that ubuntu dot com. And, however, to downloads and server, we want to get the latest I so image for the Ubu server. So just hit the download button there, and down it comes pretty big download. It's going to be about 809 100 megabytes or so now that are a boon to disk images downloaded. I'm going to switch back to virtual box here, the Oracle VM virtual box manager and click the new button and give this thing a name. Let's call it I don't know, uh, one two dash elasticsearch whatever you want and it's going to be a Lennix system, and it's going to be in a boon to 64 bit system it next and set this to the middle somewhere. I have a 16 gigabyte machines, so I'm gonna go ahead and allocate half of that memory to ah, this disc image for a boon to if you have less than that, just, you know, pick around halfway. But I wouldn't go below two gigabytes fire you so I'm sticking with eight gigabytes here. Go ahead and create a virtual hard disk and accept the default for the formats. Dynamically allocated is fine, and let's give this 20 gigabytes of space. We do need a little bit of an extra there to work with, and if you want to make sure that that's being stored on a discreet have space for it, you can click on that icon there and make sure that stored on a drive that has sufficient free disk space, it create and now hit, start and navigate to where you downloaded that I so file for a boon to So for me that's in my downloads folder and hit start. And that should kick off the installer for the Abou two operating system itself. So now that I'm in this installer, I can use the enter key to accept the defaults or the arrow keys to change to a different language. If I want to well hit, enter to install a boon to server and that will kick off the installer for a boon to itself . I'm gonna go ahead and accept the defaults here. English, United States. That is, in fact, where I am. Go ahead and change that if you need to. I'm not going to detect the keyboard layout either, and stick with the default English layout. Next. We need to give our host a name who going to this fine doesn't really matter. And we need to type in the name for your user. So for me, that's Frank Kane for you. It's probably something else Hit tab to hit the continue button and then hit Enter and you need to use her name. I'm gonna use F cane for myself, but again, use whatever you want for your account hit tab. When you're done and hit enter to continue and enter a password that you'll remember again tab to the continue button and re enter it again to make sure you didn't fat finger it. We don't need to encrypt things and you do have my time zone. Correct. So I'll accept that. Go ahead and accept the guided partitioning except all the defaults here we will tab to say yes to write those changes to disk. Remember, we're in a sandbox here. So we're not really messing with our primary dis system for Windows here. Hit tab, then continue Tab, then. Yes, I'm not behind a proxy. So I'm gonna hit tab and say Continue. I'll go ahead and set. No automatic updates. It's not that important in this case. Thea, we're gonna install our own software, so I'm gonna hit tab and select. Continue here. Just stick with a standard system software. Go ahead and let it install the grub. Boot loader hit. Enter here. Don't worry. It's not really messing with your real master boot record on your disk. It has its own little sandbox environment and a boot to has finished installing. Let's hit, Continue and go ahead and let it start up on boot up for the first time. Here we go. Just let it do its thing here and we have a logging prompt. So let's go ahead and type in our account and password that we set up during installation and we're and we actually having a boon to system up and running within our desktop. How cool is that? I think that's kind of awesome. Now, if you run into any trouble, you can go ahead and refer to our website their sonar dish education slash elasticsearch for troubleshooting tips. All the latest tips and tricks will be there if you have any difficulties, but hopefully you got to this point without a problem. So next thing we need to do is actually open up some network port so we can communicate with our server from our desktop environment. So to do that, go back to the virtual box manager here, select our image here, have been to elasticsearch and hit settings that's like network, and then open up the advanced and then port forwarding and hit the add button. Here, we're gonna create a port for elasticsearch itself on 1 27.0 dot 0.1 on poor to 9200 for the host and the guest ports just like that. Hit the add button again and will also add a port for Cabana, which will talk about later. It's the Web you I for elasticsearch also on 1 27.0 dot 0.1 port for this is 5601 just like that and finally will open up a port for ssh because we need to connect to this thing somehow. That's also going to be 1 27.0 dot 0.1 And this time Port 22. So everything should look like this at this point. Double check. And if it looks good, hit. Okay, okay. Again. And we're done with that for now. So the last insert six requires Java version eight. So the first thing we need to do is install the Java environment. Just type in the following pseudo apt dash gets install open J. D. K Dash eight Dash Jr E dash headless dash Y, and you have to re authenticates and give that a minute or two to come down and install. And once that's done, will also type in sudo apt to dash Get install open J d K Dash eight dash J D K Dash, Headless Dash Why? And let that come down as well. All right, The job a development kit has been installed. Now we can install the elasticsearch itself. So first we need to actually update our repository so that you're going to knows where to find it. Now, follow along carefully. There's a lot to type here, and all it takes is one little typo to mess things up. So be careful if you follow along, start by saying w get dash Lower case Q big letter. Oh, that's not the number zero space dash space. Okay, these spaces are important. Https colon slash slash artifacts dot elastic dot co. That's just co no com slash g p g dash key in upper case dash elasticsearch in lower case space pipe space Pseudo space apt dash key Space ad space dash. Okay. Should see an Okay, prop Next up, Sudo Apt Dash Get install. Apt dash Transport Dash https. Sort of a safety thing to do. It might not actually do anything. Okay, that's good. And now we will say echo space. Double quote Deb https colon slash slash artifacts dot elastic dot co slash packages slash six point x Because we're installing elasticsearch six in the elastic stacks six slash apt stable main double quote and then we're not done yet. Hype and other space Pseudo space t t e dash a slash e T c slash ap ti slash sources dot list dot de slash elastic dash six point x dot list All right, so far, so good. Now we can actually install elasticsearch itself to do that, we can say sudo apt dash, get update to update our repositories. Double Amber Sands Sudo apt dash, get install elasticsearch And this is where the magic happens. Off it goes Now we just need to update the default configuration for elasticsearch a little bit. So to do that type in pseudo space v i slash e t c slash elasticsearch slash elastic search dot y m l and I want you to scroll down to where it says network dot host. I want you to hit the I key that will enter insert mode in this text editor and now we can use the cursor to go over to the end and network and hit backspace to get rid of that comment and scroll over to the end of 1 92 to 1 68.0 dot 0.1. What we're gonna do is change that 20.0 dot 0.0 just like that, and that will open up elasticsearch to other host so we can actually access it from our Windows or your PC's desktop with that done hit escape to get out of insert mode and then type in colon W Q that writes the file and quits the editor. All right, so now we have last exerts set up. We just need to run it and set it up so that it runs automatically whenever we boot up our virtual image here. So to do that, we can say pseudo space slash bin slash system CTL for system control space. Damon. Make sure you spell that right dash reload, and the next step is to actually enable elasticsearch as a service pseudo slash bin slash system CTL in able elasticsearch dot service. And finally, we can start it with sudo slash bins. Last system CTL start elasticsearch dot service so we'll take a few seconds for Elasticsearch to spin up. But, yeah, it's basically up and running. Now let's see if it's actually working yet, so we just test it by typing in curl 1 27.0 dot zero. Dial one. That's the I. P address of your local host, Colin 9200 because 9200 support that elasticsearch runs on and we should see something like this. So hey, we actually got a response back from elasticsearch. And at the end, you should see a little thing that says tagline, you know, for certain. So this indicates that elasticsearch is actually properly installed in up and running on your virtual image here on your virtual bunting machine. So congratulations, you've actually step in a bunch of server on a virtual machine and set up elasticsearch properly on it, and it's actually sitting there waiting for you to use it. So just to get a little bit of ah payoff from all this effort, let's do something fun. Let's actually install the complete works of William Shakespeare and index it in our elasticsearch index. So to do that, do the following w get http colon slash slash media dot son dogged. Asked soft dot com. Make sure you don't forget the dash there slash e s six slash shakes dash mapping dot Jason . And this is just retrieving a little file for the course. Set tells elasticsearch how to store the Shakespeare data with the various field types are and how to store it and went on in the back end. Now to submit that to elasticsearch, we can use the Curl Command like so curl girls. Just a command that simulates an http request. By the way, we have to say dash age to send the appropriate header. So dash H, then double quote content. Dash Type the attention to capitalization colon application slash Jason And then we can say dash X put to put this mapping file in 2127.0 dot 0.1 colon 9200 slash Shakespeare indicating that we're creating a new annex called Shakespeare Dash Dash data dash binary space at sign shakes dash mapping dodge a sawn. So this is basically submitting the contents of that Jason file into elasticsearch, and we got back an acknowledgment. What's good hit? Enter a couple times to get a clean prompt So now we can retrieve the Shakespeare data itself with the following command. W get http colon slash slash media dot son dog dash soft dot com slash yes, six slash Shakespeare underscore 6.0 dot Jason And this is just a Shakespeare data set that comes from the elastic website itself is a demo. Take a quick peek out you can with less Shakespeare. 6.0 dot Jason. I'm using the tab key by the wear way to auto. Complete that file name and you can see there's a bunch of Shakespearean data in there, one entry for each line in every play, so lots of data. Kind of funny how it all came down so quickly. It's, ah, good reminder of how cavalierly we throw around large amounts of data these days, huh? So next will actually submit the complete works of William Shakespeare into our Elasticsearch index here. To do that, we can just type in the following curl dash H single quote content, dash type, colon application slash J. Sohn saying quote dash Ex post single quote. Local host Colon 9200 slash Shakespeare slash doc slash underscore of bulk for the bulk AP . I've elasticsearch question Mark Pretty. To make sure we get nicely formatted results back. Single quote and then dash dash data dash binary space at Sign Shakespeare underscores 6.0 dot Jason, and that will take a few minutes to insert. We are, after all, taking the complete works of William Shakespeare and indexing it into our search engine into Elasticsearch. So let's just wait for that to finish and we'll come back when it's done, all right, That took a little bit of time, but, hey, it is the complete works of William Shakespeare. After all, let's get a little bit of a payoff and actually query all that data now. And to do that, we can just type in the following curl Dash H quote content, dash type, colon application slash J Song end quote, Dash X get because now we're getting information back from elasticsearch Single quote 1 27.0 dot 0.1 colon 9200 slash Shakespeare for the index name slash Underscores search because we're doing a search query Question Mark Pretty to get nicely formatted results end quote and then Dashti single quote mean that we're going to send the following data as part of the request. This will be in Jason format, so we'll start with the curly bracket and then we will say quote, query, quote colon open curly bracket. Quote match. Underscore phrase quote colon open curly bracket. Quote text underscore entry quote. Colon quote. To be or not to be. So let's find out what play that actually came from that famous line. And we need to close off those curly brackets. 123 and then a single quote to close off this command. And let's see what happens. It worked. So it turns out to be or not to be, that is the question is a line from the play Hamlet on apparently speech number 19. Line number 3.1 points 64 if you care. But hey, how cool is that? We've actually installed a boob unto. We've installed elasticsearch in a bun to weave index the complete works of William Shakespeare, and we've query that index all in one little lesson here. So I hope you feel a sense of accomplishment after all of that. It all gets a lot easier for now. Now we have everything set up and install it. All we have to do is play with it, and that's what we're going to do for the next several hours in this course. So congratulations for getting this far and let's move on. 2. Elasticsearch Overview: let's start off a sort of a 30,000 foot view of the elastic stack in the components within it and how they fit together. So Elasticsearch is just one piece of this system it started off is basically a scale will version of the loose seen open source search framework. And it just added the ability to horizontally scale Lucy in in a See So we'll talk about shards of elasticsearch in each shard in elasticsearch is just a single loosening inverted index of documents, so every shard is an actual loose seen instance of its own. However, Elasticsearch has evolved to be much more than just loosen spread out across a cluster. It can be used for a much more than full text search now, and it can actually handle structure data and aggregate data very quickly. So it's not just researching and handle structure data of any type, and you'll see it's often used for things like aggregating logs and things like that. And what's really cool is that it's often a much faster solution than things like Hadoop or a Spark or Flink. You're actually building in new things into the elasticsearch all the time with things like graph visualization and machine learning that actually make elasticsearch a competitor for things like Hadoop and Spark and Flink. Only it could give you an answer in milliseconds instead of in hours. So for the right sorts of use cases, Elasticsearch could be a very powerful tool and not just for search. So let's zoom in and see what Elasticsearch is really about at a low level. It's really just about handling Jason requests. So you're not. We're not talking about pretty you eyes or graphical interfaces when we're just talking about the last two church itself. We're talking about a server that can process Jason requests and give you back Jason Data, and it's up to you to actually do something useful with that. So, for example, reason Curl here to actually issue an arrest request with a get firm forgiven index called Tags, and were just searching everything that's in it. And you can see the results come back in Jason format here, and it's up to you to parse all this. So, for example, we did get one result here called for the movie. Swimming to Cambodia has given User I D and a tag of Cambodia. So if this is part of a tags index that we're searching, this is what a result might actually look like. So just to make it riel, that's a sort of output you can expect from elasticsearch itself. But there's more to it than just elasticsearch. There's also Cabana, which sits on top of Elasticsearch, and that's what gives you a pretty Web. You Why? So if you're not building your own application on top of elasticsearch or your own Web application, Cabana can be used just for searching and visualizing what's in your search index graphically, and it could be very complex. Aggregations of data can craft your data. You can create charts, and it's often used to do things like log analysis. So if you're familiar with things like Google Analytics, the combination of Elasticsearch in Cabana can be used as sort of a way to roll your own Google Analytics at a very large scale. Let's zoom in and take a look at what it might look like. So here's national screenshot from Cabana Looking at some real log data, you can see there's, ah, multiple dashboards you can actually look at that are built into Cabana on this lets you visualize things like where the hits on my website coming from and where the error response codes how they all broken down. What's my distribution of Urals? Whatever you can dream up. So there are a lot of specialized dashboards for a certain kinds of data, and it kind of brings home the point that Elasticsearch is not just researching text anymore. You can actually used for aggregating things like Apache access logs, which is what this view in Cabana does. But you can also use Cabana for pretty much anything else you want to later on. This course will use it to visualize the complete works of William Shakespeare for up, for example, and you can see how can also be used for text data as well. It's a very flexible tool in a very powerful you buy. We can also have something called Log Stash and the Beats framework, and these airways of actually publishing data into elasticsearch in real time in a streaming format. So if you have, for example, a collection of Web server logs coming in, that you just want to feed into your search index over time, automatically file beat can just sit on your Web servers and look for new log files and parson out. Structure them in the way that Elasticsearch wants and then feed them into your elasticsearch cluster as they come in. Log Stash does much the same thing. It can also be used to push data around between your servers and elasticsearch, but often it's used as sort of an intermediate step. So you have a very lightweight file beat client that would set on your Web servers. Log stash would accept those and sort of collect them and pull them up for feeding into elasticsearch over time. But it's not just made for log files, and it's not just made for elasticsearch and Web servers, either. These are all very general purpose systems that allow you to tie different systems together and published data to ever needs to go, which might be elasticsearch might be something else, but it's all part of the elastic stack still, but it can also collect data from things like Amazon is three or caf co. Are pretty much anything else. You can imagine databases, and we'll look at all of those examples later in this course Finally, another piece of the elastic stack is called X Pack. This is actually a paid at on, offered by Elastic Dot Co. In it offers things like security and alerting and monitoring and reporting features like that. It also contains some of the more advanced features that are just starting to make it into elasticsearch now, such as machine learning and graph exploration. So you can see that with X PAC, Elasticsearch starts to become a real competitors for much more complex and heavyweight systems like Flink in Spark. But that's another piece of the elastic stack when we talk about this larger ecosystem and you can see here that there are free parts of expect like the monitoring framework that led to you quickly visualize what's going on with your cluster. You know, what's my CPU utilization system load? How much memory that I have available things like that. So when things start to go wrong with your cluster, this is a very useful tool toe have for understanding the health of your cluster. So that's it. At a high level. The elastic stack obviously elasticsearch can still be used for powering search on a website. You know, like Wikipedia or something, but with these components, it could be used for so much more. It's actually a larger framework for publishing data from any source you can imagine and visualize ing it as well through things like Cabana. And it also has operational capabilities through X Pac. So that is the elastic stack. At a high level, I have been more to elasticsearch itself and learn more about how it works. 3. Please Follow Me on Skillshare: the world of big data and machine learning is immense and constantly evolving. If you haven't already be sure to hit the follow button next to my name on the main page of this course. That way you'll be sure to get announcements of my future courses and news as this industry continues to change. 4. Intro to HTTP and RESTful API's: So before we can talk about elasticsearch, we need to talk about rest and rest full AP eyes. The reason is that elasticsearch is built on top of a rest ful interface, and that means to communicate with elasticsearch. You need to communicate with it through http requests that adhere to a rest interface. So let's talk about what that means. So let's talk about http requests at a more high level here. So whenever you request a Web page from your browser, what's going on is that your Web browser is sending an http request to a Web server somewhere requesting the contents of that Web page that you want to look at and elasticsearch works the same way. So instead of talking to a Web server, you're talking to an elasticsearch server. But it's the same exact protocol. Now, on http requests contains a bunch of different stuff more than you might think. One is the method, and that's basically the verb of the request. What, you're asking the server to do so In the case of actually getting a Web page back from a Web server, you be sending a get request, saying that I want to get information back from the server. I'm not going to change anything or in any information on the server. I just want to get information back from it. You might also have a post verb. That means that you want to either insert a replace data that's stored on the server or put , which means to always create new data on the server. Or you can even send a delete verb. That means to remove information from the server. Normally won't be doing that from a Web browser, but from Elasticsearch client. Totally a valid thing to do. It also includes a protocol. So specifically what version of http Are you sending this request in Might be http slash 1.1. For example, you will be sending that request to a specific host, of course. So if you're requesting away page from our website, that might be Sunday, August education dot com. And the URL is basically what resource you are requesting from that server. What, you want that server to do So in the case again, of a Web server, that might be the path to the Web page that you want on that host. There's also a request body you can send along. You don't normally see that with a Web page request, but you can send extra data along in whatever structure data you want to the server within the body of that request is well. And finally, there are headers associating with each request that contains sort of metadata about the request itself, for example, information about the client itself that would be in the user Asian for a Web browser. What format the body is in that might be in the content type stuff like that. So let's look at a concrete example again, getting back to the example of a browser wanting to display a website. This is what an http request for that might contain in that example were sending a get verb to our Web server and were requesting the resource slash index dot html from the server, meaning we want to get the home page. We will say that we're sending this in 1.1 http protocol, and we're sending it to a specific host on That's our websites on dog education dot com. In this example, there is nobody being sent across because all the information we need to fulfill this request has already been specified and will be a whole slew of headers being sent along as well. That contains information about the browser itself. What type of information in languages it can accept back and return from the server. Information about cashing cookies that might be associated with this site. Things like that. So a bunch of information about you being sent around the Internet whenever you request a Web page. But fortunately with elasticsearch, where our use of headers is pretty minimal. So with that, let's talk about rest ful ap ice. Now that we understand, http requests So the really pragmatic practical definition of arrestable a p I is simply that you're using http request to communicate with your Web service of some sort. So because we're communicating with elasticsearch using http requests and responses, that means that we're basically using arrestable AP I Now there's more to it than that. You know, we'll get to that, but at a very simple level, that's all it means. You know, it sounds fancy, but that's really it. So, for example, if I want to get information back from my elasticsearch cluster like search results for example, I'm actually conducting a search I would send a get for belong with that request, saying I want to get this information from ELASTICSEARCH going to insert information into it. I would send a put request instead, and the information that I'm inserting would be within the request body. And if I want to actually remove information from my elasticsearch index, I would send a delete request to get rid of it, Mallika said. There's more to rest on. That s so let's get into them or the computer science e aspect of it. Rest stands for a representational state transfer, and it has six guiding constraints. And well, to be honest, these aren't really constraints. Not all of them. Some of them are a little bit fuzzy. We'll talk about that. Obviously, it needs to be a client server architecture we're dealing with. You know, the concept of sending requests and responses from clients to servers doesn't really make sense unless we're talking about client server architecture. And that is what elasticsearch offers we have in elasticsearch server, or maybe even a whole cluster of servers and several clients that are interacting with that server. It must be stateless. That means that every request and response must be self contained. You can't assume that there's any memory on the client or the server of the sequence of events that have happened there, Really? So you have to make sure that all the information you need to fulfill a request is contained within the request itself. And you're not keeping track of state between different requests. Cash ability. This is more of a fuzzy when it doesn't mean that your responses need to be cached on the client. It just means that the system allows for that. So maybe your responses include information about whether or not that information can be cashed again. Not really requirement. But it's on this list of rest constraints. Layered system again, not a requirement. But it just means that when you talk to, for example, son dog education dot com, that doesn't mean you're talking to a specific individual server. That request might get routed behind the scenes to one of an entire fleet of servers, so you can't assume that your request is going to a specific physical host. And again, this is why statelessness is important because one host might not know what's going on in the other, necessarily. So they might not be talking to each other. Really? Another sort of fuzzy constraint is code on demand. And this just means that you have the capability of sending code across as a payload on your responses. So, for example, a server might send back JavaScript code is part of its response body that could then inform the client of how to actually process that data. We're not actually gonna be doing that with elasticsearch, obviously, But rest says you can do that if you want to. And finally it demands a uniform interface. And what that means is, ah, pretty long topic. But at a fundamental level, it just means that your data that you're sending along is of some structured nature that is predictable. And, you know you can deal with changes to it in a structured way. So at a high level, that's all it is. With that out of the way, why are we talking about rested all here? Well, the reason is that we're going to do this whole course just talking about the http requests and responses themselves, and by dealing with that very low level of how the rest ful AP I itself of elasticsearch works, we can avoid getting mired into the details of how any specific language or system might be interacting with elasticsearch pretty much any language out there Java, JavaScript, python whatever you want to use is going to have some way of sending http requests. So it really doesn't matter what language you're using. What matters, Maurin understanding how to use elasticsearch is how to construct these requests and how to interpret the responses that are coming back from it. The mechanics of how you send that request and get the response back is trivial, right? You know any language can do that? If you're a Java developer, you can go look up how to do that. So we're not gonna get mired in the details of how to write a Java client for elasticsearch Instead, what we're going to teach you in this course is how to construct http requests and parts the responses you get back from elasticsearch in a meaningful way and by doing that, you'll be able to transfer this knowledge to any language in any system that you want very easily. Some languages may have a dedicated client library for elasticsearch that provide sort of a higher level rapper over the actual http requests and responses, but they'll generally be a pretty thin wrapper, so you still need to understand what's going on under the hood to use elasticsearch successfully. Lot people get confused on that in this course, But there's a very good reason for why we're just focusing on the actual http requests and responses and not the details of how to do it from a very specific language. All of elasticsearch documentation is done in the same style. The books that you can find about elasticsearch. Same idea. There's a good reason for that. So the way we're going to interact with Elasticsearch in this course is just using the curl command on a command line. So again, instead of using any specific programming language or client library, we're just going to use Curl, which is a limits command for sending http requests right from the command lines. We're just gonna bash out curl commands to send out requests on the fly to our service and get the responses back and see what comes back from them. The structure of a Curl Command looks like this. Basically, it's curl dash H, followed by any headers you need to send. And for elasticsearch that will always be content type of application slash Jason, meaning that whatever's in the body is going to be interpreted as Jason format. It will always be that. And in fact, we will show you a little bit of a hack on how to always make that header specified automatically for you on Curl to save you some typing That will be followed by the Earl, which contains both the host that you're saying this request to and in this course l will usually be the local host 1 27.0 dot 0.1, followed by any information that the server will need to actually fulfill that request. So you know what index Taiwan talk to what data type, what sort of command am I asking it to do? And finally, we will pass Dash D and then the actual message body within quotes. That will be Jason formatted data with additional information that the service needs to actually figure out what to give back to you or what to insert into Elasticsearch. Let's look at some concrete examples to make that more riel. So these 1st 1 of the top here we're basically querying the Shakespeare Index for the phrase to be or not to be. So let's take a closer look at that curl command and what's in it again. We're saying curled ash age content type application. Jason, that's sending a http header that says that the data in the body is going to be in Jason format. Dash X Get means that we're using the get method or the get verb, depending on your terminology, meaning that we just want to retrieve information back from elasticsearch. We're not asking it to change anything. And the girl, as you can see it, includes the host that were talking to in this case 1 27.0 dot 0.1, which is the local loop back address for your local host. Elasticsearch runs on Port 9200 by default, followed by the index name, which is Shakespeare and then followed by underscores search, meaning that where you want to process a search query as part of this request. The question Mark Pretty is a query line parameter. That means that We want to get the results back in a nicely formatted, human readable format because we're gonna be looking at it on the command line. And finally we have the request body itself, Swiss fight after a dash d into between single quotes. And if you've never seen Jason before, this is what it looks like. It's just a structure data format where each level is contained within curly brackets, so it's always contained by curly brackets of the top level. And then we're saying we have a query level and within those brackets were saying we have a match phrase command that matches the text entry to be or not to be. So that is how you would construct a riel search query and elasticsearch using nothing but an http request another example. Here, we're going to be inserting data. So in this one, we're using a put verb again to 1 27 0.0 dot one on Port 9200. This time we're talking to an index called Movies and a data type called Movie, and it's using a unique identifier for this new entry, called 109487 and under a movie I D. 109487 were including the following information in the message body. The genre is actually a list of genres, and in Jason that will be a comma delimited list of stuff that's enclosed in square brackets. So this particular movie is both the I Max in sci fi categories, its titles Interstellar and it came out in the year 2014. So that's what some real http requests look like when you're dealing with the last six or so. Now you know what to expect and how we're actually going to use elasticsearch and communicate with it. We can talk more about how Elasticsearch works and what it's all about. We'll do that next. 5. Using Elasticsearch: So before we start playing with our shiny new elasticsearch server, let's go over some basics of elasticsearch first. So we understand the concepts of how it works, what it's all about, how it's architected. And when we're done with that, we'll have a quick little quiz to reinforce. What you learned after that will start messing around with it. So there are three main logical concepts behind Elasticsearch. The first is the document. So if he used to thinking of things in terms of databases, that document is a lot like a row in a database. It represents a given entity, something that you're searching for and remember in Elasticsearch. It's not just about text. Any structure data can work now. Elasticsearch works on top of Jason Formatted data. If you're not familiar with Jason, it's basically just a way of encoding structure data that may contain strings or numbers or dates. Or what have you in a way that you can actually transmitted across the Web cleanly, and you'll see a ton of examples of this throughout the course, so it'll make more sense later on. Now every document can have a unique I D. And you can either explicitly a sign, a unique idea to it yourself or allow elasticsearch to assign it for you. And it also has a given data type that describes what sort of thing this document is. So, for example, you might have documents that represent encyclopedia articles or in documents that represent log entries from your Web server. And that's where the concept of types comes in. So you can have many documents that belong to a given type, and a type is basically a schema or a mapping shared by a bunch of documents. So, for example, you might have a type that defines what an Apache access log entry looks like, and that might define a mapping that says, and Apache Access Law contains things like a request, your Ellen, a status code and a request time and a referring your l and things like that. Or you might have a type that represents an encyclopedia article, for example, that represents things like the text of the article itself, the author of the article, the date the article was written, whatever else that might be. The title of the article. So you think of a type is a scheme and again. If you take us back to a database analogy, it's a lot like a table where you define the individual columns that are in a given row or a document in our terminology. Finally, there's a concert of an index, which is a collection of types, and it is basically an entity that you can search across. So if you need to search across multiple different types potentially than you'd want to make sure that all those types are contained within the same search index. So an index is sort of the highest level entity that you can query against in elasticsearch , and it can contain a collection of types, which in turn contained a collection of documents. So again, bringing this back to an analogy of a database you can think of an index as a database. A type is a table and a document as a row, those air kind of the analogies there of database to elasticsearch world. Now, of course, it's not quite that simple. Uh, basically, what is an index? Well, an index is actually was called in inverted index, and this is basically the mechanism by which pretty much all search engines work. So the idea is that if I have a couple of documents, let's assume they just contains text data here. Let's a have one document that contains space, the final frontier. These are the voyages and maybe have another document that says he's bad. He's number one. He's a space cowboy with a laser gun. And if you understand what both of those air references to than you and I have a lot in common now, an inverted index wooden store those strings directly instead of flips it on its head. So what it does in a search engine is, it actually splits each document up into its individual search terms and in this example will just split it up for each word and will lower case them just to normalize things. And then what it does is it Maps each search term to the documents at those search terms occur within. Okay, so in this example, the word space actually occurs in both documents. So my inverted index would indicate that the word space occurs in both documents one and two. The word the also appears in both documents, So that will also map to both documents. One and two, and the word final only appears in the first document. So are inverted. Index would map the word final, the search term final to document one. Now it's a little bit more complicated than that in practice. In reality, it actually stores not only what document it's in, but also the position within the document that it's in So we can do things like phrase, search and stuff like that. But ah, high a conceptual level. This is the basic idea and inverted indexes what you're actually getting with a search index, where it's mapping things you're searching for to the documents of those things live within . And, of course, it's even quite that simple. So how to actually deal with the concept of relevance? Let's take, for example, the word the How do I deal with that? So the word the is going to be a very common word in every single document. So how do I make sure that only documents where the is like a special word are the ones that I agreed Get back if I actually search for the term? Well, that's where T f I D e f comes in that stands for term frequency times. Inverse document frequency. It's a very fancy sounding term, but it's actually a very simple concept, So let's break it down. Term frequency is just how often a given search term appears within a given document. So if the word space occurs very frequently in a given document, it would have a high term frequency. Or if the words the appears frequently to document it would also have a very high term frequency. Now, document frequency is just how often a term appears in all of the documents in your entire index. So here's where things get interesting, so the word space probably doesn't occur very often across the entire index, so it would have a low document frequency. However, the word the does appear in all documents very frequently. So we do have a very high document frequency Somali divide, term frequency by document frequency, and that's the same is multiplying by the inverse document frequency. Mathematically, we get a measure of relevance, so we see how special is his term to this document. It measures not only how often does this term occur within this document, but how does that compare to how often this term occurs in documents across the entire index. So with that example, the word space in an article about space would rank very highly. However, the word the wouldn't necessarily rank very highly. That's a common term found in every other document as well. And this is the basic idea of how search engines work. If you're searching for a given term, it will try to give you back results in the order of their relevancy, where relevancy is loosely based, at least on the concept of TF idea. Got it. It's really not that complicated. So how do you actually use an index and elasticsearch? Well, there's three ways we can talk about one. Is the rest ful? AP I Now, if you're not familiar with the concept of rest queries, let me explain it a very high level. It's just like how you request a Web page from a Web server from your Web browser on your desktop. So when you're requesting a Web page on your browser, like from chrome or whatever you use, what's happening is that your browser is sending a rest request to a Web server somewhere, and every rest request has a verb like get or put or post and some sort of body that specifies what it is that you want to get back. So, for example, if you're looking for a web page, you would send arrest query for a get verb and then that get would request a specific girl that you want to retrieve from that Web server. Now, ELASTICSEARCH works exactly the same way over the same http protocol that Web servers work across. So, you know, this makes it very easy to talk to from different systems. So, for example, if you were searching for something on ELASTICSEARCH, he would issue a get request through arrest A p I over http and the body of that get request would contain the information about what it is you want to retrieve. And Jason Format will see examples of this later on. But the beautiful thing about this is that if you have in a language or an A p I or a tool or an environment that can handle http requests like just talking to the Web normally, then it can handle elasticsearch. You don't need anything beyond that if you understand how to structure the Jason requests for elasticsearch than any language that can talk to http can talkto elasticsearch And most of this course is going to focus on doing it that way. Just so you understand how things work at a lower level and what Elasticsearch is capable of under the hood. But you don't always have to do with the hard way. If you are accessing elasticsearch from some application that you're riding like a Web server or Web application or whatever it is, often there will be client AP eyes that provide a level of abstraction on top of those rest queries. So instead of trying to figure out how do I construct the right Jason format for the type of search that I want or inserting the kind of data that I want? There's a lot of client AP ice out there that make it easier for you and just have specialized AP eyes for searching for things and putting things into the index without getting into the nitty gritty of constructing the actual request itself. So whether using python or ruby or pearl or C plus plus or java there, AP, eyes out there that you can just use finally There are even higher level tools that could be used for analytics, and one that will look at in this course is called Chibana. It's part of the larger elastic stack, and that is a Web based graphical. You I that allows you to interact with your indices and explore them without writing any code at all. So it's really more of a visual analysis tool that you can unleash upon pretty much anyone in your organization. So in order of low level to higher level AP eyes, there are rest ful queries that you can issue from whatever language you want. You can use client AP eyes that make things a little bit easier or eagerness to use Web based you wise to get the information you need as well. So those are the basic concepts of how elasticsearch is structured and how you interface with it. With that under our belt, we can talk more about how it works under the hood and how its architecture works 6. Elasticsearch Architecture: Let's talk about elasticsearch is architecture and how it actually scales itself out to run on an entire cluster of computers that you can scale up as needed. So the main trick is that an index and elasticsearch is split into what we call shards, and every shard is basically a self contained instance of loose seen in and of itself. So the idea is that if you have a cluster of computers, you can spread these shards out across multiple different machines as you need more capacity, you can just throw more machines into your cluster and ADM, or shards to that entire index so that it can spread that load out more efficiently. So the way it works is once you actually talked to a given server on your cluster for elasticsearch. Once it figures out what document you're actually interested in, it can hash that to a particular shard i D. So we'll have some mathematical function that can very quickly figure out which Shard owns a given document, and then it can redirect you to the appropriate shard on your cluster very quickly. So that's the basic idea. We just distribute our index among very many different shards, and a different shard can live on different computers within your cluster. Let's talk about the concept of primary and replica shards. This is how Elasticsearch maintains resiliency to failure when a big problem you have when you have a cluster of computers is that those computers can fail. Sometimes you need to deal with that. So let's look at this example. We haven't index that has two primary shards into replicas, So in this example, we were gonna have three nodes and a notice, basically an installation of elasticsearch. Usually you'll see one note installed per physical server in your cluster. You can actually do more than one if you want to, but that would be a little bit unusual to Dio. But the design is such that if any given node in your cluster goes down, you won't even see it as an end user. You know you can handle that failure, so look, let's take a close look at what's going on here and this example. I have two primary shards. That means those were basically the primary copies of my index data, and that's where Wright requests are going to be routed to initially that data will then be replicated to the replica shards, which can also handle read requests whenever we want to. So let's take a look at how this is set up. Elasticsearch figures this all out for you automatically. It's kind of like what Elasticsearch gives you. So if I say I want an index with two primaries into replicas, it's going to set things up like this if he gave it three different notes. So let's look at an example here. Let's say that node one were to fail for some reason, you know it had some disk failure. The power supply burned out. Who knows? Could be anything. So in this case, we're going to lose Primary Shard one and replica shard zero. But it's not a big deal because we have a replica of chard, one sitting on No. Two and another replica sitting on note. Three. So what would happen if Node one just suddenly went away? It's Elasticsearch would figure that out, and it would elect one of the replica notes on two or three to be the new primary. And, you know, since we have those replicates sitting there, it's fine. You know, we can keep on accepting new data. And we can keep on servicing read requests because we're now down to one primary and one replica, and that should be able to get a spy until we can restore that capacity that we lost of new number one. Similarly, let's say I don't know no Number three goes away. In the example, we lost our primary node zero. But it's OK because we had a replica sitting on Node one and No. Two and Elasticsearch can just basically promote one of those replicas to be the new primary. And it could get by until we can restore the capacity that we lost. So you can see using a scheme like this, we could have a very fault tolerant system. In fact, we could lose multiple notes, you know? I mean, no two is just serving Replicant owns at this point, so we could, in fact, even tolerate Note one and No. Two going away at the same time, in which case would be left with a primary all on note. Three for both of the shards that we care about. So it's pretty clever how that works. You know, there are some things to note here, you know. First of all, it's a good idea to have an odd number of notes for this sort of resiliency that we're talking about. But it's pretty cool, right? And the idea is that you would just round robin your requests as an application among all the different notes in your cluster. It would spread out the load of that initial traffic. Let's talk a little bit more about what exactly happens when you write new data or re data from your cluster. So let's say your indexing a new document into elasticsearch. That's gonna be a write request. Now, when you do that, whatever note you talk to will say, Okay, here's where the primary shard lives. For this document you're trying to index, I'm gonna redirect you to where that primary shard lives. Okay, so you will go right. That data index it into the primary Chardon, whatever. Know that lives on, and then that will automatically get replicated to any replicas for that shard. Now, when you read, that's a little bit quicker that can just route it to the primary shard or to any replica of that shard. Okay, so that can spread out the load of reads even more efficiently. So the more replicas you have your actually increasing your read capacity for the entire cluster. It's only the bright capacity that's going to be bottleneck by the number of primary shards you have now. This kind of sucky thing is that you cannot change the number of primary shards in your cluster later on. You need to define that right when you're creating your index up front. And here, By the way, it's what the syntax for that would look like through arrest request. We would specify a put verb on our rest request with the index name, followed by a setting structure and Jason that defines the number of primary shards and the number of replicas. Okay, now, this isn't as bad as it sounds because a lot of applications of elasticsearch or very read heavy. You know, if you're actually powering a search index on a big website like Wikipedia or something like that, you're going to get a lot more read requests from the world, and you're gonna have indexes for new documents. So it's not quite as bad as it sounds. In a lot of applications. Oftentimes, you can just add more replicas to your cluster later on to add more read capacity, it's adding more right capacity that gets a little bit Harry. Now it's not the end of the world. If you do need to add more right capacity, you can always re index your data into a new index and copy it over if you need to. But you want to plan ahead and make sure you have enough primary shards upfront to handle any growth that you might reasonably expect in the near future. We'll talk about how to plan for that more toward the end of the course, by the way, just as a refresher. Let's also talk about what actually goes on with this particular put request for defining the number of shards. So in this example, we're saying we want three primary shards in one replica. How many sharks we actually end up with here? Well, the answer is actually six. So we're saying we want three primary shards and one replica of each of those primary shards, so you see how that adds up. We have three times. The one is three, plus The three original primaries gives us six. If we had to replicas, we would end up with nine total shards, right? Three primaries and then a total of six replicas to give us two replica shards for each primary shards. So that's how that math works out. There could be a little bit confusing sometimes, but that's Ah, that's the idea, anyway. That's the general idea of how elasticsearch scales and how its architecture works. Important concepts here are primary and replicate shards and how Elasticsearch will automatically distribute those charts across different nodes that live on different servers in your cluster to provide resiliency against failure of any given note. Pretty cool stuff. 7. Quiz: Elasticsearch Concepts and Architecture: all right, like they like to say in my kids schools. It's time to show what you know. It's quiz time. Don't worry. Not too hard. Just wanna make sure you were awake during these past few lectures. First question the schema for your documents. That is, the definition of what sort of information is stored within your document is defined by what the index, the type or the document itself. Where is the information stored? As to the actual scheme? Of the information that a document contains, the answer is the type. So the type is basically again the equivalent of a table and database. It defines the individual fields on what data types they are that a document contains so again going back to the example of an Apache log entry that tight might define things like the U. R L that was requested or the status code and the request time and the referring your Ellen things like that or for storing something like Wikipedia entries. It might just include things like the text of the article itself, the author of the article, the title of the article, things like that that is all defined by the type of a document. And again we defined types and elasticsearch, by defining was called a mapping When we're setting up our indexes. Question two. What purpose do inverted indices serve in a search engine? This isn't just specific to elasticsearch. It's in general for search engines. Do it Doesn't inverted Index Ally to search phrases in reverse order? Do they quickly map search terms to the documents that they reside within? Or are they load balancing mechanism for balancing search requests across your entire cluster? What do you think the answer is, he said to the 2nd 1 You're right. They quickly map search terms to documents. So remember, an averted index simply maps out specific search query terms to the documents that they live in. So as you index documents and inverted indexes actually created where it splits those documents in the search terms and as a very quick look up of where to find those search terms and given documents. Next question. If I have an index configured for five primary shards and three replicas, how maney shards what I have in total a little bit tricky here. Think about that for a little bit. The answer 8 15 or 20. I can think of ways of doing the math work. You get any of those answers, but the correct answer is it. Pause if you still to think about it. Don't know. Cheating answer is 20 shards. So the way that works is that I have five primary shards, so I start with five shards, and then I want three replicas of each shard as well. So three times five is 15 so end up with five primaries and 15 replica shards for a total of 20 shards in this particular example. Remember how that works? Because they can add up fast and remember. And a given node can actually contain many different shards. So you know it will distribute charts among notes in whatever way makes sense on your cluster automatically just because I have 20 Schaars does not mean I need to have 20 machines in my cluster. We can have many shards on a given note. Next question. Elasticsearch is built on Lee for full text Search of documents tour false. Well, you got a 50 50 shot on this one, but if you've been paying attention at all, you know the answer is false. Elasticsearch can index any sort of structure data with any kind of mapping. You can dream up. So it's not just for full text search anymore. It's not just researching encyclopedias and websites and blog's. It can also be used for searching and even aggregating in visualizing numerical data or thyme, Bay State or whatever. You can dream up and increasingly is being used as a tool, for example, for aggregating weblogs from Web servers and sort of building a system that can compete with Google analytics and things like that. So Elasticsearch is not just research anymore. How do you do? Hope you did pretty well there. If not, go back and review those first few lectures because these are important concepts that will provide the underpinning for the all the stuff that we're gonna be doing going forward. But with all this conceptual stuff under our belt, it's time to roll up our sleeves in the next section and actually do some more hands on stuff. So let's get busy. We've accomplished a lot together in a short period of time. We've installed in a boot to virtual machine and got elasticsearch up and running on it. And we've covered the basics of what elastic searches for and how it works. You've already learned some valuable information, so congratulations on reaching this point. But we've only scratched the surface of what Elasticsearch can do in the remaining sections of this course. We're going to focus on hands on activities to show you how to import, search and analyze your big data with elasticsearch in many different ways. Keep on going. I've got a lot left to show you. 8. Quiz: Elasticsearch Concepts and Architecture: This next section is about mapping and indexing your data. To make things interesting will import a real world data set of movies and movie ratings. I'll cover the basics of what we're doing in slides so you'll have him for a reference. But you're going to follow along with me and actually import this data into elasticsearch yourself. Then we'll go over how to insert, update and delete individual documents in your index and some of the complexities that can arise when doing this on a busy cluster will also cover different ways. Elasticsearch can analyze and token eyes your text data and talk about data modelling. When you're dealing with relational structure to data and elasticsearch, there's a lot to cover, so let's get started. So let's go over how to actually connect to your cluster now that we have it set up running on a virtual machine on your desktop. Now, if you're running into boot to desktop machine that you had lying around, obviously you already know how to connect to that. You just sign in and your go to town. But earlier, back in lecture one. When we set things up in a virtual machine if you did that, you know. So we just we signed right into the console on that machine once we started up, And that's not how you would do it in the real world. Typically, you would connect to your cluster or a machine on your cluster using ssh from some remote locations. That's how we're going to do it Here is well within this course. So let me show you how to actually connect to your elasticsearch server using ssh on your own PC. First thing we need to do is log back into our virtual machine and install an open s estate server that does not come pre installed force the way that we did it. So that way our machine can talk. Ssh. Then we need to install some sort of a terminal application on our desktop that we can use to communicate with our virtual machine. So if you're on Mac OS, you already have a terminal application, and you could just use it. Just type in the ssh command and the 1 27.0 dot 0.1 to connect to your local machine that you're running there within your virtual image. But if you're on Windows. Windows does not come with a built in terminal application, so we'll have to go installed something called Putty P U T T. Why, in order to do that and that has a nice little side benefit to it has better support for things like copying and pasting data from your Windows clipboard, as opposed to just logging directly into your machine from the virtual machine. Finally, I'll show you how to actually connect to your so called cluster, which is really just a single elasticsearch host running within a virtual machine on your desktop. And we'll just practice that because we're going to do that every time that we log in and try to do something in this course to your clusters. So let's dive in and see how it all works. So let's get things set up so we can connect to our virtual elasticsearch server here again , if you're just running in a boot to desktop system that you had sitting around, you don't need any of this. Obviously, you can already just log into your server already directly. But assuming you're on Mac or PC follow along here, so start by opening up the Oracle VM virtual box application that you downloaded and installed back in lecture one and slightly. You been to elasticsearch virtual machine and hit start to boot that puppy up shit boot up pretty quickly. Just let it sit back and do its thing until you get a log in prompt Cool. It's now we have a log in. Just type in the log in user name and password that you set up back in Lecture one and we're in. Awesome. Cool. So now to install open Ssh! We just need to do the following sudo apt dash, get install open Ssh! Dash server Just like that and return will need to type in our password again. And that should go and grab everything it needs type and why and then enter and off it goes All right, So now our little virtual elasticsearch server is up and running in listening for ssh requests. And if you remember back in lecture one, we opened up a port for SS eight. So if I were to click on this and hit settings and go to the network advanced settings and go to port fording, see that we set up a opening port forwarding for ssh on Port 22 that allows us to actually connect into our virtual machine from outside of the virtual machines environment. So that means that we can run a terminal application right on our desktop and connect into this virtual server using Ssh. Okay, so how do you do that? Well, on Mac OS, that's pretty easy. You have a built in terminal application, and you could just bring that up in type in the Ssh Command and connect to 1 27.0 dot 0.1 and log in and you're good. But on Windows, there's a little bit more to it. So we need to go to install an actual terminal application because Windows does not give you one built in the one that I like to use in this course is called Putty. So just head on over to www dot putty dot org's and that will take you to this download link here, where you can download party from, go ahead and do that and download the version for your operating system, which is most likely the 64 bit Windows installer and that just download the standard windows installer. You just double click it to run it and let it do its thing. I've already done that. And I've already put a shortcut to the putty application in my start menu down here. So let's go see how to actually use that. I'll have to do is click on the Putty icon here and to connect for the first time. You just want to type in for the host. Name 1 27.0 dot 0.1 and leave that fault port to 22. Okay, then hit open and should prompt you to log in. Now, I've already saved this to a profile, so I have to keep typing it over and over again so you could do the same thing by hitting the save button. And I've already set one up here that is configured to use a larger fought so you can see it more easily in this course. So I'm gonna go ahead and connect to that. You will initially see a security warning. It is okay. It's just saying Hey, I've never seen this host before. You sure this is cool? Yes, I'm sure. And there you have it we have a nice little pretty log in prop from an ssh terminal F cane and type in whatever your password is. Obviously use your own user name and password, and we're in. Pretty cool, huh? So now we are logged into our virtual elasticsearch server from ssh from our desktops. So this is how you would work in the real world? You know, you generally would not be logging in directly to the console of an actual server unless you were, like inside the data center and trying to troubleshoot some terrible problem tickly. You'll use an ssh client like putty or terminal or what have you instead, And that's how we're gonna be doing things in this course when you're done, just typing exit to get out of the terminal application to get out of putty. And then you can go to the virtual box window here, go to machine and say, a cp I shut down two. Shut that down cleanly. And that's how are you going to shut things down nicely in clean when you're done with each luxury session of learning that you want to dio Okay, got it. So that's the basics of getting up and running with logging into your elasticsearch server and how to shut things down when you're done. So make sure you understand all that, cause you're gonna be doing it a lot in this course. 9. Getting to Know the Movielens Data Set: throughout this course, we're going to use the free movie lens data set. So let's just download it and take a look at it and figure out what it's all about. It's gonna be an important thing that we work with throughout the course, so movie lines is just a free data set of movie ratings. And if you go to movie lens dot or you'll see they have a website where you can read a bunch of movies and get back movie recommendations. And over many, many years they have built up a very large database of user ratings and movies as well. So we can use this as a way to play around with not only textual data like movie titles and things like that, but also structured data like movie ratings and the properties of the movies themselves. So we're gonna be using the state a lot throughout the course. So let's just go take a look at what this data looks like and familiarize ourselves with it . So head on over to group lens dot org's just like that in your favorite browser. And let's take a look at this data and figure out what it's all about You click on the data sets tab. You should get to a page that looks kind of like this, and we're gonna be working with the ML latest small data set here. So it's a manageable number of ratings that can be processed comfortably and quickly on your own little virtual machines. So we're not gonna be dealing with big data at this point cause we don't really have a real cluster. So we're gonna restrict ourselves to 100,000 ratings. Let's download that. Take a look at what's in there. So just open that up when it's done and open up the resulting ML latest small folder and we just sort of familiarize ourselves of what's in here. There is a read me dot text file in here that describes what's in there in detail. It's in Lenox format, so make sure you open inward pad or something that can handle that. But let's just dive in into these files and see what's in there. So, for example, you can see these are all see SV files that stands for comma separated values. We open this up in excel or whatever you want. You take a look at what's in there. So let's start with readings dot C s V. All right. So you can see here we have four columns of data in this CSP file. First is the user i d. The movie I d The rating in the time stamps. So this is basically saying, for example, user i d one redid movie I d 31 2.5 stars at whatever that number represents for a time stamp, it's, ah, UNIX format that represents second sense, you know, the epoch of January 1st, 1970 or something not not important for our purposes. So this is the ratings data itself is 100,000 rows in here and again. The format is user i d movie I d rating in time stamp, all separated by commas. How do you know what a movie I d is? Well, that's where the movie starts. Es three file comes in. Let's open that up and you can see here this maps movies to their titles and also to a pipe the limited list of genres. Let's open this up a little bit. So, for example, this state is telling you that movie I D one represents the movie Toy Story from 1995. The release states always in parentheses there, and it belongs to the adventure, animation, Children, comedy and fantasy genres. And this is fleshed out for every single movie in the database. And turns out there's quite a few of them, almost 10,000 at this point. So that's some fund data to work with there. And to make use of this, we're gonna have to parse out that release date from the title, for example, so ingesting the state is going to be a little bit tricky as we'll see. But there's some cool dated analyze here, so we have movie titles that weaken search. We have genres that we can classify things with, so we're gonna have some fun with this later on. Also, we have a Link Stott CSP file. We're not going to use this too much in this course, but basically it's just a mapping of movie ideas to the codes that they are used to represent them on IMDb and T M d. B. I used to work at IMDB, so I'm kind of partial to that one. That's basically the IMDb movie I D for each movie idea and movie. Lynsey can map the two together. Finally, there's a tags dot CSP file that will also play with a little bit in this course and these air user generated tags on each movie. So, for example, some of them are a little bit snarky. Okay, so user i d 15 whoever that is. Tagged movie I D 3 39 which presumably stars Sandra Bullock as Sandra Boring Bullock. Poor Sandra Bullock. I actually like her as an actress. I don't understand that at all. But these are just a list of tags that people have assigned individual movies, and that's also some fun information to mind for trends, so we'll play without a little bit later on. All right, so that is the movie lens data set. In a nutshell. Basically, it's 100,000 movie ratings by real users out there and metadata about each movie. So now that you know what's in there and how it's structured, we can start playing with it. 10. Create a Mapping for MovieLens: now that we've seen some data that we would like to import into elasticsearch, let's talk about map ings, which tells elasticsearch how to store that information so you can think about a mapping and elasticsearch as a schema definition. It's telling elasticsearch what format to store your data in and how to index it, how to analyze it. Elasticsearch usually has reasonable defaults and will infer from the nature of your data the right thing to do. More often than not, it can figure out if you're trying to store strings or floating point numbers or injures or whatnot, but sometimes you need to give it a little bit of a hit. So here's an example of where we're going to import some movie lens data, and we want the release date to be explicitly interpreted as a date type field. So here's what a chase on request would look like to do that. Now we're gonna be using the curl Command throughout this course. That's just a simple limits command that allows you to send an http request to the server that's running elasticsearch. In that case, that's going to be our local host running within our virtual machine and the syntax of the curl command is you need to specify the content headers using the dash H command. So we're going to send along as part of this. Http. Request a header that says content type application. Jason, which tells the server that were sending Jason formatted data. This is new and elasticsearch six in the last search. Five and earlier. You did not have to send that header explicitly. But in elasticsearch six, you do, and it's going to get a little bit tedious. And in the next lecture, I'll show you a way around that. Finally, we sent in X put, which is the verb for the http request the http action. We're going to do a put command to put information into our index. The actual address that we're talking to you is 1 27.0 dot zero doll one, which is the loop back address for the local host. Colin 9200 indicates that we're running on port 9200 and then we say slash movies, which is the name of the index that were manipulating after that incur we say Dash D and then a single quote that says it really did send along the following data as the request body, and those single quotes will enclose the Jason data. That's actually the body of this message that were sending into elasticsearch so that Jason Body contains a mapping section which specifies for the movie data type the properties of the following fields, specifically that the year should be of type date. So that year type date is what's telling elasticsearch that I want to you to explicitly interpret the year field when it comes in as a date type and not just a string full of numbers and dashes map. It's could do a lot more than that, though, so there are several things you might use map ings 41 would be for the types of fields like we saw previously, and some of the valid field types that you might assign to a field might be text or keyword . Those air subtly different. A text field is a strength feel that's actually analysed in index for full text search, while a keyword field is not broken up into its components. So you won't get back partial matches on a keyword field, but you will get back partial matches on a text field. If you're doing a search for a part of that strength, there's also bite types short integer along, float double Boolean and date all the usual different types for various types of numbers and also dates or treat especially to. You can also control whether or not a field is indexed, and that's either true or false in Elasticsearch six. So if that's true, that means the field is query a ble, and it will appear within the inverted index for this index. Therefore, you can search for that thing. But if you just want to get it back, a sort of data that comes along with other queries, you can set that false instead and save some resource is on your index. You can also control the analyzing behavior through map ings. So for fields that are indexed for full text search, if it's a text field, you might want to specify exactly how that text is broken up for full text search. And there are various specific rules for different languages. For example, English might have unique rules for things like stemming So words that end with I N G or E D might really have the same base to them, for example, and plural so delicate in a certain way. And synonyms or Dell, if in a certain way you could also have standard filters, filters that just break things up based on white space, whatever you want to do. But the analyzer mapping is how you control that behavior. So let's talk more about analyzers. There are three things and analyze Ercan do when his character filter so an analyzer can remove HTML encoding and do things like convert ampersand to the word. And so the idea here is that if you apply the same analyzer to both your search query and the information that's indexed, you can get rid of all these discrepancies. So if I apply the same analyzer that maps ampersand to the word and on my search query and in the data that I'm indexing, that means it doesn't matter from searching for ampersand er, and I'll still get ampersand in and results back from my inverted index toke anizers that actually splits your strings up on certain way. So how do I know what makes up a word how our search terms broken up the choice of rear token Isar determines that you can, for example, split strings up just based on white space, on punctuation, on anything. It's not a letter, and there's also language specific ways of doing token ization as well. So that's how you specify how a string gets broken up into search terms that get indexed. You can also do token filtering. And again, if you want your searches to be case insensitive, which usually you do, you might want your token filter to lower case everything. So again, if you apply the same analyzer, you in the same token filter to your search queries and what's index. You can get rid of any case sensitivity if they're both lower casing, both e what you're searching for and what you stored for things to search for. You can also handle stemming So, for example, if you want the word ah, box and boxes and boxing and boxed toe all match for a search for the term box you could use stemming to normalize all those different variants of the given word to the same root stem. Okay, synonyms also a language specific thing. You know, if you wanted toe search for the word big toe. Also trying documents that contain the word large. Your token filter could normalize all those to a given synonym, nor to make sure that there was also work as well. You can also specify stop words. So if you don't want to waste space storing things like the and and A and little words like that that really don't convey a lot of information, you might want to apply a stop word filter as well to your searches. So you don't end up indexing all of that stuff too. Now, there are reasons to actually not you stop words, particularly for phrase for search. So imagine back to our first example that we did back in lecture one. When we search for to be or not to be, all of those might be stop words. So they're all very common words. And if I weren't actually indexing those than that search wouldn't have worked at all. So don't, ah, don't enable stop words lightly. Sometimes it has side effects that you don't really want for analyzers themselves. There are several choices thes standard analyzers, and when you get by default, it splits everything by word boundaries removes any punctuation and lower cases everything . So if you don't know what the languages, that's a good choice to stick with. And that's what we are going to stick with for our movie titles, because we do have some foreign films in there as well. So in our case for a movie, Lynn's data standards is actually a good choice. There's also a simple analyzer that just splits on anything that's not a letter and loaded lower cases, everything and nothing more and even simpler whitespace analyzer that just splits on any white space. So that does not lower case. So any punctuation will be preserved, for example, which you may or may not want, depending on the type of data you have. Sometimes punctuation is important. There's also language specific analyzers, so you can specify your text is in English or any other common language that elasticsearch supports, which is pretty much all of them. And that could do language, specific things like handle stop words and Stemming and synonyms, which is pretty cool. So if you do know that you're dealing with full text data that you want to do, full text searches on and you know what language it's in. You can use the language analyzers to do that. And, you know, sometimes you can even mix and match different languages in the same index. There are some tricks you could do to actually run different analyzers on the same text and store. Those has different fields. So I could have, for example, in English version and a French version of the information in my index that could search separately. So with that, let's dive in and actually create a real mapping for some movies data hands on. So let's actually set up a mapping for movies data in our elasticsearch server, and I'm gonna walk you through getting started here just this once. So again, to get up and running, start by launching the Oracle VM virtual box. Assuming that you are working within a virtual machine and not on a real Lubutu server, select your reboot to elasticsearch virtual machine and start it and give that about a minute to spin up as it boots up. Nice thing is that it boots up pretty quickly all in all because, you know, we did install a bunch of extra software that we don't need. So this way for that to finish. And once we have our log in prompt, we're not gonna log into the console because again, that's not how you do it in the real world and said, we're going to open up our terminal application. So on Mac OS, that would be terminal on windows, it would be putty. And again, if you want to connect for the first time, you type in 1 27 0.0 dot one for the host name here. Or if you already saved the session, you can just load that up to bring that right up when 27.0 dot 0.1 Port 22 is what I should say here. Ssh. Should be selected and hit open, and we can go ahead and blow that up and log in using her own user name and password. All right, so let's just make sure first of all, that elasticsearch is running. We can type in curl 1 27.0 dot 0.1 colon 9200. And just make sure we get something back. Sure enough, we do. So elasticsearch is up and running and waiting for us to do something with it. So now there were in Let's Use the Curl Command to send a request to the Elasticsearch service that's running to set up that explicit mapping for the year field in our movie Lynn's data. So to do that, follow along with me here we're gonna use the curl Command that since an http request to our server, we use a dash age to send the content type Petr. So Dash H quote content, dash type, pay attention to dashes and capitalisation. It all matters. Colon application slash Jason End quote. And then we need to say Dash X put because we're sending a put command to our server. That's the http action the's service that we're going to talk to is 1 27.0 dot 0.1. That's our local hosts I p. Address Colon 9200. Because Elasticsearch operates on Port 9200 slash movies, that's gonna tell elasticsearch that we're working with the movies index and then we will say Dash D and then a single quote, and that just tells curl that the following lines are going to be sent as the body of the http request and that body should be, as we indicated, Jason Data. So because it's Jason format data, we will start with a curly bracket and hit. Enter now I'm going to make pretty tabs. Here to format. Are Jason data nicely? You don't have to do this. Tabs are optional, but if you do want to type of tab character, there's a trick to it. You need to say Control V and then tab, and that's how you actually enter a tab here. So say quote map Ing's quote Colon open curly bracket. Another tab, Another tab quote movie Because we're dealing with the movie type Colon, curly bracket Tab, Tab tab. And then we were going to define properties within that type. Properties in quotes Colon, Curly Bracket, Tab, Tab, Tab, Tab And in the year field, we're going to say colon, curly brackets quote type Colin and dates. All right, so I'm kind of compressing all this into one line because I'm getting tired of typing tabs . But it all works the same way, and finally, we need to close off all of those curly brackets. There's one. There's too. There's three and the final 14 and will close off this request body with another single quote. Double check everything, make sure it looks right. So again, to review. We are using Curl to send an http request with a content type Petr that indicates Jason Data a put action to our local host on Port 9200 affecting the movies index. And we're defining an explicit mapping for the movie type that changes the property of the year to be interpreted as a date type. Let's go ahead and hit, Enter and see if it works. All right, you should be seeing something that says, Acknowledge True shards acknowledged true in next movie, so it looks like it worked. Now, if you want to retrieve that and make sure that it actually took, we can use a get action to get that mapping back and take a look at it. So let's say curl Dash H quote content to dash type colon application Dash Jason. Typing that header in all the time gets tedious. So in the next lecture I'll show you a way around that this time we're gonna do it Ash X, get because we're doing it. Get command to get information back from our server 1 27 0.0 dot one is our local host address. Port 9200 slash movies slash underscore mapping to retrieve the mapping associated with that index specifically for the movie type. So that's what we're saying here with this syntax here. Let's go ahead and hit, enter and see what comes back. That's interesting. So we got back a Jason Blah, but it's not very nicely formatted, so it's kind of tough to read. Let me show you a little bit of trick there. I got hit the up arrow to just repeat my previous command, but at the end, I'm gonna type in a question mark Pretty, and that will cause the response to come back in a pretty formatted response. Let's take a look at what that looks like. That's a little easier to read. Isn't it much closer to what we typed in? And you can see that our custom mapping did get stored in this index successfully. So cool. We have now told Elasticsearch that when it sees a year field in our movies index under a movie type that should be interpreted as a date and not as a string. All right, so next we need to put some data into this thing. And, well, first, I'm not sure you that hack about avoiding that dash age Command, if you're done for now, however, you can just type in exit, and that will get you out of the ssh shell. And if you want to shut down, you can say machine a CP I shut down in virtual box, and at that point, you can safely close down the virtual box manager itself. Okay? All right, let's move on. 11. Hacking CURL (Don't Skip This!): one of the things that changed in elasticsearch Sixes. At every request that goes to your elasticsearch server needs to have a content header that specifies a J Sohn content type and typing that in every time we issue a curl commanded communicate with our server can get very tedious. So let me show you a little hack that lets you avoid doing that. So to make life easier, we're going to create a BIN directory under our home directory on our little virtual server here. And we're going to create our own curl script that lives in there. And this curl script is going to override the one built into the system. So it's gonna be a little bass script that just calls the rial curl script under user Ben Curl, but automatically passes in that dash age content type application. Jason Parameter for us. So we don't have to keep typing it over and over and over again. And that dollar sign at sign at the end of the line there within our script says, Go ahead and depend all the other parameters that we had on the command line for that curl command. So by doing that and then changing the permissions on this script to be executed ble. What will happen is that you boon to will use the version of the script that's under our Home Directories bin folder before it uses the rial curl command. And our script will automatically add that dash H content type application Jason to every curl command that we enter. So that just saves us. Ah, lot of typing going forward. Now, remember, without this hack in place, you will need to add dash age content type application slash Jason to every single curl command. In this course, I'm going to assume that you've done the following steps to avoid doing that. So I'm not going to specify the dash h parameter going forward in the subsequent lectures of this course, you need either do this hack or you need to actually remember to type that in every time. Just remember, you probably don't want to do this in the real world. In the real world, you're not gonna be using curl for the most part anyway, you're going to be communicating with your elasticsearch server using some sort of coding library or some sort of larger framework. Generally speaking, You don't want to be sneaking around behind the back of your system and making commands silently do things that they weren't designed to do Originally, that can become very confusing If you were to try to issue a curl command that required a different content type header, you be left wondering why is that not doing what I'm telling it to do if you didn't remember that you had this thing in place. So remember this is a hack, but it is going to save you a lot of time in this course, and it's only applying within our little virtual machine that we're using for this course. So really, there's no harm done. Let's work on this together and get it in place. So I've already sparked up my virtual box with my elasticsearch image running, and I'm going to open up my ssh client and log in. Now. When you first log in, you should be automatically in your home directory. If you do it, P W D. It should say slash home slash whatever your user name is, and that's where you want to be right now. Now we're going to create a BIN directory underneath that home directory by typing the following and k D. I. R. That stands for make directory space been. And now we can move into that bin directory by saying CD for change directory Been. And if we do a PWD now, you'll see that were under slash home slash your user name slash Been. Now the neat thing about a bone to is that any scripts that are in your home directories bin directory will take precedence over other scripts on your system. So this is how we're going to trick a boon to into using our version of curl instead of the default one. Now we need to create our curl script. We're gonna use the built in V I editor. It's not the prettiest thing in the world, but it gets the job done and it's built in. So let's type in V I curl, and that would create a new curl text file. First thing we need to do is hit the I key. The I key puts us an insert mode, and now we can just start typing. So type in the following hash sign or pound sign which painting where you come from exclamation point slash bin slash bash that tells the US that this is a bash script. Enter and then we're gonna say slash usr slash bin slash curl that will explicitly call the real version of Curl that lives within our system with Dash H quote content, dash type, colon application slash Jason quote. And then we need to pass along any parameters that were sent into this script. And we could do that with a quote dollar sign at sign, quote and hit. Enter. And that's all the script does. Basically, it takes all of the primaries to the Curl Command and sticks it on the end of the rial curl command in our system, but automatically sticks in that dash H content type application. Jason Header Forests so we never have to type it again. Now, to get out of this, you're gonna type the escape key to get out of insert mode. Then we're gonna type colon w Q. And that will write and quit from the editor. It enter, and we now have a curl script now to make it an execute herbal script. This one more thing we need to do That's type in C h M o d for a change Mod a plus X that gives executed oppressions to everybody. Curl is the file that we're trying to effect. All right, so now we should be pretty much done this to make sure that it's working properly. Let's do seedy space dot, dot and I should put us back on our home directory. You can type PWD to confirm that. And now if we type in which curl, it will tell us the path of the curl scripturally executed from. And it should say, the one that's in our home directories bin folder like that. So that means we have successfully overridden Curl to do what we want using our own modified script. And let's just make sure Curl still works. For example, we can say curl 1 27.0 dot 0.1 Colin 9200 and it did send us back East and erred response that we get back that says, you know, for search. But under the hood, it automatically added that Dash H content type application Jason header for us. So again, without this hack, you will need to add dash age content type application. Jason every single curl request in the course. But we're just doing this to save you some typing, because in the real world, you don't really use Curl that much just for like, one off the bugging. All right, that's in place. We can move on and do something useful in Elasticsearch. 12. Import a Single Movie via JSON / REST: already, So we have a mapping to find out for the movie type within our new movies index. Let's go ahead and show you how to actually import a document into that index and for the sake of illustration will import one of my favorite movies, interstellar and the information associated with it. So here's what the format looks like for that request. And again, I just want to back up a little bit. Here. We're using the Curl Command just for sake of illustration. It's an easy thing to use while we're messing around and learning. But remember, in the real world probably won't be using Curl to do this sort of stuff. Usually you're going to be doing something like interfacing within the lasted circuit cluster from some application that you're writing. So in the real world, you'll probably issuing rest queries over http from Java or some Web framework you're developing within or some application that you're writing. This is just for the sake of illustration, but for the sake of illustration, let's use curl here, and we're going to create a put request, and we're going to send that to 1 27.0 dot 0.1 colon 9200. In the real world, of course, you would be sending that to the host name of the actual endpoint for your real elasticsearch cluster. For us, that just happens to be how we reach it on our desktop. Followed by that slash movies is the name of the index. Slash Movie is the name of the type that we're inserting. So we know what mapping to use, followed by the unique I D that we're assigning to this movie. So it turns out that the movie I D for Interstellar in our data set is 109487 So where we're creating a document with a unique idea of 109 for 87 that is a movie type in the movies index, and it contains the following data in the request body. The genre is going to be a list of genres, and we specify a list with those square brackets separated by commas. So it belongs to both. The I Max and sci fi genres has the title of interstellar, and the year, which we define to be in date format, is 2014. So let's go ahead and type that in and see what happens. So I'm gonna pick up the pace a little bit from here on out. I've already started up my virtual box for my boon to Elasticsearch virtual machine, and I've already connected to it using putty on my Windows system to 1 27.0 dot 0.1. If you need a reminder of how to do that, you can go back to the previous lecture and I'll walk you through those steps again. But here we are, logged into our virtual machine through Ssh! On Putty and let's go ahead and do what we saw in that slide and insert the movie interstellar into our shiny new movies index. So curl Dash X put 1 27.0 dot 0.1. Colin 9200 indicates the host name and port that I'm running Elasticsearch on slash movies indicates the index name slash movie indicates the type which we mapped previously, and then the idea that we're going to use for this movie is 109 for 87 dash D single quote . So now we can start entering the body of this request will start with the open curly bracket control. The tab first will define the genre field, which will contain the list in square brackets I max and Sci fi comma because Ross specified more fields. The title, for example, will be interstellar, and the year will be 2014. Close that off and put in our single quote to end that. And there we go and you can see that we got back a successful response into the index movies of type movie with the i D 10 109 for 87 We did in fact, get a created event with a successful flag. One successful request zero failed requests created. True, that means everything worked. Hit Enter again just to get a clear prompt there and that's it. Let's go ahead and retrieve that. Shall we just approved ourselves that it's really in our index do that we can say curl dash X, get through sitting a get verb this time, meaning that we want to retrieve something from elasticsearch. We're not putting something into it. 17.0 dot 0.1 Colin 9200 slash movies slash movie slash underscore Search question Mark Pretty let me explain what this is doing. So slash movies means that we're searching the movies index movie Means were searching the movie type underscore. Search means we want to do a search, and there are no further parameters or request body here, meaning that we just want to get back everything We want to get back all the movies, every movie type that's in our movies. Index and question Mark Pretty just in a case that we want our results formatted in a nicely formatted way that has tabs and new lines and all that good stuff. And sure enough, there it is. There is our interstellar movie from the year 2014 with both the IMAX and sci fi Jonah's attached to it within the Jason formatted response. So there you have it. We have successfully inserted our very first document into elasticsearch and, ah, a good choice. It is. I think, that's a fine movie if you haven't seen it. All right, let's move on and do something a little bit more complicated, inserting many movies at once. We'll do that next 13. Insert Many Movies at Once: so importing one movie is all well and good, and sometimes you'll need to do that. But more often you'll have to import a bunch at once potentially. Maybe you're importing some existing data set like I don't know the entire movie lens data set. So let's see how that works. How do I important many documents at once over arrest Query and Jason. Well, this is what it looks like. So there is a bulk import format that elasticsearch uses. So if you just hit yours elasticsearch servers endpoint there and say slash underscore bulk , you can then give it a bunch of Jason information for individual documents that you want to insert all together at once. And this is what the format looks like. So you do have a create field here that says what index and type and idea I want to create , followed by a separate Jason structure that indicates all the data that you're actually inserting for that document. So in this example of where inserting records for Star Trek beyond Star Wars, Episode seven, Interstellar, The Dark Knight and Plan Nine from Outer Space all at once. And don't worry, I'm not gonna make you type all this in that curled us have a format that lets you just specify a data file that has everything in the body in it already. So we will do that instead. But it is worth understanding the format here, so look closely here. If you're going to write your own Jason file for importing or maybe writing a script to create a file like this, that's what it would look like. Now the elasticsearch documentation refers to This is kind of a funny format and I don't know, it looks normal enough to me. But, you know, if you were to create a Jason request that handled multiple documents at once probably wouldn't look like this. Really. The reason that it split up like this in the individual lines is because if you remember right on elasticsearch, every document gets hash to a specific chard, and so we kind of need to deal with these documents one thing at a time. So what happens when you send this into a given server on your cluster? It will go through this one documented a time and then send that document off to whatever shard is actually responsible for storing that particular document. So that's why we have this format where things were sort of broken up into their own pairs of lines, reached documents so it can easily send that off to specific shards for further processing . Let's go ahead and run this and see what happens. So again, I already have my virtual machine running and I've logged into it with Ssh and let's go ahead and insert that data. So instead of making you type in all this movie information, I actually already put a Jace on file up on the Web for you. So to retrieve it, you can just type in the following w get http colon slash slash media dot son Dog dash soft . Make sure you spell that right dot com slash es slash movies dot Jason Capitalisation matters. Every little punctuation letter matters here, so make sure you get that right. If you dio, you should see something like this where it downloaded that file to your home directory. Let's go ahead and see what's in it. It's a cat movies dot j saw, and there's the contents of that file, and it is what we saw in the slide. So you can see that we have these pairs of lines where first we have a create command that tells you what index and type and I d. You want to index the following document as and then I can use that information to actually hash that to a specific chard and send off the document information itself to the appropriate charge. And that's where the individual fields are defined in the following line. So for the first pair, for example, we're saying that we want to insert a movie into the movies index of i. D. 135569 and the following line specifies all the fields of that document. So the i d. The title The year that John Rose, which is again a list of genres, and it will do that for each subsequent movie. So we're gonna insert five movies all at once. Let's see how that works. So, using curl, we would do it like this, curled on x put. So again the verb is put. We're putting data into the index and then to our host name 1 27.0 dot 0.1 and deport 9200 is what we're using slash underscore bulk. All right, and we'll say question Mark pretty just so that we can see the response and make sense out of it. And with the curl command, this is just specific to the curl command. We can say dash dash data dash binary space at sign movie stock. Jason. And that means that for the body of this request, I want you to just import the file movies Dodge a song from the directory that I'm running within. So let's go ahead and do that. And it worked. Cool. So if you scroll through here, you can see that here's the response message that I got back gives you back all the items that actually inserted. So let's take a look here. We had a successful creation of almost everything. We actually got a error back, though, on interstellar because we already inserted that one, huh? Pretty interesting. How so? You see here that era says true, that means that there was a problem. So you do need to go through the responses he was going on. And sure enough, it does not let you insert the same movie I d twice while you can, and we're going to talk about that in our next lecture. But for now it did the right thing. It said You can't insert interstellar again. You've already got a document from your cellar. So it did what we wanted to do, really. But for the remaining movies that did successfully insert them, we could go ahead and take a look at what's in there. Let's go ahead and say Curled up X Get 1 27.0 dot 0.1 Colin 9200 slash movies slash underscore Search pretty, and you can see that we got back all over movies. We have Plan nine from Outer Space. Star Trek. Beyond the Dark Knight Star Wars Episode seven and Interstellar is still in there as well, from our previous exercise. So cool it worked. Who let's move on and learn how updating an existing record works. We've done inserts. Next. Let's try updates 14. Updating Data in Elasticsearch: So we've covered inserting documents or indexing new documents using the Jason format in arrest, a p I. And don't worry, we'll show you other ways of doing it later on in the course where you don't need to actually use curl and right arrest queries by hand. But you remember that we try to insert the movie I D for interstellar a second time when we did that bulk import and it gave us back an error. So you might ask yourself itself how to actually change a document or updated document with sorry to be in it been indexed, and it turns out it is possible. So here's the thing. Elasticsearch documents are immutable. You can't actually change them once they've been written. But there's a work around for this. Elasticsearch automatically maintains an underscore version field on every document that you put in. So what you can do in order update a document is creating a new copy of that document with an incremental underscore version number. So all that matters is Athey unique. I d. For the document and its version together are unique, so you can have multiple versions of a given document and elasticsearch will automatically use the most recent version that you've created. So when you do an update request on elasticsearch, what really happens is a new document. An entirely new documents gets created with an increment ID version number, and then the old document gets marked for deletion. And when Elasticsearch does this little clean up past later on, whenever it feels like doing it, that old version will go away. Okay, so what does that really look like? Well, here's what a natural update requests. Looks like from Curl. You would do a post as opposed to a put or get Ferb to the host name and port, Of course. Specify the index. See one in our case movies, the type movie, the idea of the document, which in this case is 109 for 87 for the movie Interstellar. Then underscore update meaning that you want to take the existing record and only update one or more fields. So in this Jason requested body were saying under the doc format, all we're going to do is update the title, and I'm not sure if this is actually different. It doesn't really matter, but what this command is saying is that I only want to change the title field of the existing document for Document i d. 109487 I will copy over everything else that existed in the previous existing version, and it will then create a new copy of this document with an updated title that has a new version number. So let's just dive in and see how this works firsthand. All right, let's play around with updating and see what's really going on under the hood here. I've already signed into my ELASTICSEARCH server here through Ssh. So let's start by taking a look at what's already in there for the movie interstellar. So if you type in this command here curl dash X, get 1 27.0 dot 0.1 colon 9200 slash movies, which is the index slash movie slash the movie i d. 10 09487 Question mark Pretty. That's going to tell Elasticsearch to retrieve the document i d. 109487 which I happen to know is interstellar. So let's go ahead and hit that and c we get back now. I want you to look more closely at their response this time. So what we get back in Jason format for this query is that the document retrieved is in the movies index senators of type movie and of the idea that we specified. Now take a look at that underscore version is one. So what we have here is Version one of our interstellar document, which says that the titles interstellar, the years 2014 and it's in IMAX and sci fi movie. Let's say we want to update this. Okay, let's say that I want to change the title to something else. Let's go ahead and do that. To do that will say Curl Dash Ex Post 1 27 0.0 dot one Colin 9200 slash movies slash movie , which is the type and then our type, which is 109487 slash underscore update Dashti Single quote. We're going to give it a Jason body here Open, curly bracket, Doc, and we're gonna change the title. Feel to something else. I don't know. Ah, interstellar Blue Because I like that movie. Woo, that's That's my That's my Take it. Excitement. People close off that curly bracket there and a single quote And there we have it. So take a close look here. The version number that we got back was now, too. Okay, so we've actually created a new copy of this document with a new version Idea of two instead of one. So when I go to retrieve this document now, just like it did before my responsible now say version two because version one has gone bye bye. It's marked for deletion at this point. Now, what would happen if I try to insert this movie idea again? Well, let's find out. Curl dash X put 1 27 0.0 dot one Colin 9200 movies movie 109487 Make sure our response is pretty. But when Curly bracket and I'm not gonna bother with tabs for this particular one Genres, I'm X sci fi. So we just got to try to reinsert this document from scratch. So that's interesting. How so? It actually did not error out. Instead, it just created a new version for me automatically. So when I'm using the put interface for creating a new movie, if it does see a duplicate i d. Already there will automatically increment the version and updated instead. So see that the results says update announced, that have created. So it's telling me that even though you told me to create this thing, I just updated it for you. Hope, hope you're good with that, and it did succeed in writing it. The created field, however, is false because it did not actually create this document. All it did was updated. So that's how updates work in elasticsearch. Pretty interesting stuff now. Like I said, those old versions air still around, but they are marked for deletion, so it's not safe to try to actually retrieve them At this point. The next time Elasticsearch cleans things up, those will be gone, and only the most current version Version three will be around. Cool. So there you have it. That's how updates work. 15. Deleting Data in Elasticsearch: all right, we've covered inserting documents. We've covered updating documents. So for completion, we need to talk about deleting documents, and it really could not be easier. Turns out there's arrest verb for delete. All the effort do is use it so the command would look like something like this. If you know the document idea of the document, you want to delete it, just delete. And then the host name that we're talking to for elasticsearch, followed by the index name movies in this example, followed by the type, which is movie in our example, and then the document I D that you want to delete. So let's say you wanted to leave Document I D. 58559 That's what the command would look like. It's just that simple. So quickest lecture in the world. Let's go try it out in practice It All right, let's delete something. So in the real world, in practice, you'll probably need to do this in a couple of steps. First, you need an actual document I d to delete, so you might have to do a search to get that document I D. First that you want to get rid of Let's delete the Dark Knight, shall we? It's a great movie, but you know, we got to do what we gotta do. Let's start by figuring out what the document I d for The Dark Knight is. So let's do a quick search query here. Curl Dash Exit Get 1 27 0.1 colon 9200 slash movies were searching the movies Index underscores search Question Mark Q equals dark and we'll talk about this search format later on in the course. But it's basically a quick one liner way of saying I want to search for a specific term. Q stands for query, so Q equals Dark means that I'm doing a query for the term dark, and this is what we get back. Sophie parses out. We can see that the idea is sitting in their 58559 for The Dark Knight. So now we have our movie I D, and we can just delete it. Let's go ahead and say Curl Dash X delete on 27.0 dot 0.1 colon 9200 slash movie slash movie slash 58559 and you can see that all the information we need to do this deletion request is there in our previous search response. What's at a question mark? Pretty so we can see the results in a nicer format. Sure enough, we got back a successful deletion. It says that we did in fact, find that document and the result is deleted. It did successfully delete the Dark Knight. So if he issued that search query again should be gone. And sure enough, it ISS. So you can see that while our query itself was a valid query, it returned zero hits. So there are no results in our database at this point that contained the search query dark . Now, remember again that when we mark something for deletion and elasticsearch, it doesn't really get deleted immediately. This kind of a separate sweet that happens behind the scenes that actually cleans things up periodically. But for all intents and purposes, the Dark knight has gone for us now, and that's all there is to it. That's how deletion works in elasticsearch. It's just that easy 16. [Exercise] Insert, Update, and Delete a Fictitious Movie: Alright, it's time for you to fly without a net. I haven't exercise for you, and all we should have to do is go back and refer to the last few lectures that we did. All I want you to do is make up some fictitious movie makeup, a movie about yourself for about your doll, agree, or goldfish or whatever you want. Give it a fake name and come up with some genres for it in a release year and go ahead and insert. That is a new document into the movies index that we already created. After you're done, I want you to do a search and make sure that it's really in there and then do a partial update. Change something in that movie, changed the genres or change the title a little bit. Changed the release here. Whatever you want, just change something. Do another get and make sure that you actually had that change take effect. And then finally, when you're done, delete that movie. Clean it up, get it out of there and again make sure that it's really gone. So again I should have to do is go back to the previous lectures to figure out the format for doing this sort of a thing should be pretty straightforward. And if you need to come up with some fake document, I d just pick some really big number like 200,000 that that will be fine. All right, so go hit, pause and give that a shot. When we come back, I'll show you how I did it so you can compare what you did to what I did. So let's go ahead and create update and delete a document just like I challenge you to do. I hope you had some success there that's often own is a cruddy operation. Create update, delete kind of a fun little shorthand for that. So let's make up a movie and insert it. Let's see how actually go about doing that. Let's say a curl bash X put because I'm putting something into my index. 17.0 dot 0.1 Colin. 9200 slash movies going to the movies index, the movie type. That's what we're using. And I'm gonna make up a an idea of 200,000 and I will say I want this in pretty format for the response and provide the following Jason body for the data that I'm inserting. Start with curly bracket title. We'll call this movie Frank's Adventures with Elasticsearch. I know that I'm avoiding punctuation there. You need to like specially escape things that are unusual characters. So that's why I didn't say a posture. Yes, hopefully didn't run into that issue yourself. But that's a different story for genres. Let's call it a documentary member. John uses a list, so that needs to be in square brackets and we'll give to release Year of 2017. All right, so it looks like it went in successfully. We have a creation result. It looks good to me. Let's do ah, get just to make sure that it's still in there. Sure enough, it came back. Awesome. So now let's change something. Let's change the genre's Let's make It Not just a documentary but also a comedy. Hopefully, it doesn't end up that way, but let's let's do it. Curl Dash Ex Post is how we do an update. 1 27.0 dot 0.1 92 100 slash movie slash movie slash 200,000 slash underscore update meaning We want to do a partial update of this document with the following request Body duck genres , and we'll make that documentary and comedy this time Close that off. Looks like it was successful. Let's do that. Get again just to see what's in there now. So I'm just using the, uh, power to go back to my previous command here. Why didn't search? And sure enough, my Jonah's have changed to documentary and comedy, and you could see that my aversion number has been implemented to version two, so it looks like our update worked. Awesome. Let's delete it now that we're done. So to do that, I can just say curl dash X delete 17001 92 100 slash movie slash movie slash 200,000 Because I already know the I d for that document. We'll make it a pretty response, and sure enough, it's gone by. Do that request again to get it. It's not there anymore. So there you have it. We've done a complete crowd cycle, create update and delete every type of operation. You can do it a very fundamental level in elasticsearch. So that's the basics right there pretty cool stuff. I hope you had some success with that exercise. If not, please go back and try it again. Now that you've seen how I've done it, for additional reference with that, let's move on to some more advanced topics. 17. Dealing with Concurrency: So when you're dealing with distributed systems, sometimes you can run into really weird problems with concurrency. What happens when two different clients are trying to do the same thing at the same time? Who wins these air concurrency issues? And let's talk about how to deal with that with elasticsearch. So here's the problem. Let's say that we have two different clients say they're running a Web application on a big distributed website, for example, and there, maintaining page counts forgiven documents that are viewed on will call those documents pages on our website. So let's imagine that two people are viewing the same page at the same time on our website through two different Web servers. So basically, at this point, I have two different clients for Elasticsearch, retrieving a view count for a page from Elasticsearch. And since they're both asking at the same time at this point that is trying to get the current count for that page, let's say it comes back with the number 10 10 people. Look to this page so far. Now both of these clients want to increment that at the same time, so they're going to go ahead and increment of you count for that page and figure out that I want to write a new update for this document that has a view count of 11. And they will both in turn, right, a new document update with a new view count of 11. But it should have been 12. So you see, there was this brief window of time between retrieving the current view count for the page and writing the new view counter the page during which things went wrong due to this concurrency issue. And this is a very real problem. If you have a lot of people hitting your website or hitting your elasticsearch service or cluster at the same time, this sort of weirdness can happen. So what we do about it? Well, there's a solution called optimistic concurrency control. So let's walk through how this would work. It makes you said that version feel that we talked about when we talk about updates. This is where it comes into handy. So in the same situation here we have two different clients that are trying to retrieve the current view count for a given page document from Elasticsearch, and they both get the number 10 back. But remember, when you request something from Elasticsearch, it also gives you back a version number for that document. So I know now that the view count of 10 is associated explicitly with a given version. Number of that document Let's call it version number nine just for the sake of argument. So now when these guys say I want to write a new value for that page count, I can specify that I'm basing that on what I saw in version nine. So when you do an update, you can specify the version that you're updating. So what would happen then? If two people try to update the same version is only one of them would succeed. So let's say the 1st 1 actually successfully wrote the account of 11 given version number nine. The other one would try to say, OK, I want to update this document explicitly for version number nine in the last two Search could then Stell. Hey, my current version is actually 10 not nine. Something's wrong here. You're basing this update on the wrong information, and at that point you can try again on that particular client, so I would just go back and try to re acquire the current view. Count for that paid. Start over basically, and then I'll get back version 10 of that document, which contains 11 and I could then increment that to 12 and write it again, hopefully successfully. Now you don't have to do this necessarily by hand. There is another parameter called Retry on Conflicts. When you do an update that will allow you to automatically re try it. This happens. That's kind of a nifty feature, so that's optimistic. Concurrency control If it ah, you might want to stare at this slide for a little bit. If it doesn't make sense, you know the again. The idea is that if you have many Web servers or many clients that are trying to talk to elasticsearch at once and trying to update the same document at the same time, you can use the version numbers in order to ensure that you're not stomping on each other and retry on conflicts. And using an explicit version of her in your updates are ways to work around this issue. So let's just go and see how it works in practice. So let's practice using optimistic concurrency control in action. So I've already spun up my virtual machine for Elasticsearch and logged into it through ssh using putty on my Windows system. So let's go ahead and retrieve the current document for interstellar and see what the current version number is. So, you know, imagine that we're trying to update interstellar. To do that, I would do something like Curl Dash X, get 27.0 dot 0.1 colon I 200 slash movies index movie type and the document ideas 109487 and will pretty print the results. Okay, so you could see that the current version number is three that I'm working with. So what I can do is specify the version number that I'm modifying when I'm doing an update . So let's do an update. It's a curl dash X put 1 27.0 dot 0.1 colon 9200 slash movies slash movie slash will 109 for 87 and watch this question mark version equals three. So that's telling me that I am explicitly updating version number three. And if that's not the current version that you have is the latest one. Give me an error back. Okay, Let's actually give it some data to insert here to update, uh, open curly brackets. Genres. I'm ex side by title, and we'll just update the title here to be something different. Interstellar year release years. Still 2014 and call it good. All right, so you can see that succeeded. And I have a new version. Number of four that results is as a result of that. Now, let's say that I'm another client who was trying to do that same update at the same time. Let's try and do that same command again where I'm saying okay, I want to update version number three explicitly. So imagine this is coming from some other server somewhere. Some other client who didn't get the memo that somebody else already updated this diversion for and you can see that actually got a conflict air. Their version conflict, engine exception, reason version, conflict current version four is different in the one provided three. So you can see there that I got an Arab actress like I should, because elasticsearch has detected that I'm baking a a update request based on outdated information So this is how optimistic concurrency control works by just taking note of the version number that you get back when retrieving a document and then specifying that version number when you attempt to update that document. So it's a very reliable way around this problem. Let me show you how retry on Conflicts works. To handle this automatically. For example, I could say Curled Ash Ex Post, Let's do a partial update this time. 1 27.0 dot 0.1 colon I 200 movies Movie 1094 87 Underscore Update And this town That's a question mark. Retry on conflicts. Conflict equals five Dashti And this time was just automatically update that title and nothing else changed that back to what it waas like this. All right, so what's going on here is that I'm doing an update query So under the hood that's going toe automatically retrieve the document that currently exists the current version, and it's going to change it and submit a new one. So this is basically automatically doing what we did before. It's getting the current version, and then it's going to change it. In this case, it's only going to change the title and then try to put in a new copy of that version under the under the next version number. But it will automatically because I'm saying, Retry on conflict, implement optimistic, concurrency control. And if in fact there is a conflict, it will just keep retrying. It will go back and get the current version again and just try it again until that it does actually get a consistent response. So if we do have a concurrency issue this retry on conflicts intact with a partial update, well, just automatically do the right thing. So that's very handy. Let's see if it actually works. Sure enough, it did. So it is successful. It successfully updated it. Let's just make sure that that's what's actually stored at this point. Curl Dash Exit Get 1 27.0 dot 0.1 Colin 9200 slash movies slash movie slash will. 109487 Pretty and sure enough, our titles back to what it was interstellar and everything looks great. So that's optimistic. Concurrency control in action. You can either do it yourself by getting a document in taking note of the version and implementing it with your update. Or you can use partial updates that does it all for you, with the retry on conflict parameter set and the value of retrying conflict is how many times it will retry before finally giving up. So, you know, if I were to actually do five re tries in that previous example and it still had a problem , well, I might want to raise some sort of an issue to me or the user saying Something's really weird going on here. Maybe there's some sort of attack going on my website where everyone's trying to hit one document at the same time or something. Who knows? But that's it in action, optimistic concurrency control. You can see it's not really that hard, and it's a very useful tool for managing weird concurrency issues on your elasticsearch cluster 18. Using Analyzers and Tokenizers: We've talked a couple of times in this course about analyzer, so let's go in a little bit more depth about them and how they can help you control how your text fields air searched. So something new in Elasticsearch six is that you need to make a decision. Every time you have a field that contains textual information, you can define it as one of two types, and that determines whether or not your text fields air searched as an exact match or if they're analyzed, meaning that they can be partial matches that are right by relevance. Now, if you do want a text field to be an exact match on Lee, you use what's called a keyword type. When you're defining your mapping for your index and that will suppress analyzing on that text field, it will only allow exact matches on a keyword field. So by defining a keyword mapping on a text field, that means the only return that you're going to get on a search on that field will be an exact match case sensitive and all on what's in that field. Now if you do want more partial matches than you need to choose the text type matching instead, and that will allow what we call analyzers to run on that field. Now you can specify what analyzers specifically you want to run on that field if you are using a text type. And there are different analyzers for specific languages, for example, and they could do things like make your results for search case insensitive, it can handle stemming, for example, dropping the anger, the ed or the S suffixes on your words and making those all equivalent from a search standpoint. It removes stop words like you know it and and and the it can even apply synonyms and have those treated equally in a search. Also on an analysed field. A search with multiple terms doesn't need to match all of them. If you have just one of those terms, match what's in a text field. It will come back as a potential result, maybe with a lower relevant score, however, so let's dive in and see how this all works. It's best illustrated with an example, So let's just mess around with analyzers and some searches to kind of illustrate the point of how full text search works and how you can modify things using your map ings and your analyzers settings to get the results that you really want. It's important, understand, because again, you need to set up these mappers before you index any data. So it's stuff you have to think about upfront. Let's start by just doing a search for Star Trek out of the data that we've already indexed for movies. So let's do something like this. Curl Dash X Get 27 outs here that's here, not one cola night. 200 slash movies Movie index and type underscore search with pretty results following search body Jason Request Open Curly Bracket. Let's do a query where we're going to do a match of the title Star Trek and just close off those curly brackets and let's see what we got back. So let's scroll back up here so you can see that I got some results here. Uh, we got back Star Wars as the top result. That's a little bit interesting, isn't it? Followed by Star Trek beyond So why did I get Star Wars for a search for Star Trek Now? In fact, Star Wars came back with the higher relevance in Star Trek. How is that even possible? When I was searching for Star Trek explicitly. Well, this is a good a time as any to talk about the an issue where you if you have too many shards and not enough documents, things don't really work the way you would expect. So the problem is that in practice, the in verse document frequency is only computed per shard. Okay, so what probably happened here is that Star Wars is stored on one shard and Star Trek is stored on a different shard. They both have different inverse document frequency scores for the term star, which ended up giving Star Wars a higher boost than it would've if everything was into one shard. Now, in practice, this won't really be a big issue. If you have a large enough corpus of documents in your elasticsearch cluster, another potential work around would be to set up only one shard on small data sets like this. But bottom line is, don't get too hung up on this right now. But point them trying to make, though, is that even though I search for Star Trek, I got back Star Trek and star wars because both contain the term star, so they're both considered legitimate responses to the search query. Maybe that's what you want. Maybe it's not what you want, but you have to remember that that's how analyzers work. They're going to try to give you back anything that even remotely matches where you're searching for, and it will try to give it back to you in relevance order, and sometimes it gets it right. And as we've seen here, sometimes it doesn't get it right. So if that's behavior you're willing to live with and that's cool. If not, then you need to think about how you structure your queries and how you analyze your data when you're indexing it. Okay, let's Ah, look at another example here. Curl Dash X get 1 27.0 dot 0.1 Colin 9200 slash movies movie on. We will search for the following Let's look for ah genre of Sai and see if that actually works. We're actually going to do a phrase match here, so you would think that only a search query of sci fi would even work. But in fact, it gave me back Sci fi movies, even though I only said Cy and I wanted a phrase search again. It's because the default analyzer is being used for the John R. Field. So even though I'm saying match phrase, it's splitting those up into individual search terms in the inverted index, so it doesn't matter. You know, it seemed that even though I'm trying to search for the phrase sigh, Sigh itself is being treated as a phrase, and that itself is being used as part of a search term search. So because of the phrase quote unquote, Sigh exists within side ash fi thes air, all coming back is valid responses. All the SciFi movies, All right. So, again, even if you're saying match phrase, that's not the same thing is saying an exact match for the field. If what you really want is an exact match, you need to disable analyzers on that field. So let's do that. Let's start by blowing away this whole index and just starting over. Okay, so curl dash X delete 1 27.0 dot 0.1 92 100 slash movies Again. You cannot remap things after they've been index, so if you do need to change your mapping. This is what you have to do. Unfortunately, you have to blow it away and start over. So now we've blown away the movies and excellence to find a new mapping for it. We could do that thusly. Curl dash X put 1 27.0 dot 0.1 Colin 9200 slash movies Dashti Single quote Open curly bracket defining map ings for that's index specifically for the movie type with the following properties. All right, so as before will define the movie I d. As a type of imager and the year will remain of type date and now things get a little bit different. So we're gonna take the genre field and to find that as a type of keyword. So by defining that is a key word type that means that we must have an exact match on the genre field. So the word sai will no longer match against the genre. SciFi. Only a search for side ash fire with a capital s and a capital F will actually return a positive hit for the sci fi genre now, because that is now a key word type. Now, in contrast, the title, we still want relevancy scoring on title searches. So we're gonna define that as a text field, meaning that it could do partial matches and return results in order of relevancy. That might not be an exact match, and furthermore, we can define an analyzer that is language specific. So let's say it's, ah, English language that we are assuming Artiles air in the real world, you could have foreign films, and that might not always be the right thing to do, but just for illustrative purposes will do this. And that means that English language, specific things like how words air stemmed or how synonyms are applied can be applied to searches on the title field. Now we just need to close off all those brackets. 1234 Close off the quote and sure enough, it worked came back with a true acknowledgement. Everything looks good. So now we can reimport the movies data thusly. Curl dash X put 1 27.0 dot 0.1 Colin 9200 slash underscore bulk question. Pretty data dash binary at movies dot Jason. Cool. All right, so those are all really next with a new mapping. So this point, we should have an English analyzer being applied to the titles and no analyzer at all for the genre. So let's see how the behavior changes. Now let's actually do that same quarry we did before for Sy Curl Ex, get 1 27.0 dot 0.1 92 100 movies. Movie Search. Pretty nasty quote set up a query and thusly. We're on the search for the genre. Sigh like we did before, and this time it doesn't return anything because we did not analyze the John R. Field at all. Nothing but a exact match is going to work now, which is kind of what we wanted. You know, the side Jonah doesn't exist, so why should I get results for that? Let's ah, try making that SciFi. Instead, let's hit the up our on going back here. Hey, that didn't work either. Why is that? Well, remember, it has to be an exact match. Even lower casing isn't getting applied because there's no analyzer at all. What I need to search for is actually sci fi with the proper capitalization and now get some results back. So there you have the behavior of a non analyzed field. Sometimes it's what you want. Sometimes it isn't so. The thing to keep in mind is that when you're dealing with full text data, you don't always want to be doing full text. Search is on text data. Some strings just aren't meant to be treated that way. It's sentences or searchable fields, so you have to think about the nature of your data and how you're going to use it When you're setting up these map ings. Let's do another search for Star Wars. Just for fun Girl Dash X gets Don't try. Seven. It's here, it's here about 1 92 100 movies. Movie underscores Search Pretty Dashti quote, and we'll do a query with a match of title Star Wars, and you can see that we now have an English analyzer applied toothy title field. So I'm getting back anything that remotely matches Star Wars because it is an analysed field. That means I'm getting back Star Trek in addition to Star Wars in my final results here. So that is what I would expect. So I hope that helps you understand how analyzers work just to recap again. If you have a text field and you want to be an exact match, only you should define it as a keyword mapping. Otherwise, to find it as a text mapping, in which case it will do partial matches and return results to you based on relevance. You have to think about how these fields are going to be used and what the appropriate behavior is when you're setting up your mapping prior to actually indexing your data. So I hope it all makes a little more sense as to how analyzers work now. 19. Data Modeling with Elasticsearch: next, let's talk about data modelling and elasticsearch. You may have heard that in a lot of distributed systems like, you know, Cassandra Mongo, DB and things the usual advices to de normalize your data. So if you're coming from a relational database background where different types of information are tied together through relations, that may seem counterintuitive. Now, with elasticsearch, you can kind of do it. Either way, it's up to you whatever makes sense and let me talk about some of the trade offs of normalizing in de normalizing your data and elasticsearch, and also how to represent parent child relationships, which you can also do. So this is sort of the traditional relational database way of modelling data normalized data. So let's take the example of looking up a movie rating from the movie lines data set. Now you might remember that the ratings data itself only consists of a user i d a movie I. D R rating in a time stamp. So let's say I want to retrieve a rating for a given user through elasticsearch and have a rating type I'm gonna get back from that initial query for a given rating is going to be the user i d. The movie I d and the rating. But what I want to display to the user is not the movie I d. But the title of that movie because you know the number 6733 doesn't mean anything but the title Interstellar might right? I just made up that number. Don't check me on that, so I would have to do a second request to look up the actual title of the given movie I d. Using the movie type in Elasticsearch. So this is a normalized approach where I'm storing ratings and movie names independently because I don't want to make a copy of the movie name in every single rating entry. There are a lot of ratings out there and not a lot of movies, so this is a much more compact way of storing information. I only need to store all the movie titles once, but it requires two hops to get all the information that I need. I need to issue two queries instead of one to get back the combination of the rating information and the movie title that that rating was four so Another benefit of this is that it makes it very easy to update or change movie titles. Not that that would ever happen, of course. So you need to think about these considerations. Is that really likely to be a problem? But it does mean that if I did want to change a movie title, I could do that in one entry, one single document in the movie type and that then be reflected through my entire system automatically. So that would be a valid reason to stick with the normalized data approach if you are willing to deal with those two hops instead of one. But you know, you got to think about the fact that's going to double the traffic on your cluster, potentially for all of these request that you're doing, and you really shouldn't get too hung up on the storage space aspect of it. I mean, yes, it's more efficient to only store those titles once. But remember, you're dealing with entire clusters of computers here. Every machine on their cluster has some ridiculous amount of memory by, you know, historical standards. So storage is very cheap these days. You know, this day and age you should not be optimizing based on minimizing your storage usage when you're talking about clusters of machines, okay, it's more about Do I want the ability to easily change data, and I do. I have the capacity to deal with the increased traffic that doing these two hops will take . And can I tolerate the additional late and see that's required to actually do to transactions instead of one that depend on each other? Mind you to get back the information that I need. Sometimes you care a lot about page response time, where the response time of your application and you really would prefer to get that back in one transaction instead of two. So if that describes your situation, a de normalized approach would make more sense for you. In this example, I would actually import the title into every single rating record. So instead of just having used variety in a movie idea an A rating, I would also have the movie title in there as well. So when I retrieve the given rating record, I would get back. The title is part of that. That means that yes, I will need to copy those movie titles into every single rating where that there is a rating for that movie. That is a big waste of this space. But who cares to Space is cheap now and now, instead of doing to queries, I can only do a single query and get all the information I need. The real down side of this is that if I needed to change a movie title, for some reason, that will be very difficult to do. I have to go through an eatery through every single rating, looking for that title and changing it. If, if necessary, it's not impossible, you know, I mean, that can be done so it really shouldn't be a show stopper. And when you're dealing with things that don't change like the title of a movie, it's not really a big deal to begin with, right? You're talking about maybe cases where there was a typo or something originally in your original data. Another thing that we can represent in a structured manner is parent child relationships, and we're gonna look at this in a real example in a moment. So let's say I want to model the relationship between different kinds of entities one example might be movie franchises in the movies that make up that franchise. So, for example, the Star Wars franchise consists of the movies. A New Hope, Empire Strikes Back, Return of the Jedi, The force awakens, and whatever new movies come out soon to continue that saga, we'll just pretend those other three movies don't exist. And Rogue One is kind of like its own side story, So that doesn't really count. Yeah, I'm that much of a geek anyway. You can model this sort of a thing in elasticsearch as well. It does have built in capabilities of modeling parent child relationships that are just part of Elasticsearch. That's kind of a cool thing. So let's dive in and see an example of this in action, and we'll actually set this up. So let's make that last slider reality and actually show you how to set a parent child relationships in elasticsearch. Six. It's a different way of doing it in the last insert sixth and five, by the way, in elasticsearch six. The way it works is that we have a new type of field called a joint field, and this field is used differently in parent and child records. There is an apparent you define what kind of parent it is, and in a child you define what child type it is and also who the parent is and let's see it in action. So before we create our index for these movie Siri's and the films that are in them, we first need to create a mapping for it. So let's see how that works. Let's say curl Dash X put 1 27.0 dot 0.1. Colon 9200 slash syriza's the name of the index that we're going to create. Dashti quote. Curly Bracket will create the following map Ing's so we're going to create a movie data type, and each movie will have the following properties. It will contain a field called Film to Franchise, and this film to franchise field will be of type join. So this is the feel that we're going to use to tie together parents and Children and specifically we will define the following relations we will define. The parent type is franchise and the child type is film. Okay, so that's what that syntax means were defining a film to franchise field of type join. And that is a joint that defines the relationship between a parent of franchise and a child of film. Let's just close off those brackets. 1234 and we're good. So now we can actually import some data. Let's get some dated it to import. W get http. Colon slash slash media dot son dog dash soft dot com slash e s six slash Siri's dot Jason . Now let's take a look at what's in there, less serious dot Jason So there's a few things to talk about here. First of all, you'll notice that there's this routing colon, one parameter and all of our creation events here within our bulk load a p I. That's because in elasticsearch, six parents and their Children all need to be added within the same shard. So we're forcing everything to be indexed into shard number one for processing here. That's that's what that's all about. This is a limitation of parent child relationships in Elasticsearch six, and in addition to the performance hit of actually maintaining these parent child relationships, that's also something a little bit weird about them. So you kind of only want to use these things if you really have to. So you see that in the first entry that we're defining here, we're creating the Filmed A franchise named Franchise. So this is where we defining the parent itself of Star Wars. So this means that we're defining a new franchise which is our para type entitled Star Wars . So this first line is creating these Star Wars parent. Then we create all the individual films that are a child of that parent. So, for example, here we're adding Episode four, A New Hope and the film to franchise joined Feel There is set to name of film indicating that this is a film child and its parent is I. D number one. So again we look back to the Star Wars parent that we defined. It had an idea of one. So as we define each child, we indicate that the film to franchise join Field is indicating that this is a film which is the child type that we defined in our mapping. And its parent is parent I d one. So we do that on every individual film all the way down to the last entry. Well, almost everything I don't actually have the last Jet I in here because I made the state of set before that film came out. But we'll just gloss over that, shall we? So now that we have this, we can actually index it. Let's say curl dash X put 1 27.0 dot 0.1. Coghlan 1900 slash Underscore Bulk question mark Pretty Ash dash data Dash binary at Siri's dot Jason and that's successfully imported all of our parent child relationships between the Star Wars movie franchise and the films that are within it. So let's do a little query and see if it actually worked, shall we? So now that everything's been imported, let's actually do a query. Let's start by getting all of the films that are part of the Star Wars series. So here's what a quarry like that would look like. Well, say Curl Dash X, get 1 27 001 Cold night. 200 slash Siri's slash movie slash Underscores search. Pretty Dashti quote. All right, so we're gonna do a query and here's how we search for anything with a given parent we can say has underscore parent with a parent underscore type of franchise. So we're searching for things that have a parent type of franchise and will put a query within this well, say query, I will do a match query where we're looking for a title of Star Wars. All right, close all those off. 123 So this line is saying that we're doing a query for documents that have a parent type of franchise where that parent matches a title of Star Wars. Close off those brackets and see what happens. Hey, I think it worked. So let's scroll up. We have, ah, all of the Star Wars movies minus the last Jetta, and whatever comes after it, just due to the Age of the State of here. But yes, seem to work. So that was actually how we got a successful query of all of the movies within that franchise or all of the Children within the Star Wars franchise parent. So we can go in the other direction as well and try to find the parent forgiven child. So let's find the Siri's name for the force awakens as another example. Syntax is similar. We could just say curl Dash X get 27 outs. Here it is here that one colon 9200 slash Siri's is our index. Movies are data type. Underscore search Pretty Dashti quote, and we'll issue the following query. This time we're gonna say has underscored Child, as opposed to has parent of a type film, and our query will be a match of title. The force awakens close off all those brackets. 12345 And let's take a look at what we're telling it to do here. So we're Issue Macquarie for any document that has a child of type film where the child matches a title of the force awakens. So it's gonna find any parents that have a child that has a title of the force awakens, and we should get back Star Wars as we did. So there you have it going both ways. That's how you set up a parent child relationship in elasticsearch. Using joining feels to tie them together and using the has underscore parent and has underscored child query types actually query between parent child relationships. So that's cool stuff. Again, there are tradeoffs involved in doing this. There is a performance that to maintaining those parent child relationships and those queries that spanned the parent child boundary can be expensive, and furthermore, you need to make sure that you index them all in the same shard when you're actually importing parents and their Children. So lots of restrictions and caveats associate with parent child on elasticsearch. But sometimes you got to do it. And if you run into one of those times, that's how hope it's helpful. Congrats. We've covered the basics of using elasticsearch, importing data through Jason and rest and inserting, updating and deleting documents. We've also covered a few more advanced subjects, like dealing with concurrency analyzers, toke, anizers and data modelling. Keep going. You've learned a lot, but there is still more essential information you need in order to put elasticsearch toe work for you or for your organization. See in the next section. 20. Using Query-String Search: in this next section will focus on search. I mean, that's kind of what Elasticsearch was made for. There are so many ways to use elasticsearch to help your customers find the data that they need quickly will cover different search interfaces. You can use how to sort, paginated and filter your search results, how to conduct fuzzy queries and partial matches, and even how to implement those searches. You type features that you see on many big websites these days. There's a lot of interesting, useful and valuable information in this section, and as before, it's all hands on. Let's get started. Let's talk about query line search, which is a handy little shortcut you can use sometimes for experimenting around with elasticsearch. So in the past, throughout this course, we've been using full body Jason requests for structuring our search requests, and that's usually the way you want to do it. That's kind of the right way to do it. But if you're just kind of messing around and experimenting, there is a shortcut. Sometimes goes by the name of query light, and it looks like this you can actually issue a search request without having any request body at all. You can squish it all into the URL, which can make life a little bit easier when you're just messing around with curl and stuff like that. So let's look at this top example issued a get request through http with just the following your l it would work, and what this is saying is, I want to search the movies index. The movie type underscores search means I want to do a search request and the question mark q equals means I want to issue a search with the following shorthand query, where it has a title that contains the term star. So that little girl in and of itself you just like type that into a Web browser to the correct port that you're running elasticsearch on would bring you back a Jason structure of all the movies that contain the word star in the title. And it's not limited to a little simple queries like that, either. It's actually very extensible in conducive, very complex things. So look at the second example here that one saying Q equals plus year greater than 2010 plus title trek, and this example would actually search for a movie set. Both have a release year greater than the year 2010 and have Trek in the title so you could do the sort of Boolean operations with that plus sign. And you can actually even do. Relational query is where your when I mean relational in the sense of greater than less than not relational databases where you can actually compare years to, you know, greater than or less than or equal to some years well, and there are actually even more complicated things you can do. I mean, if you really take the time to understand query, light syntax, there's not a whole lot that you can do with it, but it's not quite as great as it sounds. The problem is, when you're sending across Yarl's, you need to Ural encode them very often, and that can make things very confusing very quickly. So you couldn't actually reliably send across that string as is. As part of a rest request, you would have to your own code it in order to securely transmitted over the Internet. And that means that any special characters need to be encoded into their Hexi decimal equivalents here and coated out. So in reality, you'd end up having to type something like this with all these, you know, percent to be in seven underscore percent free instead of a question mark in percent three e instead of a plus sign. I mean, in practice. You know, depending on how you're sending these requests across, you might be able to get away with not encoding everything. But for a query like this, there are things you will have to your own code. And this kind of starts to take away the readability. Right? So the practical value of using this is just a quick one off shortcut kind of gets diminished when you start talking about your own coding and they need to deal with that. There are also some other reasons why you should not use this little short cut in production, for sure. I mean, first of all of these query strings could get pretty cryptic and tough to debug. So while it is powerful and you can cram just about any kind of quarry that you want onto that your l parameter, it gets pretty, pretty weakly pretty quickly. So you're always better off having a structure. Jason request where you can kind of see what's going on in a more structured and manner that makes sense for lack of a better word. It could also be a security issue. So if you're actually allowing end users to input these your else somehow you never want to give any users the ability to just send arbitrary data to your server, right, so you know that could be a dangerous thing. An end user could very easily create a search query string on the euro there that you know , does some incredibly intensive operation that brings down your clusters. So you definitely want to make sure that you don't open this up to end users, and it's also fragile. You know, again, these Ah, these parameters can get very cryptic very quickly and one wrong character and your host, you know, and it's tough to figure out what's going on. You know, it's related the first point of it just being cryptic and tough to debug. But if you're just doing quick one off experimenting or verifying the search term exists, or something like that for simple, quick one off things, it could come in handy. So let's take a look at how it actually works. If you wanna learn more to search for you are I search. That's what's formally known these days and elasticsearch. So if you go to Elasticsearch is website on Elastic Taco and search for you or I search, you'll see the full documentation on how to construct these things. I'm not gonna go into it into a ton of death in this course, because again, you really shouldn't be using it very often. But if you do want to learn Mauritz, the information need is there, and you can see that in addition to the Q parameter, there's many more things you can do as well. You could even specify the analyzer you want, and the query strings that you send across can be very complex. And there's more documentation available there. But let's just do a couple of quick experiments and see how this actually works in practice . All right, let's mess around with query line search or your eye searches. It's known these days do a quick example here just to show you how it can be handy. So let's do, ah, little search query like we saw in the slides of just searching for things that have the word star in the title. You could do that with a one liner like this girl. Dash X Get I'm going to use a double quote here toe. Enclose everything in because I do have some special characters and with curl that can get you pretty far, sometimes without having to your own code. Things. 27.0 dot 0.1 Colin 9200 slash movies index slash movie type underscore Search question Mark Q equals title Colon star an agnostic and pretty at the end, so I get nicely formatted Jason results and you can see that actually worked. So it gave me back results of all the moves that contains starting the title Star Trek. Star Wars stuff pretty cool. And we did that without any request body at all. So that's a way to quickly do a search query. Just when you're experimenting, doing a one off thing without having to mess with Jason requests at all. Let's try and take it one step further, though, when I try to run into some of the limitations of this, So let's do that other query where we were searching for movies released newer than 2010 that also contained Trek in the title. That would be something like this. Curled Ash X Get quote on 27.0 dot 0.19 200 slash movies. Movie underscore Search question mark Q equals we're going to save plus to indicate a Boolean and operation here year. Colon, greater than that's the syntax for doing a greater than query 2010 and then plus for another and as part of our Boolean operation title, Colon Trek and Pretty. And what this should do is give me back all the results that have the word trekking the title and were released after the year 2010. Okay, and again, I'm using your I search to avoid the need for a Jason body. But wait, that didn't actually work. Why don't I get back into interstellar that doesn't contain the word? Ah, track it all. Star Wars was basically this did not work, and it's not at all obvious why it didn't work. Well, it turns out it's because I did your escape it so it actually couldn't make sense of anything past that first plus sign so I ended up doing a query of everything in the entire index because everything after that for us, plus sign for review. But that looked like just got tossed out because plus has to be your I encoded, it turns out, so you can see how you know this is a great example of the need to Ural on code all of everything that gets sent across. It's also a great example of how you can get very cryptic results that are difficult to debug. This is all these air, both good reasons not to use query line search. Let's actually your own code this and get around that problem so I could say curl dot exe. Get 1 27.0 dot 0.1 Colin 9200 Movies, movie underscore Search question mark Q equals And this time I'm going to your own code. That plus sign is percent to be, which is just the X a decimal code for the ask a value of plus you can see how this gets to be a pain in the butt real quick Percent three a percent three e 2010 plus percent to be title percent three a trek and pretty, and that time I got the results that I want. I mean, so you know, this is again just to drive home the point that it's not as great of a thing as it seems doing you are. I search, you know, If you compare that search to this search, you can see that's both much more difficult to construct because you have to look up. The girl escaped codes for all these different characters. And also even worse, it's not doing what you thought it would do. Like it just court sort of silently failed on that first query and gave you back results that were unexpected. And again, these are all reasons why you should usually stick with just full Jason query structures within your search requests, and we'll talk about that and more depth in our next lecture. 21. Using JSON Search: so hopefully I scared you away from using. You are I search in that previous lecture, but you will run across out once in a while, so it's important you know what it is and how it works. But most of the time you want to stick with Request Body Search, where you're actually sending a Jason request as part of the request body to Elasticsearch , and we've seen a bunch of examples of this already in the course. So let's just talk about it a little bit more depth and take a closer look. So here's a a chase on equivalent of a request body search for that same thing we did in the previous lecture, just looking for movies that contain the search term star in the title and you can see here the beginning of the euro looks the same. You know, it's the actual host name and ports, followed by the index, followed by the type followed by underscores Search, and I'm sticking pretty on there to get nicely formatted results. By the way, in production, you would necessarily need the pretty argument if you're just dealing with the response in a programmatic manner from an app or something that can just eliminate some overhead that you might not need otherwise. So only use pretty when you're messing around as well. And the request body is where the magic happens for the query itself. So here you can see that we have a more structure definition of the search query were defining a query block. And within that query block, we have a match clause, and that match clause contains the actual match operation of matching the title field to the term star. So figure out how to structure these requests can get a little bit confusing. Often you need to refer back to some examples or the documentation to get it exactly right . But at least when you look at it, it makes sense. And we generally don't to deal with weird your Ellen coding because we don't have all these weird, cryptic characters to make up your eye searches. By the way, Ah, you might not have seen before, but get requests can have a body. Usually, when you talk about get requests on http request. You're talking about just retrieving a Web page from a Web server, but you can actually send a body along with get requests as well. Sometimes that trips people up. It is a legitimate thing to dio anyway. Things you can do within a query, you can have two different things. There are filters, and then there are query So queries air usually used for returning data in terms of relevance. So when we're doing something like searching for the search Term Trek, you would use a query because you want to get back results in orders of relevance as to how relevant the term truck was to that given document. However, if you have a binary operation where the answer is basically yes or no, you want to use a filter instead. So filters are much more efficient than queries because not only are they faster, but the results can be cashed by elasticsearch so that if you do another query using the same filter, they can get the results even faster. Let's look at an example of how that might look. So let's do that other second example that we did using a your I search in the previous lecture and see how you would do that using a request body search instead. So here we're again structuring a query. And within that query we have a Boolean query, a bull that means that you can combine things together and the equivalent of an and where the bull query is called must so by saying, Must term title track were saying that this query must contain the term trek within the title to be a valid results. But we're further going to filter that result by having a range query that contains the year greater than or equal to 2010. So again, the syntax is sometimes something You just have to look up. But this is what it will look like for this particular example, and this syntax has changed from version to version. Even so, this is sort of the new way of doing it in Elasticsearch five. But it is easy to read, so if you look at it again, you can see we have a query that contains a Boolean expression where you must have trek in the title, and you also must have the filter past the condition of the year being greater than or equal to 2010. And those sorts of range comparisons must themselves be within a range query. So that's what that would look like Now. There are many different kinds of filters. Range is just one of them. You can also do term filters, So if you need to do filter by some exact value of a term, you can do that with a term filter. It would look like term year 2014 for example, to filter out on Lee things that contain a year that equals 2014. You can also do terms if you want to match anything in a list of values, so we terms, Filter could say in this example. Anything that has a Jonah of sci fi or adventure is what that reads out to be. We already looked at a range query in the previous slide. If you want to just do a greater than or less than or greater than or equal or less than or equal sort of operation that will be arranged filter, there's also an exist filter we can just test of a given field exists at all within a given document. Again, the scheme is only loosely enforced, so you know you can in fact, have documents that are missing fields entirely. And if you want to just test if it exists or not, the exist filters what you want and the opposite of exist is missing. If you only want to find documents were given, field does not exist. You could use the missing filter, and finally there is a bull filter as well. And we've looked at a must example that's basically the equivalent of. And there is also a must, not which is the equivalent of the Boolean operation not, and this also a should, which is the 1,000,000,000 equivalent of or so you can do complex Boolean filters, where you can have conditions that must be combined, you know, with and relationships or or relationships or not relationships and whatever you can imagine using a bull filter. Query zehr different kinds of queries as well. So, just like there's different types of filters, it's different types of queries. Match underscore. All is the default but just returns everything. So if you don't specify any qu