Applied Data Science - 3 : R Programming | Kumaran Ponnambalam | Skillshare

Applied Data Science - 3 : R Programming

Kumaran Ponnambalam, Dedicated to Data Science Education

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
14 Lessons (2h 47m)
    • 1. About Data Science Series

      8:12
    • 2. R Studio walkaround

      6:40
    • 3. R01 Language Basics

      12:04
    • 4. R02 Vectors and Lists

      8:51
    • 5. R03 Data Frames and Matrices

      14:41
    • 6. R04 Data and Input Output Operations

      10:30
    • 7. R05 Programming and Packages

      12:41
    • 8. R07 Statistics in R

      3:01
    • 9. R08 Graphics in R

      6:51
    • 10. R Examples 01

      16:18
    • 11. R Examples 02

      15:05
    • 12. R Examples 03

      17:17
    • 13. R Examples 04

      17:29
    • 14. R Examples 05

      17:22

About This Class

This class is part of the "Applied Data Science Series" on SkillShare presented by V2 Maestros. If you wish to go through the entire curriculum, please register for all the other courses and go through them in the sequence specified.

This course focuses on R Programming. It explains the various constructs of the language and provides examples of how to use them.

Please download and install R and R Studio

R installation : http://cran.r-project.org/bin/windows/base/

R studio installation : http://www.rstudio.com/.

The resource bundle for this course can be downloaded from https://www.dropbox.com/s/ayicd007a35j9j0/Resources.zip?dl=0 

Transcripts

1. About Data Science Series: Hey, welcome to the course are played data signs with our This is your instructor, Cameron Parnham belong from video Mastro's Let's Go Through and understand what this course is all about. The goal of the course is to train students to become full fledged data practitioners. So we're focusing on making people practitioners who can execute into event data since project right from start off acquiring data all the way to transforming it, loading into a final later our destination and then performing organs analytics on them on finally achieving some business results from this analysis, what do you What you by taking this course is you understand the concept and concepts of data signs, you understand the various stages in the in the life cycle off a data science project, you develop proficiency to use our ANDI use are in all stages off ANALITICO right from exploratory Data Analytics to directive an hour. It takes to modeling toe. Finally doing prediction using machine learning algorithms learned the various data engineering tools and techniques about acquiring data and cleansing data on transforming data. Acquired knowledge about the friend machine learning techniques on also learn how you can use them and also most importantly, then you can use them become a full fledged data science practitioner and who is can immediately contribute to real life data. Science projects notto mention that you want to be taking this knowledge to your interview so that you can get a position in data science. Terry was this practice we wanted touch upon this particular thing off theory versus practice, data, signs, principles, tools and techniques. Image from different signs and engineering disciplines. No, they come from computer science, computer engineering, information, terry probability and started sticks, artificial intelligence and so one on theoretical study of data signs it focus on these scientific foundation and reasoning off the various Mission Learning Gardens. It focuses on trying to understand how this mission learning Salgado's work in a deep sense on be ableto develop your own algorithms on. Develop your own implementation of these algorithms to predict a real ball problems. Just one dwells into a lot off in our equations and formal on deprivations and reasoning. Whereas the pact is on the up late at part of the data, science focuses on a playing the tools, principles and techniques in order to solve business problems get the focus on trying to use existing techniques and tools and libraries on how you can take these and a play them to really work problems and come out with business deserves. This one focuses on having adequate understanding of the concepts a knowledge of what are the tools and libraries available on how you can use these tools and libraries to solve real world problems. So this course is focused on the practice off later signs, and that's why it's called Applied Data Science Inclination of the courses. This data science is a trans disciplinary subject, and it is a complex subject. It doesn't mainly three technical areas to focus on. So there is math and statistics that is mission learning. And there is programming on this course is oriented towards. You know, programming is oriented towards existing software professionals. It is heavily focused on programming and solution building. It has limited and asked required explosion exposure. The math and statistics on it covers overview Off machine learning concepts gives you articulate understanding off how these machine learning all guarded them books. But the focus is on using the existing tool to develop real world solution. In fact, 90 95% other work that later science time. Just do in the real world is the practice of data science. Not really, Terry, of greater science and this course strives to keeping things simple and very easy to understand. So we have definitely made this very simple. We have stayed away from some of the complex concept. We either they tried toe tone down This complex concepts are just stayed away from them so that it makes easy for understanding for people of all levels off knowledge in the in the data science field. So it is a kind of a big nurse course. If I may say that the core structure it is goes through the concepts of greater sense to begin with, what exactly is their assigned? How does data science works? It looks at the life cycle of data saints with their various life cycle stages. It then goes into some basics of started sticks that are required for doing data signs. It then goes into our programming. It question to a lot of examples of how you would use our programming for various stages in data science project. The various stages in data sent injured Data engineering, part effort. What other things you typically do in there that's engineering one of the best practices in data undulating, it covers those areas. Finally, there is the modeling and predictive analytics part where we build into the mission Learning or God Adams. We also look at Endo and use cases for these machine learning algorithms, and there are some advanced topics also that we touch upon. Finally, there is a resource bundle that comes as a part of this course, and those results bundle basically contains all the data sets. The data filed the sample court example coat on those kind of things that that we actually teach as a part of this course which is covered in the examples all of them are given in the resource bundle. So do I Don't know the resource bundle that has all the data you need and all the core sample that you need for you to experiment the same things yourself. Guidelines for students, the fasting this toe understand their data. Saints is a complex subject. It needs significant efforts to understand it. So make sure that if you're getting stuck, do review and relieve you the videos and exercises does. He called help from other books on land recommendations and support forums. If your queries 1000 concerns does, and that's a private message, our do posted this question question, and we will be really happy. Toe responded that as soon as possible. We're constantly looking to improve upon our courses, so any kind of feedback that you have is welcome on. Please do provide feedback through private messages are two emails on the end of the course . If you do like the course, do give leave a review. Reviews are helpful for other new prospective students to take this course and to expect Maxim disc ones from other future courses from We Do Mastro's, we want to make that easy for our students relationship with the other. We do Masters courses are courses are focused on data science, really a topics basically, technologies, processes, tools and techniques of data saints on. We want to make our courses self sufficient as much as possible, eh? So what that means is, if you are an existing we do master student, you will make see some content and examples repeated across courses. We want to make themselves a vision So rather than saying that, are any point in the course? Okay, girl, look at despotic like other courses. Register for the other course and learn about this. We rather want to focus on this course itself. Keep two things in the same course itself. Unless that other concept is a huge concert. That theirselves of separate course. We want to India them as a part of this course itself. So you might see some content that is repeated across courses. Finally, we hope this course helps you to advance your career. So best of luck. Happy learning on Don't keep in touch. Thank you. 2. R Studio walkaround: Hi. Welcome to this presentation on how we use our studio. So I have this year the Art Studio Council opened for you on what you see here. The woman you get into our studio is that there are four windows in our studio each capable of are doing different things on. I actually have rearranged my windows differently. And you can also do that. And I will also show you how so. What you have first is a window in which you can look at any kind of source code here. So on this left side top you we know that gives you sore scored on the left side. Right. Sorry. On the right side top, you have the council. These are the council you will get if you just go into our that basic are consoled into the same console here on Dhere. What I've also done us, you know, made these phone size a little bigger so that you would be able to see them in this presentation more clearly. At the bottom, you see are the environment it is going to show in the spine. The environment is empty. But the what you will see is as we keep going through the radius examples, you will see that this one starts showing all the variable to Darren memory, and you can actually explore these variable to see what is actually in them on. Then you have options of importing data sets, clearing the environment on. You can also have a history toe. Very can go and look at all the common study of executed earlier. So you can visit all the comments executed earlier, and then take a look at them and maybe copy them. Copy them either into the r programming window, our in the council and use them and in the bottom, right. You have the file, Explorer. So you can I get a different file systems like you can take on them and navigate. You can create new folders. Rename folders are most importantly you can set a specific folder as you're walking battery and stuff like that on. Then, whenever you executing a commander of executing a comment that gives you lots, then the plots will show up in this blocks window. And from here, the blood can be exported as image. Our pdf the packages window tells you a list of all the packages that currently installed on your our studio so you can all look at them. And then maybe you can call the great command toe update all your packages. That is the help window in which you can go and search help on any command that you want. Let's say I want a surgeon and for a common call, mean type in here. And then you chose up here the hell for mean in the mean You'll see that, OK, there is the command and then the definition off all the arguments with command and then some references. And it also has some examples in the bottom. The viewer is used for any other purpose. If you're bringing up any other things and the charts, typically you will get involved too. We look at the command, the top men. Oh, there is a regular file operations like open fire, the recent fires and stuff like that on. Then edit the typical editing you can do on the file on the file and that's court help in there, that is view the more important thing. You want to look at it. The session one. So when our studios running it is typically running to processes in your windows system that is different and our process and there is a back in our session. So in this case, you can play around with this and do things like that. Workspaces nothing. But you're all your in memory objects that formed the workspace. So if you're working on something, you can always save the workspace. All the variables and their values get safe on, then you can receive the same set of values and store them also back. And if you want to on, then there is the tool said, which is very regarded the global options in there. Here is where you go on, figure out a lot of things here in the cord, everything. You can have some options in terms off. You won't dated the court the appearances where I have increased the phone size and there are different edit that teams, if you want to use like Bob Black Background team and the White back on team, if you want toe The pain layout is where I went and changed my set up to show the get concerts. A show here in stuff source I can make it this Joe, the console or the environment. So there are four pains in which I can go and player on my things. The packages tab is buried points to the clan mirrors of Whenever you want install a package, it automatically goes there and gets the package for you. Pretty simple and straightforward on. Then, if you have using jet, then you can actually point a jittery as we and so automatically the court can We checked in and checked out and stuff like that. So on the left side is your court is the court file. Right side is your console on the environment is in the bottom and then you also see the file process. So whenever you are executed, common, you can go to the council on Ondo will come and likely let's start files. Can't say something like get working directory Now here you see, and then it immunity if size and then say this is my working directory. The command is given here, and then it comes up with the notebook. You can also execute commands from the window on the right, where you can put enough piece of court and then you can execute a code by, like, selecting that particular Very But Lane. Of course, you can select the line off court like this, and then if you hit run here, it gets executed on the right window. So you can also execute commands like this on. The one won't execute this command. You see there at the bottom, it start building up my variable list. So I'm creating a variable called a bar chart a carver with the value of programming and immediately shows up in the bottom where I can go and inspect this list of values if I have to. So this is a view. Typically use your window and then you record the source. Yeah, you can source it, even source it with a coe. Got all kinds of options here. And there are other things like searching Accord on, then a bunch of some according tools, like in a court compilations. Deflect that. So this is ready Much. Have you would use our studio in the example that are going to be we're going to see we're going to be using this our studio for executing on our court examples and also are on machine learning algorithms and stuff like that. Eso do play around with our studio a lot more trying to get family raised with all the functions that is going to give you if you have been used to other programming. I d e s. This is B a pretty simple thing for you to learn and learn and adapt to the 3. R01 Language Basics: Hi. Welcome to this model on the our language. This is your instructor, Cameron. Here. Let us started with another standing of what our language is all about. Our language was originally created as a language for statisticians. Were statisticians to do a number of statistical analysis with the data that is, there has been there for quite a number of years, but awfully. The our language has expanded itself to the entire data signs stack on then you today. It supports all kinds of functions for doing any enough data sense activity starting right from the data acquisition part. So the another to the transformation and analysis and prediction and all kinds of stuff. So it is definitely a full fledge the knowledge, foot flit language that you can use for data signs. So it was very rich set of features, very rich set of functions. In fact, it has so money functions that are multiple implementation on everything. You want to go with it. People keep putting in a lot off. People keep adding to the repository off the functions and libraries that are available. So I got a massive library center use the only deficiency with our is the amount of data it can handle are always keeps. All data in memory are always keeps all data in memory. That means you are limited by the amount of memory you have on your system, so that is the only limitation off our. So it's only when you were using really large data sets that cannot fit into the are memory . You have to take alternative approaches like you have to like bright your own court to put data into desk and keep getting it back. Our use, another language are used. Another big data technologies like my produce and harder to get your things. Stan. Other ways are such a powerful language. It's very easy to use with writing very few lines. Of course, you can do a lot of things in our so it's really powerful with that handling large data sets and table data sets and even instruct your data. So the students fasting I want to say here is that we assume that you have programming experience in another programming language. In fact, we assume that you have built certain riel life applications using other programming language. We are not going to be going into explaining very is basic self programming like water. Why Lupus? What are far Lupus? What an array Zara Water listed. So that's not what we're going to concentrate on. Rather, we're simply going to say how those things can be achieved in the our language. What is the syntax for a four low? What a syntax for a while, loop are working of things are available. Eso If he added, loaned went really in depth class or not, that is going to take and our like 40 hours off its own class. Rather, we are going to focus on the main things here, focused mostly on how did omission learning and are on. Leave it to you to go and seek other resources to learn everything else. In fact, you know it's not possible for you to learn. I love them and one go, you're going to just learn as to use it. Use cases for you, keep coming up, but are as an extensive set of help. There is asked helping the our studio. There is help from our website, and there's also a lot of help available on the Internet for you that's going to the basic self the language again. The first thing you want to know and be always countries about is that our as the concept off of working directory, a working directory is always assume by our. So any time you create a new file in our it is going to go create that file in the working directory unless you give an absolute correlative part in your filing the same a place when you're trying to read a file, it is going to create it by default in the working directory, unless you're giving a related or absolute part on the file name. When you when you're passing the command. There are two commands called get working Directory and said they're working directory that I used to manage your current working directory. How can you get helping are there are number of open source resources available for helping the art language? One of the first sites you want to go in it is the www dot our project that O. R. Ark. You got all kinds of hate, their down lords there no libraries, documentation and everything is available in this website. So you have. This is your first place, where you might want to go and look for help and manuals. Excellent manuals are available here. When you are using the are command line, you can use the question Mark Command to get help on any other commanders there, including functions and library. So any time you get a library in our start using it, the library does come with its own help, so you can use a question mark Command to get some help on the library also, and there are a number of our farms available, like Nabel and Stack. Overflow and cross validated. Our help. Meaningless. Now there's so much help available on the Net for you with our on Google, search is pretty powerful. With respect are you can just go put in anything you want. Like I want to know those I want to know that, and you will definitely see some help coming out. And that is a pretty simple and easy to use language. So taking health on the Internet and using it is not going to be a big deal. It's pretty easy, so let's just jump. So before we jump into some information, some basics off the language. So the way this press these class is going to be structured is that we have a separate presentation to describe the basic self, the class basics after language. And then you have another presentation where we're going to show you a demo off. Doing the same thing with our studio so they can be like to part kind of things or use have separate lectures for each of them. Variables creating variables and are are easy and straightforward. The assignment operator that is used in ours, the less than hyphen less than hyphen is the assignment operator, which you are going to be using again and again. You can also use equal to. It's also a lot, but less than iPhone is the most preferred way of doing things. The variable type that has supported in our character numeric and Boolean. So three types of variable types. Like any other language, it doesn't kind of differentiate between subsets like, you know, numerical doesn't goingto indigent and float and small and those kind of things that just stays and these basic levers artist case antidote and so are the variable name. So please nail this one into your heads. Are is case sensitive, whether it is variable names, data function names, library names. Everything is case into there. So in case you're trying to stop something, it says something is not phone. You're pretty much making a mistake with the cases. So always be sensitive to this part artist. Very case incident, and you will keep getting into trouble again. Again, If you forget this one single part, there are no explicit variable type declarations. And are you not going to cleric in Florida? And things like that were able to take the type of the assign value. So you just Assane toe a variable any value the variable takes the type of that assign value. Like you say. Here we are, too less than iPhone three. It means you're ascending a value of three. Tow this variable. The variable automatically takes the value. Three. Now you can reassign some other value to this variable, and it'll just reassign answer. So it's just magically. You don't have to bother about type conversions in our strings. The next thing you wanna bother about in our strings strings are enclosed in single or double course like any other language. Straightforward the can coordination function. How do you can coordinate our joint strings. Is the function called C. It's console can coordinate. So here is an example of how you can candidate two strings programming and languages or gun string less than hyphen C programming languages. So you see that this is your assignment operator. This is the function, and you just fastest parameters in there. On this value, programming languages will be stored in this variable. Special characters can be included with the backlash similar to a lot of other programming languages. Like you want to go carriage or turns on a bright line feeds use a backlash on something. And, yeah, you're going to get it assigned pretty quickly. A number off string manipulation functions are available like the standard ones and a string split left right amid, you know, string search. Everything is available here again. You can seek the help for how this impacts for these functions are going on to the next one date and time. How was dating time supported in our soap? 06 Date is supported as a standard former in our sir, the data's internally store. As a positive state, you can convert character to date with the function. Ask dot date like this one. So as Dark Tate can take any date, former, you can also pass explicitly. What former that the data is coming in, and it's going to convert a string into a date format for you again. A number of date functions available for former conversion date arithmetic. All kinds of stuff is available here, so pretty much straightforward. Then comes the most important thing, which is called Missing Data. Suppose you have records in a data set like, you know, record after record arose, and sometimes values can be missing, and you typically see those missing values are handled in different ways in different systems, like you might broker blank there, Orazio. There. If you're using a database, you would have been using something called Null and u L. L Now is something that is used in databases for missing data, the same kind of representation. How does missing get that gets the president in our is by the tinkle in a So if you see that the value off a specific variable is in a it means it is not available and not available is an important part off our and that is because a lot of our functions are sensitive to missing values. So suppose you have, like, 10 values and then you trying to find call the function mean to find the mean of these numbers on if one other values is in the the function will give you an error. So you have to use an additional para meter in the function got in a dark autumn. It's or not available, removed. So if you don't and made out army called true, then that function is going to ignore the in a values and do its operation on the rest of thing. So you would be very sensitive. So this one that if any is available, your functions can start her. They basically will say we cannot process because we phoniness on your B uses in NATO bottom, you call the truth. This is because you are to make a decision as toe. Have you want to handle missing data missing, get eyes A very important thing when it comes to mission learning, which we will see in the later lectures. You can use the start and their function test if a variable as, um nothing is a missing really? So you can get This is all you test If a variable aside, the value any not like is the variable Amy called to any you are a test by using is not any on. Then you pass the variable name and this is going to return a true or false. So this is all you. Look out. Look for missing data. Thank you. 4. R02 Vectors and Lists: hi. The next important thing you honor learn in our is water called Victor's Vector is like the basic data structure in which you are going to be handling data most of the time in our E and r, you're hardly going to be handling individual data values. Rather, you are always handling a set off data, a set off their values, and those are always handled through vectors. So what are victors? Factors are equal and one dimensional Ari. So if you have been using any other programming language, you would have been seen. Something called Varies so it is equal in tow. One dimensional re's on most data handling in our happens through vectors. So what is when you say, what's an array? Is a sequence off list of data elements off the same data day Given Victor can only hold date off off given data types, so it's either a vector of characters. A vector off numbers are a big or a vector of Boolean values. It's not going to mix one thing with another. It was equal and off a raise. But victors, of course, if you have three types of Victor's, in fact, you can have Victor's off other victors, victors of lists of Victor's off data frames. So director can basically hold any other data type a simple or a complex data type, and not but all the items. All the elements within a vector has to be off the same type. That is the only limitation that you have. Vectors can be created manually using the Sikh Ament. In fact, when you import data from other systems or a file, you'd Piracy actually also creates individual vectors automatically. So how do you create a vector? Let's say, in this case you say my vector less than hyphen C 35 times, So this creates a vector off elements. 3 500 The first element in directories three. The second element is five, and the third element is 10. Next, you create a character director with two elements in their apples and oranges. So it does about the size of two. On there is apples and oranges, apples with the first member and oranges in the second member, and you can also create a vector of billions on that. Is it false? True, false, so you can create three tapes of Victor. These are using the simple data types, you can also create a vector of vectors you so you can put one data type into the other and into the other, So that's pretty easy for you to do. Specter indexing. So you're given that a victory as a list. How do you access individual members off the list? You use them by actually using index numbers similar to how we raise our access. Using indexes vectors can also be used. Access members can also be accessed using indexes, so indexing in our vectors starts with one, not zero. It starts with one a subset of the members. If you have one, a contiguous subsidy off members of the victor. You can use the colon symbol to access them. So if you wanna access vector some members from 3 to 6, you can say three Colon six on that is going to return all the four value 345 and six to you. So when you say a backed up name on and in this choir bracket, you're going to say three colon six. It is going to return. All the members from 3 to 6 is basically you ended on another vector for you. So how that is a director indexing works. The vector index can also be negative. So when you give a negative number the return value rather than having that given subsets value, it is not. It is going to ignore that particular value underdone the rest of the members. Suppose your vector of six members on your say Give me that vector minus three. It is going to return. All the other fine members are there in the third member in the list. So it is going than all the other fine members other than the third member in the list. So negative index numbers can also be used in victor indexing. If you are giving a index value, which is out of fringe, so Director only has like five members, and you're giving a value of aid as index number, it is going to return you any. So this is something pretty important because you might be using these tests checking for any to figure out when you're doing these kind off index surges. Vector operations is one of the most powerful things about our is how you can do a lot of complex operations on Victor's in a very simple former. So arithmetic operations can be performed on Victor's similar toe. Have you do, for regular variables love how you had to variables pretty easily? You can also are Do Victor's pretty easily. So how does that work? When two vectors are added, the equal and members of the victors are added together, and the result is also a vector. So in this case, let's say we take to Victor's Vector One has three elements. 1214 But the two last three elements 45 and six. When you say better one plus specter to your Harding to Victor's together. What happens is that the first element of Vector one gets hard with the first element affected to the second element, and Vector one gets out to the second element affected. Oh, so you just at equal and members straight away, so you will get a result off 57 and 10. So what happens here is one gets added with four to give. Five do gets added with fight give seven and four gets added with six to give you 10 subjective. You can easily add and do this kind of operation is pretty simple and straightforward. You can also do something like conditional analysis, Ana Victor. So what happens when you do a condition? And now this is Ana Victor? Is that Suppose I have a vector like this again? Victor off 35 and 10 on I say my victor create within six when I give you this command by director greater than six. What it is going to do with it is going to evaluate this expression greater than six on each member of the victor. So it's gonna take three greater than 6 506 10 greater than six on for each of the valuations it is going to have done. A true are falls. So remember this when he wrote these are automatic operation that returned value is also another victor. So in this case, defy seven tenders returned in the top one will be a victor so you can store it as a vector and handlers as a victor. The false, false true will also be a bit that you can no similar operations on this one, too. You can also have named Victor members like equal and off associative arrays rather than using on Indeed alike want to our three toe access members off off Victor you can use named vector Index names helps you track vector members a lot better. So, for example, if we have vegetables like C 57 and three, maybe this is having the count of vegetables. You can then put a command called names of vegetables. When he send names, you're giving name for each of the members and say, Kara, beans and broccoli. So all what we're trying to give you here you're giving and named Index. You're saying the index name Iscariot for Fight UK. They use one tracks of the first men, but are you can use carotid plaques of the first member. Then, if you say, like vegetables off beans, the beans, the second member, it is going to give you seven so that this is a more convenient way off accessing members off a vector. Thank you. The next thing you want to be bothered about in our water Cordless lists are very similar to vector, except that list can hold elements off different data types so director can only hold eight of one data type unless can hold data off any number of there any different data types. Andi was equalled off structures. In other languages. You can do a structure or in object oriented languages. This is typically an object. So, for example, in this case, you can create a list called employees, unless that has employee I. D. Employee name and some of the body and variable lake False. So you got your three different data types that you put into one variable so it list are typically like the used not that popularly used as with respect to Victor's, because typically you're when a vector. You're trying to deal with the same data types, but lists are used to hold other data types, even a list of vectors and list off elements and stuff like that. So it is another data point that is available for you as a part off our programming. 5. R03 Data Frames and Matrices: high in this section, we are going to be talking about multidimensional data on how they are handled in our, uh, the main be it. A structure that is used to store on minor plate multidimensional data is data friend similar Tohave You have tables in a relational database are even a worksheet in an Excel spreadsheet data frames used to store rows and columns. It is the most used data structure and are because typically, data is manipulated as rows and columns as tables, and this is the way you represent them in our so it represents a table with just rows and columns. Every column holds data off the same type, so it becomes a vector. So every column in a data frame is a vector on. Different columns have different data types, so different columns can have different data types, which makes every though in a data frame a list. So every column is a vector. Every vote is a list. Every column has a column name on the column index as well as every rose has a row, i D. And a Row index operations can be performed on like antiheroes and an entire columns and their table are you can access individual service in the same table and then trade them. So it is that there is a lot of flexibility. There's a lot of minor palatability with respect to what you can do with a data friend. Data frames are typically the inputs that are taken by a number of functions, including a number of machine learning implementations to so in this example. You see, you should have been already family here with Howard Table Looks like this is an employee table with employees ideas being stored as first column, the imply name being the second column. And it's manager flat being third column and stuff like that. How do you create a reputation for data frames that are a number of ways in which our allows you to create data frames? You can create data frames by first creating the data as vectors and then combining them all together to make it in a frame. You can he create. Individualists are individual directors, and then you can combine them to create a greater friend. You can combine other data friends to create a new date offering that is also possible when you read the hotel from a files like a C EST refile, it is read in tow, a variable as a data friend. When you read data from a database, you can also upload that into a data frame and the same a place for you know are taking data from a data frame and upgrading into a database or a file to. So there are a number of ways are allows you to create these data offerings. Data frames can be analysed using a number of comments once again on. To start with, there is the andro and then call. So they basically give you the number of rows on number of columns in the data frame. That is a function called summary. The summary function goes. The homely command goes and analysis a data frame, and that's analysis on individual columns in the data frames and gives you data. Also, what kind of data somebody gives is that if the column is a number, is a new Marie column, it gives you the cartels for the column. The column is a bully in our category column. It is going to give you the distribution for the column, so it is a very powerful tool. We would be using the summary function in a lot of our use cases and examples. The head function gives you the top six rows and in the particular data frame so you can actually take a peek into the data in the data frame. Of course, you can choose any numbers like head off. Seven will give you the top seven rows and stuff like that. The best year command stands for structure. So when you put this command out, it is going to give you water the structure of the data frame, which is for every column on the radar frame it is going to tell you. Okay, this columnist numerous car, this columnist Boolean and gives you some example values. So these are a number of these comments by which you can take a peek into the data. Look at the summary of the data that is there in the data frame for you analysis purposes. And you would be using these comments a lot in your analysis. How are data friends in Texas? Similar indexed, similar to have you we saw on our vectors and lesbian in dress data frames are also indexed every day when a Gator frame has an unique ideas so you can access a row with the real I. D. And rows and columns can off course be accessed using index numbers. So Index numbers again drowned Oneto one, too, and they start from one and keep running. So columns have their own index numbers. The rows of their own index numbers on the indexes are typically access similar to multidimensional arrays in other programming languages. So, for example, exact this one. You see example, underscore D of one comer to means it is going to turn you the fast row second column, the cell value of the fast row and the second column must return When you say example, Underscore D of one comer to you can also access the MP A column by the second example where you say example, underscore D F A dollar symbol. Example. UNDERSCORE COLUMN. So the dollar symbol is a presentation. T specify a column on and their column in a data frame. So when he does that, it is going to return you the entire column as a vector. So then they're column gets returned as a vector, and you can also do the same thing by using the colon operator. When he said Call and one country for Come on fire. It is going to turn you the 1st 3 rows. Rows 1213 on For these rows, it is going to be returning the columns four and five eso As you have been going through this lecture, you also should in battle, go through the examples lecture. So in the examples actual you of actual examples being executed and you can actually see the results to into it will rows of a data frame artless individual columns of a data frame are victors, data manipulations where the my the data frame can be manipulated in a number of ways. So data frames can be combined with a that the are bind command or D C by in command are buying stands for robot. So if we have to data frames, you can combine them when you combine them using the r B I in command. The rose in the rose in the second data frame are attached to the under the rules. In the first data, frames of data from one has Firoz on data for him to have six rows. The resulting data frame will love 11 Drew's but he used the Arben command. Both these data frame should have the same number of columns. So it is like attaching them one below the other see made another hand stands for Columbine , in which case you're tryingto combine a data frame side by side. So in this case, if data from one us four columns on data data from two has six columns, you put them side by side on the resulting data frame will have 10 columns to let the same number of rows but 10 columns when the requirement in this case is both. These data frames have the same number of rows, so our Bindon see behind a pretty powerful functions. When you want to combine data either row by row or column by column, columns can be added by using associative Our name our conference like, for example, if you want to add a new column to an existing data frame, you can just pass a new vector and like the common like this in this case example underscored the U S dollar example column and takes an import see one Come on. Come on. Three. Now what this is going to do with it is going toe. Add a new column example underscore call to the existing data frame and assign the value. See, one comma, comma three and it is going to take. And I send this for every row that already exist again. The expectation is that, uh, the the new column has the same number off a rose like the other row. Interesting. Those in the data frame are has very powerful functions. It cannot just a lot. I mean, you should read more about how these data frames work. There are other options like you in the are bind and see by and can take a vector and arid in either as a row or a column too. So there are a lot of combinations that are possible with these commands. Most operations, a lot of functions exist on on on these data frames. The most operations on Victor's can be done on individual rows or columns in the data frame . If if all the columns in the data frame are of the same greater type, they obviously become a vector. So you can do a lot off operations on Nanterre. Org went on an entire column on the data friend. There is another function called as the merch function, which works more like a foreign key operation. Are a joint operation that you do? In an SQL statement, I suppose you have to Data frames wondered a frame US employee data, and it has a column called Department I D. And there is another data frame, which has department data like department Idea in department name. Now, for every employee you want to get at it to the department name, you can use the much function so you can merge based on a specific column. So when he does that, it is like a giant function in this cable. But it is going toe automatically. Look for the corresponding grow in the second data frame like it is going to match Department 92 Department I d. Pull the rest of the columns and populate in the same grade A frame pretty powerful function, which can be used for doing joints, are joining operations. Of course, there's a number of built in data frame examples. Examples available in our so are and self comes bundled with some examples, we will see how to use these examples as a part of our daily Makris is so similar to have. These are data frames. We also have mattresses. The difference between data frames and mattresses are mattresses are two dimensional array similar to data frames. But only numeric values are allowed in the matter, so there's typically having the same data type. It has rows and columns. Dan mattresses can be converted into data frames and data. Friends can be converted into mattresses and vice. For me, you can do these conversions also between mattresses and data frames. But the only limitation is that the mattress are currently allows numeric values. A number of operations on data frames are Plato. Matters is, too. If they doesn't play, you can always convert to a data frame and do the operations because my process can. All mattresses can become data frames because it is automatically support certain. The reference cannot become mattresses if they're all of the same raider type. This is an example of how you want to create a metrics. The metrics see on, then see a combination of a number off values, and then you say Andrew recorded three, and College called to what it is going to do it because it is going to recognize this list of values are having three rose on two columns that is presented by Andrew and in court. And it is going to automatically create a Mavericks into something that looks like this. So it is going toe. Read this data as column by column. You can see how the two for 31 47 gets converted like this rope was column second column on . Then you can see the index off these rows and columns showing a blackness so matters can be created this way to you. Can also read data from, ah, file and then read it into the data frame and then converting the metrics. You can do a lot of these powerful operations between data frames and mattresses factors. Factors stand for categorical data, so factor is the leader type that is used to store categorical variables. So when you have, when you want, are the regular May someday that ask categorical data, you have the convert it into a data type called factor, so it is pretty simple. There is a function got asked that factor in automatically converts any data type into a factor day today. In fact, when you read a data file automatically, if our sees them being strings on, that particular column only has a few unique values. Let us say it sees a column which stores gender age will always see. You're respecting off the number off rose In the data set, there are only two unique values. Male or female. It automatically recognizes it as a factor, and stores is a factor. No, any victor can be converted into a 21 from factors. So as the ass start factor converts into the factor and a factor can be converted to its equal and numerical character. Presentation also and factors permits. Special grouping of factors allow you to do certain special operations. It allows you to do what is called grouping similar toe. Have you do group by an SQL statement Factors away when you do the group by certain things and come up with some somebody statistics like if you have a column a factor column that stores male and female, you can compute them. Somebody's for these factors like you can find by. You can do a group by operation to find how Maney Mills are there and homely females are there. Are you goingto average by average off another column for males alone? Average of another column for females alone and stuff like that. The factor is about a pretty powerful thing that you want to use on. This is the most important thing. A number of statistical and machine learning functions require factor data types, so they want certain things to be factors to make. Help them make better decisions. In fact, whenever you do classifications, classifications, mission, mission, nothing exercises. The target value is expected to be factors. You will see that in examples when we look at actual mission learning classes. 6. R04 Data and Input Output Operations: high in this section, we are going to be talking about data transformation capabilities and input output capabilities that exist in our so for data transformation. The first thing you always want to see is that a rate of salt and you can sort any vector using the sort function so you can sort of funk director in ascending or descending order using a sort function data frame, on the other hand, can be so introducing the order function. So so in order function, you can sort a data frame by using multiple columns. You can specify which columns need to be used in what order to sort an entire data friend. It is almost like you're trying to solve the output off a table using multiple columns using an SQL statement or an Excel spreadsheet. So it's pretty powerful. Inboard sopping can do in our the next. Next thing you want to look at as merging, how kind of much to data frames are based on a column? Common column The dough data frames can be much using using a common column using a March command on. In fact, if the data, if the column names are the same on both D data frames. You don't even have the Specify which column name you won't be used to join. So this is Alva. Joined would look so you have a a data frame containing imply data with employers? Rename his manager Salad E. And then you have the department idea of the employees. Then you have another data frame with just department data like department idea department meant, and you want to join them so that for every employee you want to know what the department name off the employee is? This is a standard example you would see in SQL, and we're using the same example here. So the moment you say much data from one and data for him to it is going to figure out OK, there is a common column for department Arian, both of them. So I'm going to join based on them. So if even if If the departed the column names are not saying you can specify which column on the first date a frame and which column Andi second data frame you have to use to join them. So he joins and gives your data frame like this. So department I D in the resulting data frame is repeated only once it actually moves to the tops a site. And then you see the rest of the columns from both the data frames listed here that this is a much operation is going to work in are the next example you see in our is what we call billing. So what is building is that when you have continuous data, you can convert the continuous data into categorical data by what is called building and building. What you do is you take a continuous data and create possibly a new column, but it gives you something like a rage. Eso. If you have something like an age being continues value, you can create a new column called Age Rage on. Either you can populate something like 1 to 2020 to 40 40 to 60 something like that. So you can use the cut command in our to do building. And in this example, you see how you have a an existing data frame called M data frame with a set off columns. So you can you trying to use a cut on the employee salad a column as you can see below the employee salad. A column is a contiguous is a country news values. The values are arranging anywhere from one to infinity on the new car cutting using the proxy You were actually giving here the boundaries for the cut and you give the C 1 205 100 So when you give three values, you're cutting the whole thing into two parts. Are you cutting into a range 1 to 2000 and then 2000 to 5000? So when you do this command, you create a new column called Sal Range, and this is how the result will look like those. You see that for every salaried is giving you the corresponding salvage. Of course, in stuff saying 2000 it is giving the same value in exponential format. So 1242.11 is becoming 1 to 34 2009 0 is becoming 2000 to 5000. So you're converting your country quest ater into a pre defense set off bins, and this is called building Bending is a very powerful example thing you have to do because as we talked earlier, a lot of the mission learning algorithms want. Factors are categorical variables for their operations. So when you have a continues variable, you want to convert them as strangers before you want to start using them for any kind of predictive analytics. This is how burning words continuous Jada gets converted into categorical data were huge. Specify water rearranges You want to be like 1 to 4000 to 10,000 to 10,000. However you want displayed. You can't even spit it in a range like one toe 50 5200 100 or 200 however you wanted to be on. Then you can create a new column called Here Salvage to populate the data Input Output operations. Off course input output operations are pretty critical for any programming languages and are also supports a number of operations for the same. The first thing is, what are the console operation that is possible in our so there is a function cause can function. This can function can be used to read input from the council, and it can also read input from files to you can scan something from the council on assign it to a variable. The print function is used to print it out of a console like you're all programming languages of print, function like system that our dot print land our print off any kind of such function that Prince State are to the council. There is a cat function that also writes to screen. There are some details in there that you can look up the help toe, see how a cat function would work like the bill and let's start file function Prince The directory listing off typically the door function list a directory listing off the current working directory. You can also pass a corresponding name off a directory to find the list of files. The Leicester face, of course, comes out as a victor so you can get that vector into a variable and then do some vector operations on it. Also, file operations were kind of file. Operations are possible that we dark table functions treat any file into a data Friends. Rita Table is a powerful function. Any kindof file Our data file, which is comma separated and stuff like that are LCS. We are tap separated, can be read through the street door table, function in toward a door frame. And when you call rate our table function, you can specify what is the new line character. What is the separator character? What is the header? All these things can be specified in the reader table. Function toe. Read a file into a date. Afraid the raid at CS Visa Specialized portion of the reader table file. Basically, that's pretty different. Former Radar CSU functions In the case of that repair is a comma, comma, separated value fight. So this is used to also read a comma separated five into of into a data frame. Variable right there, table and write out CS. We are are the corresponding right operations that you can do again. You can control all the different formatting things like You are a new land, you on the header column and stuff like that and you light a fire Also, so read and write operations are done through these commands. Database operations off course, similar to find. You can also read data from various RGB. Emma's in tow arguments database tables into are you can ride back to tables. Also, you can do update operations. Elliott operations insert operations, all kinds of operations under their bases. There are much people. Libraries are packages available in our for working with databases. There are packages that work on general databases. Using what? ABC? There are packages that work with specific databases, like an article and my sequel. So there's pretty powerful and what you can do with respect to our Deb Amos. In our and libraries are packages library. The leg libraries is a general term in the case. Off art as a package are available for any kind of data basis. Data can also be accessed from how do from a harder file system using the our Hadoop package on it up lets you read and write files to and from Ah, hello file system. So if you want a story a big data files in huddle up on, then you can pull the data in tow are in fact, our supporter and my produce operations. Also, if you want to work on them them and no read and write files from Hado, that is also possible. You want to download files from the very start data from the world. It is also possible using the download dot file common in the download were not file command. You just provide a You are all on on the commander's goes and downloads the data for you, and it can download go down any kind of format. It can be like a table. It can be a CSP file, but it can be. You can even just do like Robert pages can be scrapped using a get you are doing your get your get you the entire extremely page into a variable, and then you can scrap the stream. All paged for getting a specified list of data if you want. Oh, so that is also possible. So are you are you can get data basically from any kind off input and right back to any kind of fold put, You can even get data from, you know, Twitter or serious wars are Facebook are linked in All of them have their own packages. They all have their semantics, and plumping has to have you connect to them. Have you authenticate with them and get the data. But they're pretty simple again, and you can get data from any of them and you can put data back on any of them art that is very powerful in terms off the input output operations that you can do. Thank you 7. R05 Programming and Packages: No. We will see what kindof programming capabilities exist in our. So the first thing you are looking at this what kind of operators existing are so typical of any other programming language you will see there are arithmetic operators like plus miners Multiplication division. The percentage percentage is the Modelo operator in our then you have logical operators which are used for comparing but variable. So there is less than less than equal toe greater than greater than equal toe on all kinds of these variables. And then there is an orange and operator also that is available. That is, if you want to do conditions within by conditions a less than five and be less than 10. Those kind of things that is an order are and operator, which is the pipe and amber's and symbol that is available. Decimal places are just based on ah, prints off your adding tune toe. Impede your values. It up automatically gives you an integer value. If you are indigent rial number, it becomes an in the rial number. Decimal places automatically just you run out of worry about a lot of tastings and are they automatically take effect next you look at the control structures that are available for programming. There is an, if else control that is available, so you can see if condition and you can given expression. Are you can do an if condition expression. If else statement that is available. Of course, you can do on a nester if else. Also that is possible. There is a far statement that you can use so you can go through a given variable from values 1 200 then you can open for every value off where you can. If you're an expression very similar to other programming languages, that is a wild statement that is available. Wild condition expression Pretty straightforward on there's a switch statement available. If you want to use a switch on a specific variable on for every value in the variable you want, execute a different piece. Of course, there is an if else that is available. That is an inland if l. So there's one line is any fence function in the first function? The first salamander is the best condition. You're going to test something like is Feiger is a greater than five or something like that on the S partners. What is the value you have to return if the condition passes on the no partners? No. Is what is the value you have written If the test fails So in England, e felt is also available. Are is a very concise programming language where you have right only a few lines, of course, to execute a lot of stuff. So these are pretty powerful stuff. You can also have you to create user defined functions in our like you can create your own functions similar toe another programming languages. You create functions of procedures. You can create your own functions here. So when you build a function, you're the first to lower the definition of the function into memory. The definition of a function is loaded into our variable, so the very but then becomes the function name. So you'd lord the definition of a function into a variable and the variable then becomes a function name and we will see here how functions take parameters as input and it can't write in tow. Other are put perimeters it candidate on or put value on. You can take any kind of param eter Can you take simple You know string cards, stuff like that. Or it can take complex things more like victors and data friends saying whether it on value also and you can also access global variables if it has to. This is an example how how you would create a function. The function definition always starts with this word with this word function. And in this bracket you're giving the functionary is a pretty friend. Word on the bracket, you of the input parameters buying past. I mean, the curry braces is the actual body of the function inside that your job is doing X class by. So the explosive eyes output, since it is not assent to any variable, is simply return as the output of the function. The function definition itself. A store in an in a variable called computes. Um, And you see the assignment operator being used the computer and then becomes the function definition. So you want to call the function any time you called it like this, you just call compute sandwiches the function name, which is again nothing but the variable in which the function of start and you passed the perimeters. Three. Come on, Fi. And it is going to have done the values output as output, so this is a pretty straightforward for implementation of function. You can build functions as complex as you want, depending upon what type of things you want to do with the function. So this is pretty decent stuff. Next comes packages. Packages are nothing but pre defined libraries, pre existing functions that are existing in not so functions are capabilities are typically back aged as various packages are libraries. You have another programming languages might use. Various terms like packages are libraries are are are other whatever. They have different names for them. But packages are the lifeline for our. That is because most off the functionality, the power off our comes from the huge set off packages that are has so somebody can build a package of string functions. Somebody can build a package of mission learning algorithms. Somebody might build a package for my a managing Twitter with our so number off packages exist on, they are implemented typically either in native are and sometimes some packages are implemented, even in C and FORTRAN, and there is there are specific syntax that you're to follow the implement packages anyone can write packages and upload them to the depository on these packets, there does really, really excellent and extensive on did provide. So any time you want to know something, rather than trying to implement it yourself, see if you can see if the existing package already exists for what you want to do. There is an online website called Plan or with what stands for Comprehensive are are Come network, where all these packages are being uploaded. So once you build a package, you just go and upload the package in tow, this grand repository, and then it is available for anyone to go and use it. The good thing about the can't repositories that when somebody builds a package, they also have to provide a user guide or a reference manual. So it always comes with some kind of reference. When you don't load a crown package into your insulation, it also download the online help. So once you download a package, you can find help for any of the functions in the package. Using the question Markham and also so that is again pretty powerful so anyone can build on . Asked for those to use that is a pretty powerful thing. How do I install packages? So once you going to do the are console in selling packages, it's a pretty straightforward thing. You just used to find the name of the package you have been installed that you feel typically will figure out from some help or, you know, just doing a Google search of whatever. And then you have a brand. The install that packages come in, you need to Randy installed. Our packages come and only once when you run this command it is going to download that package from the grand upset. Typically, when you install the are consoling, you typically is who don't automatically toe the the grand repository, so it will go and find out by itself bad this package exist on, then it will download it for you. So the installation is pretty simple and straightforward when it is installing, it also downloads dependent packages. So if one package needs the help of other package intended uses, all the dependent packers also get installed. So you turned over the bother about any of them, and once installed, the packages are available for use in that are insulation to you know only after insult once, and it is available for you every time. But to use a package, you need to first load the package in memory, and you load the package in memory with the library command. So you have the Lord every time you execute a program, but within the program, you have the loaded one Lee once in the memory. Once he loaded no memory, the package becomes available on. Then you can execute the commands of the package. Pretty straightforward. Our studio, when you use our studio, are sodium, maintains packages on. You can also use our studio to constantly upgrade this packages to the latest versions. It has ways of going and checking with the cran repository toe. Finally, there are new packages are upgraded versions of these packages available, and you can have a clear 11 click. You can go and download. The latest packages installed them, too, so there's a pretty straightforward functions. Next comes stiff up, play family or functions available in our our play is a very, very powerful thing that are gives you, In fact, it may. This is one thing that makes programming very, very easy and are on why it makes programming easy is that when you're doing good and manipulation and are you're always handling rows and columns, you're doing rows and columns. So what we are doing? Rohan COLUMN When you're handling rows and columns operations in any of the programming languages, suppose you ever data said the last 1000 rows, you typically have to write a Farlow Ah Far Group that walks through them row by row on. Then you said, for every every loop execution, you are going to do something for that particular road. So you need a little more operations where you have to execute something row by row. Our execute something called on my column rather than you having to write a for loop every time you just go and use the play function, which is like an is. It is an inland function, one lane function that does the holding for you so it gives you a shot cut for doing the entire operation that you can do in just one function. Call on the function. You can do it by row, and you can also do it by column so you can say for this given data frame for every row in the data frame. Do this are for every column in the data frame. Do this on what you want to apply can either be a standard function or it can be it. Use a different function also. So, for example, here you have some variants off apply functions at their year. Play Aleppo. I am a play. You can go through the help to figure out what these variants are, but they put image to the same thing. And here is an example that you see here. First you create a Matics off from a list of 20 values. You create a metrics off 10 rows and two columns in the with the first line of comments or using metrics. See one off 20 The moment you say See one off 20 it is going to give you a vector from values 1 to 20. This is a sharp cut off, creating a vector when you say see wanted 20. It is a sharp cut off, creating a vector of values 1 to 20. So it's going to give you 12345 after 20 and you're going to say used Does director to create a matter itself and grows and two collars. And then when you say I'm gonna apply the input, you're going to use this. This matrix on the second perimeter, you see, years 11 means every road I play, the function called mean. It is going to go into this metrics and for every row in the mattress. In this case, there are 10 rows. It is going to find the mean off all the values within the road. So when you notice a play function, it is going to go and execute for every low, mean function on it is going to come out with 10 different output. So it is going to come effect with the out with the vector off 10 with each rally being the mean off every room in the particular metrics, the second coming up lake against us for the same metrics. But it is saying going to go with two on two means for every column. So there are two columns in the there in the metrics. So it is going to go on a play for every column. It tries to find the mean. So since that are two columns, it comes with an upward, which is a vector off to. So when you play this one, the 1st 1 does it, Roberto, the 2nd 1 doesn't column by column so on for each off them it is going to come and give you a mean on mean In this case is the function you're calling rather than calling a mean you can actually call any kind of function. It can even be a use a different function that you want to call on. Use your household. So that is again a very powerful thing that you can do with our Thea play family of functions. 8. R07 Statistics in R: I in this section, we are going to be looking at doing statistical functions in our on Our was originally built for statisticians by statisticians. So there's no dirt off statistical functions on packages that is available in our so you can do all kinds of statistical functions you want, like descriptive started sticks, correlations, teeters. Regressions are No. One, and it's variants. Power analysis is like all kinds of complex things that it's possible in the statistics world can be done with our. So this is it's home ground so it can do anything. It's possible in our we won't be exercising a lot off that as a part of this course, we only do as much as it required for our Predictive Analytics exercises. So that should be all we need. Descriptive statistics in are these other commentary would use as a part off our exercisers . So that is mean standard deviation, variants, min Max, media and range quintile. So all of them will take as input a vector that sometimes they taken input us a data frame , and then they do these operations on them. Under turn, you are value. Somebody is another very powerful function that gives you a full description off given data frame. And there is a package called Psych on. The Psych package gives you a lot of additional statistics function on advanced visualizations off the statistical functions, which again you can you can use as a part of your analysis. We would be using the site package in for our use cases that we were looking at Coordination. Correlation between two columns in the later frame can be done using the car function, as we have discussed in the statistics lessons. Correlation is a very important part off any kind off data analysis in data science, so you can use the car function to find correlation between any two columns. Are any do vectors using the car function? Different types of correlation methods are supported. We only looked at Pearson's correlation coefficient, but there are other algorithms to find correlation, whether implementations like Spear, more on Man and Kendall. And there's this library. You support any of these one so you can specify what I'll go to them. You want to use to find correlation with being two data sets. The site package once again has a number of other functions. If you're wondering why the packages called Psych it is used for. It was originally developed for use for psychologic Finale our statistical analytics in these psychology world. So that's why it's called the psych package. But it does some really powerful functions for statistical analysis. Everybody has started using this elaborate e with a lot more interest. You will use implementation and use off these statistical functions when we're doing our use cases in the later section. 9. R08 Graphics in R: in this section, we are going to be seeing use of graphics in our we have heard a number of times that a picture is indeed worth 1000 votes. There is no other place where it is a lot more important than when it comes to data signs. Whenever we are doing analysis on a huge set of data, a large data set the best bear for you to picture is the whole thing. And look at patterns is to use some of these pictures and charts. Otherwise, just looking at raw data is going to be simply mind boggling for you. Ah, but large amount of data graphical representations are the best way for you to look at and sparked. Some trends on graphics play a huge role, Ho told in all surges off data signs like India, when you're doing data cleansing it. Graphics helps you identify out players in the data it has you spots and good predictors. Onder, when you're doing explore ated it analytics and finally it helps you in explaining your results to the project owners. So graphics again is something that you would be using in all stages off your data science project, So it is good to know all the capabilities that are gives you when it comes to graphical analysis, There are different chart types, which you would have been always already used to, like History Graham or a pie chart on a bar chart on Are actually gives you capabilities to do all of them. Eso it else actually supports a number of different dated chart types. And for each other, it gives you a lot off minute level controllers to things like the colors and the shape and size. And what knock, so does that's really powerful graphical capabilities. One chart which you might not have been used to so far, and that is pretty important when it comes to statistical analysis is the box and whisker plot. So let us spend some time understanding what a box and whisker plot is about. A box and risk up. Largest used to show in a picture the 4/4 sales we talked about earlier off the cool when you look at the quarters so that four different quarters, the four different cartels, are shown in a box and whisker plot, the water chose here. As you can see on the Y axis are the values. Suppose you have a list off values on your trying to find the far and before quarters for the values. This Y axis shows you the actual values the in the box and whisker the bottom line. Straight land shows you the minimum value in the data set and then you have a box in the middle on the lower boundary Off the box represents the first quarter. There is a decline in the middle, which represents the second quarter. The median, the top off the box represents the third quarter, and the top line represents the max value. So this first quarter, this is your first quarter. This is your second quarter. This is your third quarter, and this is your fourth quarter. No box and whisker plots have a little adopt ation toe. What a quartile is on. What it does is it tries to find the in track Wartelle range are the like. You are in truck. Water range is the distance between the first and the third quarter, which is nothing but the height of the box height of the boxes called the intra quarter range. So within the intra quarter range is 50% off your data, then what it tries to do with it tries to limit the maximum value and the minimum value lines Toby within 1.5 times the intra quartile. Strange. Suppose the in track water range is about four. It tries to make sure the seas try to see that whether the maximum value is less than 1.5 times off that for with your six. So this distance is make sure that there's six. If it is less than six between the actual value in the data set is less than six. It draws it wherever it as to the actual maximum value in the data said is greater than the six distance that you talk about. Then it limits the Mac's refuge with just 1.5 times, and all the values in the data said that is beyond this line are called shown. Us dots on these darts become our players on. Basically, it's saying that everything that is beyond 1.5 tens, the intra quarter range becomes outliers both on the topside here and also on the bottom side here. So once more everything that is beyond the 1.5 thanks to intra quarter range become outliers. So narrow, Data said that everything is within 1.5 times the max value. Actually will. Suppose if all the values Alex with a three point fight to the Max Lane will draw somewhere here. But if in this case, the values are going beyond 1.5 times what is limiting the line here on then putting darch for outliers? So the moment you look at this chart, you can immediately It's part. What are the out players are how much out layers exists in your data? Yeah, there are three out players are present, but three darks. And there is a more players here. They need not out to be outliers always if the value ranges are narrow. So this is an excellent tool for quickly looking at how your data is spread on. If there are any outliers in the data and it shows off course, how skewed the values are disability in your data set on that is the purpose off your core pills. So there are three systems of three libraries are packages that are available are for doing your graphics. The one first one is called the Base Plotting System. The base plotting system comes as a standard part of our It does have really powerful graphical functionality. Then there is the largest system. It is, Ah, nervous system, none of the library and the 3rd 1 is called the Grammar of Graphics System. So, you know, in our class, we are going to be looking at both the base plotting system on the grammar of graphic system. The grammar off graphics system is built upon as it says, the grammar of graphics, where you can build a graph step by step, we will see how you build a graph step by step. It has powerful graphical capabilities, and it produces more professional looking graph than the base plotting system. Of course, all of them have excellent set off functionality for type and control on. You can use health files for looking all the charting options available. It is again an extensive set off options for graphics that is available in are pretty powerful, and you will see in the examples what kind of things are actually available 10. R Examples 01: Hi. Let us know. Start using our Miller to start going through some of the examples and are what you have. Here is a file called our Programming Examples, which in which I have a whole list of examples which I'm going to walk you guys one by one on this file is also available you as a resource package as a part of the class. Or you can download this file and execute and player on with this command on, I highly increased that you do player on and try various variations and stuff like that. So let us start off with the first thing As I have talked about what he wanted. Overs is to set your working directory to be the first operation you want to do That is the set working that a tree command. So you just select the sky on, say, run on. It was said to your working directly. You can also go selector working directory by, you know, selecting a file and then going here and then seeing said working directory. And that will also work pretty much fine. So here I'm just selecting the same directory on. Then even you can do that and they said, working the creed and execute the same command and you're going to get the same working directory set up. Now let's start going through and understanding the various construct off the programming language. So to start with our what type of variables are supported and are the first type of variable supported us as we know as the character data type. So you just say a carver is the name off the variables less than equal do is the assignment operator, and programming is the name of the variable Joe, you're less electors. Run it on. It gets set on. You can see below the value off that particular variable you can print a very able by just typing the name of the variable age where char on it prints it out. Now you can Assane if hardware and you can see another value different really. And it takes that value immediately and you see at the bottom left in the environment that value shows apostle. So this is how a basic variable works or not. Next thing is a numeric variable on number and you just give it a number which is not encodes the woman does not encode it is going to be a number and there's you just on it on . Then the number gates are saying to that variable and you can again print that variable by just selecting the name of the variable on, then typing it out and then you can print it out. You can also assign values to this number variable using a scientific notation like 2.3 e minus four. So here we run this one on. Yeah, that is not what Sign off. You print this variable, you will see that the value shows up as to what for? You are saying so. This is also a way by which you can you can set values. And as you can see as it keeps setting values, it just keeps overrating The existing values that are there in this particular were able Bullen variable take values to our force. Artist came sensitive So the true our for our both to are fully capitalized similarly are the variable names. The variable names are also case sensitive. So here I said boolean variable to true on. Then I just print it out and yeah, you see, under learned the right side. The value takes and says it is true. Now we just want to go and look Start looking at how we can convert the data type. So if I have to convert a number variable into a character variable I call this function as that character. So you simply call this function It is going toe are put the value straight of a to the console on When I'm converting a number to a character, you see that it is no enclosed in double coat. So you earlier very Freeney off reprinting the number variable. It was not enclosed in double courts. Now you see that it has been enclosed and double court. The induction double course means it is a strength other ways, you know, when it is not enclose, it is a number. Similarly, I can convert a bully into a number. So when they convert a bullion burly of true atomic to a number, it comes up with a value of one eso. So when you can work them, it converts toe the equal in value in that particular data type. You cannot when you have in invalid conversions, like in this case, I'm trying to do a conversion of a character toe a numeric. When you have an invalid conversion, it is going to come back with a value of in May. We have seen this. The value of any not applicable is similar to how you would handle Knowles in databases on you. As you see immediately, there's a warning message on it says in its introduced by coalition. When you see this message, it means that some of the competition you are doing some over certain in Nice on. You have to handle in this explicitly in your code for it to work. So there are other commands, like, you know, you haven't a less command, which lists all the variables in your workspace or is a less and run it and you see here it list all the variables that a verbal, a carbon on a number. So whenever you're printing something, you can always sprint to the console. Are Assane it toe variable also, like I say, I can say something like less result on, then assign it to us and then it gets assigned and you see that and officers go here, I can always go back to the previous command by using the operator Oh, on Lower Arrow to scan through the command, my older commands and then I can then access them. I can also go to the history and in the history also shows that come and I'm executing against all exit and commence year and then rightly can see on the neck instantly. Lord reload loads all of the lost the command list on. Then I can take this commands and then copy and and here you see, it's to console. And so so so I can send this command told the council like this and then executed them there, too. So you have all these options to soar and manipulate comments and are so that completes our the examples off. Working with very bus. Now let us say how arithmetic boxier. So the basic arithmetic. Let's say I said three plus five. We just read three plus five and transit and gives an or put eight straight away. No, again, you can either give a command like this, and it goes to the council are you can ascend the output to a variable where even say, I'm going to go use the app Arado to go up on us 1/10 1 And I say to three plus five and you see Temple on taking a value. Go back to the environment. You see the temple on taking a value of eight. So how West us added medic work. So I hear create a variable one, but the value of five Create a variable two with the value of three on. Then I say variable three is variable one plus very to on then very three sources of the boat on the bottom. You will see that all these values insurance here where 12 and three. So you have the values showing up here. So this is our Humana played arithmetic in our moving on the strings strings you can can cutting it using the C command. So in this case, IOC command which says, See, I'm saying and hello, So if we just select the sea command alone and execute the output comes to the council, I can also take this whole thing and assign it to the string to and then string toe is going to have the value off. I am saying hello. You can also use the Pace Command to do the same thing. So I'm using the pace to pace two strings together, so based run it here. I can also use space so it returned in or put, which is the can coordination off the strings. But if you look at what based us near pasting two strings, it puts a space in between those two strings. If it doesn't want to use the space space here, use the pace. Zero command. So what the pay zero command does is it does the same thing without spaces between the two strings. You can also use the cat command toe cat a string. And here I am using a special character slash D on you see that a top character also start showing up on the right side. So this is basic string manipulation. Moving on to date time. You can get have access to the system Time Bay, the command called sister time. So run it here and you see the value showing a bus to what the current time in the system and my system is on. Of course, you can always use the question Mark says Dart. Time to see the help for this one help shows up here the current date and time from our documentation. You can also search the our documentation here with the same thing. It actually gives you a nice arto filled s so you can actually look at all the commander that their system dark on. Then it is going to show you my life. It only says this dot on all assisting that you'll see here the Getty and we get located Junction. You can see all these comments coming up here. So when you go on class off the start time, it is going to give you the EPO paying the current EPO time, the poke time as the normal presentation have you'd want to store five internally, It is also called the UNIX Timestamp. So that is also drivable in our so you can convert a string into date by using this ass start date, and then you can give indeed and you see that it gets converted. You can kill a date stringing any format, and also you can pass a format like this like your percentage impersonated the percentage why you can say as start date and you're on it. And yes, that also works fine. So these are some of the date manipulation functions that are available as a part off our moving on to vectors. We saw that vectors are one of the most important things that you want to bother about in our how do you create a vector? I'm grating era vector off, See? 123 And I'm saying my victor on then I'm a saying to my director like this, my director now takes 123 So when I print my wrecked, I just selected here, run it here and it says 123 So this is our back. Don't show something on. You can look at that cloud the command class of my vector. It tells you what kind off content is there within my director. In this case, it says numerous. You can also create vectors like this against in my director see 1 200 So when uses you want 200 it expands this in tow. The impair range from 1 200 So when you create a my director 1 200 you immediately see that I print that it has created a new rector off size 1 200 You see all this content here? Now let us try to explain what you keep seeing in this choir bracket on the left side. It basically tells you what is the index off the first entry in this in this line. So if you look at, just crawl back and you give seeing it is spending 111 all the way. It is actually saying how Maney elements are there. So here it says, the first element off this rectory this year. Then it says 15 means of 15 element of the vector started sprinting here. It printed 14 elements. Then it comes back and start printing the 15th element and the print element this year. So that is what I was trying to show you. The content As we keep moving on, you can look at members of a vector like this, my direct sector full. It is going to print the fourth element of the vector. You can see my and even fight to eight. It is gonna print elements 5 to 8. In this case, the actor members are also fight to eight. Unfortunately, Yeah, So they went in the same values minus three. Would mean their death sprinting everything accepted third Clements. So in this case, it is printing 99 off dumb. But do you see? The 3rd 1 has gone missing. So when you see a minus in the vector index, it means that is going to print. Everything are than that one. So here also you could potentially you strangers You can use conditions on a director When I use a condition on director. That condition is applied to each and every member of the victor on the resulting are popular is true or false, which is also a factor is going to be printed out. So let s see what happens when they do this. You'll see that until the 48 30 minutes keep printing false first fault and then starts for the truth. This whole or Portis again A vector of Boolean values. So you can save this and use this and do whatever things you want to go with it. It was then go and let off a vector. Length of the vector gives you the length number of elements that are there in the vector. You can do that on see how it is the length of my victories on the number of elements that exist in director, and there are a number of function that you can do at a vector level. So here you say some off my director, the sound function, then a place toe. They end their my director, which means you're going to some, the all the elements within that director and give you the total output. So when I say some of my director, there's going toe some of all the 100 elements in the vector and give you the total output , which is facing a phaser. So these kind of functions and apply to the end there, Victor, like some mean standard deviation, those kind of functions pretty much work on the entire vector. I'm moving on to doing some vector arithmetic. So here I have a vector, one with just elements 34 and five so and is creating the factor. Then I go create another vector with respect to which is 10 11 and 12 and then when they say vector one plus back ter toe, it is God Adam one by one by one. So the first element of Vector one gets added to the first element of vector toe. In this case, three gears out of 10. Four gets out of 11 and figured out of 12. So that is how the elements get added to each other. When they say Director, one plus better too, are you can also no seat. So what happens now? Effective three as only one element on I said Dr One plus Specter. Three. In this case, what is going to happen and is going to, like, rotate around? So this contract the element foster limit on add that toe every element and vector once. Vector One has three elements 34 and five, but the three years one element with just one. So then why, When I say director one clump prospector three. It is going to add this element one toe. Every element here, 34 and find is going to give me back 45 and six, and there is a vector. Arithmetic works. The next thing you are going to see is named rector in access. Likely director of vegetables would just fight seven and three, and then it's in names of vegetables as carried beans and broccoli. So I'm basically replacing the index. The index in stuff saying 1213 I'm going to replace the index with card beets and broccoli . That's what I call by names of vegetables here. And then when they print vegetables, you will see that carried beans and broccoli is showing a bus headings on. Then you see that the values are showing up here like I can always access them, like when it's vegetables or beans and you will see that it can be used. It's a named in Dexter than using a number index. It is a lot more easier within a program toe. Does this kindof data access than trying to use indexes alone? Thank you. 11. R Examples 02: Hi. So let us know, start more want to looking at data frames, which is the most important data structure within our eso. How do I create a data frame? I can create it from a set off vectors. So I'm gonna first create a vector off employees ID's. And then I'm gonna create the record off employee names. And then I'm going to create a victor off his manager. If this employer manager or not with a bunch of true or false values, then I'm gonna create another fort director off salary values. And here is my list off victors. And then I can combine all these directors into a data frame using this command. Where is a data dot frame on I'm giving here the list off all the vectors which will become columns on them when they say stringers factors equal the false. It means whenever I see strings in the data set, I will recognise them as strings in the final data frame and not as factors. So here I'm just going to go on execute, select this one, and execute this command. And here you have a data set and let's start no looking into various thinks about the data set. So I get to read it as that caught empty off, and then I selected and do around it. Prince, the data set out for me. What you see here is actually like a table. Everything has a nice column heading in here on every row as an idea that goes from 1 to 13 and then the actual cell values offering that here there is a bunch of function that you can use to analyze the data set. Let us start going through them one by one. The 1st 1 is the class of an easy class. It is going to say what type off data structure does. It says they don't not, friend. The andro command gives you the number of rows in the data set. In this case, it comes back and use your three. The end call command gives you the number of columns in the data set. It gives you like in call of MDF equal to fourth. Then comes the function called STR, which stands for destructor off embryo on. When you run it, it is going to tell you what this data said country. This is like looking at the meta data off the radar set. So it stays. This this MDF is a data frame off three observations. Every row is called an observation, and every column is called variable. Suggest three observation and four variables. And then what is very these column headings are on. Then it did you? What type of data that so embodies a number. M name is character is name is logical salaries number on. Then it gives you some sample data afford this one. So in this case, there are only three rows, so she gives you all the three values. Otherwise, they're typically gives you, like nine or 10 values To give you an indication off what kind of contenders sitting there inside each of this Rose Somebody is the next comment. It is a very important command. When you run somebody, you can see that it is giving you an immediate summary off everything. So here, everything that just numeric it is going to give me the quantum's for that particular column. So empathy is numeric start giving me men first. Quartile media mean third cordell and Max. So, since it is 12 and three of his five. It is kind of pretty straightforward. Implying name since it a string doesn't do anything. This is my manager. It gives you how many faults are there. How many troops are there? No money in AIDS are there in that particular color. And then when it side comes back to salary again, it is doing the same thing. It is giving me the minimal first quartile median mean third quarter and maximum for this Sunday. So it gives this summer. You come and give us your nice summary off water sitting there in the leader said, Very helpful when you're trying toe, look at a large data set and see what kind of values it does most importantly, whenever there are any s abuse, you, this and this here There are other strange characters, like question marks and stuff like that that will also show up when you do a somebody. So it's a nice way off understanding what is there in a given data frame? Names off, empty off When you agree with it, it is just going to give you the actual column names so you can get the column name list and into it is you can just read this into another vector on possibly use it for our different kind of analysis. If you have toe now, let us look at indexing on a given on data frame When I said data frame of one comma. Three m Daryl frame one comma three. It means I'm trying to access the fast role and the third column so the first row want. KURT Column When you execute, the third column is basically is employed these manager column and it was giving me about your faults. So forces the value in that particular data set. The next one. I'm trying to access a range here. I'm tired. Taxes rose one and two on I'm tryingto access columns 123 So I do MDF rose 1 to 2 and columns wanted three. It is going to give me a nice Matics are a subset only off those rules and those columns only you can access an entire column like this, which is empty of dollar salary. So in this case, I'm accessing the Impair column called Salary on. It is going to turn me back a vector on this is the vector output, remember? Rather than printing to the council. You can also is assigned all of them toe a variable and then use that variable value for further analysis If you want. Now you can also access a given column. A given drew in this former It also like MDF, the name column and the second value in the column, which is nothing but the second row. And then you can also get that specific value also, So you have a number of base by which you can access the content within a given date. A friend. You can also play summary functions on inter column. So did this empty of dark dollar salary is nothing but a vector off all the salary values so similar to what you can do with directors. You can go like here as some off that and is going to give me some off. All the salaries in the the dollar Saleh be so similar to that you can do all kinds of things that you would rather always do on a vector. So here you see empty of dollar Saturday greater than 2000. When he executes this command, it is going to come back with less tough truths and false. It gives going to give you another director off whether this particular condition the salary greater than 2000 passed on eat off the values within salary you can use then this condition to filter data within the data set. So look, what we're doing here is we're saying MD of and then you put a condition in stuff giving a row. Index, you are giving your condition here. What that means is it is only good going to give you the rose which passes this condition list. So it is going to check this condition MD of dollars salary greater than two dozen on each of the rules on whichever rose passing this test, only that will come out. You can also do the same test on the columns also if you want to. But in this case, we just printing the college second column assets. So you're you're trying to print the names of employees of all those employees whose salary is greater than 2000. So here you see them coming out. So this is a very powerful feature. When accessing a data friend on, you cannot play this conditions to filter data out of the data frame and then take them and use them for for the processing you're now. Then go and create a new column. So I'm trying to create a new column within the employed. A doorframe called the department I D on. I'm assigning values to that as 11 and two. So I'm just straight away going and saying Empty of daughter department, I D. Take value, See 112 So just on it, and it just goes and creates a data friend and then I just go print the data frame, and now you see the department I D column and it here it has the values that your boss into bed. So that's where you can simply add data on our new columns to a given data free. There are a number of building data frames, and ideas is in are one of the most popular data frames. Is the ideas data offering on. We would be using this Irish data frame a lot in future examples, so let us see how there's one. Looks like you just they call it as Iris. I was going to come back and warm it all the contents off. This is a large data frame of 101 150 Rose. It is a board Flowers on. We will explore this Data said later in the class. The next thing I'm gonna do is I'm gonna dome or operations on data Friends. So I'm gonna create another data friend. This is an hour different way of creating a data frame. So I'm gonna create a data frame D of one off columns x and y and the X is the column whose values will be 1 to 1 tree search, one colon tree. And why there is a column that love values A B and C. So this is another back and creator data frame D of one? No, I'm trying to create another data frame called the of Two. Pretty similar except seven and nine ways. FD and Hedge. 1/3 data frame D of three again similar types of values. In fact, the same type of values in there. So I'm creating please three Della sense because I'm going to use them. And for the example. So let us look at how be if one looks like this is out. The of two looks like all of them have Arturo's on Sorry, two columns and tree does and that is the three. Right? So you see the X Y x y, and here it's x two y toe. The column names a little different. Now I'm gonna go what are Bind off DF 100 of to both the F one. And do you have to have two columns and three rows I'm doing to bind them row by row, which means they're going to get up. Day of One is going to get up on that below the one below the A plan and that See how that one works when you are our mind off the one day of to You See that the 1st 3 columns belong Toe are one. They have won the second live in the last three column. The last site, the faster it does belong toe the one The next three rows belonged to D of two. So do you have to go up on that bill O. D F one in the do on our bank? So this gives you a new rate of frame, which you can assign it to another variable and then start using it for your operations, which are where you want to. Now you're gonna do other command called C bind, which is you're going toe to toe a C by our column bind off the of 103. So what? I'm doing a call off. Bine de f three is going toe. Get up on that toe the right off the of one. So let us see how that one goes. You'll see that now it is getting up on that as new column. So X and y belong toe here Extend way belonged to be if one x to one vital belong toe de of three. So when I do a Columbine, it is getting a pendant to decide are more columns get added in the resulting data frame Again, this data frame can be assigned to another variable and you can do any kind of manipulations that you want to do on them. Let's go on. Look at what my tresses, Art mattresses again. Pretty similar toe their frames isn't just that they are not used to that. Extents like data frames are because they only support like numeric data. So I here I'm going to create a demo metrics off. It's going to be elements. There are six elements and I'm going to create a mattress, sort of them going to convert them into three rows and two columns, then drove on and call gives you how money rose and how maney columns I want to create. So that's done this devil metrics and then printed. Alomar drinks here, and you will see that it is telling you how nicely this one looks like. It has three rows and two columns. There is a common core PPI. The's stands were transposed, so transfers is going to come back. Convert the rose in the columns and columns in the rose. If you have used to mathematics, you would have seen something called Transport transposing a mattress and you see that the rose get convert. Two columns and columns get converted to Rose. What do you see? Arrive here has 24 and 24 and six becomes Kroto for in six years of the column becomes Rose , but the rose becomes columns, transports and mathematics is what you're seeing here. The next thing you see it, as you see, is a factors so a factor you can convert the string to a factor, but using as start factor. And you see that here on when you print a factor, you will see that it is getting printed US levels. A level is the number of unique values that it has. Suppose you cared a factor off gender, which is male and female. It has basically two levels, so you can create factors like that ends. For example, you can convert the employee name into a factor from the MPF, and you see that it comes up as two levels. Sorry, three levels John, Mike and granted three unique values on you can also do something like a factor off its managers since the managers has on I selected. Sorry, wrong one that's to his manager once more here, when you convert is manager does to levels false and true. So, Josie, what the unique values levels are and the actual values vaults true and false levels are basically falls and truth. The important thing about factors is that now you can start doing some summary functions like a table. You can do a table on this ass factor office manager, so it's going to give you a table about how Maney falls you have and harmony Truth You have you have, like a group by our put out off that very group the data by false and true. And you're basically counting how Maney consider you can actually pass to the table. Come. And if you have to use any other mathematical function like means standard deviation. Also, look at how the table command looks like in the help it gives you trust abolition and table creation as a number of things that can basically do for you here. So you can almost a dig look and read more on how these factors work. Thank you. 12. R Examples 03: Hi. Uh, so let's move on to the next set of examples here on the first thing we want to see is how do you sort a vector? So here you create a vector like this, which is a director. 634 11 to 95 And then you call the command called Sort and give it the past directed us an import. And here you have the sort of victor. The output is also a factor. So you can take that Ondo further vector manipulations with that? No, we don't see how do we start data frame? So here you already had known about the employee data frame which has all these fight columns so you can use the order command. But you can do when ordered by salary. So I want to order by salary. This is over. I'm gonna order it so I do an order by salary. And this is Coop. Innate capacities. Couple order by salary. It is going to give me the order of the row ideas for the order. So you have you see the roadies of the order to just first start and second and then this order of royalties. I didn't pass it back as on index in Do the employee data frame on that would print me the end. They're employed eight A friend in that order off Sandri. So it's a two step process for straight dude order off salary. It only doesn't return me the values. It pretends me the role ID's in sort of daughter. And then I passed away these to back toe, the empty of data frame, and that prints me the actual list off. But the actual data frame here, let us now do. Ah, merging operations on for the merging operation. I'm going to create a data frame for the department data, so I have to departments one and two on the names are sales and operations. I'd go and create a department data frame here just on it, and this creates a department data frame. Then I do a merge off the employee data from under Department of the Frame on, given that both off them have the same column name department I d. It automatically recognizes that, and it is going to give me a joint data frame when I do a much and you see that the department idea actually moves toe the first column and then you have the columns from the employer later frame, and then you have the columns from the department data frame department. I d only comes up once on DE. So this is the options that you do and much actually also have a number of other options. You can access help on murder, and there is going to give you a number of other options also, So you can have like X y, and then you can pass the Intersect and intercept gives you which names between X and y You want to use it for the intersection purposes. Bending bending is how you want to separate our gun. Word continues variable into a categorical variable. So in this case of meaning, you're going to do the cut command based on the employed, our salary exactly here to imply it. Our salary is three values. As you can see here, I want to 4 to 3490 to two over on. Then I'm going to use the cut command to create ranges salary rangers in the ranges of 1 to 2000 and 2000 to 5000. So it is. He gives you the Rangers here when I say one. Come on, 2000 come out. 5000. I'm giving you the boundaries off these rangers. And these boundaries are then used to create the rangers 1 to 2000 and 2000 to 5000 and then execute the cut command on when the UK Security cut Command is going to give me for every will. What is the salary range on that salary range is then converted in this common in tow. A new column in the employed. It a frame calls Lynch. So let us go ahead and execute this one. And now let us look at how the employee data frame looks like. Now you have the salary range introduced year and the salary injuries Oneto 2000 and 31 to 2000 and 2000 to 5000. So every recorded now categories into the salary range on Then you can use the cell arrange for for the things like you can do our tables. You can then use it for certain mission learning algorithms. To eso cut is a very powerful mattered on a very useful method in predictive analytics. Then you have a command car aggregate aggregators where you want. Oh, it's like a group by operation. So in this case, I'm going to do an aggregate by employees salary. Andi. That employee salary is going to be aggregated based on the salary range, so salary ranges the column I'm going to be using in the group by and salary on salaries. The column. I'm going to do some, so I'm going to be passing the column. That would be some. The function is being passed your what function I want to use here. The sun. The function can be a built in function are user defined function on, then the biased. Nothing but the group by group by salary range produced some off salaries on Let's see how this one works to here at the group's 1 to 3 4000 to 5000 and years the X, which is nothing but the some off the salaries. You can do that in stuff doing base salary. Rangeley can also do it by you know, other things like, say, I can also do it by, let's say, his manager, for example, and I do it a lumping like this manager and then I'm getting by whether the guy is a some manager or not, I can get the some off the salary. So this is how you do like a group by equal and operation in our are moving on to the input output operations. Let us look at the basic operation like this can command. So in the scan command, I can do a scan and then entering into the read data. So it's a scan on ascended to the data. Now here it starts waiting for our Putin foot from me, and I can give you it actually list of allies against three. Sorry. So here it goes here and they said three. Six for And then when they took a blank, it means that it is end off input. So 364 has been now taken on a cento. This read data on. Now when they see a reprint radiate, Iet's see that it is actually a vector of values 36 and four. You can also use the print command to print out what the re data is so you can print a vector. Lectures on the prime command can be then used with the can coordination function toe actually print some strings. So your print concatenation of we straight on three data. So this transplanted like we wrecked 36 and four So you can combine some streaks here and also put it out. The bill function print is nothing but the directory function. It gives you hear the list of files Under the current working directory, you can also do a list files, which also it was a pretty similar or put on. As you can see, it is actually giving your vector So you can actually get the list of files and load it into a vector and then maybe walk through the vector, access it file by file and do some operations on them also. Now I'm going to be reading a file called Employees CSB on for that. Let me first go and set my working directory correctly actually working. That is kind of sad character. So here, I'm going to read if I call employed out CSB. This is how I'm going to be reading a CSC file into a data frame. So I'm gonna be replacing the country off Ebony off by saying read, Employed Nazi a sweet Onda. Let us look at what the content of the employed artsy is. It is. It is actually the pretty much the same content which we have put in earlier. So it is the same content years. Nothing different on then. Then you can do all this cut operations under on. Then you can print what the employer dear. First on, then you can then write the output. Whatever your computer into another file called employees. Added Garcia, Speak. So this writes the content off this data from embryo into a file called employ Added Art. Perseus. So these are the read and write operations on CSP East. Um, that you can do so data frames can be read from underdone toe CSP files. Pretty. The commands, as you can see, is pretty simple and straightforward. You can do these operations pretty simply and quickly that are in things like a know file handles and streams and open stream and close dream and all kind of stuff here is produced , herbal and straightforward, going on toe control structure. So here we're just going to see an example of the control structure, and for that, I'm gonna be using the ideas. Data sector. Let's look at the structure of the Irish, data said. How it looks like the IRA, Jada said, basically contains five different columns. It was about a type off flower, so there is a flower. Basically, there are three types of lovers called suppose oversee color at Virginia Seaside types of flowers on what it has. Basically, it's samples taken from three different flower types that are 100 50 samples off actual flowers and for each other flower. You're measuring the supple in supple wit, pedal and peddle built. And this basically contains data about that. That stuff on this is the data set here and here. We basically going to be using this data set for many, many different examples here. So let's start with using their data set for doing off some control structures. So in this control structure, what I'm doing here is I'm going to do a far loop and then this far, Look, I'm going to be looping. I'm creating a new looping variable called I, and I'm gonna be loping from one to N drove off Iris. So Indra wire is the number of rows and iris, so there are hundreds of heroes in this one, so it will become one the 1 50 So this is that you can take the number of rows in a data frame, and then you start to walk through every room in the data offering. So for this one and insert, I have some functions. Yeah, I'm using an EF loop. So what is that? If loop I'm using, I mean in the u flow. But I'm going to access the Irish data set. I'm going to be accessing the I throw. So I was the loping index here. So I'm gonna be accessing that. I throw on the species column in that I throw I throw and speeches column equal to is a double equal dissenters. So if that value off the column species in that row is Sentosa, I'm gonna be printing like this is Sentosa else and go reprinting. This is not Sentosa. So this is the whole far note with all the corrie braces to kind of contain the farm open than the if loop and everything. So you can now go and execute this and they are far lope in one shot. You can select the holding here and executed one start and you see that for every road, The sprinting auto. Actually, this particular data set is ordered by the type of species. So the 1st $50 it is gonna be printing this is Sentosa on the next 100 it's going to be printing. This is not Sentosa. So this is a simple example off using a far lope and an if construct toe, do some programming examples and this is all you typically work through a data frame and then walked through each row in a data frame. Ondo some manipulation operations inside that functions pretty easy to write functions in our. So the function definition starts with the function Cheever, an ex Come away are the input to the function on it In the curly braces is the body of the function and within the body or Burnaby printing print received X and y, and then the output. Other function is X come away, which is what is going to be done outside the function on when you create a function, you are saying the body of the function using the assignment operator toe variable so compelled some then becomes the name of the function. So first I'm going to be selecting this on during that ran on. This creates a function called Compute Son Anytime I want to call the function and just going to call it as compute some four common six that the selectors on grounders and you see pocket inside the continent Start printing print received four comma six The out port Dundas X come away, which is 10. Since you're not ascending the airport, any operator any any variable, it is sprinting the airport. So the console If you are cento variable, it would have gonna send without any kind of printing Of all the sprints admitted anywhere have happened This is a simple example of how you can create user defined functions in art going on to packages packages. As we have been saying, it's like the most important thing in our So you have to go. The packages are actually available for you in this website. Call our so this grand project are org is the Web site in which you would be having all the packages for our So here you can actually don't look our itself and then you can go on. Don't look all the packages. And then when you click on this packages link, it is going to print out for you all the packages that are there, you're the table off available packages. Click on that and a lot of these packages. You know, it's like so many number of packages are available. You don't have to go to this Web site and download anything you can install your packages from hidden. The are shell. So there is a package called Our Cold, which is used to run coal commands within our. So the way installed packages I just called installed on packages are called Let's see what happens when I do installed our packages Arkle. And here you are to make sure it is case sensitive, and it is double coated here. Once you know, installer packages then actually starts working, I start downloading the content on downloads and then successfully saves into a local directory. Now this packages downloader, and once it is downloaded, you can go to the packages list and you all the packages that are there and then when you want to use a package, use a library Arkle, so install not packages, you're giving it within double courts. But when you're using it, you use a library Arkle without the double court and that is going to load. The package is a year, says Lord Little, library article exist loading required package bite up. So Arkle has another dependent package by tops, which would automatically loads in for your use. So you know I don't want it unloads the given package. It also lords all the doc commendation for any commands within that package so you can actually go and access the command within the package and use them as you want. Next example you're going to be looking at is the play function. So I'm gonna be creating Matt tricks. A mattress off values wanted 20 on the rose from out of which there are 10 rows and two columns. So you just say my tricks wanted 20 metrics, 10 Comotto and this creates a mattress for you. So how does the mattress looked like this? So it looks like it has tendrils on it does two columns. Now I can use it a play function on this metrics, and I can say when I use the second variable as one, it means one means for every door and two means for every column. So for every row find the mean on desseaux you apply and I find the mean and it prints out the mean for every row here. So there are 10 drawers off. There are 10 outputs year now. I can find the mean for every column by using this the value to in the second perimeter that is going to give me the mean for every column. Now I select this and I do a run on that gives me the mean for every color. So this is pretty simple and straightforward stuff. So a play is a very powerful thing in this case, and passing a function mean you can pass any building function, are any was a different function that can take a math lick as an input, so and then returns you a vector as an output. It does take a mattress as an important under donors record as an output. As long as it confirms to that a paradyne you can pass in anything on beacon, use any function for here, toe. Get the things, get the job done. Thank you 13. R Examples 04: high in the section letter start looking at doing some statistical functions with our and also graphics. With our do the statistics, we're going to first build up a data frame for that. And for that, I'm trying to great and our data frame called Iris de Off. By taking the building data frame called Iris and storing that inside the island's data frame, it is. Look at how this ideas Fredo frame would look like. Let me just move this year. So you're seated us for it has actually fight columns CEPAL and supple, with better land and petal wit. For all the four new Marie columns, you have the quantum's being present that year, So separate lenders value between 4.3 and A 7.9, with a meeting on 5.8. So it's kind of equally distributed separators between like 214.4 Patel lenders between one and six on species is the factor variable, and it has three values Seto saw particular, and Virginia. So Set also is occurring in 50 records. But take a secularist occurring in 50 and Virginia also is occurring, and 50 How do we do statistical functions is you can do a function called mean and I'm going to do a mean on the cepal dot Lent all the values of simple dot land. So I'm just calling it by ideas. Data frame dollars supper long planned on. I'm just calling the function mean execute. This is the mean off the values. Then I do a range range is nothing but the Max and Min values in that particular data set. So it is between 0.1 and four and 2.5 for petal dot belt. Then comes the most important function called the correlation function, in order to find the pair sense correlation coefficient. So I'm trying to find the correlation coefficient between all the four variables the forced for variable, supple and supple wit pedal and on pedal. What for? That I'm passing the data framed and their data from the 1st 4 columns in the data frame all the roads and fast Rocard four columns to the correlation function and it is giving me and I thought put like this I'm at Southport. So every variable that we passed as is having a correlation coefficient against every other variable. So what do you see inside this mattress is the correlation coefficient between the row and the column heading. So each variable is plotted against itself that is always of one, because that variable is no 100% correlated to itself. Then you look at lumping like supple hotbed of separate dot bert and supple Heartland. The correlation coefficient is minus 11 so 110.11 is a very low values. Its are low. The correlation is not that high on it is also negative. That means every time the rate goes up, the length typically goes turned out not to induct extend because it is only left 0.11. But when you look at something like the petal lot land and supple dot length, the correlation coefficient this 0.0.87 which is a really high value. So that means they have high correlation between each other the better, not land and simple dot lent. So this is how you look at the correlation coefficients and see how how these values are correlated with each other. No. One of the deficiencies off the building car function is that it cannot handle find correlations for factor variables. It can only find correlation for numeric variables But there is this library called psych that can find correlation between any kind off there any kind of data variable so fast I load the celebrity. In fact, I've die already installed this library using installed our packages. So you have to first install all the libraries, which I'm calling here. I may not be doing that for every library because I have already installed it. Load the library, psych. And then before I call part start panel and just going a little blow up here, So I got some space, and then I say, this is a very powerful function and we will explore the explored the output off this one. Paris door panels ideas. Daddy off and run it. That is giving me here a very nice picture on what are this picture contain on the Bagnall ? Is each off the variables in the particular data, friend, each of the columns. So, for each of the columns, first thing you see is that there is a nice instagram. So for each of them, there is a history graham that tells you how the values off this column are distributed. So if you could separate dot landed is like, you know, take a no normal distribution. Separatists Earl also like a normal distribution pattern and looks like a bi model distribution. So there are some values at the lower level. There's, like high set off occurrences, some low values. And then there is a gap here and then some occurrences off some high value. So the disease are distributions. We have seen what the distributions are like in our statistics course. So here, you see that how the distributions looked like So this particular part start panel command gives you a lot of information when one of them is how these variables are distributed. Pedal Godwin again is like by model. And given that species is actually a factor distribution, you have only three bars for each of the type. And then you see how many different values are there. Then comes what is there in the cross mattress off each of them. So on the bottom, you actually have a block. So if you look at separate dot went, let lengthen supple dot bit, here is the plot off the value. So you're this particular plot has on its X accessible quit on its why accessible, lent on. It is parting each point for this combination and you see how the blocks are looking like on on the top. Bagnall, you have actually the Pearsons correlation coefficient. So this is the correlation value and this is the plot, and you can immediately see that the lawyers a low correlation co vision the values are no highly knows, like distributed are spread around. Suppose you look at something like petal dot land and separate out land. The cross Matic shows the correlation coefficient off 0.0.87 and you see the chart on the other chart actually kind off, Say show start. There's almost looking like a straight line, so the values are almost falling into a straight line that actually shows you they are highly correlated with each other, which is confirmed by the high value off the correlation coefficient. So by just looking at here, I can easily say which variables have impact on the outcome. So suppose I'm trying to predict species I'm trying to see for species who are my best predictor. So the type of speeches is highly dependent upon. As you can see, the pedal waiters are high impact 0.96 by the lenders High impact 0.95 were a CEPAL. With this, none of that kind of an impact 20.43 and supple and has all OK medium to high impact of 0.7 it. So you see that the battle wit and pedal and actually have very high prediction on what the type of species is, and that is kind of you. Look at the graphs. You see that for petal wit and species, you see how the values are very much segregated from each other. Here is some values your other values for, Let's say, one type of species. Here are the values for the second type. Any other well is for the third time. But as you look at for the same species and look at a combination for a lower occurring Dingle example bit, this is the one I Usually there's overlap on these lines here read. The lines here have separated very much from each other, whereas in this case you see there's a nice overlap off line, so they're not that kind of a high prediction. So the plot start panels is a very powerful told that you can use to analyze data are now Let us start doing more stuff. For example, I'm trying to do a linear model all the way doing building linear moral for really a regulation first thing. I'm going to convert the speeches into a numeric representation as dark numeric ideas, DF dollar species just on it. And then I can do a linear model which is committed to come up with a formula. We have more details about how you believe me. A regression models when you go to the predictive analytics, think really Just having a somebody just showing you at the kind of statistical things that you can do here with our We will explore more off this command when we go into the predictive analytics class. No letters, no wonder doing some plots. Andrea First going to start by doing something in the base plotting system. Ondo, begin with what you're going to be using is this Park Ament. The Park Command is going to set something like a canvas on the canvas. You can actually say how many different plots are going to draw on in this case. You say Emma Freud called the one come out one, which means this canvas has one dro and one column only, which means I can only going to draw one block at the time. We will see more examples of doing multiple plots in the same converse later. But for no, I'm just going to do one other time. First thing I'm gonna do is a strip chart. A strip chart for the values in CEPAL. Doctorate strip chart is not a dust useful a chart and just showing you because something like this exists, it just shows your distribution of how the values are. The next thing I'm gonna do is a history, Graham. His diagram of simple dot bet that is going to show me how the values are distributed and you can see here the on the X axis are the values and the he in the Y axis are the frequency. What do you mean by that? I mean, ISO 2.5, and as a frequency of about 15 on the data set off 150 rose. 15 of them have a value off 2.5 for simple dot Bet. Let me repeat the data. Set off 150. Does 15 off them is the frequency has values around 2.5. That is a frequency. His telegram. It just shows you how the values are distributed. It tells you most of the values for supper dot with this around three and looks like a very normal looking curve among here. Now I can add some more decoration for this particular history. Graham Command. Like I can add a color called San. I can have a main heading call separate. Well, I cannot x label, and I cannot away label. I know you can add some more than on a patient on your chart and let's see, only executed. Now it looks a lot better. You have color here, you have a heading for the chart, you have an X axis title and you're why Access title on bulk and stuff. The next one is a box plot, a pretty important plot that you are going to be seeing again and again and again. So let's do a black spot on separate out belt. As we discuss in the earlier class, the box part is going to give you basically the fight cartels, so it is the main value. The first quartile, the mean the second quartile, the max value and it shows some outlier. So you see that there are some hope players both at the bottom and the top say for the simple dot bet. But most of the values, as you can see, is built into this particular box plot. Explore more. Try exploring the same command of comments on other values also so that you learn more about you know how to use these blocks plots. Now I can do this. Box plots, multiple box plots also. So what you see in the next Comanche areas, I'm gonna go box plot off simple dot bit and then they use a pillow to say by the told means Bye bye iris dot species. So I'm going to go separate outward by idle start species. What that mean is I'm going to draw a box block for values off simple dot bet for each off the type off, either start species so that it's execute this and see what it comes up with. Here you see three different box plots. So when I say by ideas dot species for every distinct value within ideas dot species, I am going to get an different box part, so I can compare here side by side as how these values are looking and you can easily see that said, Oh, Zoe's range of values is much higher than compared to oversee color the best size between, like 3 to 4 and most of the well and the meanings around here, where, as we're sick, colors there ranges between somewhere like 2 to 3.5 and some of the means around here. So you see, there's a distinction between these two categories off lovers in the range of values for simple Yet. So for this kind of analysis, the cyrus dot speeches has to be a factor. And this is one days and you want to convert new American characters. Two factors is this kind of analysis cannot live it in on factors, and you can immediately see some trends here. That there's a distinction on Sentosa was this particular in terms of the range of values that the separate out with takes for this different species off flowers. Next come and I'm gonna be doing is an ex white block, so I'm gonna be plotting CEPAL leant against Sipowicz. So let's plot CEPAL leant against the bullet and see what it comes a bit. This is a nice blood. So CEPAL enters on the X axis. Supple wit is on the y axis on the day dies, kind of widely distributed. Now I'm gonna are another dimension to the plot by coloring these bloods based on my wrists around speeches so I can minutes off passing a standard colored value. I can say color by iris dollars species, but means each species value each distinct species value will take a separate color. So that is executed descent. See what happens. So you immediately see that there are three distinct colors showing up in the plot. What? The three different values off the species column. So the one that you see in the block black is basically the sentosa on the ones in the red are testicular and one and green. A Virginia. It doesn't show, uh, help in terms off. Which colors what? But you can add the help by using some annotations to the same block so pretty easily you can add them. So the next flood you are going to be doing here as a line plot. So the same plot cepal dot lend the moment I said type. We called the l, which means I would say line plot, See how Lane God looks like. Then I can go there another land plot again. But I'm going to be taking a sorted list off separate Outland. Maybe the line looks a lot more sport who like this. Yeah, You can do a bunch of things here by playing around with the data. You can also what? I'm then going to choice how you do a bar plot before that. I'm going to do some aggregate functions, so I'm gonna be calling the aggregate function on IRAs on. I'm gonna be taking by four. I'm gonna take aggregate of the entire data set by speeches. I'm gonna find the means. So let us say it. This command aggregate the in their data set bi species means. So I'm going to find the mean off each of the four different measures. And okay, this is now giving me another because argument is no numerical. Logical. So what I have the past year is I need the only pasties columns 1 to 4. Let's look at how this one lives. This is done on. Let us on ideas, aggregate. And you really want to see Is that by each off the type? I'm going Summary data like cepal dot with up a lot with prevalent and paddle with the overall mean for each of them is some raised and aggregated here in this a date offering. Oh, once you have this data frame, I can then do a bar plot. I can be here to a bar block off cepal dot Leant on the name start are I'm just pointing past for each of the bars. What does the names argues? Nothing but the group one here Groupon is the actual value. You can actually use the same guy. Same headings for the column. Headings are you can give new column heading using the name come. And also on the legend text, you can add the legend text for this box Blood again by disaggregate dollar Group one and I'm also going to color it by group one. So let us see how it looks like. So here I see that I see new three different bars each showing the mean off the type of the data. And I have also added a legend here in dumps of legendary text telling me which color belongs to which type of flours. So this is another way by which you can do some nice looking bar charts. Thank you. 14. R Examples 05: There is no more in tow. Explore about the DJ blood. Our library, that is another plotting system that is available inside are on for this exploration. We are going to be using the empty cars data set. It stands for the motor trend cars they don't set. So we just crossed going to copy the data, a set of empty cars into a memory, and then they do a summary off that to see working of data is there in the in the empty custard. I said it has information about different cars, like the mpg sitting, the displacement, the harsh power, whether this automatic or manual on stuff like that on. Look at that. Some grows in the data set and you will see that giving grows for different type different car makes and models. And then the data for each of these makes and models the involved Judy plot. You take this library library, Jiji plateau. So this one loads up a bunch of other libraries to in early on. Let us start with start with some basics off. How would you keep locked books in the G d plot? A plotting system? You have a separation between the data that is used to plot. In other words, it's called aesthetics and the actual type of flight you are going to make on your trying to basically create this plot step by step on. What you do is, let's say the first thing you do is hear you say G plot this particular line DJ plot You say I'm going to be using the empty cars, Data said. On the aesthetics is that my exact is a smiley pal college and button and why accidents wait. And this is the first thing where I'm setting up the aesthetics and then toe that I add germline, which means the German tree line on that makes it the lane point on In stuff Genome Lander . You can use something like GM point on June box plot that in turn, would be doing a different kind of plot for me. So don't go head and explore all these combinations in G plot. Eso you can actually take, create this plot and assign it to a variable, and then you can use that variable kind of to keep adding on more things. Also the word. So let us start by plotting this one. And this is online plot. A basic line. Blood that comes out here. I just saw the departing becomes better now. How do we do a history, Graham. So I start out by saying this is my aesthetics. I'm just going to be using the excess he called the cylinder on what kind off graph I'm going to block. I'm gonna brought something called hist o gram and stuff a line. Then I'm saying, What is the bit off my been? And what is the color off my plotting Until this time EDI adding the team black and white. So the team black and white means in stuff the great background that you're getting in the earlier chart, I'm going to be using a black and white team song beginning a plain white by grown chart so that you see, that's a T three step plot. First is theist addict. Second is the type off plot you're going to be doing. And third is more decorations, a coloring. So this is my DJ plot. Uh, this is my history, Graham. Black for the cylinder. So four Celinda, six cylinders, eight cylinders, and then a count of the number of cars that fall into each of these categories. Then they do a density block identity blocked the for the same empty cars and aesthetics he called the cylinder. I'll see how our density past local identity plotters a smooth curve line that this plotted are kind of take this instagram and connects the history grams on a small straight line. Finally, we come down toe our favorite box plot in the box part again I'm going to be using as empty cars on The aesthetics I'm gonna be using is the number of cylinders. So I'm gonna be using cylinders. And why equal the mpg on what I'm here doing is by selling that type forces in the six cylinder eight cylinder, I'm gonna be trying to map plot the number of the mpg for that, And I'm also big could be coloring by using the different types. It is going to be a box block on the labs. You think life's function is used to give labels like my title and X title, and the white title on this looks like a more killing at Nice block. So you have titles in here and here using different colors for different factors. So this is a four cylinder, six cylinder eight cylinder, and for each of them, this is a box plot of mpg. Obviously, the four cylinder has a better mileage than the 16 that, then the eight cylinder on the on the right side. You also have a legend that Joe's what color is used for each of the types of cylinders here. So this is a view. Do a box plot. We did a similar plot in the base plotting system. This is all you do. A similar the box plot in the Jeep plant. Then you can do a scatter plot and DJ plot on again. You see a number of things being set up here. You see that the baby dies and three cars the ascetics exacts is mpg on the Y axis is rage . Are you plotting mpg against rate? And then you add in the second dimension, which is the color the color off. Each of the plot is based on the type off cylinder, so every type of cylinder has a different color, and then you adding 1/4 dimension, which is the gear, whether it is a manual, are automating so the type of gear they remain the shape of the dark. So it is a scatter plot exports us. Why? But each of the plot is going to be colored by the cylinder type, and the shape of the plot on the shape of each of this point is going to be by the type of gear, and you're reading labels for the no names. And then for each of the point that you have, I'm saying a geo point, so it's going to be a point block. The site of the point is going to be six, and then I can add text on the size of the Texas could be black on. What I'm using for text is that for every point, I'm also adding a label, and that label is nothing but the name off the card, so I'm putting them. So money information. Here there is X axis that is Y axis that is color. There is shape, and there it's labeled, So there's five different pieces of information that I'm putting in. This one plot on this is a powerful disporting system, so you see that mpg against waiters plug out here on then the gear type Rh gear type has a different Sorry. It's the number of gears in the car. Each get number of gas in the car as different shapes. And then the color of the cylinder also has different colors. So many different informations. The four diamonds. You're not even a fight dimensional blood that you bring up here. The labels, they represent, each of the types of the cars. So this is how powerful deep lattice you have. Okay, so, Bernie number of dimensions we scree. So money number of dimensions into a single black in here. No player on a lot with this thing. To understand more about how DJ plot works, the nexus have you do a pie chart. It's a pie chart off cylinder, basically the number of cylinders in a car on how many cars I will actually have that cylinder. This is the common for doing a pie chart, and you will see a nice black shot coming up here pretty straightforward. The next thing you want to do here is what is called Fascetti. In the case of fascinating, you are trying to create multiple charts. You're trying to debate the data into many sub data sets on for each off the sub data set. You're trying to create a plot, so let us see how this face sitting works so you can use the same data type. And then again, that is X. And why being the X axis and by being set and the color again is based on with automatic or manual on. It is a point that is a point chart, but what you're just adding to with you adding a face that grid off cylinder by gear. What that means is bison in the by gears for every combination off a distinct values off cylinder and years, I repeat every combination of distinct valleys off cylinder and gear. A separate plot is created for this, a combination. So let us execute the plot to try to understand more off how this one works. So you were you see that there are nine different plus creator. So we did one plot by Celinda Bike here. So on the X axis, you have the number off gears, three gears for gears and five years on the Y axis. You have the number off cylinders +46 and eight for each of the combination. A separate blood is created, so this particular plot is for Ah, Faisal in for fight gears. Four cylinder cars like this plot is for four year six cylinder cars on it takes their subsidy and on the plots that on then, of course, you can color it also. So it's like having 123 65 different dimensions being plotted under the same plot. So that's how both powerful this one can again get. So you have contrary look a data in various fashions here on day. This is how do gored and try a lot of these combinations on DJ plot and sea waters you can do with DJ plaque are moving on to the next example. I'm trying to create what is called a heat map. How do we create heat maps that say, I'm going to just create uses libraries, graphics and G R devices, and then I'm going to create a heat map on empty cars on I'm going toe. Just do heat map and see how it comes a bit. You see, this is a heat map off the different rows here are shown in in in the in the y axis that different columns are shown here on the X axis. One for each of the columns, depending on the value in the column. A heat map So every column here is a heat map in itself. So the higher the value, the more read that particular boxes, the lower the value, the more below the boxes. So that's a heat map being created here for every row on for every colony. This is basically the entire table in itself and then based on the values it is giving you and heat maps every column there is a heat map range that is created and discolored based on but within that particular column, the values high art low. So this is a phenomenal, funny thing to show a lot of things and a compressed in one single heat map. You can also dough time serious plotting again. The rest. Let's say I have a file called Time Cities that CSB I'm just going to load it up here. Okay, I think I understood the I'm not able Lord because my said working directories not set. So let's go do that and come back here and now Doha read time serious dot c as we know it loads and then let's look at how the time serious looks like. So it is basically a time undervalues date and a value on a date and a value for every day . There is a measurement that is being set here. I'm just going to convert the string. That is a presentation of the date into an actual date. Variable By running this command, no come back and words to actually date variable. And then I can plug the time serious in the G plod using this command and never see how this one looks like. The dates are shown on the X axis and the values on the Y axis and this is the lane blocked . Now you can also do a box plot by different months. So this is a box plot by Mons, where every month here is like a box blood here the month values shown on the X axis on the values shown on the Y axis and your you see some formatting being done. The aesthetics, like the date giddy plot, can actually block the Time City is this way and show some results. Then we come toe another powerful function, which is how do a plot stuff on a geographical map Isn't that pretty cool? So there is a nice library called Dede map that is used for plotting things on a world map . So at first I load the library. I was just loaded. Now, now I want to just plot our Indiana hotel abroad in India. Very simple. So I just create a company called a que map, and I'm just going to sport a string. Keir, inject India. I'm just going to put asking. So if you go to Google maps and start typing altered and strings and go will map So just by itself, it is exactly the same thing happening here. I'm just giving the world India and go figure wearing the ace it does by itself. And then I give it a zoom level, which is the zoom level for the map. The way you zoom in and out off the Google map that the same thing and the legend is in the bottom so on, I actually can store it in a variable here. I'm going to store it in an India map on then I'm going to see India map. What you see here happening inside? You can see that once executed map it is actually going to the Google a PS on downloading the map in the same map that you will get if you go into Google Maps and actually print a string India and said the zoom level to fight so it's don't order. It doesn't map on now. I can just blood it pretty easily just by calling this indie Emma and then it is going to come up and show me the Indiana know pretty cool stuff. Very simple command. You won't even expect this command to be this simple. Now, I'm going to be doing some more stuff here for which is I'm gonna be plotting some places on a holder plot things on a map, and that's what I'm gonna be doing here. And for that, I'm gonna be using your either It is a point off interest data set, and there's gonna be loading that point off interested a set and just to a somebody off this point off Interstate A set on what does point off interstate us that has basically is a name of a point off in just like a school or looking at the stream of such judge on for each of the point of interest, it is going to be having the latitude and longitude. These are all points off interest. But in California, in the US and as various individual items and the latitude and longitude of each of the items on, I'm gonna be taking these and actually we be plotting on a full scale map. The first thing I'm gonna be doing this. I'm gonna be creating a California map by using this Q map. Just partners in California on the map type I want is the satellite map type. So I'm just creating that and again goes to Google Maps and purchase it. And then I can plot it here. And you see that? No, I'm getting a satellite map off California. Next I go and get another type of map, which is the column at No. The map type is called the tonal tape, which is something like a black and white print of Russian. Makes it easy for you to plot for the same zoom level, no player and with this with different um levels and see what you get out offered. And now I'm going to be plotting the actual point. So I'm just calling calling map. Plus, I'm gonna be doing Joan Point. I'm gonna be plotting the longitude and latitude, and I'm going to be coloring them by the type off. Mind off interest on the shape of the circles. The size of the circles are given here and what data I'm gonna be using, I'm gonna be using in the same year. I data set at the point off interests are in our port dam reservoir in town. I'm just gonna be speaking these four different types of data on I'm gonna be This is why I'm using a filter here at the point off. Interest in innocent operator is an innovator in this list. Only used this data said, and then plot the long you'd and latitude on this map and colored them by the point of interest. And then I can select all of them and then run it. And here you have all these points blood in here, and you can actually see the California map with all the daughter lines on the hidden in the bottom tells you the color and the type of point off interest that is being plotted here. I asked my just how cool this one low. So how come Blessed list one looks? It is a pretty easy thing for you to plot points on a map if you know the longer and latitude are forgiven place. So these are some of the examples off our programming. I hope this is all pretty interesting to you. I do recommend that you start do Gordon and player on with these commands. Try a few things so that your understanding off this comments improve and increase. So I recommend new ways doing a lot of self exercisers on how you use these arguments. Thank you.