Transcripts
1. Introduction: Welcome to the class R programming for data science and machine learning. If you look into the job market in 2020, you will find that most of the jobs are either in machine learning data science, in Python, RR, programming languages because by TA1 and are the, are the most popular programming languages used in most popular technologies in 2020, that is, data science and machine learning. And in this class, we will be learning about our programming from basics to aid was so that you can use these are programming concept if you want to defeat your data science or machine learning models further. So this class, we will start from Artist's Coalition and take you to the most advanced topics that as a vector manipulation, sorting of decision-making in NADH, and data manipulation and data analysis in R programming. So and with that, you can also create your own data visualization data graphs using our programming. So if you are interested in making a carrier in data science and machine learning and you want to learn R Programming, then this is the right class for you. So if you are interested and excited, Android Mao see you inside the class.
2. Why Learn R: Hello and welcome. So in this lecture we are going to learn why, why we should be learning R programming language. So to answer this question, we have to go through two things. First thing is what is our, and why we should learn up our region's behind learning programming. So let's get started with what is R? So R is the most popular language in the world of data science, data analytics, or aesthetics. So it is heavily used in analyzing data that is both structured and unstructured in nature. Nowadays we are getting huge amount of data. And that is called big data, which is mostly unstructured. And if you want to analyze those big data, you can easily do with the R programming. R, R. R with R. R is a programming language and software and woman for statistical analysis, graphics representation, and reporting. Our watch created by Ross yamaka and Robert Gentleman at the University of Auckland, New Zealand. And it's currently deployed up by our development core team. So R is invented by ross Yucca and Robert Gentleman. And that's why it's name is art because it's in winter name. Start with the art of Ross and Robert and they named it based on their name. And it is called programming from their regions to launch, our R is open source and freely available open source software you can plug and play. And if you want to contribute to the art that also you can do. And it's the GNU General Public License. So no need to pay anything you so freely our level. And that is the best region to use our programming. Cross split pump compatible. So whether you run the program on the Windows or Linux or Mac OS, doesn't matter. It will run seamlessly and it will give you the same result on any of these split one are a highly flexible and evolving, is flexible in nature. And it very much evolving and it is currently more than 2 million are much more than 2 million users start using our programming. Industries and domains. Widely used our programming. Like you name the industry and you'll find the uses of our programming that like financial domain, they use to detect the fraudulent transactions in the telecom domain. They are used the R Programming to profile for their subscriber profiling. In the biological domain, you'll find the competition or by logic to perform genome analysis to many, many domains they are using. And it's a huge community, as I told you, 2 million to just under a huge developer community. And odd is having more than 10 thousand last packages. And lacks of inbuilt functions cat into diverse need. So whether you want to perform a simple sum operation or you want to find the mean of some functions, some numbers. You want to do, some more graphical representations of your data. You can do easily with the R packages and input functions. And our packages are great for data manipulation, data visualization, machine learning, data science, and the statistical modelling, imputation and whole lot of other packages are available to play around with. R is great for visualization. And R packages like g, g plot two gives you great visualize yourself. So you can visualize data easily with the R programming. And many major companies like Facebook and Google they are eating are for their various needs. Od is equal to language. What is technical analysis and data science to R is widely used in data science, machine learning, data analyses, data mining, all those things. So odd Easter core to language, but all these things. And if you want to learn, if we want to go into THE data science and machine learning, a field, you start with the programming, because if you know the basics of our programming, you can easily go and launch our data science and machine learning concept and you can implement those concepts, those algorithms with our programming Eataly.
3. R Installation: Hello and welcome. So in this lecture, we are going to see art installation process. And at the end of this lecture, we will be able to run our programs inside our console. So let's get started to install R on our machine. We need to go to the website. Cr AN cran dot hyphen project. Dot ORG. Cran dot hyphen project or do RG, which is the official website up our programming. And you can see Comprehensive R Archive. When you come here, you may see a different piece if we are coming under different time after a few months. But moralists, you will see the Comprehensive R Archive Network here. And then here you will see download and install R. And here you will see the different operating system options that you can download R4. So here we are on the Windows machine. So I'll go with that download for Windows. If you are on Linux or if you're using Mac OS, you can go to the corresponding operating systems. So we need to click on the Download R for the respective Operating System. From my case, I'm clicking on the windows. And here you can select Install R4 first-time. Okay? And here we will go with the download R four dot or dot for Windows conduct but are not odd. Hyphen, poor dot O dot two hyphens. When dot EXE file will start downloading, it's 83.6 m v phi. And my internet is bit slow today, since morning. So taking some time, if you are on a high speed internet, it may take few minutes and dot EXE file will be downloaded. So just wait for it to get downloaded. And once it is done, we will start the star lesson process. So now the dot EXE file has been downloaded. So what we need to double-click on that and the prompt will come to install the art. So we just need to click on Yes to allow the admin access to the installation process. And once you do that, the SR Listen prompt will come like this. And here we need to select the language so you can select the language of yours and click on OK. Now we need to asymptote Thompson condition like GNU General Public License Agreement. So you have to read it and then we need to click on the Next. And now you can select a directory. I am keeping it in the c file by default directory and we need to click on Next. We're seeing already exists because it was already there. So I just click to install in anyway, click on is. And yet you can leave it as it is and click on Next. Click on Next. And here you can, if you want, you can select it to not to create the Start Menu folder, okay? If you want, I want to be created so I'll click on Next. And here you have to select, you just select the data decks, stops or shortcuts so that you can launch the are easily. Whenever you want to write program, you can just click on the text absorbed, got Anglican, gets started, reconnects and know the arm is getting installed on our system. It may take a few minutes. Highly, It will take gut goto three minutes. At most. Ceos, I know the R4 windows four dot, dot, dot, set of regard and it is saying it is finished. So just click on the Finish and odd is installed on your machine. To verify did odd is stored or not, you need to click on the Windows Start. And human just need to scroll down to check whether theta naught. See here, this is our for loop. And when you click, you click on the art folder, you will see that two opsins, r i dot r, three at D6 and 64. So if you are on Windows seven, Windows ten, and higher operating systems, you can go with the war would virgin. Otherwise you can go with the R36. Okay, so I'm under 64 bit machines, so I'll click on this and R will be lost. So this is the GUI and this is the R console. And here we can start writing our program. Suppose I'll do four plus five and it really gave me nine. Simple hello world program for art. So we need to write blamed. And then in the single quote Hello World, and it will print hello world. So see, you know, this is the simple Hello World program in R. So this is how the art programs can run in the R console. But for this course we are not want to use our console are RG way street. We are going to use our studio, which is an IDE for R. And that is more preferable and better option to proceed with the R programming. So in the next lecture, we will start downloading and installing RStudio seeming sector next lecture.
4. Installing and Exploring RStudio: In the previous lecture, we have downloaded and installed on our machine. And we have seen how to walk with roughly when the are console right here. But we are not going to proceed with this. So we are going to download RStudio for this course. And throughout this course we'll be using RStudio. So RStudio is an Integrated Development Environment IDE for R programming. And with that, we can do things easily and we can manage things in an organized manner and ID, they help a lot in programming. So we can code with ease and we can see the results with these. So next week we are going with the RStudio. So first thing first, we need to download the RStudio. So to download RStudio, we need to go to the studio.com. So this is the official website of RStudio. And when we reach RStudio website you'll see the very substance like product solutions, customers resources, and many things. And at the top you can see the downloader will not click on that. Before that, we'll just see what are the products of level at RStudio. So there is RStudio, the premium IDE for R, Then Irish to-do star. What is this sign is already there. And RStudio packages are also there. So we're going to use RStudio, the ID, and then we will be using R packages for various uses. When we explored that it assigns machine learning all those things, then we will be using R packages. Okay, so i'll just to do an R packages we are going to use. So please click on our studio. So I wish to do is an integrated development environment for our. It includes console syntax highlighting editor that supports direct code execution as well as bulls plot, plotting, history, debugging, and what? Place management. Okay, so all this things we can do with the asked told you we are going to use the latest release up RStudio one dot three. And there are two versions of RStudio of level that is honest-to-God, x-dot and RStudio's server. So we aren't going to use, the other students said, well, we're going to use RStudio on our next job, so we'll click on that. And when you see other storage extra, again, you'll see the two options, open source edition, which is basically a free thing, and RStudio Desktop Pro, which is what commercial use. And for this RStudio Desktop pro, we need to pay $995 Body. We're not going to use this commercial one. We are going to use for this course. Though, RStudio, fully open-source version, that is a GPL version three, free licensing. So I'll click on the download artists to do the extra. Click on that. And when you click on that, you'll come to the harsh to do download base. And here we can choose this free version and click on Download. And when you click on Download, it will take you to your respective Operating System. Was done so year Hal download RStudio for Windows. If you are on other operating system, you can click on that like o12, macOS, whatever operating system you have, you just select that file hand click on download. So I will download RStudio for windows and click on that. And artist studio dot EXE file will be downloading. It will start downloading here. So I'll read for that. So this artist studio dot EXE file has been downloaded, so I just need to double-click on it. So artist reduce setup wizard has started, so I just need to click on Next. Next. You just need to click on Next, next time it will be done not only to do anything extra and just click on Next, Next, Next hand. It will be done in couple of minutes. So the artist studio setup has been done. And just click on Finish. And RStudio has stark completely only Alesina zoologists. So Well, look like this. Here you'll see the opsin to write your script. And this is a console where we can see the results of those scripts. And then on this right top corner you can see the environment history. Whatever the command and script we are running those extreme, you can see then the connection and then tutorial. So if we want to learn about particular package or something, you can go here and learn about that. And then here you can see the files. So on the working directory files will be shown here. Then the plots when we use blogs and all graphs. And now if we run something landscape for plotting something as a graphical things, all those blocks and all will be sown here. The packages which we use in our script. Those packages will be available here. So all the packages that are installed will rezone here. And you can just select. And if we want to remove that backend, you can remove from here. And if you want to learn more about the package, you can click on this link and you can learn more about the packets. So here you can see the package name and then the sort description of the Bacchus and the origin of the Bacchus and then explore and then delete, remove the package option is there. And if you want to install a new package, you just need to click on install a new package, name your prosperous way and it will be downloaded. Okay, and then here there is a help ops and you want to learn about our artist studio. Audio. You want to go to the manuals are large to do mammals, you can go here and get the help on any topic, okay, then these things we have seen connexons and tutorials and all will be HIEO. And from here you, the three doors. You can see, when you click on this, you can select the working directory that you want to sit for your RStudio. So suppose selector, I parented 20 as a working directory. So I'll just select that. And then we need to click on this more opsin and Set as Working Directory opsin is dead, you just need to click on decks. So senior set w, w is the command to set our working directory. So senior set d column are printed 20. This tree has been working directory. So if you don't want to do from here, you can use this command to set the working directory, okay? And then here you can create a new folder, you can delete, you can rename. Although dropstones at our level here and here we can just write the scripts approach by plus five and this R1, you can select this line and you're gonna click on run and this screen print, this statement will be running and you will see the result here for then simply, you can, if you want to print something, print hello, RStudio and click Run, you'll see that there's LTL Hello, RStudio. Okay? And in the history you will be seeing all the commands that are being executed. Okay? So these are the things. And this file you can say we want to save this file, you can click on save, and it will be saved in your current working directory. So you can give any name first. And it will be saved data R5. Okay? Similarly, if you want to create a new file, you just need to click here and you'll see the artist's group if you want to click create artist script file or notebook or R markdown file, we will see what is R markdown file. We will be learning about this Agile. Okay? And then here, if you click, you can create a new project or new directory. You can go to the existing directory and here you can see the version control. Okay? So these are the various options of level. If you click on file again, you'll see the new file. Scrape. Mark down all those things you can see here. And gold plots that we have seen the same thing. Build, Debug profile pulls a noun. So we'll explore all these opsins when and where needed. And we'll explore all those things for now. This is the script RatingBar, Daniella, we'll see the result and here we can see the current working directory and blogs and all. If we use, you can see packages can be installed and our huge from here. And here you can see the history and all. Okay, so this is all about RStudio. And now we are ready to start with our programming with this artist studio IB. So from the next lecture onwards, we are going to learn our programmings. Or Stephen said the next lecture.
5. First R Program and Operators in R: Hello and welcome back. So in this lecture we are going to write are forced our script file or first our program, and we'll see some syntax of our programming. Okay? So first thing first, we need to create our file. So I'll act created in the previous lecture fostered out. I'll just close it. And we just need to click here. And we'll select artist script. Or you can, alternatively, you can use Control sipped can create a harsh script plane. And here, what I'll do, I'll try to save this by clicking here. You can put controllers and I'll give this name forced our program dot part. Okay? So dot-dot is the wild extends and programs. Ok, I'll just clipped. So I'll just click on save. So now first part programs script file is done. Now what I'll do, I'll just start with some variables. Suppose X, X is and variable here. And I want to assign it a value of seven or eight. So here we can assign a value using a less than sign and hyphal thus. So x. And when we use less than desk eight, it means that this object in our programming, everything is an object. So we can call as an variable, or we can call it as an object. As an object or variable. So x is an object and we are assigning it a value eight. How can we check that it is having no value eight now? So when you click on Run, you'll, you'll see and see here on the global environment and one-month heavier, you can see x values x. So x object is carrying an ALU, it. We can simply use the print command. And we can put x inside. And then when we run this, we'll get the value eight. Okay, we're getting the eight. So know what I'll do. I'll simply write a program to add two numbers. So I'll use x equal to eight. Why? So see what I did. I assigned it to x, nine to y, and then I have used another variable or object, GED. And I have assigned x plus y equal to jet. And what I did, I print Jedi are so we will get the X plus Y in jet. So jet will be getting x plus y, x plus eight plus 917. So we're getting value I just 17. Similarly, what else do? We can use string as well. I'll use my string and assign some string value to it. Suppose my anything, okay, so this is the string I'm assigning to myString value. So now the myString object will be carrying what my home, right? So what I'll do, I'll just try to print it. And I'll try to run this executed this line and hours executing this print statement. So we need to execute line by line, okay? And if we want to execute at once, you just need to click on the source. So click on Source and everything will be 17 for this print. And then my home it results for this string value. Okay? So this is the way I'll show you one more time. Suppose I am using x 910. And I mean you didn't print xcr. And if I run this, I'll get eight. So because this statement I have executed earlier, so I'm getting X4 eight. The new value, x equal to nine, x equal to ten is not executed at soar. To execute that, what I need to do, and it will run this line. And if I run this statement again, I'll get ten, right? See, if I run this whole source. What I will get first, I will get the value x plus y 17. Recall here, till here, X value is eight. But when we proceed in the program, x-value is reassigned to ten. And that's why we are getting the x value is ten here, okay? So the myString value is a string variable and X, Y, j, the other numbers, okay? And suppose something if you want to, suppose I don't want to use x equal to ten, I can come down by using casts. Ok, so now this statement will not be executed. So now if I run this whole program Hall scraped, what I will get, I will get x equal to eight, okay? And not get ten less than the CEO. Now I am getting x where Louis XVI, Louis 81. If not ten, if we remove these commenting, again, I will be getting ten. C x value is Covington. So if you want to comment on something you can use has and that the statement will be commented down. Similarly, we can perform all the mathematical operators on the variables. So suppose I want to use foreign to five, and we get 21 to 20. If I want to use 25 divided by five, I'll get five. So all these things we can do, we can perform all the mathematical operator C. Suppose I want to use 25 divided by two, I will get 12.5. So all these things we can do here, we can also use exponential things, as we'll suppose I want to use three, exponential, two, exponential two, we'll get nine, right? So then this C, a will be getting nine. Similarly, if I use 25 exponential 25, how much we get? Let's see. Okay, so this is the big value I should have. Use 25 exponential two will get 625, I guess. Yeah, 625. So all these mathematical operators we can, yeogiyo supports. If you want to use six exponential one, you'll still Get six, right? If you make two, will get 36. So all these mathematical operators and calculations we can use in our, as usual, okay, see we live, we can perform subtraction as well. 78 minus six will get 22. So we have seen like Edison subtract x1 multiplication exponential. And we have spacial operator that is more or less also. So suppose I use 45 and model plus we'll use two times. So 45 modulus, suppose I use five. So 45 modulus phi will give 0, right? Yeah. And if I used 45 modulus four, I'll get one C. So and if I use 25, more or less, three, I'll get four, sorry, one. Okay? So modulus will keep that remainder. So three divided by 25 divided by three will give us 13 into it, 24 and then we'll get the one as a reminder. Okay, so when we performed division, that reminder will be given as a modulus result. So these are the operators we have seen in the sense of Jackson multiplication exponential, and then the modulus.
6. Data Types in R: Hello and welcome back. So in this lecture we are going to learn about data types in R programming. So unlike in other programming languages like Java or C, or C plus plus, where we declared a variable, we declare the variable without data type, right? Suppose we want to use an integer, a number value one will declare that int x. Right? Now, float y. I'll character x, right? String. Why string? A string like that, right? But in our programming, we don't declare variables with datatype, right? Suppose alpha I find here, I have assigned x are ten, right? I have assigned ten to x. So this object is assigned the value obtained. So the data type of this object will become data type, data-type of this number ten, right? So this variable of this, suppose this is ten as an integer number. So this object data type will become number. If I assign x equal to some string, this will become the character string variable. So we don't decide upfront, whatever we assign to the variable or object, that object will become adept at type. So it's more or less like our dynamical mirror type rate. So we don't put the data type up front. It will be decided what kind of value going to that object. And that object will become that type of value of an object, right? So let me create another file where we'll be seeing the datatype. So let me clear this as well. So there are basically six types of R objects, and they are like vectors. Vectors, then lists. And then we have, that is and then we have matrices. And then we have factors. We have factors and then, sorry, okay, and then we have dataframes. Okay? So these are the six our object types.NET we have in our programming. So let me explain these data types one-by-one on object types one by one. Okay? So the first thing, we'll see the vector. So vector object is the simplest object in R programming. And suppose it has six datatypes. So we'll see, suppose I'll use. X equals x. What we like it. And support that if I want to know what this x, what they can do, right? Canoes blamed, and then I can use glass, sorry, inside the print x. Okay? And if I run this, I'm getting plus xs logical. So this is the axis logical data type. Ok? Similarly, we have numeric. Suppose if I assign X equal to 90, and let me just copy this. And I'll put y here. And if I run the whole source seal, the class of Y is suing numeric, right? So now, see we are not assigning or we're not declaring the datatype of y. Whatever value we are assigning based on that, the variables are, the datatype is being decided. Okay? So here we have given a logical, so it is logical, and here we have given numeric, so it is noetic. Next thing is integer, right? Suppose I'm 90.9. So what would be the C? It didn't nomadic right? Now, the next thing is called seen integer. So suppose I use r, t, and I'll assign somewhere else, supports 34 L. Okay? And then I'll use print plus, plus sub k. So when I use print class up t, What delegate? Cme, I'm getting nomadic. Sorry, I'm getting no class up. This integer is an integer data type, right? Similarly, we have complex suppose and declared a complex number, three plus five. Right? And when I put class and what we like it, see the Class F, i is complex, so this V, each complex number to plus phi by similar list of balls and assigned C as our ALU program. And if I bring the glass up, see what it will be here to be character right? Now I'm getting the class up, sees character. Next thing is support the values and character to draw and, and use Java. And if I print class and you want to be, it would be. Let's see. Okay. Next thing is the sewed. Our objects are called vectors. Okay? So these are the vectors types like glass, I'm WS class up C is character class of v's complex number a plus up D is integer class up, why is numeric? And plus up x is logical. So these are the data types are object types, you can say in our programming. So let me save this file as later types. In the next lecture, we will see how we can create our vector, okay?
7. Creating Vectors in R: So hello and welcome back. So in the previous lecture we have seen the different data types in R programming. In this lecture, we are going to learn about vectors, how we can create a vector W0. Most people element, more than one elemental control script can. I'll create a new file. And in this we are going to write the artist, therefore creating a vector with multiple element, more than one element. Ok, so what I'll do, I'll try to deviate and rector hockey. So suppose I wanted to create a vector quantity and I want to assign it to a multiple of l. So in our programming, when we want to create a recto with multiple element, we have to use a function called C function, right? So this c function will allow us to create eigenvector with multiple elements. Okay? So we'll use C and open and close bracket. And then inside that we can write our elements. Suppose I want to create countries names, of course, L port, Australia, comma retain. Okay. South Africa. Support these other elements, some connecting enzyme disruptor. Okay, so we can use the function OOP, Create and vector with multiple elements. Right? Now, I just huge print and I've put g. So this will print the values inside the director. So let me run this. Let me save this file. And I'll save this freelancer force vector dot card. Okay? And all these files will be our level inside this current working directory that we have zip and I'll act as these files. So already you can download and you can walk on that. Okay, so liquid on this source while See, ya know, we let me clear this console and run and run it again. So see here now we are getting the value for t vector as Australia, Britain, sort of McCandless, Yeah, right? So this is and vector with elements inside it. We can, we have seen how, Sorry, we have seen how we can create a vector with single element. And it will give us the following day rate. It will give us in the audit. So this way we can create and vector then the object with a single element and we can use C function to create and vector with multiple element. And suppose if a huge print glass of G, what will be the output? Can now guess, let's see. Character c. These are connectors, so it is giving us the t vector S character. Suppose if I change this D, o, sorry, let me first printed this class of D. So the class uppercase character layer. Now what the LU and just change it 200. Okay? And I'll try to try to run this. And then I'll try to run this last statement. See, ya know, we are getting the glass up. D has a nomadic, right? If I change one, grantee, 3779. And if I run this source for L, what we get, we will get the 12 to 3779 and the class of G is now changed to numeric. So whatever data you are going to use based on that data type of the vector will be decided knocked up front like C or Java programming. Okay? So this is the difference between C, Java and heart rate. So whatever you use based on that, it will be decided.
8. Sequence in R: In this lecture, we are going to learn about sequence. We are going to learn about the sequence bunks on him. So what do we do with the sequence function that we'll see? But before that, let me tell you one more important thing that I have forgotten to include in the previous lecture, and that is, suppose I am creating a vector and using C function. And if like here, I've created a function Eugene Wally numeric values. So the class of this will be numerically because all the elements are numeric. What if I'll give one character and then I'll give numeric, and then I'll give a logical, and then I'll give an integer. And if I run this, and if I print this vector c, you know, hello 67 through everything into a quotation. Why? Because if we, if we create a vector u in c function and one of the element is of character, all other elements, whether is numeric integer onto logical, it will be converted to string. So if I, let me just copy this. And if I put class of wave, it will be coming as a character. So all other elements will be converted to the character if any one element is character, all other numeric God, logical, integer, all other elements will be converted to the character and the class of debt vector will be character. So remember this, if all numeric, it will be numeric if all integer class will be integer, but if it is mixed of integer and character, the integer will be converted to the character. If any one, any element is character, everything will be converted to the character. Okay, so that is to clarify. The next thing is I'll create a new script file. I don't know why this is not going up. Whether it's something here. Okay. That is some kind of eras and on all y. So now we are going to learn about the sequence. So in R we can create a sequence of numbers like suppose I want to create 12. I want to print numbers one to ten. I can do that in two steps. In two ways. I can do that in two steps, but in two ways I can do that. I can use a colon operator or I can use sequence function. So first thing, what I'll do, I'll try to create a sequence of numbers. And I'll assign me to some value. Suppose I want to get the eta sequence. Suppose I will say it, See, I'm creating a variable c that will be having value. I want to assign it a value one to seven. R12 pain are one to a 100. So how I can do that? One colon 100. If I do 100, and if I run it, what it will do, it will deviate sequence from one to a 100. Let me see. C. 100 numbers has been generators, so it will create a sequence of numbers from one to a 100. So let me for the console on the right. So now it will be easy to understand and write the accordion and output will be here. And let me, okay. So now, so what we need to do to create any sequence, we need to give the first number from where we want to start. And then we have to give the ending number. Suppose if I give ten and if I run this two, it will create a sequence from one to n. So if we want to create a sequence, you can use this colon operator and you can get a sequence of number. Suppose I want to create numbers from one to 20. I want to create to, to 20. How can we do that? If I run this, what we will get to see, we will get 2468 up to 20. So each and every element, a sequence numbers like one will be multiplied by 22 will be multiplied by two. Plague that we'll get, right? Similarly, I can create a sequence of, suppose I want to create a sequence from 2.5 to four. I can do that. If I print f c Here, I am getting 2.53.5 because next one will be formally, so it will print the sequence from 0.5 to four. And if I use here, what could we be giving the echo 0.53.5 up to 39.5, right? The same way. Suppose I want to create a sequence from two colon, 20 minus one. And if I print d, can you guess what will be the answer? Let's see. Now we are getting one to 19. Why it is coming one to nine, because the one will be minus from this 120. So it will actually generating number. From one is to 190 minus 1120 minus 119. So the both will give us the same result, right? Okay? The next thing is, we can do it in a different way as well. Cleared out boot. I can use sorry. I can use a variable a, I can assign a value eight. And then what I can do, I can, I want to generate numbers from one to eight. So I can, if I run this to what I will get, I will get a sequence of numbers from one to eight, right? So this way also we can do, we can assign now value. It puts a variable or object e, and then we can put one is to it, instead of punished to it, we can put one is two a. And this will also give us the same two delta t 128 numbers has been generated. I can put one minus a as well and see what will the result 0 to seven because it will one minus1, 08 minus 17 to 0 to seven sequence will be generated. If I put a minus one into bracket. And if I try to run this, now I'll get one to seven because this will be executed first, okay? So it will be seven and the one will be one because we are not subtracting this one from here because it is in the bracket. In an, in art, the bracket will be given the higher preference. So this will execute first. So one is to seven. It will generate numbers from one to seven. Apart from this, we can use sequence to create a sequence of numbers. So suppose I'll give, if I write as d q and I'll give one comma, one comma five. What result I'll get? I'll get the sequence of numbers from one to five, right? Similarly, if I give a sequence of suppose nine, and what result I'll get, I'll get a sequence of numbers from one to nine. Here, I am giving one-to-five means from one to five. And yet if I'm not given the first digit, what it will take, it will generate from one to nine by default, okay? That, that is the default nature of sequence. Okay? Next thing is we can use this function in a different way as well. I can use from a, suppose 82 equal to 32. And if I run it, what it will do, it will generate numbers from eight to 32 to 32, okay? Similarly, we can also use sequence as I can keep the front number. Then I can give that number, suppose eight, and then I can give two. So what this two will come? It will, this is known as the step. So let me write it here. Sequence two comma, suppose 12. Step two. So this argument is step. So let me run this first and then I'll explain, CEO, what we are getting. We are getting 24681012, what it is doing, Institute of creating a sequence of 234 up to 12. What it is doing, it is creating, it is giving us the adding two to four. There is a gap up to right, because we have given the step value as two. So two plus 244 plus 26, six plus 28, there will be difference up to between these two sequences. So that is what the user see. Step augment. This is the step out when the system step argument, right? We can also see another example. Suppose I want to create a sequence of ten from ten to 25. And I want to increase the sequence by wine 75. So let me run this. Sorry. C, l, what we are getting, we are getting that sequence of numbers are getting started from ten because you have given ten and it is going till 25. And what it is doing, it is adding 0.75 in the each values or ten, it will start then it will add 0.75. Then in 10.75, what it will do, it will add another 0.75. So learn 0.512.25. So this byte, 75 minute, it will increase the value by 0.75. Suppose if I do this way, three, what it will do, it will add like 101316192225. If I increase this by value to five, what it will do, it will 10-15, 20-25, like this. Ok, so this way we can generate the sequence. There is another thing that is called length. So suppose I want to create numbers from 25 to 50. And I want to give length equal to six. What it will do? See, yeah, it is giving us starting from the 25, then 630, then 354045, and then 50. So it will generate sequence of six numbers between 25 and if I make it, now suppose, then what it will do. It will generate numbers 2527. So what it will do between 2550, it will generate ten numbers with equal like 25273030 points, something like that, okay? If I gave it a 100, it will generate hundreds of sequences, right? See you? If I give it to one, what it will do it only one number between 2550. If I give two, it will generate two numbers, D5 and 50. If I give three, it will when divided by three, C, 2537.550. Ok? Like this. So it will try to generate three sequences between 2550.
9. Replicate Function: In this lecture, we are going to learn about Replicate function. So what Replicate function will do? So we'll use the REP, REP, replicate and when we want to replicate some things. So let me show you with an example. So suppose I want to create a sequence where I want to repeat five times. I'll use rep function and I'll give five that I want to repeat. And here I'll give times, so I'll use times eight. So what this Replicate function will do, it will repeat 58 times. So we saw you see five-by-five, eight times. Okay? So this Replicate function will replicate the same number. Replicate means repeat it politically, 58 times. Similarly, we can use character says, well, suppose I'll give ds and then I'll specify times equal to three, or it will print three VS TS, ds, ds. It will create a sequence of bs, ds, ds three times. Okay? So remember this, whatever we are giving here, it will be depleted by number of times, okay? In the same way we can use, we can pass them. Some object to the application function. Suppose I want to, I have created so far Replicate function. I want to create a valuable are where I want to generate sequence from three to suppose six. So what is this 32326 will do? If I print out it will be connected three, 4-5-6 rate. Suppose I use a Replicate function here, and what I'll do, I'll pass this on to replicate. And here I'll give times twice. So what it will do this will repeat this sequence from three to six twice. Okay? So if we run this, what it will give, it will get 3456 and then again, three to 562 times. If I make it three, This sequence will be repeated three times. So 3456, again 3456, and again 3456. So number of times it will be repeated, the whole sequence will be repeated that many times, right? In the same way we have, sorry, we have another argument here in the replication function. That is, suppose I'll use our same object are, and I want to apply each to it. If I apply each year, I have applied times equal to three. Now I'm applying each equal to two also what it will do, let's see. Suppose I'll run this. Say you know, what it is doing. It is that each element from two to six will be repeated twice, and then it will go to the next number. Months means three, twice 33, then it will work for four, then 551066. So here 33445566. So each element will be depleted and then it will proceed to the next element. And earlier in times what each element will be repeated. And here each element will be repeated. And then it will go to the next element. And here the whole sequence was repeated, 3456, then 3456 again, and yet 3-3. Then pour for, suppose I put it three, what it will do, it will each element will really projectory 3.3.3, then 444, then 555666. So this is the difference between times and each replicate functional. In the Replicate function there are two arguments times and each time we'll repeat the entire sequence data many times. And each will, each element will be repeated for this many times. And then the sequence will be created like 333444, like this.
10. Accessing Vector Elements: Hello and welcome back. So in this lecture we are going to learn about existing vector elements. So we have seen how we can create and vector in our null or alkyne heat and vector handle retry to exist that any went off rectus. We'll see how we can X is the vector elements in odd. So for that, I need to create a file handle, gave it a name. It says req dot. Ok. So now what I'll do, I'll create an array with a name, month and for the months like January, February, march up there like that. Okay, so I'll use the c function and I'll try to store the values Jan. So what I'm doing here, I'm creating and vet down and I'm storing that rectangle. Object. Month, okay? And how much storing the values january fad money up to, okay. Now we have created and recto how we will access the elements of the vector. So for that, let me clear the console here so that we can see that. So if I run this and if I run month January to December, okay, so now what if I want to x's knob at Braille? August or September like that. Okay, so for that, what we need to do, I can create I can create another vector support month two. And for this, what I do, and it was the capital of bracket and now you just see function. And inside the c function I'll keep the index for the, suppose this January 1234, elegant for gama, suppose seven, comma nine. Okay? So I'm, I wanted to exist a month for 79 and e phi print month two. So if I run this tool, what do we get c? Now I'm getting as July and September because the index is four. And then 567 for July and for September eighth, nine. So this way we can call our x-dot elements of vector function. Okay? So vector object we can access legs would be if we want to access a particular element. You can do that if I change it to three. And phi. And if I run this wish statements, what we get to see, yeah, we'll get march and then non-white and amaze me. So like this, we can access the vector elements. We can also X is the vector elements using logical using logical indexing. So how to react? So suppose I'll create another object, months three and supported up which month I wanted to access a month. And the capital bracket and inside the c function, what I'll do, first I'll give support through and then I'll give false. And then I'll get false. Then I'll give true. So suppose I'm passing this true, false, false, true. And if so, sorry, print month three. What will happen? Let me run this and see the result. What we are getting, we are getting Jan. The first month is Jan true, so it is printing here. Then the next false, false. So the February, March will be false, wins it will not be, we're not accessing the federal body and marched, right? So these two values will not be there to February, March is not printed here. Then for the fourth one is true. So we had a getting to four. And then the desktop element we have not given. So it is coming as of May, August, and September. It is at the printing. So suppose it is again, doing the same fight on this. Again, a mayor gusts up the MN and the somewhere. So it is repeating the thing. Okay? Next thing is what we can do. We can use negative indexing. So suppose I'll create another object, month for and what I'll do, what will assign for the month for alcohol month. And and you see function and I'll give minus2 minus five. So what this will do, brent? And if I run this two, what will happen? See, I'm getting Jan, March and then made an arc that's February and May not printed because minus two means the second indexing minus2 that each February, February, Lord be, we don't want to access minus 2n minus phi means the second month and the month we don't want to access an except that everything else will be printed. Ok. So February and May will not reprinted. See here, February and May not printed, except all others are printed. Minus 12, then the somebody will not be printed as well. So now the longer the somebody, somebody's not there. So if you want to access elements and if we want to leave some elements, you can use dot minus for that index and it will not be brain dead.
11. Vector Manipulation in R: Hello and welcome back. So in this lecture we are going to learn about vector manipulation. So we'll see how we can form a reason within the vector elements. How we can add two vectors, how we can subtract two vectors, how we can perform the region within Twitter. And we'll see how we can do multiplication of the vectors. So let's get started. So I'll create an R script file and I'll give it a name. Manipulation. Okay, so let me clear the console here so that we can see correctly. Okay, so first thing first, what I'll do, I'll perform automatic office and within two with two vector so forth. Thing for Steve, what we'll do, we'll click too weak to swell, did support the vector F1. And solving for clear director. I'll create a vector F1. What I'll do unless some values supports 127895 something. Okay, so I'll create one vector F1, and I will copy and paste. And I'll create another vector, F2. And I'll give some to 83020. Okay? Okay. So now we have two vectors, F1 and F2. I wanted to perform addition. Okay? So what I'll do, I'll create another vector, another object a. And what I'll do, I'll assign F1, sorry, F1 plus F2. So what I'm doing here, I'm adding these two vectors, F1 and F2 and assigning it to object a. Okay, so now what I'll do, I'll print a. So let me run this. See you know, what I am getting. I am getting the result 2424, weight 12 plus 212 flushed element of these two vectors will be added, 12 plus 214, then 78 plus 819 plus 3125 plus 2530. So this way we can perform addition. Suppose I want to perform. So what I'll do is just copy this. I will return at the day. Yes. And I'll print the S and L here and what I do, F1, F2. So F1, F2, if I do or tell good SCM getting 107060 and minus 30, widen 12 minus 21078 minus 87090 minus 3065, minus 2520. So this way we can perform subtraction in the same way. If yours multiplication, I'll create over M And what I Lou F1 into F2 and I'll print L. And if I run this or delegate will get dwell when to do 2478 into 8 six twenty four, ninety into 302705 into 25.51. Similarly, we can perform di region. So suppose I create a vector t and what I'll do F1, F2. And I'll print deal. And if I run this or delegate, CEO, 12 by two, well-read, where do 678 divided by 9.79.8027 graded weight 25 is 0.2. Ok? So this way we can, but for a decent subtraction, multiplication and division. So let me write here, this is multi application. And then this is the region. So this way we can perform addition, subtraction, multiplication, and division on the vectors. And this is called vector manipulation.
12. Vector Elements Recycling: Hello and welcome back. So in this lecture, I'm going to tell you something that I have not told you. I'm not taught you in the previous lecture. And that is very unique question that you might have asked If it is a physical or offline class. And you must have coming to, this question, might have coming to your minus1. So suppose I have two vectors, F1 and F2. And F1 is having four elements and F2 is, I've been wanting to elements. And what if I will add these two vectors? Suppose I'll perform addition on these two F1 plus F2. So your four elements and two elements only. So how it will do. So in our, if we tried to perform F1 plus F2, if we try to add two unequal vectors, then the second vector, which is of length, it means the sorter vector will be, elements will be recycled to make it the length of the first vector. So in this case, what will happen? Vector will become like this. Internally where F2 will be like two comma, comma two, comma eight. So f two will become something like this, okay? It two, it will be repeated till it will make the number of elements in the F1. So there are four elements, so it will repeat to it, to it. So this will become something like this, and then it will perform the automatic operation. Let me run this. Hence, when you're done with that, you'll get a better idea. Let me clear this and run this. See here now the result, age 148690 to tackling how to plus four to plus two, sorry, 12 plus 214. And then 78 plus eight, 86, and then 90 plus two. Because the two, it will be repeated. So 92925 plus 813. So this will become like this. If I add one more element here, suppose 80. And if I run this, what will happen? Let us see, see here we're getting an error here. Longer object length is not a mildly off soccer object length. It means now see yet to this element is F1. F2 is having two elements. Two elements. And F1 is having F18 every five elements, right? And really getting it at that. The second, the first element to this is having two elements and this is five. So five is not a multiplied by love. To write. Naturally we are getting. So because two, if you divide five by two, you will get a remainder one. In this case, this reputation will not walk. But if I add one more element, suppose seven. And if I try to run this, this will walk the style c. And now we are getting the result because now the number up element is six years. And second one is 22266 is a multiple of two. So the first L W longer elements, longer vectors would have the as many numbers. Number of elements should be multiple updates sorta inversely to six. So six is a multiple of two. So this sum, this multiplication of the Vigenere addition you can perform, but if it is five, then two will not be the five will not be the multiple of two, and it will throw an error like this. Ok, so I hope you got it. Similarly, if you want to perform the multiplication that also you can do the multiplication division everything will be done so that elements recycling will happen only if the softer the elements in the longer vector, either multiple up the elements in the sorta rector. So longer vectors, number of elements are widow. Multiple, softer elements, okay, no more abstract elements. Ok. I hope it is clear for you.
13. Sorting Vector Elements: Hello and welcome back. So in this lecture, we are going to learn about who vector sorting. So let's see how we can solve vectors in. Okay, so what we're going to learn, we had one neutron sorting, right? Okay, so for this we have function calls sort. So we are going to use the sort function. So first thing, first clip when could lead and rector wreck. And for these salad, I'll give some number 3498 to the two Y 43, some random numbers. Okay? And suppose I want booths or this vector elements. So what I can do, I can simply create another object sort prereq, what I'll call the sort function. And I'll box this. What is that vector to the sort function and what this sort function will do. It will start this vector and then Waterloo, I'll just simply print sorted. So yet I'm printing the sorted recto, and this is the original recto. And I'm passing this vector to the sort function, and then I'm printing the start date vector value. Let's run this and see that it's definitely clear this terminal and run the decane. See, ya know, we're getting us are correct. So 34, nine, twenty seven, thirty four, thirty six, forty eight. So it is sorted in increasing order. So by default it will sort in increasing order. So the lowest value elements we look washed and then it will keep on increasing. Ok? So this way we can sort and vector. What if I want to sort this? But I've won in a decreasing order. So for debt, or they can do, I can use the same sort function. And here what I will do, I will just keep decreasing, decreasing. It's a logical thing, and here I'll give two. So decreasing, I'll keep drawback because by default it is false. And it will do that. Increasing order rate, ascending order. By default, it will sort in ascending order. So what I'll do, I'll do the cleansing contour. True means it will, sorry, the vector into decreasing order. So let's run this and see, see you know, this sorting in decreasing order 98 first, then 4636342794. So this way we can sort in decreasing order. Next thing is we are going to see starting off character, record character. So that also we can do suppose I have a vector which is having some values like ROM for some random things and read country. Okay? Anything in a vector form. Ok, so this is the o vector. It is having values in character. And suppose I want to sort this and I wanted to store the sorted vector, random vector function sort. And I will pass this vector one, vector two, the sort function. And if I run these two lines, what, what do you like it? Let me bring to this. Print two. So if I run this line, what would let you delegate? Blue, then concrete, then form, then drown, dendrite. Why it is so because b and then c, then f, n, r. Okay, so it will be in the alphabetical order. And what I want to sort in. But he also ordered, I can give you decreasing or do Drew and no see that deserved it when we started our first R, then f, then C and then B. So the Mozart poverty, God, or it will be. So this way we can sort the vector spot character and the number, OK.
14. Decision Making in R: Hello and welcome back. So in this lecture we are going to learn about decision-making. And so in odd, like other programming languages, if we have the if statement, we have if L and we have switch statement. With that, we can perform decision-making. If statement means if something is true, then the following statement will be executed. And if else, if something is true, and if, if we are giving a condition, if that condition is not fulfilled, then the if statement will be executed. Switch will see how the variable will be tested for a quality of established of Fellows. So we'll see if else. And then let's see the suite. So let me create artist script file here. And I'll give this a name. Decision-making dot. Save it. Let me clear that console so that we can see that it's, so first what we'll see, we'll see it. Sorry. First will see step. Okay. So first we will see if statement. And if it's, if it's stormwater, it consists, consists the Boolean expression and falling statement. So if the Boolean expression is true, then the following statement will be executed. So suppose Waterloo here, I'll create an object a, and I'll assign value 67. You can write anything, okay? And what I'll do, I'll write if statement, if ie less than 70, then what I'll do, I'll try to print a is less than 17, okay? Suppose this is the thing I want to perform. So if I run this, what will get? See that result I am getting is less than 70. Suppose if I change this value, 75, sorry, 75. And if I tried to run this, what will happen? I will not get any output because E is 75. And if I go inside this if statement, it will be sowing. A less than 70 is greater than 70, so it will not enter into this statement. It will not fulfill dot if condition and this statement are largely framed it, right? So this is called if statement, right? Similarly, I can write like a dot. And so the Alpert is integer, and then I'll pass a and a light is integer. So let me run this c here. Because is a numeric here. And what I'm testing yet, if is an integer, then I need to print a is an integer, right? If I make this an integer, k1 equal to 75 L, and if I run this c, Now a is an integer, it will be printing. So this way, if this condition is fulfilled, true, it will print this subsequent statement. Okay? Similarly, we can put an else statement here. If each integer then it will print this. I can print thing tight like a nomadic. So if I run this, if I change this to numeric, and if I run this again, what we'll get will get 0s nomadic. Let me try. If I pick a and if I run this, this is an empty jar. I need to check the statement here. Because this condition as getting true, so it is printing is. Nomadic rate is numeric. A's nomadic. And this statement is not, it is not going to the else part. So this way we can put if now we'll see switch statement, okay. See switch statement. So let's switch statement will do suppose I have a vector D And I'll use switch. And then what I'll do, I'll huge support for gama one. And I'll give 23. Okay? And then I'll close the switch. Di, what value we will get, let's see. And you get four because I'm passing for. So what switch statement will do? It will go to the fourth value, and it will bend that 1201234, so it will print four. Suppose I am writing here something else. Hello. So this hello will be printed. Okay? Let me show you this printing hello, right? And suppose that if we're doing one or two, it will be printing two. So it will go to the corresponding please. Ok. So far two, it will go to the one-to-two. It'll print this. For three, it will go to the handle for 45, it will print five. Let me see if I, so what did we do? A switch statement allows the valuable to be distinct for equality against the list of L. So these are the list of values. And for them, indexing is 12345. Whatever index you will give you, that index value will be printed. Suppose I'll give three. So the index value three will be printed. Three means 1233 will represent it. If I give four, then hello will be printed c. So this way we can use a switch statement in hot.
15. Loop Control using repeat and while loop: Hello and welcome back. So in this lecture we are going to learn about loop statement. In what is a loop? A loop is a function. You can say a loop. With a loop, we can execute a statement or group of statements mightily by times. So when we put loop, when we put a statement inside the loop, will be executed multiple times until that condition is satisfied, right? Until unless it is satisfying the condition will keep on looping the same statement again and again. So a loop statement allows us to statement a group, a group of statements multiple times. And there are three types of three types up loop statement in. The first one is the second one needs while loop. And then we have the very popular Guadalupe. So what I'll do, I'll start with a repeat loop crossed. So let us see what is repeat loop in R. So suppose we have a recto, which is having the values supports us. Okay? Like this. So it is, I'll save it later. So we have a vector which is having values India US and retain. And suppose I am taking another red able. Suppose I'll give it a name count. And I'm keeping it like support for gold equal to four. And then what I do liquid feed function here. And I want to print the vector particular number uptime, okay? So what I'll do, if count is less than ten, then of what I wanted to do, I want to print. Ok. So if I run this, what will happen? It will keep on printing this because count is for, and what can we say? If count is less than ten, so this will always be less than ten, right? So what I'll do, I'll use here count, count plus one. So up each address, an island, count value one. Okay, now let me stop this. And let me run this code again. Now see what happen. It does print 1-2-3, 4-5-6 times y is extended. First time it will come to four. So it will print. Then if your login, so current will become five. Then again it would have been four. Then 54678 up to nine, then asked to manage the conflict becomes ten. It's been luck for it. So it will print 45. It will print for 456789. So and astronauts confident become ten. It will come out of this loop. So this is the huge up. Okay? So this way we can use the repeat loop, okay? In the same way, we try to use the while loop. So while counter is less than, suppose, while count is less than eight, what I'll do print C counties for and what I am giving you count less than eight, then it will print vec and it will increase the counter by one. So if we run this, we get India-U.S. poor damn state because it will print four for preparing for five, equilibrium for six, it will print 47 and S1 current will become eight. It will come out of this while loop. Okay? So this way we can use the while loop.
16. For loop and next statement: In this lecture, we are going to learn about fought in, up. So suppose I'm creating a vector d, vector d, which will be containing something like two to 20. Okay? Suppose, if I bring t will give me the numbers from o to 20. Okay? So now what I'll do, I'll write one loop here. And I'll use a variable i. So for each value of i in D, for each value in this vector d, What I wanted to do, I want to write. And then I want to do i plus one. I equal to i plus one. Let moon, I will say with Latin. So let me run this and see what we are getting. Ceo, what I am getting, I am getting 234 up to 20. So what it is doing, it is checking whether this Indie, for IE, it will be 118 not there, then it will increase by 12. So it will come and check here too is the adding d. It will print two, then it will again, i equal to three. Then it will print three. Like that. It will print till 20, and then it will come out up the low. So this way we can use the for loop, right? In the same way what I can do if I put here a simple thing. Ld move this. And yeah, what else do? I can put it condescending sit here. I equal to so porch 15. Then what I lu l print, I write. If I were to obtain, I'll print iii Next, and then I'll print. Print. So see you, what will happen when we've done this. Again? See what is happening to each value in IE. It is printing this mandate is checking the value. If I put in it. What would the next? So you see you 2345678 up to 1314. It is printing correctly and then it is Cinco KI equal to 15. Next, next means it will skip the this i iteration. So 15 will not be printed here because next means it will skip this ICT racer. So the footprint with this i plus one will be escaped and 15 will not be printed. And then it will go to the 16, and then 161718192020 print f. So next means it will skip though that I trace on. So if I use here, suppose 17 lend, thus 17th addresses will be skipped. Cia tripling 16-17 will not be printed and 18, okay, so if you want to skip one nitrogen you can use next. I hope you got the idea of how to use next. See you inside the next lecture.
17. Functions in R: Hello and welcome back. So in this lecture we are going to learn functions in R. So in our data, types are functions of the one is built-in functions, Milliken functions, and the other one is user-defined voxels defined. Okay? So there are two types of fossils. We can for instance, and usually find fossils. So first, let me tell you what these are. Function, function is actually like if you want to execute our thumb group of statements and perform some calculations that we want to. I'll do something. So it's basically like a function is a keyword. And in function we basically perform group of statements, Okay? Our group of functions of group up calculations we do, we perform some calculation or we do something. So that is our function. Okay? So let me tell you the basic structure of a function. So suppose our function takes, suppose that this is a function or in R. And what it takes, it takes arguments rate. So it takes arguments, so it can take any number of augmented argument, one, comma two. So this will be a function, okay? And in this function we can do anything with this argument's rate. Suppose these arguments can have some value. So I can support, I can use like some off these two arguments. So I can perform argument one, argument two. And I can like print some of these 12 monthly. So this is one function where what I am doing, I am for forming some of these two argument, argument one, argument two. And I'm finding some of these to augment one plus sigma two. So this is one function and I can call this function by passing the argument. So suppose I will give some name to this function, some function, and assign this, this whole function. And I'll give written Nim sum function. Okay? Now, what can I do? I can call this function by passing the two arguments, argument one, argument two. So I can do some fun. And here I can perform link three comma six. I can, if I run this liquid under forced and lambda, some fun. This was, and then I'll run this, see, okay, here it is. So let me run it again. And if I call the function by passing, ok, this is also wrong. Sorry for this mistake. And if I call this function by passing these two arguments 36, I will get the result as nine because three plus 69. So this is one user-defined function, right? And we have already seen built-in function. So what are the built-in function we have seen, we have seen sequences like sequence one comma eight. And this will give us the sequence of numbers from one to eight. So this sequence function is a built-in function that we have in R. So this is built-in constant renamed it. We need not to write a program to print the numbers from one to eight. This will do automatically because in the sequence function has been treated behind like in the order far it is. It has been defined that when we use sequence and will provide two arguments, one is to eight, it will then repeat the numbers from one to eight. So that is already been defined and that's why they are known as knowledge user, sorry, weighting functions, right? So these are the influencers in the similar way we have seen some, some are one is to eight. And this will give us the sum of numbers from one to eight. It will give us the results statistics in the same way. We have the mean function. So I'm typing here mean. Like suppose I will get to number 23 comma 9087. And if I hit Enter, I see 23 can also mean two comma nine and hit Enter, I'll get 5.5. So we can use doc mean some sequence all these at the built-in functions. So we have seen these ads up, user-defined function, user-defined and this sequence and some water built-in functions. Okay? Now what I do, we can see how we can call this some function. Okay? So let me copy this sum here. So we can call this function by providing the arguments here. And we can also call this as x one comma 29. See, I'm getting tacky to this way also. We can, this is by poor diesel and this is very the argument name. Okay? Argument, argument one equal to, for an argument to equal to nine, like this we can provide. So this is another way of calling the function. Ok. Now what I'm going to do, I am going to write our function to find the square root of cities up number of series of numbers. So what I'm going to do here, I will Kalita. Square function. And I'll pass to this boson number, a sequence of numbers. So I'll use the for loop field. And what I'll do for i in e two. Okay? So whatever number this will be from that number to ten. I want two numbers, sequence of numbers. And then what I want to do, I want to assign I, I want to square the eigenvalue, okay? And assign it to b. And then I want to bring to the B value, okay? So this is the function squared function that I'm defining here. And now what I will do, I will call this square function by passing a number supports for. So what it will do, it will first create numbers from four to ten, and then it will square the 45678 up to ten, and it will print the number sockets. So let me run this. See here we are getting 16. Flash number will refer for this quest 16. Then it will print 16. Then the next number will be five. It will print printing 55 squared, 25, then 636, seven forty nine seventy square forty nine, sixty four, eighty one and kinda squared a 100. So this way we can create a simple function that will find this quite off series of numbers, or citizen number is for propane. And we are getting here four to ten square. Each number will be squared, n will be getting the discharge. Okay, next thing is what I am going to do. I'm going to find the sum of even numbers. So I'm going to write a program to find some off even numbers between two numbers. Ok? So what I will do, suppose one protein. First, I want to find 12 even numbers between one to ten. Some of even numbers, okay? Not describe some of the even numbers, okay? So Waterloo and write a function sum of even and odd function. And what I do, I will write here first transgender to the even number. And how to generate even number. And you see an algebraic one for even numbers. So what I look, I'll multiply each number by two. So it will be sum up the numbers between one to ten. And then we're going to do print and print the water. Lewell. Do the sum of these even numbers. So and faster even number, the sequence of numbers too though, some, some unsung handle it. Some before that, let me bring to the even number. Okay? And then I'll come out and I'll call the even function. Okay? So let me run this. Okay, this is the lowercase c. Ceo Foster, I'm getting 246810. Some of even numbers from one to ten studies, some of the even numbered from one to 20, okay? Because I am multiplying by two. If I make it five, it will be the earlier what we have written. It will be that, okay, 246810, ok. So this way we can sum of even numbers, some numbers. And then when you get done with this function, sum of even number three, it will sum everything and it will give us the result. Okay? So when you sum these numbers, you'll get a 110. So this way we are getting dust some of even numbers between two to 2001 to 20. Ok. So this way we can do it. Next thing is I'm going to do calling function without document. This is very simple, calling function without augments. So what I'll do here, I will simply create a function hello and an h function keyword here. And inside the function keyword Waterloo. See even here also we have called those some of even, some of even function without passing any argument. Okay, the same thing we are going to do here as well. What I'll do here, I will simply use a print and I'll say hello. Okay? And then come out and say hello. And thus, I guess call it, Okay. So before data to run this. So let me run this and see, yeah, we'll get an Hello. How are you? So this way we can call a function without argument.
18. Matrices in R: Hello and welcome back. So in this lecture we are going to learn about metrics in R. So matrix in R is very important concept and we need to understand it. Because when we go further beyond this class, when you go and try to implement the arc concept to analyze the data. When you try to use the R4. Data manipulation data visualization are even though data mining, data analysis and in data science projects, in machine learning. So mattresses or going to unfold and we are going to use it in many places. So in our matrix is an object or object. In our everything is an object. And in that way, mattresses or also our objects, right? So mattresses are the art objects in which the elements of the same atomic types arranged. So in matrix, matrix we are going to orient the same type of elements, same atomic types of elements, okay, in poor diamonds melts rectangular layout, so it will be two dimensional rectangular matrices. Matrix is a function. So how we are going to create the matrix? We are going to use the matrix function that is inbuilt in art, create metrics in art. And the syntax would be matrix. And then we provide the data. Data means the matrix elements we are going to provide you. And then we are going to provide number of roads. And rho is number of rows in the matrix and then n number of columns in the matrix. And then we are going to see vital. We want to add in the elements by rho or by column that we are going to define by this argument. And then we are going to give the demonstrates named him name means we are going to keep the name of the name instance. Okay? So data is input vector, which becomes the data elements of the matrix in general is the number of rows to be created and call it number up columns to be created by Roy's logical clue if true, input vector elements are arranged by rho. So if you want to create a matrix by assigning the elements by row, we have to give by rho equal to true. Otherwise it will be, by default, it will be arranged in a column way, okay? Columnar values, like if we are given element one to ten, it will be 123 column wise. It will be, if you give true, then it will go by by rho hockey. And deme name is name assigned to the rows and the columns. Ok, so this is the fundamental we should be knowing. And now what I'm going to do, I'm going to create matrix. Ok, so for that, I have already created our file where I have written a few programs or few functions to create metrics so that we don't waste our time in writing the code and again and again. And while writing it takes lot of time. So I have already created this mattresses dot files and I'll Explain what I'm going to do in each and every steps. Okay? So first thing, we are going to create a matrix where elements are arranged sequentially by column by default. Ok, so suppose I am creating a matrix and I'm giving it a name M1. So yum one will be an object in R which is going to contain this matrix. So I'm going to use matrix function. And here what I'm passing, this is the data, data I'm passing 12 to 35 means it will create numbers from 12 to 35121314 up to 35. So this will be done data and then what I am giving the next one thing and row number of rows six. So I'm just passing the data and given the row number of rows. And I want to create a matrix which is having number of Rosa six. That's it. I'm not giving any column value or anything. I'm just passing the data to target five numbers and number up road S6. I'm not giving by row, by column. Okay? That argument, I'm not passing joke. So I'm just creating a matrix which is having, which will be created with the six rows and it will refill it with a data plan to 35. So let me create it. If I run this, see ya, M1 is one is to six, so one is to 412 to 36 data will be provided. And if I bring this m1 matrix C, Now we have six rows, 123456 rows, portal six roots. And in this six rows, our data, 121314, CEO foster, column one will refer see column 1121314151617. Then it will go to the column two and then 24 to 29, and then started to stratify. So this way, so data is being filled by column wise, right? First, column one and column two, and column three. Okay? Because I have not given anything, yeah, I've noticed specified by rho equal to something here. Okay? Next thing is, if I use by rho equal to false, what it will do, it will do the same thing. If I run this as well. Again, I'll get the same output, same matrix will be printed again. But if I provide here the same core by rho equal to true. Now though, matrix will be created by filling the Roche fast, so fast. 12131415, first row will be filled, then 161718, second rule, then 20212224, fourth row like that. Okay, so first the rules will refer and then the second row, then third row like this. And here it was first row, then second row, first column, second column, third column, like that it was filling here, first row, second row, third row, row wise, or column wise here, row wise. So if we want to filter data by row, you have to give by rho equal to true. And the next thing is giving the column and row names. Suppose I want to give the column name here, and I want to give that role_name also. So how I can do that, I can use c function and I can give names to the columns and rows. So suppose I want to then a six rows. I want to give each row named row one, row two row to row six. So I'll use the c function and I'll create an object of row names, and I'll create another object, column names. When I'll give column one, column two, column three, column four, whatever name you want to give, you can give you. Then while creating the, while creating the matrix, what I'll do, I'll first boss dot data matrix inside the matrix functional fastforward the data. And then I'll give number of rows six. Then I'll give by rho drew this you can give true or false whatever you want. Okay? And then the next argument will be deemed them diamond son's name. And here what I'll do, I'll create a list and inside the lift, what I'll do, I'll pass the row names and row columns as an argument. So this will clear the diamonds and sparked the matrix. Rows and column name will be fetched from here and pass on to them list. And it will give you the row and column name for the matrix. So let me run these two lines. Okay, sorry, I need taught on this whole thing's CEO. Now we're getting a matrix with the column name, column one, column two, column three, column pore, and row names as row one, row two, row three, row four, row five, row six. So this way, first we need to create column list of column names and column names and columns name. And we need to provide those to dim names argument okay, through a list, okay? And then it will be created, the row and column names will be given in that matrix. Next thing is accessing the elements from the matrix. So suppose I want to access the element from the M5 matrix, first column. And Todd Rose, How can I do that? The third row, three comma one, m phi, and then the capital bracket and three comma one. It means, I want to fetch though thought column, sorry, row and first column. The first argument is for the row and the second one is what? Quantum third row first element will be printed. Todd role means this one and the column one with 20. So if I run this, I'll get, I still get 20 years 30 char c, we are getting 20. So m5 is the matrix, and I want to find the top column, top row, first column, third row, first column. Similarly, I want to first off value of the fifth row, second column flip through. This is the fifth row and this is the second column TO 29 should be printed. He'll say, Yeah, we're getting 29. Similarly, suppose I want to face dot row six. Okay? So I can give m5 and rule six and column value will live it. So if you give like this, what will get? You will get the row six value C0. Thirty two, thirty three, thirty four, thirty five. So this is the row six. You'll get that ok. Similarly, if I want to access the second column, I can give, I can leave though row ALU, and I can give one leader column value. And if I run this, I'll get dot column to see 1317. This is column 21317. Twenty one, twenty five, twenty nine thirty three, twenty nine thirty. So this way we can fetch up column value. Next thing is matrics, Edison and subtracts on weekend. Do multiplication, addition, subtraction, and division with the matrix a. Suppose I have m1 matrix, let me print this. See this is the m1 and I have another matrix that is M2. So both are same metrics, but what are the different, different metrics, but the values are same. Okay? So suppose I want to add m1 and m2. I can simply do m1 plus m2. And I'll assign this value to some object. So if I run this, and if I print the sum value of CEO, allocate 2436 like this top 2012 plus 122413 part starting 2614 plus 1428 like that. Ok, similarly we can do m1 minus m2 and we'll get 000 because both the metrics are same, right? M1 into m2, we can do multiplication, sorry for this. C, 12 into 1212413 into exacting 116 like that. Okay? And in the same way we can perform the matrix multiplication where the, each element will be divided by each element, okay? See, yeah, well by 121, like that. Okay? So this way we can use matrix and we can perform it isn't subtraction and we can access the metrics, elements.
19. Factors in R: Hello and welcome. In this lecture we are going to learn about does in R. So what is factor into? Factors? Are the data object enough like we have done so far? And that everything is an object. And so factors are also data objects in R which are used to categorize the data. So basically factors are used to categorize the data and then store those categorized data into those levels. So first it will categorize the data, and then it will create labels for those, categorize data. And then it will store that labels. It will store that data into those levels, right? And Factors are used to represent the categorical data. And they can store both strings and integers and treat it as an integer vector having a level. Suppose integer vector having a level, something like that. Okay? So basically suppose we have, suppose a male or female. So we can store that. It is better to store that as a genuine one, right? So what it will do it, suppose we have a data of air mail and feed some data of population where we have our data for male and female both. So what factors will do it will compute a category of beta with male and female, and it will store that data in the category of male and female. And then easily you can find how many males are there and how many females are there, something like that. Okay, so let me open the file that I have created for factor. So I have already created a file factor in R dot R. And here, I'll, sorry, how to create a factor in. The first step to create a factored in R is creating a vector. So here what I'm doing, I'm creating a vector profession, profession. And here inside the profession vector, there will be Dr. engineer, carpenter, Dr. mechanic by lead, Dr. carpenter, engineer like that. Okay, so there are few professors I have created here. So this is the professor vector where I have kept the profession of the person's. Okay. So if I run these two lines, what it will do, it will deviate and vector, right? And it will have doctor, engineer, carpenter doctors. So see a doctor is repeated here, right? Doctor. So many doctors out there, right? So suppose this is a vector that I have created. If I want to check whether this vector is. Factors or not, I can use a function here called each factor. So anything if we want to check if a factor or not, you can use each factor Frank's on and boss that vector R object to dot each factor function. And it will give you that it is a factor or not. Okay, so let's run this. See here it is swing as false. It means this vector is not an factor. This is a vector not a factor. How to convert this from k, some vector into factor dashed. What we are going to do next step. Next step is apply the factor function. So support now I want to convert this Professor Rudolph factor. So I'm creating another, creating another object that is fractal underscore professing. Okay, so now I'll use the factor function and I'll pass this, prophase some vector to the factor function so that this professor vector will be converted into the factor. Okay? Now, if I run this, and if I run this, what allocate CEO. Now, this step will create though factor from this professor vector. So now this factor, professor nasal factor, so I can tick here. It's factor c here it is, showing us through. And when I'm printing this factor professional, which is a factor, what I am getting, I am getting see I'm getting the same result. Doctype, carpenter doctor like that and the same data. But here I'm getting another output that is labels. And labels are carpenter, teacher, doctors. So all those things will be the labels, right? Yet the levels are carpenter, Dr. Driver, ingenious, mechanic, pilot, and teacher. If I use the table function and pass this factor to the table function, what it will give. It will give us the same thing, levels, okay? And if I use somebody, what we like it, see, if you use somebody, you will be getting how many carpenters are there in this data? So two carpenter, three doctors, one driver to engineer, two mechanic, to pilot, and to teach us. Ok. So this way we can create and vector into the factor. Factor will see yet it has created a factor, carpenter dr. And it is giving us the data like there are two doctors, there are two carpenter, three doctors, one driver like that. So it will categorize the data. You know, the factor, what factor has done. It has categorized this data, this data into the category of Professor Carpenter, dr. And it is giving us the numbers like these many carpenters are there, these many doctors and engineers are there in our data. So the fact of will be useful while analyzing the data which are categorical data, right? So we'll see how we can do Categorical Analysis Using factors in up. So for now, it is enough to understand how to create a factoring are from the vector. Okay? So we can apply factor function to the vector and that vector will be converted into the factor. And the condition is this professor vectors to be having some categorical data. Okay? And then we can check each factor function. With the, each factor function, we can check whether a vector is a factor or not, okay? And we can use table to see the levels of the factor and we can use the summary function to check what are the numbers, are the lake harmony, our doctor, how many engineers are there? So somebody up the factor will be seen by using somebody function, okay?
20. Data Frames in R: Hello and welcome back. So in this lecture we are going to learn about data frames in R programming and how we can use DataFrames. That what the topic of this lecture. So let me first tell you what is data frame. Dataframe is a table order two-dimensional, array-like structure in R, in which each column contains values of one variable and each row contains one set of values from each column. You understand what I said. Dataframe is a table-like structure or two-dimensional array. You can see in which each column contains values of one variable. Each column will be containing values prong one variable and each row contains the set of values from each column. Okay? And it is a special case of list in which each component of each component we'll have the equal length, okay? And each component from the column and content of the components on the road. Understood. Each component from the column. Each component will form the column and the content of the component will form the roads. Okay? You will understand when we do the hands-on, you'll understand it better. So our DataFrame in R will have following features. The first thing is row names must be unique, so role_name should always be unique. So row names would be unique. You cannot have the same row names in a DataFrame, okay? So each row name must be unique and then column name not be empty. So column name should not be empty, it should be a non-empty. So all the column names would have entries, okay? And data stored in a DataFrame can be often numeric factor or a character types. So any of these type data you can store in DataFrame, numeric, factor, or character type. And each column contains the same number of data items. So each column, so it contains the same number of data items. Ok? So DataFrame is a table other two-dimensional, array-like structured in art in which each column contains the value of one variable and each row contains the set of values from each column. Okay? So let's get started with the practical. Okay, so what I have done, I have already written program. So and the file limit DataFrame dot are what I'm doing here. I'm creating a data frame. So DataFrame can we created as follows? So what, how we create a DataFrame? We use data.frame, data.frame function we'll use to create. Data frame. And this student is an object too, which I'll assign this DataFrame. Okay? So data.frame. And then what I'm doing here, fostering, I'm creating silly serial number. Okay? So serial number, I'm creating from one to five, okay? And then it, I'm giving, I'm using c function to create the is. Ok. So I'm creating a vector Yasi 2115103545. So it will have these entries. Then I'm creating Name. Name will have the money entries. Okay. So five entries show good honest risk, John and Tom. And then I'm closing this ok, so this way we can create a DataFrame. So let me run this line first. So create a DataFrame. So c here. And then let me print this. Student CEO. Now, the student visa DataFrame, right? And what it is containing, it is containing a rule, a columns as serial number, age and name because we have given serial number, then it, and then name. And each column serial number is, and name will contain the values what? Cdl and Marie will contain the value one to five, so 12345 serial numbers, and then it is 21 footprint in 3540 and name will have these values, okay? So each column will have the values from name, okay? Name, variable rate, test what we learned in the theory part. So this way we can create our dataframe student, which will have the column serial number is a name, and it will have the row values from city lumber from one-to-five, ages 215 is we have given a name we have specified in the name valuable. So these variable values will become the entry into this TE, one, right? So it's so table-like structure. Okay? So this is what known as data frame in R. And we can see the structure up our DataFrame by using STR function. We can use SDR and then we can pass the DataFrame. It will give us the structure of the data frame. Ceo, the DataFrame had structure and Lucy DataFrame. And it is having five Raj from three variable sr 12345. So five objects are five rows and three variables. Three variables are, serial number, is, and name. These are the three variables. Okay? The column names are known as valuable. So three variables, okay? And five rows, five objects up three variables. And these variables are serial number, is and name. So serial number is integer, is, is numeric and name is of character type. Okay? And CEO, each. Variable on each column will have same number of items like serial number is one-to-five. Age is also having five entries and name is also having five entries, right? So also would have the same number of entries, right? Okay. Suppose, if I give six year, what will happen? Let me run this. See a lot in DataFrame. Because the serial number is having one to six and it is having only 53. And name is also having five entries, right? So what error we are getting entering dataframe and it is suing arguments imply differing number of rows, 665. So one variable is having six rows, six items, and all others are arriving five only. So that's why it is showing us the era. So you should be 55 foreach, ok, so here you can put phi and then when we run it, we will not be getting any error. And it will be running successfully. Okay? And with the STL, we can get the structure of the DataFrame. Next thing what we are going to learn, DataFrame components can be accessed like a list or like a matrix. So fast we'll see accessing list. So we can use, if you want to access like list, we can use either of these three. We can use our dollar operator or we can use the double bracket, or we can use the single bracket, okay? To access the data from the dataframe. Ok, so suppose I want to access the name from the dataframe student, one link name. So if I can do that with this student, and then inside the column, I can pass the variable name or column name. Okay? And if I run this, I'll get all the names from the dataframe. Similarly, I can do the same thing with this dollar symbol rate, dollar operator Student, That is a dataframe name, then dot and then the column name out of variable length. So here, if I run this, I'll get the same Digital SR is student dollar name will give you all the students name. Ok. And similarly, we can do this as well, is to rent. And in the bracket, again, inside the bracket we can pass the name. These three things will give you the same result. Okay? Now, we can do like here we have given name. Name is that third column right inside this DataFrame. So we can pass the column number as well as student three. So it will give you the third column C, the name, namely the third column in this DataFrame. See, if I give two, it will give us the h c here, it will give us the is. So this way we can pass the column number and fetch the data from the data frame. Now next thing is modifying the data frame elements. We can modify that data frame also how C will give that student, that is a dataframe names. And then here will give the 11 means the first row, and then I'll give h. And I want to modify the first row is 291. Okay? So let me run this and let me print the CEO. Now the first row is, is modified to 91. Earlier it was 21, now it is 91. So this way we can modify the frame DataFrame elements. We can add rows to the Roche component to the DataFrame. Suppose I want to add another row. Suppose here five rows are there, right? I want to add one more rows to the DataFrame Hollywood. I can do, I can use the rbind function. Rbind function is used to add a row into the DataFrame. So inside the turbine, I need to pass the data frame name and then up to use a list. And inside the list I have to pass the serial number six, then is 120 and then the names would be they're so Rava T, so nothing so empty. Everything we have to give. There are three columns, so we have to get three columns here. See the lumber is And though name. And if I run this, one more rules, rho will be added CO six hundred twenty two fifty has been added to the student DataFrame. Similarly, we can add column component to the column component to that DataFrame as well. And how we can do that? We can do that with the C1 and C1 myths, Columbine, RBI myths row bind. Okay, so cbind function we use to add a column into the DataFrame. So inside the same thing, we need to pass the data frame name and then we have to suppose a column I want to add as a country. So I have to give the column name country. And then after US does C function and have to pass the country number up countries. So here I need to pass the six countries and invade 12345, okay? And if I run this c here, because the six value that we have not committed, so it is not coming here. Or in a DataFrame is having only five rows two, it is adding the phi one. Ok? So this way we can add another column like contrary to our data frame, okay. Next thing is, we can assign the country in a list like mine up Lake. List like assignment weekend US student, dollar country. It means in the strength DataFrame, we are going to add another column. That is, that will real name, country. And we are going to add the entries like these countries, okay? Six IN india and New Zealand, US, Japan, and China. Ok. And if I run these two statement, I'll get the country has been added here and the country names are here. Okay? So this way we can add a column in a list like assignment. Okay? Next thing is we can delete a component from DataFrame. So we can delete the entire column by using this thing. Okay, student that a data frame name, dollar, and here column name, name, and I can assign null. So if I assign null, the entire column will be deleted. So let me run this and see. Now serial number is and countries, their name column has been deleted because we have made the name column as null. Okay, so this way we can delete the entire column. In the same way we can delete entire role. So to delete and title what we can do, we can use student and minus two, and then we can delete the intangible. So let me run this. See here, the row two has been deleted. Co two is 2-15 newline that has been deleted. So if you want to delete a particular row, you can provide 3m minus two means the second row will be deleted from the student data frame. So this way we can delete an entire row from our data frame. So this is how we can create a DataFrame using data.frame. And we can provide the number of columns or number of variables. And then we have to provide the number of values for those variables. It should be equal number of items here, five, so all is a names would be 55, then only it will create a DataFrame. Then we can get the structure of a data frame by using STL function. Then we can face the particular column using by providing the column name using this bracket or the dollar symbol. And we can, by column number, we can modify data frame like this, DataFrame elements like this and all those things, cbind R1 and all those things we have seen, right? So I hope you got the better understanding of data frames in R. And I hope I make you understand what is dataframe and how we can walk with DataFrames. See you inside the next lecture.
21. Combining Data Frames: Hello and welcome back. So in this lecture we are going to learn about how we can combine vectors into DataFrames. It means we suppose we have three or four vectors. And I want to create a DataFrame from those vectors. So how we can do that? And then secondly, what we'll try to do, we will try to combine those data frames as well. Okay, so let's get started. So first thing first, to combine the vectors, we need to create the vector. So here what I'm doing, I'm creating four vectors, names, city, zip code, and salary. So these four will be the four vectors that I'm creating or for objects I'm creating. And what I'll do, I'll combine these for to create our DataFrame. So these four vectors will create one DataFrame forming. Ok, so first, victories, names. And in the name vector what I'm giving, in the names vector what I'm giving, I'm using the c function and I am giving the names of the Parsons lactamase Rockies, Henry and monte. And then the second vector is 60 rotor. And inside this I'm giving them this particular, their respective cities named like Bangalore, London, New York, and Mumbai. Okay, and then I'm giving that, give code for these cities in the third vector. And then the fourth vector is salary, in which I'm storing their respective salaries or this will have the highest salary Iraqis salary and reassembly and Monte salary, okay? Monte salary. These four vectors we can create by this that we have learned in the vector chapter, OK, in the lecture of vectors. So let me run this. So this will create these four vectors, okay? Now I want to combine these four vectors and create one DataFrame. I wanted to create one data frame by using these four vectors. So I want to combine these four vectors and carry it one DataFrame. So for that, what I am doing, I am giving a data frame name here imply details. So EMP dot details will be the data frame name. And what I'm going to do, I'm going to use cbind function to combine these vectors. Okay? So because this name CTD port will be dot, will be the column steam. So naturally I'm using c bind here to combine the columns. Okay, so names, city, deep codons, LEDs are the four, this will be the four columns in the DataFrame. So I'm using c Wine and I'm providing the columns name, like names vector, city vector d, current salary. Okay? So let me run this. Okay. Now. Let me print time employed it is. Okay. So let me see that the denser in procedure now that imply detail is dataframe. And we can see the columns Name, City, Gibbs coordinate salary. And the entries are dummies Bangalore. And your current salary lack is Citi, London court this and the salary then Henri. So CF from these four vectors we have created, for, sorry, with these four vectors, we have created one DataFrames, right? Suppose if I put the names and if I run this name, what I'll get, I'll get one leader named straight OK, and similarly, city and the current salary. So I have combined the vectors into a data frame. So now we have our table-like structure here that is a dataframe. So now we have one data frame. Similarly, what will I do? I will, you can use the CAT, CAT function to print something. Ok, so here, unjust in the fall DataFrame from four-vectors and this, and then it will come like this. Ok, so the first DataFrame from for vectors is this implied details, EMP dot details. Ok? So if we want to print a headline, you can use the cat function, okay? And then print the implied it is, I have already printed here. You can print it here as well. And then what I'm doing now, next task is I want to combine two DataFrames into one. So we have one DataFrame, employee details that we have created from the four vectors. Now what I'll do, create another vector. Sorry, I will create another data frame, employee details to EMP dot details too with the Hale-Bopp DataFrame function. And here, manually. Inside the data frame I'll give the name vector, city vector, give goal, salary. Ok. So this way also we can create, in the first step what we did. We have created the name 3D deep core salary vector separately. And then we have bothered to cbind function to create a DataFrame. And what we are doing now, we are directly passing this vectors. We are creating the vectors inside the data frame function. We are not Agency by India. We are using data.frame DataFrame function to create a DataFrame. And inside the data frame function we're passing, we're creating names. Vector, density vector, then the code vector and enrich also we are giving here only ok, salary. And a string adds factor faults. Okay? And then if we run this, we will be creating one DataFrame. That is implied details too. Okay? And if I hit on this two, what we will get, we will get the second DataFrame that each employee details to name strategic code. And there are three entries, Locke's RAM and push pop and their CT deport undressed productive salaries printed here. Now we have the two DataFrames, employee details and imply details to what I wonder do I want to combine the root from both the DataFrames and create another data frame that will be all employee details, that will contain all imply details from imply one and implied two DataFrames. So I'm creating another object, all dot employee dot details. Ok. This will be the DataFrame which will combine the Raj from imply detailed frame prim dataframe and implied it is two DataFrames. So for this, what I am going to use, I'm going to use our bind outline because we want to combine the roads. Here, three rows and here are four rows to these four rows. And these three rules I want to combine and create another data frame. Actually I'm using autobiographical combined dot rows. And then inside of the RBI nel provide the fast data prep and then the second DataFrame. And then we can use the cat function to print the headline so that combined implied it is. And then I can print OK. So let me run this. See here now we have all imply details DataFrame with which will be containing all seven rows. The 4x from the first DataFrame, and the last third is from the second data frame. So the, in this way we can combined two data frames. Okay? So what are the things we have learned? We have learned how to create a DataFrame from vectors. And then we have seen how we can combine the two DataFrames into one DataFrame. Ok, so this way we can combine the data frames and we can create a DataFrame from the vectors, as we'll see you inside the next lecture.
22. Recursion in R: Hello and welcome back. So in this lecture we are going to learn about recurse on, in our programming. So what is recurse on r? What is recursive function? So recursive function is a function which calls itself multiple times. So suppose if you want to perform the same operation again and again, we can use the Dickerson to don't act like you must have learnt like somewhat natural number, Salafi when number, all of those things, right? So these problems like finding some natural numbers where n can vary from one to any number up to 1012 thousand, something like that. So here to solve this kind of problem, we need to use the Dickerson and recursive is very important thing when we do the programming, right? So in R also we can use the Dickerson function. Recursive function can call itself again and again and again to do the operations. And finally it will give us the result. So to understand how recursive walks in our programming, we will, we will do a simple program to find the sum of the natural numbers up to n using recursive. Okay? So we'll try to find, will give a number, any number. And we will try to find the sum of natural numbers from one to that number. Let suppose here I am giving a T5. So here I want to find the sum of natural numbers up to 85 mins, one plus two plus three plus five plus six plus seven plus eight, up to plus 85, right? So to solve this kind of problem, we need to use the recursion. So what I'm doing here, I am simply writing a simple function here and I'm giving it a name, some underscore n. That means sum of under national numbers, right? Sum underscore n and insight that I'm taking input as argument as n. So again will be the number of natural numbers, right? Our natural number up to which we want to calculate the sum, right? And here I'm doing the simple check. What I'm doing here. I'm simply checking if n is less than or equal to one, return n. So why I am checking this condition here? Because if it is n is one, we need to return one. Because one is, one means the natural numbers start with one. And sometimes some people will consider r natural numbers with g do all sorts of, in case of 01, we need to return that particular number itself because if it is 0, we need to return 0. And if it is one, we need to return one only, right? No need to call the going to the recursive function. Ok, so this is the case to take care of the 01 if you at all consider the natural numbers including 0. And if it is starting from one, if you consider, then one. Okay, so basically natural numbers start with one, but sometimes people consider 0 address starting from 0 as well. Okay? And then I'm using this, I'm giving an if condition and then I'm using another else condition. And in what I'm doing, I'm simply returning again plus some Yan plasmas. Suppose a number is to support, I'll give you two. So suppose this number is two to two plus sum up to minus one, sum of one. So what it will do, it will do some two plus some off to summer poo minus1 will sum of one. It will come here and some of one will return one. So this function will return two plus one to plus one will be three. So if we run this function will get the output as three year. Because natural number subgroup who is three, right? Suppose I'll give three here. So what will happen? It will take, it will come into the else part because three is not satisfying this condition, right? Greater than one, right? So it will come into L Sparta and it will return n plus three plus, and then it will call the, again itself. The function will call itself the sum end function within itself. And it will call sum of n, three minus two. Sum up to sum up two will come here. Then n equal to two, it will come here. Then what it will do, two plus, so three plus two plus sum of two minus 11. It will go in here and it will return one. So three plus two plus one. How much? Six. So this sword written six, c here, the output is six years. So understood how it will do. Let me, just, for the first time when it will come, it will be like, it will be like the return three plus, sorry, three plus sum underscore n. Three minus 13 minus one means two. So it will call some up to, okay, then it will call itself. This is called recurse. And then again, in the next step, it will, the next step will be written three, the sum up to how it will come. It will come here to function, will take input x2, and then it will come in the else part, two-plus. So here it will be coming three plus two, plus two minus 12, minus 12 minus one is one. Alright? So next step is three plus two plus one sum of water. It will move, it will go to the, it will call the function itself. And it will go to the sum of one. Sum of one will be returning n, that means one. So this will give us the result one. So it will come three plus two plus one. So this is called the recursive and the final output we are getting as just six. And here also three plus 25 plus 16. So this is called Ricard son, right? Suppose I'll give me a big number, 785. So it will be going inside this function and it will be calling 75 minus 1705, minus 275, minus 1704. Then it will come here, 784 minus 173. So it will keep on calling. This is called recursion. So if we run this, we will get the output like this. Okay, three, lack something. Ok? So this way we can use the recourse on in our programming to find the sum up natural numbers.
23. Finding Factorial of a number using recursion in R: So in this lecture we'll do another hand, sun for recursive. And in this lecture we will try to find the factorial using the Dickerson. So you know what is factorial of a number? You must be knowing site, you must have studied this in your maths classes, right? So high-school maths, you must be knowing what is factored yet. Let me tell you what is factorial. Factorial of four numbers is the product of all the integers from one to that. So if I say factorial of two, so factor two will be the product of 12, product of numbers from one to two. So product of factorial of two will be one into two, right? If I save factorial of three is one into two into three. Similarly, if I say factor rule up seven, and we denote the factor like this. So seven factorial will be one from a product of numbers from one to seven. So it will be one into two into three into four into five into six into seven, which will come around 5,040 as a product of the number. So factorial for number is starting from one till that number will multiply each number numbers and get the product of all the numbers. Ok? So 1287, if I put one factorial eight here, it will be product of numbers from one to eight, right? And these will be something else, right? So let us find what is the factory love eight. So let me run this program and let me put this in to the print. Let me run this c factorial of eighties, 40,320. So it will be 40,320. So this is how we can find the factorial of a number using the recursion. So let me explain this function, which we have written to find the factorial of a number. So here I'm writing a function recursive factorial. And this will take the number as an input. And here I'm checking if n is less than or equal to one and written simply that number R one, okay? Then factors of 0 is one also, ok, so if it is one or 0, we'll return the output as one. And if number is greater than one, will go into the else part. And here what I'm doing, I'm using, I'm using N into, N into, and then I'm calling the function again the same function. This is called a concentrate to weave. The function will call itself inside the function write. The function will call itself inside the function. And that is called recursive. So inside this recursive factorial function, we'll call the factorial function. And it will take n minus one. It means suppose factor of eight, so eight into, then it will go to the function recursive factorial n. It will take the input as eight minus 17. So it will take the input essay when it will call the record factorial. Then again the function will come here. Then again it will come here and it will be eight into seven, into recur, factory lofts six. And like that it will be recurring, right? This is called recurse on, it will keep on calling itself inside the function bill, the number becomes 0, right? So it will start from eight to seven to six. And then it will come like whatever will come one minus 10. And then factorial of n minus one. So factorial of 0 will become one and we'll get them product of eight into seven into six into five into four into three into one. And this is how the factorial rocks, let me change this to five and it will come around one party. See the factorial of five is 120, okay? So this is how factorial function walks in our programming using the recursive. So the constant is suppose I am writing this function record factor here. And inside this function itself I'm calling the function itself. So inside the RCA factory farms and I'm calling that he called factorial. And this is called recurse on when a function calls itself inside is called recursive, okay? When the function calls itself, it calls the core son. Okay? So this is how recurssion walks in our programming. We have seen two examples. One is finding the factorial of a function using the Carson. And in the previous example, we have seen how to find the sum of natural numbers using recursion, where we have used that ecosystem to find those on natural numbers. And here we have formed a factorial of a number using recursion.
24. Program to check Prime Numbers: Hello and welcome back. So in this lecture we are going to learn about how to check whether a number is prime or not. So these are the problems that we might be facing in our complete you coding interviews where the interviewer may ask you to write a program to plain sum of n numbers. Whether to take a number is prime or not, or some of you in numbers or find even numbers, how to check even number, how to take prime numbers. So these are the problems are quite often asking the company to, it jumps and come to two Programming tests. So it's better to know how to implement this in our programming as well. If you are. I'm aspiring our data scientist and machine learning and Julia are EIA engineer, so it is good to know the basics, right? So, so in the series of lectures, we are exploiting these things, how to, how we can write a program in our programming to find prime number, particularly in this lecture. Okay? So this program is to check if the number is prime or not. And the number we will take as a, you just import. So we'll ask the user to enter the input. And once the user does, the input number will take that input number hand. We will check whether that number is prime or not. And for that, we will use the if and for loop. So if you know, how do you use if and else? And for loop, you will be practical to do this program. Okay? And let me tell you what a prime number, so prime number is a positive integer greater than one, which has no other factors except one and the number itself. So what does it mean? Suppose we have our number fought. So number four, we can write in the form of two into two. So it has two factors. Two into, two into two is four, right? So this is not a prime number, okay? Suppose we have a number. Suppose we have number six. So number six we can write in a form to intuited three. So it has two factors, 23. Similarly, we have number of, suppose we have number five. Number five, we cannot write in a factor, right? We cannot write number as two into something like two into two into 2.5, but that is not correct. We have the integer factors straight. So this way it is not possible, right? So the numbers like five, which has only factor one and itself. So we can write five in a form of one into five. Similarly, we like we have 77 also we can write in the form of wanting to seven. It's only, we cannot write it in a somewhat with the help of some other numbers like we have written. I'm 6223. We cannot, we cannot find a number which can divide seven, right? Seven, can we divide it by only 17 itself? So that's why the prime numbers are the numbers, positive integer numbers greater than one, which can be divided by one number itself, of which has no other factors except one and the number itself. So the prime numbers are 22, then three, then five, then seven, then 11, and then 13, then 17, so the, and so on. So these are the prime numbers, see 17, we cannot divide it by any other number except one and itself 13 also, 11 also. So these are the prime mover. So now we have the understanding of what are the prime numbers less proceed to solve this problem. So fostering how to take input from the user in R. So we can use the read line function to get input from the huge. So you should know the readLine function which has a prompt. Agile. I'll comment. So the len function will take prompt, so it will prompt the user to. And whatever you write here, it will be displayed on the console. And it will ask you to do whatever you write here, to ask you to, to do. Suppose here we are entering, we are writing, please enter a number. So the prompt will take the number entered by the user is we'll pass it to read line function. And here we can convert that user input two integers. So what we are really take distorting the user entered number in and okay, n variables. So as dot integer, it will convert to integer from whatever we get from this readline function through this prompt. Okay, so if I run this, see, if I run this, it will ask us to please enter a number, OK, and then let me clear this out. Okay? And then we are setting a flag called too gentle. Initially. Ok, we'll see why we are using this flag equal to 0, and we'll also see why it is coming. Because we have entered the numbers. It is showing us the same number, okay? This is some data that we'll see. Okay? So we are setting the flag equal to 0. And then we are giving, like, as I said, the prime numbers are always greater than one, right? Start with 22357111317, like depth, so it is always greater than one. So first thing, what will take will check if the number is greater than one, then will go inside the loop. And if number is not greater than one, then it's definitely not a prime number, right? So for that flag will be 0, and for flagged 0, what we are giving. For flag 0, we are giving not a prime number. So if you enter a number in minus, minus two, minus three, minus five, so that those will come here. Pulse and it will give, it will use that message that you're enter number is not prime numbers. Okay? And what if number is greater than one? Support 235678, all the numbers. So what we'll do, we'll check for the factors. And before checking for the factors, what we'll do, we'll set the flag to one. Okay? Plug one means number is five number. Okay? So initially we have set it to 0. Now, as soon as we enter inside this if, if loop, if function, if statement, what we'll do if number is greater than one will set the flag is equal to one initially. And then what Blue, we'll create a for loop. And what will be the for loop? For loop is for i in to, because the prime numbers start with 222, n minus one. And minus one means suppose we are entering five, so two to four, right? So i into 24 means it will take four to T for these three numbers. And what it will take for the factor, whether the and the number is being divided by two or three or four, it can we divide it by 234 are not. And then if n percentage, percentage I means it will take for the factor if and then the enter number e divided by, suppose we are entering five. So it will check whether they said when divided by two or not. And then it will check do edge by three or not, then reject to a delay fault or not, and then equal to, equal to 0. Okay? And then we'll set the flag equal to 0. And if it is divided by the 2-3 ford, it will set the flag equal to 0 and it will come out of the for statement. It will break the for loop and come out up data and OK, and if we n equal to two, it will set the flag to one. So what it will do here, it will check whether the number, suppose we are entering six. So six is divided by two, right? So it will come equal to 0. So it will set the flag 0 and it will come out of the loop and it will. So flag GO, flag 0 will come in the else part, and it will, so it's not a prime number. Suppose we are entering five. So five is divided by two. No. So it will come out and flag will be one here. So flag one means five is a prime number. Right? Similarly support, we are entering eight. So eight is divided by, it will take two to seven, right? To first divide by 28, whether to just flag is 0 or it will come out of the loop and flag 0 fault. It is not a prime number like this. It will check and it will give us study Josh, Okay? And support you were entering 11, so it will take N divided by two through 11 divided by two. Note it will come out of the loop. Suppose we are entering 1616 divided by 216, divided by two. Yes. So flagged 016 divided by like that. Okay. So it will be coming out of the loop by setting, setting the flag equal to 0. And those numbers will not be a prime number. And suppose we are entering 1717 divided by two. No. So what will happen? It will come out of the loop and it will set the flag to one. Flag equal to one means it's a prime number. And if the number you were entering to, then directly prime numbers, right? And for one, we are coming in the else part, greater than one, we are coming in the else part. Okay? So let me run this whole source. What is the problem here? Okay, let me run it again. See, yeah, now it is, the console is asking us to enter a number. Suppose I'll enter a number one. Sorry. Suppose I enter a number one here and hit Enter what will happen? One is not a prime number, Y, Y1 is not a prime number. It will come here and it will take whether n is greater than one on our soul, Yan is greater than one or not. So this is seriously not, so it will not come inside this. Luke and flag will be 0 for this one because it is not coming inside this if loop because f is one. And it will come here, flag equal to 0. So now it will come here. And it will check flag equal to one norm. So it will come in the else part and it will soar study charged Saudi. It will come into this else part and when so one is not top prime number like this. Okay? Let me run this again. And if I enter two, sorry, if I enter through here, what will happen? C2 is a prime number, Y2 is prime number, it will come here. N is greater than one, right? Two is greater than one, and then flag, it will set one, then it will come here, two divided by two, divided by two to one, right? What do I do? A two. So c is two divided by two if celebrate. So it will come out of this loop, right? And flag will be one. So two is a prime number. If I put three, sorry. If I run this again and port number S three, and then it grows it three is a prime number. Why? Because it will come inside this if statement. Flag, it will set the flag to one and flag one it will be prime number and then it will come inside this for loop i into two, right? So three divided by two. So it will break this statement and flag will be one, right? So, and will be the prime number. So like that, if you enter, suppose I'll enter 17, sorry, supports, I'll enter 17 here. 17 is a prime number. Why? It will come here and 17 is greater than one. It will come here, set the flag to one, and then it will go inside this for loop. And for IE into 1617 divided by two. Yes, 17 is not diverted by two. So it will come out of this loop and flag will remain 14, flag one. And the number is prime number and destroyed those 17 is a prime number. So this is the way we can write a simple program to find prime numbers in our programming.
25. Program to check EVEN or ODD: In this lecture, we are going to write a program in our programming to find if the entered number is odd or even. So, we'll take the input from the user and we are huge or to enter a number. And based on the user's input, will see the Magellan port number is an odd number or an even number. Right? So what is an odd number or even number? So number divided by 0, divided by 0 without a remainder. It's called even number. Suppose we have a number x and if we divide the number x by two and we get GUS reminder, then it is an even number. And if the number is divided by two, penalties are giving us some remainder, then it's an even number. So simply for number is divided by two and the remainder is very say, even number. And if the number is divided by two and it is giving some demanded it is and odd number, right? So let us take the program. So here I have written a program where I'm taking the input as integer. So I'm taking the input from the user as integer, and I'm asking you to enter the number, please enter a number. So read length function we'll use to take the user input and prompt we are giving. Please enter a number and then we are converting to a huge Ads dot integer. Okay? So whatever that is, an integer number, right? And then we are simply checking whether this number N is divided by two. And if the remainder is 0, so n divided by two, remainder is equal to 0. We'll say n is an even number. And if we get a remainder other than 0 order number, right? It's an odd number. So simple thing. If N is divided by two and the remainder is 0, then if save a number, and if remainder is other than 0 or number, right? So I hope you got the idea. So let me tell you what are the even numbers and odd numbers. So 24, 681012, all these numbers are even numbers. Right? And our numbers are like 35791113 or so. Not only this, it will continue to do. So. The number we waited way too with GMO demanded Ji relative magnetic scholarly when number and order number if we give that claim under other than 0. So let's run this program and see the output. So let me clear this first. So enter a number. So suppose enter number 45. So the output is coming at 45 isn't odd number by 45 is our number because if you divide 45, we will get one edge that Amanda two into 22441 we get as a reminder, so remind that is other than 0. So it will come into this else part. And it will give us the method that 45 is an odd number. And suppose I tried it again, and if I enter a number 12, sorry. I'll enter the number 12, then 12 isn't even number. Suppose I run it again. And if I entered number here to really give us even numbers, right? And if I run it again, and if I give 55 is an odd number. So this way we can identify whether a number is odd or even simple logic and divided by two is equal to 0 if it gets the same number or the resist sort numbers. So I hope this simple programs will help you understand how programming works and will your logic and digital pretty popular questions in interviews also, especially for if you are a fraser And if we are a new graduate and looking for the job, These questions are pretty common in colleges. Ten plus placement, okay.
26. Program to check Positive Negative or ZERO: In this lecture, we are going to write a program in half. We will check whether a number is negative, positive, or 0. So it is going to be a simple test where we'll find that that number is negative, positive, or 0. So for that, the same thing we'll do what we have done in the previous lecture, but slightly different here. The number can be even a double number. Okay? So we'll take the input as double and we'll use the dateline tungsten and the same prompt, Please enter a number. Then we'll kick for the enter number is greater than 0. If it is greater than 0, will give the message and is a positive number. And if it is equal to 0, we will give you 0. And only if it is other than 0. Then first it will check this. If n is greater than 0, then posture in our rate, then it will come into Else part in L. So we are using a nested if-else. So if a number is not greater than 0, it will come to this else part. And then in this part we are checking if number is equal to 0 will give number is 0. Else if number is not 0 and greater than 0, less than 0, then what we'll do, we'll go and try this else part. That means it will be a negative number eight, If number is less than 0. If number is not greater than 0, if number is not greater than 0, right? So it can be 0 or less than 0, right? So if it is 0 will keep 0. If it is less than 0 will come to the else part and put the message late number is negative numerator. Simple check. Let's run this. Let me clear this out. Okay? Okay, here. Okay, so we run this. So let me enter a number supports, I'll enter minus five. Minus five is getting a two number, right? Suppose I'll run it again and enter 45. Sorry, I'll enter 45. So 0.25 is a question or suppose I'll run it again, handy, I will put 0. Sorry. 0. Number is 0. Okay? And then suppose I run it again and give a double number like minus 78.5, okay? So minus1 d phi 78.5 is a negative number it, in the similar way, if I put minus eight has a negative. So this is the simple program to find whether a number is positive, negative, or 0.
27. Program to Check Leap Year or NOT: Hello and welcome back. So in this lecture, we are going to write a program to find whether it is a leap year or not. So what is leaf here? Leave here, you'll get an extra day, 366 states layer and leave here. 366 days here certainly API at right. So how we decide whether the, yet when we had an artist, there is a simple formula. If the IRR is divided by four and the remainder is Gino could via Libya. But that is not the one leader case, right? If the year is divided by four, this is the modulus functionality. If here h divided by four and giving the Wender as 0, then that could be libya. But we have to check another condition, whether the number is also equally divided by 100. Also, if you if it is divided equally divided way 0, sorry, EPA to edit report, I'm giving them into a 0. It could be a leap year, but we have to check whether that yet is divided by a 100 and giving we defined as 0 if divided by a 100. Also giving the reminder is 0. It, again, it could be a leap year, but again, we have to check a condition. We, the data yet is degraded with 400 and giving the reminder is 0. If that Ian is equally divided by 400 also, then that will be thus your short leap year. And if it is not, then it will be not leap year. So the first condition we need to check is whether the edit divided by four, then we have to check whether they divided WayForward and remainder as 0 for litigated waveform. That means year modulus for, so give 0, then we are projecting and more or less a 100. It should also come 0 and shear modulus 400. So it also come hero, if that is he sadly free or else it is not, live here. Okay? And if, and here also, see this l. If it is not divided by 100, then it is not that if we at heart, so it will come out of this, will finally go to the else part. And if it is not divided by four, then certainly it is not a leap year. Ok? So here also, if the year is not diverge reform, this, it is not a leap year. Ok, so let's run this program. And support them into 20-20. So skull. So you have 2020 is sleep yet, why? Because it is divided Re for It is also divided weigh a 100 and it will be divided, right? 400 ads, right? We didn't really write a report on debt is new. I did write for yes. And if we does not divided by if it is being divided by 100 and just giving them ended as 0, then we have reject 400. But here it is, giving us the mandalas, something that, so that's really the amendment is not generally here, so it will come to the else part and it will put here guarantees, warranties, Libya. So if it is more or less is 0, then we are project divided by four on it or not. If modulus is not equal to 0, then it will come to this park and it will give us that 20 grantees API supports a frayed on it again. And if I give two thousand, five hundred, twenty five hundred years, where they suddenly free it or not. So C yet 2500 is divided by four. So it will come here. It will take and divide it by a 100 or not. So it is divided by a 100 fully. Then it will come to the divided this if loop, and it will check whether this is divided by 400 are not. So C from D, 500 divided by 400. It will give us the reminder, somebody went down. It will not require to 0, so it will come here and it will come to the else part because remainder is not equal to 0. And spotted blueprint both on environment is not Libya. So let's enter and see the result. C 2500 is not a leaf here, right? E Framework 199300. Darn good. Not a leaf here, right? If suddenly free and why it is live here, because it is Do I didn't write four. Then it will come to that. This if and it will see whether this year total and Galerie du I did write a 100 and giving us the window's 0. No, it is not giving us the remainder is 0, so we need not project again. And it will clearly concordance BARDA and equilibrium. So this way we can write a program to find whether a particular deities sleep yet or not.
28. Program for Multiplication Table: Hello and welcome back. So in this lecture we are going to write a simple program to print multiplication table, right? To fork, like a multiplication table is a table for Laika. Suppose multiplication table up two will be two, then two into 24, then 2232 into 482010 like that. Okay? So simple multiplication table we are going to print. And for that we are going to write a program. So I have already written the program to save our time as usual. So first we will take the user input as integer and prompt. Please enter a number. As soon as you get will enter number, will take that number. And what we'll do, we'll run a for loop for i in one to gain, because the multiplication table we want to then it 14, up to ten numbers only, right? So, and then we'll print multiplication, table up the number n. And what we'll do, we'll simply multiply N into i. So we will first take the support will take number three to 321. Then next time the water will come, I will be one. So three into 2323 and pretend like that it will print the multiplication table. And to just, this is for two. So the output, so it will be three into one equal to 33 into two is o, six like that. Okay, so let's run this and suppose I'll enter three here. So what will be the output SIL, three into 132 eyes one year. Ok, so three into one equal to three, to d into one equal to three, then three into 26 up to ten. So if you want to increase this number, you can put, suppose l put print linear, and if I run it again, and if I put three years, so see here, this will go up to 15. So this way you can generate them multiplication table of any number and for any number of primes, lake up Putin, cook a prenup, propounded, whatever you want. So suppose I'll run it again and I will give the number two and number 25-year Atlanta 25. Let me clear it and enter number 25-year. I want to tend to the multiplication table port 252521 loop. I wanted to T 25 into three equal to 7525210 over p. So this way we can generate the multiplication table in our program.
29. Sample Data from a Population: Hello and welcome back. So in this lecture we're going to learn about sampling of a population from our dataset in R programming. So that is very important because when we have Massimi learning problem or data science problems, on deep learning problem, we have a huge setup radar, right? And we want to get some insight from the data, or we want a sample of data. Suppose we have a most city population and we want to analyze how many bosons having diabetes. So we one to get a sample of data, analyze data, and based on that, we can create a model and then we can apply that model to a larger population, right? So taking a small population or sample population from a large dataset is called sampling of a population. In what, in general terms, right? So sampling a Beta is very important. And to know these, we should be knowing what is R functions or the simple how to create vectors. And so we know all those things. So we, what we do, we will see a simple example how we can now do sampling. Okay, so us to handle sampling. I'm to provide sampling support to the sampling. R has an inbuilt function called sample. So our sample function, the simple SEMP and the sample. So this sample function, what it will do, it will simply sample population if we give sample and we give any integer number as I input. So what it will do, it will sample one to 20 numbers, means it will create a population, sample population up one to 20. So if I run this, CEO, let me clear this so that we can see the output correctly. So if I run this sample 20, see what it will do. It will create Sam numbers. Now population of numbers from one to 20. And it is not against, it did not earn a certain angle. Ok, so it's just not arranged in increasing or decreasing order. Just the numbers from one to 20, it will generate so 123, all numbers are from one to 20 here. Okay? So this is, this is the way we can create a sample up numbers from one to 20. That's a simple example. Next thing is what I am going to do. I'll create a vector from one to 15 numbers. Suppose this is the, this is our dataset which is containing numbers from one to 20. I support 115 here, I'm deck aquatic jump URL. So one to 20, this is our x vector where we are keeping the numbers from one to 20. So this is the population. This x will be known as a population because this is the entire dataset that we have. So x is the population. And from this population of x, I want to. Sample out only five elements or five. I want to sample fight. I want to get the five. Suppose these are the marks, are the digits, the number of preprint people. I want the numbers of only five people. I want to sample five elements. So how we can do that? We can use the sample function. We can provide the population here, x and then comma five, how many elements you want to sample from the lot dataset are from x. So sample x comma phi will give us the five random numbers from this large population data to it, and it will create a sample for us. So let me run this and then sample exon five CEO. Now we are getting 481410 n 12 from the population dataset, right? Population detests it is this one, okay? One to 15. So from that we are getting the random for 81410 and this is our sample dataset. If I run it again, it will give us different five numbers. See, now the sample is changed and so it is randomly pick five numbers from this X vector r, x population, and it will create a sample upgrade. Ok, so this is how our sample function walks. Next thing we can give replace equals true. So for this one equal to two, it's C. It will generate the same kind of numbers. If I run it again, it is going to give us the different five numbers. So here two places are giving us though what it does. But if we use our sample and we'll provide X as a population, and we'll simply use the place equal to true. What will happen? Let us see. It will replace the element c, now 513 to 17441415123. And see, all the numbers are not the right one to 15. Some numbers have when triplets, like 13, there's one time, but five is also one times. 14 is repeated thrice, right? 12 visited, protect twice. Write the similar way. Some of the numbers will be replaced with the, some other numbers like some, suppose one support that there is no one in this, but our original dataset is having one to 15. But he had not all numbers from one to 15 here, right? Most of the numbers are replaced. So what replace equals true will do. It will keep on replacing the numbers from the dataset. So from the 123 and it will take some of the data, it will repeat some of the data, and it will replace some of the numbers. So some of the numbers has been replaced with the, some of the other numbers like one. Suppose one is repeated, one has been replaced by four, right? In the similar way, if 14 has replaced some other number like six or something, because six is missing here, missing here. So those 68 which are missing, which has been replaced by the other numbers in the population dataset that is 14412, right? So what it will do, it will replace the number internally and tool gender data are sample population for us, so it will simply keep on replacing the data with one another. Ok? And then suppose we have event of head and tail tossing a coin. So we have two events, right? Either we get head on. Suppose we have the sample dataset or anti receivable event off anti head and tails. And we want it to be sampled 15. It will give the size of the sample Philippine and replace a quarter to what it will do. It will simply run this, see what it will do. It will simply create a sample with t and edge, head and tails. And it will repeat head and tells multiple times and till, because we have given the sample sizes for print, so it will create the etch, DHS, and then Tt. This is quite random, right? They know our sequence in that if you run it again, you may get the another sequence of H and T, and it will be upside scripting. So this way we can create a sample of head and tail events. And we can give the numbers here. If I give five, it will be done. It will give us the T, T, T C here in this sample. Since we have given only five while it tells testers are coming, there is no head if I make it suppose six. So yeah, now we're getting cat tails tends delta l and head. So it teach quite random. So this way we can use the sample function in R programming to get the sampled from a large population, like we did here, we are taking this population x, which is containing numbers one to 15, and we have taken only sample 5-7-5 elements from that dataset. And we have created our sample of five elements. So this is how we can use sample function to get the sample from a population dataset, okay?
30. Analysing Data in R from CSV file: Hello and welcome back. So in this lecture we are going to learn the very important concept, and that is Data Analysis in R. So what we are going to do in this lecture, first, we will try to read a CSV file. Csv is a comma separated files and it's like XLE seat you can think of. And it will contain the sum implied details. And then what we'll do, we'll try to read that CSV file with our programming. And then after we read the data, we will try to analyze that data that is there inside the employee detail CSV file. And we'll try to gain some insights from that data. Okay, so let me show you the CSV file first. So this is the CSV file that I have created. And we will see here this is the employee details dot csv file and it is us comma separated file. So the first imply ID, employee name, salary, data up joining and department. These are the four column. These will be the column of the table are XL seed, you can say, and then this will be the first row. The comma separated values. One will be the employee ID, then pretty yanks roster will be that imply name. And then salary will be repeated thousand date of joining this. And then department will be CSE or something. Okay, so these are the from data we have kept inside this CSV file. And now what I'll do, I will try to fake it. These, these data, these columns and row values through our programming. And then we'll try to analyse the data. So first tape is reading a CSV file through R. Okay, so let me go to the code. The first thing we need to do, we need to set our working directory. So to set the working directory unit to go to the files here, this section, and see when you click on these three dots here, it will take you to the you can browse the files on your computer. And here what I'll do, I'll go to the art 20-20 because this is my working directory and I want to set this place as a working directory is, I'll select that. And now we are inside this are 20-20 directly. Now I want to make this tax free as a working directory, so I can do that. I can go to this more often and aptitudes your Set as Working Directory. So I'll click on Set as Working Directory. Ok, so this way we can make this our 20-20. Directory as our working directory. Alternatively, you can simply use the command set WD. Set WD is a function that will set the working directory. Wd means working directory and segments setting up the working directory. So setting work set W, D is the function inside that you can provide the path what your directory, and this path will be your working directory when you execute this. Okay? So now we have set the working directory. What I have done, I have kept the employee details dot csv file here so that it will be easy for us to who read that. We need not to pass the complete path here. Since we have insight, we are inside the working directory and our employer e tilde of CSP file is also in the same directory. Okay? And if it is not there, then we have to give the complete fatty on okay, complete. But for this file, okay, so first step is we want to read the CSV file through r. So what I'll do, I'll first create a variable. You can say R object in our employee data, EMP dot data. This is invariable where I want to store whatever I'm going to read from this CSV file. To read a CSV file what we need, we need a function called read.csv. So read.csv will be used to read the CSV file. And inside this function, read.csv, we need to provide the dot csv file name so that our file name is employee details dot csv tacit. When we execute this this function, it will read the data from this employee dot CSV and it will store inside this employee data. So let me run this first. See. Now when you see here, it is showing us it is slowing us. Employee e, m p dot data. It kicks off five variables. It means there are eight rows and five columns in the CSV file. See you imply id. One to eight imply name, these other imply name, salary, let up, joining. And it is, you can see the details here. So now we have read the CSV file and we have stored the data inside this EMP dot data. Now I can bring to this EMP dot data and see what detail it is containing. We'll just run it and see you. Let me run this again. See here now we are getting the table like, like structure, right. So E, m PID, employee name, salary, date of joining and department. These are the column names and the delta values, tight rows for the corresponding columns, right? There are eight rows and five columns. Five variables means five columns in this. So can you relate what kind of data type this is in R? This is called a data frame, right? Dataframe will have the table-like structure, right? So whatever we read from the CSV file and it will be stored as a data frame in R. And we can check whether this is a DataFrame or not by using data. Each dot data.frame function, right, is dot data.frame function. And we can pass this employee data object. So let me run this. See yell it is given that is delta screw, it means this employee dot e m p dot data is a data frame. Whatever we read through this read.csv, we'll get a data frame. So now we have a DataFrame here. And the next thing is now we have our dataframe table-like structure here. Now I can find the number of columns using N col function. So I can use N col and then I can pass the data frame name here. And it will give us the number of columns in the DataFrame. So this CSV file is having five columns. Similarly, we can use Andrew. Andrew is number of rows. When we run this, we'll get the number of rows it rose in this DataFrame. See here how easily we have read the CSV file and we have created a table or a data frame in R. And now we are finding with n column and row number are columns and number of rows in this data frame. Now, we are going to get some interesting information with r. So now I want to find the maximum salary. What is the maximum salary of an employee? So I can use the max function and I can pass the employee data. Dollar salary, it means I want to pass the salary column, this salary column to the max function. The salary column, EMP dot data, dollar salary missed this column I'm fetching and I'm passing it to the max function. So dat max function will walk internally and find the maximum of these salary. So it will give us, give us the result 95,200. Let me print this max salary. We are getting. The maximum salary of an employee is 95,200. Okay, so this way we can find the maximum salary. We can also find the average salary by using mean function. So inside the main function, what we will do will provide this salary column. And it will give us the every salary of the employees. See here average salary of the implies a. Every salary of implies is 53,924. And if we use sum of all the employee's salary and divide by eight because there are eight employees will get the same radar CEO. So this way we can find the average salary of the employees. In the same way we can find the maximum. We can find the employee details having maximum salary, we have found that 95,200 is maximum salary. One implies is getting. Now we can find that employee with this detail. So we can use the subset function and we can pass this imply data. And then we can put another argument here, salary equal to max salary. So what we are going to do, we are going to find that employee who is having maximum salary. So it will give us the employee details. Who is, I think who is getting the maximum salary from the imply? Let me run this. See here this employee names. So d is getting the maximum salary and these are his details. So this way we can put dot can be Sunday year salary faced the employee details who is getting maximum salary. So this way we can find the employee who's getting maximum salary, highest salary. In the same way. We can also run this. Get all the implies working in finance, working in finance department and getting salary more than 85 thousand. Ceo. This is the department, so this is the department finance. There are two implies and both are getting salary more than 85 thousand. So what condition we are giving? We are giving calling the subset function and providing the employee data. And then we are giving a condition like department equal to finance and cell this would be above 85 thousand. So it will give us all the employee detailed whose department is finance and salaries, more than 85 thousand. Let me print this. Let me run this. See here we are getting imply. It may run it again. So here we are getting the employee ID 38 min, so deep and run Java. And these are the two implies. Who belongs to the Finance Department and their salary is more than 85 thousand. So these two implies our department, finance, finance department, and their salaries more than 85 thousand. And next thing I want to find the implies who's joined on or after 2 thousand. So we have a date of joining here also, right? So I want to find all the implies WHO joined after 2 thousand on, after two-thirds. So I can provide the employee data and I can give as.data.frame and inside debt I'll give the date of joining. So date of joining I'm providing to this date function. And this should be greater than adds date app.js 2 thousand foss January 2 thousand. Okay. So all the Implies was data up to n is greater than these 20102 thousand. We'll get those employees details here. So let me run this. Alright, Anita, again, see here. So now we are getting the implies Priyanka mace or the morass sooner. Not these five implies they have joined in 2 thousand and after data on not after 2 thousand, they have joined the company and we are getting the employee details who joined on, uh, on are often 2 thousand. So this way, we can analyze the CSV file and the data that we have inside that CSV file in Excel seats. So this is pretty cool analysis we can perform with the Simple functions right? Now what I want to do, we have a read our CSV file, right? We have read the CSV file and we have perform all these analysis. Now what I wanted to do, I want to write a CSV file. I want to get some data. I want to generate some data, and I want that data to be published into a CSV file. How I can do that? I can do that with this. Write.csv. Read.csv is reading the CSV file. And write.csv means reading or writing a CSV file. We want to write a CSV file. And whatever data these data have found implies WHO joined after 2 thousand, right? So I want to write this table out this data into a CSV file. So I will pass this object joined after 2 thousand to this write.csv. And here I'll give the CSV file names. So I'm given CSV file name as employee dot twined after 2 thousand dot csv. So all this data will be written into this CSV file. New CSP file will be created. Okay, so let me, let me see whether this file is already, see, this file is already there, so I'll, what I'll do, I'll delete this. Okay? And then I'll go back and then run this. Okay, so let me run this and let me go to the folder here. See here now we have the new file created. And let me show you the data. See here we have the 12345 rows. Pretty much has told me ran through IT and their data up journeys 2000201829142003. All these implies have joined after 2 thousand. So that similar data is here. And now that data has been put inside this new CSP foil employee dot dr. So see how easy it is to get the majority and write that result into a CSV file that we have created just now through our, through our, we can't write the results into a CSV file as well. And now I can read, use the read.csv to read this CSV file again and print. So let me run this. See here we are getting the same, recharging your SQL. So this way we can write us the CSV file and we can read a CSV file. And we can perform all these analysis like finding the maximum salary, finding they imply who's having, who is getting the highest salary. We can find the implies twining after a particular date. We can find the average salary of the employees. We can find the mean of the salary. We can find many other things, whatever required for your business. If you want to come to a particular conclusion by analyzing the data, you can do with the, with the R functions and you can analyze your data. So I hope you got to know the strength of our programming and how to analyze data in the R programming and reading and reading and writing, reading and writing a CSV file. See you inside the next lecture.
31. Creating Pie chart in R: Hello and welcome back. So in this lecture, we are going to learn about by Jack. So let me first clear these consoles and objects, and let me close this file. And I have created one file that is called by charter dot. So I have written the program, I have returned the functions to create pie chart using data in R. So we suppose we have some data and we want to plot that data into a pie chart. And we want to analyze the data so that we can do in our very easily and odd is popular for this feature as well because we can visualize that data using various charts. So one of the key chart is pie chart and that we are going to learn in this lecture. So first thing, first, first thing, we need to keep the data for the charge. So I'm creating a vector n And I'm giving valued like 30, seventy eight, twenty three, thirty nine something. You can give whatever you want. And then I'm creating labels a, b, c, d for this value. So value will be 30 BY Lu, 78, see 23 and D 39 like that. Okay? No, I can get IDO filename for the jack whatever chart I'm getting I'm creating. I can give it a name like ABC dot PNG or by Chart dot PNG. So suppose the name by jog dot PNG, anything you can give, okay? So I'm giving it a name. I can use the PNG function and I can give finally equal to the file name. So I wanted to create a pie chart dot PNG image file. Whatever crop that I'll create that will be stored in a PNG file image format. Okay? So debt also can do with this line. So this is optionally, if you want PNG file, one image file, then you can create with this. Otherwise you can directly come here and you can use the function and you can pass the data n. So this will go into the data and then you can pass the labels. This is the levels. So one pie chart will be created with this data. And each of these data will be labeled with a, B, C, D. Okay, so and then dev.off means this file will be written off on the System. Ok. So let me run this code here. So let me run. Okay. So see, you know, we have but we are not seeing the jar here. I don't know why. So let me run this part again. See, now we have been charged with the labels a, b, c, d. So E is d, v is the most like 78, that's right, the biggest ADF what? B, and then d, NC. So this way with this simple data, we have created this graphical representation. And let me go to the Art 20 and CEO, there must be one file created. Senior vice chart dot PNG file has been created with that graphical representation. So see this image file also created and see here the a, b, c, d, The other labels. And this is the V value is 78. Evaluate something, okay, that we have given a value is 30. And see when each 2329 like that. Okay, so see, we believe we have created a graphical representation of our data and we can present our data like this. Ok. So let me go back to the core. So this is the command, this is the Fontan PNG to create a PNG file from our graph. Ok, so this is our graph and this is our PNG file for that graph. Ok? So if you want to send across the chart, you can create a PNG file and send it over the email. Ok. So this way we can create a pie chart. Ok, next thing is, now I want to create another pie chart. And for that I'm creating a like implies and their salary. So I'm connecting us salary recto. And I'm giving this cell, this impulse to one day 800 supports, but our salary and the names of the implies c, one Tate and all those things. Ok. So these are the two vectors I'm creating salary and the names. So I would say equal lab doesn't let wondered but been 800 like that. Okay. And the same with B and D. I am trying to create a PNG file, Salary dot PNG for the graph that will draw and the by salary and names and passing on, passing the salary as a data and name such as labels. So the next pie chart that we are going to create, it will be created based on the salary and the names will read labels like here, ABCD did. In this levels will be delta V sake, but V1 and all those things. Ok, so let me create this deal here. Let me run this. Okay. This again. See here. Now we have a pie chart with the names like kick the bucket down. And what is the huge of graphical data for plotting? Because with this we can see that OK, Teton is having less salary or music is I haven't list salary, but, but one and follow kids having fairly good salary. The bug can Elon is having every salary. So with this graphics, we can easily analyze the data without looking at the data in a deeper, we can see this and weaken, come to conclusion that music is having the less salary after that, theta1 is having little more than a VC. And then the book and Ellen, and then baboon and Faruq. And Faruq is having the most the most of the settlements are far rookies having focus on getting the highest salary, a thousand rupees, right? So we can do this analysis by seeing the pie chart. And CEO. There must be a salary file has been created here, see here. So this you can send over the will and you can put like that. Okay? So this is the huge up by Chart. And with this symbol by function, we can create a pie chart by and we have to find the data and we have to write the labels. This name is a label that you think you will see the rest of the things like how we can analyze the CSV file detail in the next lecture. So I'll see you inside the next one.
32. Analyzing Data sets using R functions: In this lecture, we are going to analyze data set. So first thing fast, if you are planning to work as a data scientist, machine-learning engineered, or even data visualization. Data analyst. You have to have to have no. What is data analysis and how you can emulate the data. So the most important part of any data science project is how you analyze the data. So data analysis is the most important part of any data science machine learning project or even data analysis project. So what I'm going to do, I'm going to use in-built dataset that is readily available with the R package. When you download. It comes with the the R package, okay. So you need not to download it separately. It will be it, it is coming with the, okay. So it is inbuilt dataset with the R. So we'll use that hand. I'll tell you how we can use the inbuilt functions to analyze the data, to get information about the data. Okay? So what is dataset? Dataset is basically collection of beta. And we have most commonly seen datasets are like table we using our data bases, right? So databases what they give. So the most common data set that we have seen is the, well, so in our databases like my sequel, our MongoDB and any database, if you see the, basically the collection of data, right? If you see MongoDB is a collection of data dumps off key and value pair. If you see MySQL database or RDBMS relational database management system, they will be keeping the data in the form of rows and columns. And rows and columns will be keeping the data, right? So most common collection up data is table, okay? And we keep the time the XML format as well in the JSON format as well. But the most common thing is table, okay? So you can, whatever you have seen in the warmer table that is basically keeping the data with it, right? So we're going to use empty cars. Empty cars is an inbuilt dataset in R, and we are going to analyze this one. So empty costs, either motor trend cars or test dataset that is inbuilt in R and it is retrieved from the 1970 Ford Motor Trend US makin, okay. So this data is from this 1974 Motor Trend US smacking. Okay? So first thing is when you Suppose we have this inbuilt data, say that is empty cars and we want to load this data. So what we can do, we can simply write the dataset name and we, when we run this, we will get the dataset. So this is the dataset that we have. Okay? And when you give here there are more columns, right? So these are the rows and these are the rules are different. God names, okay? And for each guard data several variables like MPG, cylinder displacement, XB, wait. Okay, so all this data that we have with the empty car, right? So it is having 11 columns and 32 votes. That means it contains 32 cards. Details with 11 columns. 11 columns are 11 different variables for each guard. Okay? So this is how simple type the dataset name and you'll get all the information, although dataset all the rows and column of the dataset. Okay? Next thing is, suppose we have this in, we want to get the information, more information about the dataset. So how this is arranged from where we get that data set. So we can simply put the question mark in front of the dataset name. And when we run this, we get the information about the dataset CEO. And this dataset information is coming here. So when we run this one, question mark, question mark mt cars will get this information, MD cards, datasets. And this is saying more dark drain Garver or dataset. And this is coming from the R documentation. Okay? And so it is giving you the total description. And Portal uses a Brita or how this data has been formulated. Okay, so this is motor trend cars rotation and data was directed from the 1974 motor ten years makin comprises four, fuel consumption and tennis bats up automobile design and performance for 32 automobile started to Garth's okay. In between 1973 to 1974 models. Okay. And then it is giving the format like MPG means MPG. Cylinder means number of cylinders, displacement, HP in grass horsepower, pratt. Yet axial ratio weight, then Q sick, 14, or one way for mild part-time, time. Vs is the engine VSEPR in general string1, so VS is V4. Well received, and if it is having 0, it is received and 1-4 straight, then am for transmission automatic if an automatic emperor manual or one automatic N14 manual. And get number of 400 years and golf number of category tests. Ok, so these are the information we are getting for this data source. Okay? So now coming back to the analysis part, so you just put a question mark in front of the dataset name. And I'm gonna get all the information about the inbuilt dataset. Now, we want to get the diamonds and the value of a limp support. We have the diamonds on sleep, Raj n columns, and these are the variables name. So suppose how I want the, one little idea. Well if name, so how can I get that? So you can use? So first thing we need to do is we need to assign this dataset to some variable. And for that, what I'm doing, I'm creating a variable data set underscore cars, and I'm assigning empty gusto mt cars. This dataset Gospels represent the empty cards. So all the values throughout empty cars will be here in the dataset cars and this we can use in our program further. So if I use DEM and I will pass the dataset variable, this one dataset under score cards, I'll get the diamond sum up doc dataset. So n, if I use names and pasta dataset name dataset, or I'll get the variable names in the dataset. So let me run this junk. See here, now it is giving the diamonds and 32 rows and 11 columns that we can verify from here, like 32 rows and 11 columns. Ok, so that is the diamond dataset. And then when we use names, we get done variable names or he's CEO, we are getting the variables names, MPD, cylinder, days, SB dragged weight, all of those things. So 32 by 1132 rows and 11 columns and the diamond sense of that data saved the antigen is already well its name. Next thing, I want to extract the role name of the first column. So I want to find the row names up the first columns. I can use row names. And I can pass the dataset dataset to enable. So role_name and I bought dot dataset and as Scott cars, this will fetch me. Although God's name, first, column names say c, d, d. How many D's are the 32 datasets variables? Sorry, the jazz, it's RT to the delta 32 guards that has been used in the dataset that at every level in that data set. So the flush. Column, okay? And then if we use data, if I want to get the wildly one valuable information, like I want to get the MPG from the data set. I can use this dollar sign data set name then dollar sign, mpg, and we'll get the MPG variable values. Okay? So this way we can get the values of a particular variable. So in case here, where the MPG, I can, I'm getting the values of MPG if I use EM here and you get cheeto anyone, because that is automatic and manual. So senior, 1-0-0, Judah, Judah, like this. Okay. So Jennifer, automatic, one for manual. So this way we can use variable names here if I use mpg and get them beating. Next thing is, if I want to sort these, this MPG is coming like this, okay? If I want to sort this in order, I can use the sort function on this. And I can sort dataset under scorecard, dollar, mpg, and it will sort the values of this MPG variable c. And now it is coming now sorted form increasing order. Ok? So this is how we can sort them when Luisa for variable. Next thing is, now I want to analyze the data. Sit so I can use somebody and I can give the variable name for the dataset and I'll get the summary of the data. Ceo, the beautiful summary of the data like MPG, what is the mean? What is the first quartile values? What is the median? What is the mean? What is the third quartile? What is the maximum value for each variable? We are getting this six values, right? Like min, first quartile median mean, and then minimum mean, median, maximum values. And the first quartile and the third quartile for each of these variables, we are getting this information. So this will give you the summary of the data and these are the things we will launch in the coming lectures. What, how to get mean, what is median, what is first quarter and third quarter? So this is how we can get information and we can analyze the dataset in R. We can use our dataset name to get the dataset. We can use dollar sign to get the information about that dataset inbuilt dataset, we can use the lm function to get the diamonds and other dataset. We can use names, function to get the name of the variables. We can use row names to get those. Draw. Foster for all washed column, each row in the first column, okay? Each row's values in the first column, then we can use the scholar saint could get the variable values for the dataset. And then we can use the sort function to sort of variable values. And then we can use this somebody to get this somebody up the data. Okay, so this is how we can analyze that data set in art.
33. Analyzing Employee Data: Hello and welcome back. So in this lecture we are going to analyze the employee details dot csv file that we have seen in the previous lectures. And we are going to get the data from this CSV file. Read the data from the CSV file, and we're going to create a pie chart. And we will see what are the things we can do with the pie chart. What are the graphical things we can analyze with the pie chart? How we can analyze the employee details from the bio charges. So to read a CSV file, we use read.csv and we pass the CSV file here. And I'm going to store that into the EMP dot data. This we have already seen. So I run this and we get the data into the EMP dot data object API. Now, this will be a DataFrame rate. You'll remember it. So if I print this employee data, EMP dot data, it will also DataFrame table like structure right here. So this data we are getting from the this CSV file. Now I want to draw, I want to get a pie chart like this for this implies, okay? So to do that, I can use the by function and I can create a pie chart. So I want to create a pie chart based on the salary. So what I'll do, I'll use EMP dot data, dollar, salary dollars elements. Get the salary column up this dataframe. And it will create a pie chart. And level what I'll give, I'll keep the labels imply name. So e, m, p dot data, nola, MPI, underscore name, imply name. So it will create a pie chart like this. And this will be the salaries are blamed play and the levels will be the employee's name. So let me run this. And CEO, we getting a pie chart where these blocks will represent the salary and the levels of implies team. So with this pie chart, we can clearly see that, see that soil is drying this most salary and otherwise drying this must salary, right? So deep salaries maximum rate and which we see here. So deep salaries maximum 95,200. And my HACE is drying the less salary, right. So this 78. Ok. So this way we can get the aid pie chart based on the salary and employee name as labels. Okay, next thing is, I want to plot the chart with title and grand book color palette, so I want to make it more colorful. So here what I'll do, I'll use the Python x1 as fast the salary, then imply name. So this is the data and then this is the label. And mean. Mean equal to implies a lead. This will be the heading for this, this chart. Okay? So this will be the heading for this chart. And then what I'll use a huge quantity called rainbow colored rainbow. And for that, what I'll give employee dot data length, okay. Then FDA employee later. Okay. So let me run this Z. Now. We are getting a much colorful graph out of bike chart where the heading is imply salary and the name of the labels and the salary, the blocks are representing the implies salary. Ok. So this way we can make a colorful pie chart, which is looking much better than the earlier one, so you can use this as well. Next thing is I want to deviate the labels. So what I will do CEO and giving the levels as employee name that is already there in this this DataFrame rate. Now I want to create my own labels. So what I'm doing here, I'm going to create, parsing up the salary as labels. Okay? So what I'm doing here, I'm creating an object 5%, and I'm using the round function. And what I'm doing here, 100 into employee salary divided by some of those salary of all the employees. Okay, so basically what I'm doing, I'm finding those percentage of placentas, salary of each employee based on the portal salary, total salary will sum all the salaries. And then we'll try to find how much Susanna is getting. How what is the Susanna's salary in percentage? That total salary. Okay? So we are getting this by percent is value. And then what I'm going to do, I'm going to plot the passing test salary by chalk. So for that I'll use the function. And here data will be the same salary of the employees, this column, salary column. And then labels will be the PI person. So the label instead of name, no, I'm providing 5%, okay? So it will be 1%, 2 percent like that. Okay? And then the main heading, I'm giving parsing test salary by Jack, okay, and then the color I'm keeping the same scheme ramble. And here I am giving the length is theta imply data salary, salary and providing as a length. Ok. So let me, and then another thing I'm giving, I'm giving the ligands legions of something will be printed here to specify what does it mean are rich color blocks of which imply. So it will be posted on the top right corner. And then here I'm giving employee data. Dollar, employee name means the employee name. And then I'm giving the cex means it will be the value of the length or breadth of that thing. Once I draw this, here will be understood. So let me, you will understand once the graph is coming CEO. Okay. So let me run this as well. And now lets me on this. Ceo now we, instead of the name, we are getting the percentage numbers like to sum up total salaries of all the implies. This person and the green one who is so deep, who's getting 22.1% of the total salary. And then run now is getting 19.8. So this, this legion, this each called legions allegiance. And based on the color green will oxytocin or the bread belongs to the free Yang. So with this graph, you can understand that Priyanka salaries land 0.6% of the total salary, right? In the same way. And this belongs to the Michael, Michael Salish, 20.6. And the lowest salaries of this person who is reward us thing is having less salary, 0.2%. So this way we can create our own labels. And we can put the legions for each level. And this is more space feet like how much one person is getting in percentage, percentage salary. Next thing is we can draw the 3D pie chart. And for that we need to download this library, blot TTX. Okay? So if we're not, if it is not therefore, your Irish told you you can come to the file. You can go to the packages and you can click on the Install Packages. And here you can give your name, sorry library name which you want to download and click on the Install. If it is not there, you can click on Install and this library will be installed on your machine. Rstudio inside DOD should do and you can use that. So we need this library, lot tricks to create 3D pie chart. And inside that library we have the byte 3d function. And with that we can pass the data on that edge implies salary labels will give the employee name. And then explored 0.2 and man is present this Salvin 3D. Ok, let me run this to see, and now we are getting 3D by John. Right? See here, we are getting a 3D pie chart. Like this is the salary of Susannah, saddling up mid antibody, Michael reversing C, very less salary for fewer things or the blank like this. So this is a 3D representation of the employee salaries. And so we can see here, and this will look much better than the 2D pie chart. So this way we can create a pie chart and we can analyze the data in. Okay? So you can also try with your own data. You try to create your own vectors and then try to draw the pie chart. You can analyze the data from a CSV file, create your own CSV file, and do all those things. So I want you guys to create your own project like this and post those graphs and these pie charts into the project sex and of this class. And we will see that we can provide our feedbacks. And so it will be said between us, all the students and with me also. So you try to create or create your own data like this, and try to visualize your data bit by creating the pie chart or a 3D pie chart. Like finding the mean, finding the average salary, all those things. Ok, so and post into that projects XL.
34. Reading excel file in R: Hello and welcome back. So in this lecture we are going to learn how we can read an Excel file through R. It is pretty simple and we need to install one pack is, and then we need to load that package. And then we need to read text and files through a simple one-line core. And then we can print the data. So let's see how we can do that. So what I have done, I have created a same implied detail. Excel file, same retails I have kept into this excel file that we had in our CSV file. So it is same data, but I have created a Excel file through that CSV file. And what I have done for deck, I have simply open that CSV file. What I do, I do not have the Microsoft XL here. So what I'll do is just go to the and here we have employee details dot CSB file, right? So I'll just open that in Google seat. And we will see that it will be created as x and file C s. So now we have the column names, right? Employer, employee name, Sandy, date of joining and department, and all the details in the Excel format. Know what I'll do. I'll just download this file in Excel S6 format. So click on that Microsoft Excel format. Now load it. So now it has been downloaded and I have kept that. I kept that file here in our working directory. So next thing is now we have this employee details got XL Sx file dot. That means that Microsoft Excel file here in our working directory, that is our 2020. So what I need to do, first thing we need to install this package. And the package name is Excel Sx. So to install a package in R, we can use this syntax install.packages, and we can provide the package name that we want to install. So when you run this command, it will store it though. Excellent ethics Bacchus. Alternatively, we can also go to this Pakistan. And we need to click on the Install. And here we need to provide XML SX, okay? And you can click Install and it will be installed. So we can do either way. And that excel Sx backers will be installed. This package is required because we need to read. Excel file through our programming and we need the libraries that are built in into XML, SX, brackish. So just click on run and this will be done. Okay? So what I'll do, I'll just cancel it because I have already installed this package. You just click on this and it will be installed ok. And you can go here and you can do that also it writing hardly AMI network for your time and this package will be installed. Next thing is an opera that we need to remove this thing because it should not be there in our artist script file, our PR. You installed the liability of the package. You have to delete that line. It's still not going to the liquor, you know, Alice script file. So next thing is, we want to read are excellent ethics files and there is a function in this library. So we need to load the library, we need to load the package, so we will use library and then when you keep the library name XML SX. And then what I need you, I need to use the function read dot XL ESX. It means that we want to read a Excel file, Microsoft Excel file. And he'll first argument, we need to give money to give the file name. So the final image, employee details dot XML SX. Okay? And then we need to give the seat in text index equal to one. Okay? And this, whatever we are reading here through this red dot Excel ethics, we are storing in site object inside an object, employee underscore data, EMP underscore data. So this will have all the data that we're reading from this Excel file. And then we will simply print that. So let me run this file. And CEO, we are getting the result TO imply ID, imply name, salary data up, joining and departments. So this way is pretty simple. We can read through the, we can get the all the exits it, we can read that right? So pretty simple. Load the library. First install the package, then you load the library. Then we have to read the Excel file by simply providing the Excel phi lame. And we have to store this data object into an object. And then we simply print that object to get to see what that object is stored into. It is basically storing all the details from the excellent seat. Okay, I hope you got to know how we can read on Excel file. So we can read an XML file by using read dot XLS x punctual. And we need to provide the XML file name. And then we will simply get all the data inside the Excel file. So the packages Excel Sxx that we need to install, and we need to load the library XLS x before we use the Excel function. So this is the way we can read an XML file, Microsoft Excel files through our programming. See you inside the next lecture.
35. Reading xml file in R: Hello and welcome back. So in this lecture we are going to learn how we can read an XML file through our programming. So we will write a code and through which we will read XML file. So first thing first, let me tell you what piece. Xml file. So an XML file is an XML XML XML Extensive Markup Language and insert we call it XML extensible markup language. And it is similar to like relate HTML, hypertext markup language, you know, for writing web pages and all. So in the same way, XML is extensible markup language. It's a data file, and in this file we keep data. It's like a database, like we store data in the Excel sheet as well. We store data in a text file as well, and we store it in the database as well. In the same way, which store data in XML files that it's extensible markup language. It is a formatted like an HTML document, like in HTML document, we use the markups and all to store the document in the same way. Xml also, we use the markup to keep the data inside the file. But here we use custom tags like in esteem, ALL breathing is predefined. But in XML we can create our own tags like HTML. You have to use the older predefined language tax that is specific to the HTML, but in XML we can write our own tax, whatever we want. We can create a tag for that. And that's why you set custom tag based language out our file extension that we can now find markup language that we can use. And here custom tags to define objects and the data within the each object. So we can define the custom object and we can put the data according to that. Whatever data we want to port for that tag, we can do. Xml files can be taught of Azure text-based database like MySQL. We used to keep the data in a table and row format and we decide what kind of column we want to insert in the same way it's sort text-based database. So now what I do, I don't want to get it and XML file from the scratch. And I want to use the same details, same details that we had in our CSV file, employee details dot CSV. And then we have created, employed it l dot xml file. So I want to have the same kind of data, same data in XML agile. So for that, we can go and create manually an XML file. Otherwise we can also use this convergent tools dot IU website, where we can just browse the Excel file and we can convert that to an XML files. So let me show you. So when you come to this website conversion tools dot IU, convert XML to XML. So this phase, when you come here and we, when you browse and just put your employee details XML file here. So now I'm using this employee details XML SX file to create an XML file based on these files data. So you just need to select the file. And then we have to click on the Run convergent and this Excel file where we can rotate one XML file and it will take few seconds and your XML file will automatically get ready. So what I have done, I have already downloaded this XML file here, imply details dot XML file and we will try to read this XML file. It is having the same details as employed. It is c. So okay, so what I have done, I've modified it and I have kept only for recourses. And if you see the file which I have just computed inside downloads and this XML file, you will see C, Yes. Eight Employee ID it up to eight. Is that okay? So like that, we can do that. So let me copy paste these two. Ok, so two, like we have installed the package XL F64, reading the XML file. In the same way. To read the XML file, we need to install the package XML. And for that we need to use the install.packages and we need to provide the package name XML. Alternatively, we can go to the Install Packages, and here we can provide the package name XML. And we need to click on Install and it will be installed. So I'm not going to reinstall it because I have already installed that. And if you haven't installed it, you just run this line. You can come here and for ydA XML here, and you can click on Install and it will take few seconds to install.packages are highly amenable to install the package. And once the packaging the stars, we need to remove this line because packages already installed. Next thing is we need to load the XML package, our library. So we need to use the command. How do we need to use a script library? And we need to provide the package name, so we need to load the package, okay, so library and we need to probably the back is an XML. So all the functions and required to read XML file will be written inside this XML library and that all the inbuilt functions will be available for us. Next thing is we need to and get the librarian metrics as well. This is required for this XML packets. Okay, and then next thing what we need to do, we need to use the function XML parse. Xml files is a function that is required for reading, reading an XML file. So we'll use XML parse and we need to provide the file name for that XML file. So file equal to that file name. So our final image employee details dot XML. And we need tools, whatever it will fetch from this XML file, we need to store that inside this employee details objects. Because in R we need to store everything inside of an object like variables, you can say. So implied it is variable or object will keep whatever we get from this file, XML file through this XML parse function. So now all the details from this implied details got XML file will be stored in this object. Now we need to print that objects. So let me run till here. Okay, so later, CEO, it is printing the XML file like record imply, imply ID1 Priyanka elasto, salary date of joining department. Okay, so these are the user-defined tags that we have created. One about XML file. All the implied it is we are getting so we can read the XML file like this. Xml parses the XML file in C, or we can use the function XML food. No, the root node up dot xml file. So we are using the root node and we are passing the employee details EMP underscore details object to the root node and then we will bring this root node. So let me run these two lines. If I put a root node, one will give me the first Nord retains lattice imply a D1. If I put one of two, it will give me the name. Of the first George Senior employee name Priyanka Dostal from the first node. If I put two up to four title, get I'll get the name from the second embrace that did the employee to employee name is mace. So like that we can access the each node. I can put in three of one. Suppose three of three will get salary of the carding blank. Right? And in the same way we can find how many nodes are there, how many Northside there in our XML file that for that we can use XML size and we can provide the root node object that we got from the XML fruit. So when we run this, we have done number of north. So on. Let me print this CN number four because this XML file I have kept only 40 employed it is. So it is showing us that there are four Northside there. See here, we have four nodes will imply one, right? To imply d3 and d4. So it is giving us the what up north in the XML file. Now the main thing is once we get this data, XML data we can read through R. We need to convert that to that data frame because in R, it is very easy to read data or manipulate or data when it is in not DataFrame for my trade. So in, ah, there is a function called XML to DataFrame. It means whatever we have in this employee details dot XML file, we can directly convert that to work data frame by using XML into DataFrame function and all the details when installing the employee underscore data frame. And then we can print it and we'll see that it will come like a data frame. So let me run this and see now all the details up. Enough. Tabular form it in our dataframe parliament in r, right? So C yet Employee ID, then employee name, then salary data, joining and department. So with this, with this single line of code, we can convert an XML file to what data frame in R programming. And this is pretty handy when we do our exploratory data analyses in machine learning and data science. So we can easily convert XML file to a data DataFrame. And then we can now proceed further to analyze the data out of which we want to create graphs and all blotting and all. So all those kind of analytics, we can move on this DataFrame. So it's pretty easy to do all of those things in our programming. I hope you got to understand how we can read an XML file. How we come to know that, how, how many Northside there with this XML size, and then how we can convert an XML file to our DataFrame using XML to dataframe, right? So that's it for this lecture. See you inside the next one.
36. Reading JSON file in R: Hello and welcome back. So in this lecture, we are going to learn how to read JSON file in our programming. So watched what will do, we will learn what is JSON files. Then we'll create a JSON file. And then finally we'll try to read that JSON file through our programming. So let's get started. So first thing we should be knowing what is a sulfide. So JSON, stanford java script object notation. And it is an open source standard file format. So it is open standard file permit and data interchange format. So basically so file for monetizable and data interchange format as well means we can store the data and we use it for the exchange inter-data. Also, that you just human-readable text. In this, we write human-readable text to store and transmit data object consisting of attribute value pairs. It means it will be, we will firstly, key-value pairs, all those things site. If you know the MongoDB are any no sequel databases, you must be knowing there is a document database where we keep, we store data in the form of key-value pairs. So there will be a key, and for that key, there will be a corresponding value in the same way just on also you just the Activity two value pairs. It means there will be an attribute and product attribute, there will be an value. So attribute value pairs and added into types. So basically any other values as well. And basically a lady that I saw. Hi, I hope you got to know what is this on? In theory, let me now tell you how we can, could eat just on file. So C0, you sudden file is a very simple. We need to put data inside this curly braces. And inside the curly braces. What we will do, we will use that key value pair to store data. So suppose I, what I'll do, I'll store the same this employee data only. We have seen how we stored this CSV file here. And then we have seen how we can store in an Excel file. And then we have seen how we can store data in XML file. And now we'll see how we can store the same employee tail in a JSON file. So for that, and like I said, you so they wished file for my trick. So here, Employee ID, all the employee IDs, I'll store in an array 128. So all the employee IDs from one to it, like we have eight implies ID. So we have, we ever stood all those in an array. And we have given is active group name is employee ID and the values up 12, eight, right? And this colon is the left side of the column. This colonists separating the key and value are activated and values. So left side of the colonies for Employee ID and right side is corresponding value. In the same way we have employee underscore name, that is an attribute and all the implies names. This Priyanka will be corresponding to this imply any one or two. Synaesthesia. I like premaxilla stop at 124244, whether you are thing like that. Okay. And then we are storing the salary array, then we are storing the date of twining array and then department id. So this way, in the ADA permit, we can put data into the JSON file and I have saved this Azure, employed it test.js. Ok, so now we know how we store data in JSON file. So next thing is, we will try to read this guess on file data through our programming. So let's do it. So I'll open, I have already written the code. So hello opened at, so reading JSON file that I have written. And for this we need to install the package, our case on our JSON, this backing you initially installed, so you can install it by running this line of code and you can come to the installed packages. And you can put here our JSON. And you can click on the Install and it will be installed. Ok, so you can use this command script as well. And then we need to use the library artists on. And then we need to provide the JSON file name. And JSON file name is employed. He didn't start the song. And here we are going to use a function from this library, our JSON, that is from JSON. It means we are going to read the objects from our details, from the employee details dot JSON, JSON file. Finally, quality to the file name we need to provide. And this from JSON function will read the data from this guess on file and whatever it creates really stored that in this object, GMP underscore details, and then we'll simply print it. Okay, so let me show you here. Let me run this and see here it is reading the data lake imply AID, E122 implying names, all these other employee name, then the salary, and then comes the date of joining, and then the departments. So see here how we live with the one-liner quote from Jason, one simple function. And just we are providing the filename and it is reading all the JSON file data. Next thing is now with this weekend for in the data. So we can see the sprinting the data from the JSON file. Now, as we know in our, it is pretty easy and it is pretty recommended to get the data in DataFrame format. So we can simply load the JSON file dot data is in this object in EMP underscore details, we can convert that into the DataFrame by using Azure dot data dot frame. So when we use this function as.data.frame and pass this object which is scanning all the JSON file documents dot data. This object is getting all the details from this JSON file. And when we pass this employ retail object to the as.data.frame, it will convert this data like know the format is like this. It will take this data and it will convert that into a data frame in R. And we are storing data details, DataFrame in LA, EMP underscore data underscore frame. And when we print this will get the data in a DataFrame format. So let me run this. See here now we are getting data in our data frame formatting heart. So it is pretty simple that we can read that JSON file with the from JSON function. And then we simply pass this object, which is getting all the details from the employee details dot JSON, and convert that into a data frame by using this thing Paul Johnson as.data.frame and passing the subject. But what this JSON object on, Python object which is getting the data from the JSON file. What DataFrame when art. And simply we can print that hand CEO. How do fully it is transforming into a data frame. So this way we can, we can read our JSON file and convert that JSON file data into a data frame in R programming. And for that, we can use this DataFrame to analyze the data and do further processing, are creating plots and all whatever you want to do you can do with this DataFrame. So this is how we can read our JSON file using our programming series inside it. The next lecture.
37. Creating Bar plot: Hello and welcome back. So in this lecture we are going to learn about another visualization chart or graph that we can create using R. And that is barplot, bar chart, as you can see, bar graph. So barplot or bar chart or a bar graph, is one of the most common type of graphical visualization that you must have seen in your office or in your projects and all. Whenever we try to visualize something fostering, we use bar chart. And when you see this chart, this is a bar chart and you'll realize that you must have used it for many, many times, right? We could get seem kinda bar chart in our Microsoft Excel as well. And when we try to visualize something, even when we draw something on a pen and paper. To visualize something, we most probably we draw barplot. So barplot is one of the most common type of graphical visualization and it shows the relationship between a numeric and categorical variable. So what does it mean? It means that the CEO, a, b, c, d to this x axis will be like something that will be very categorical like GAAP. Now, what is the like weeks? Days in a week, like Sunday, Monday, Tuesday, our January, February, March upgrade my job like that. And this y-axis will be the numeric representation for these months, like revenue of an organization or salary of implies. So this x-axis, ABC will be the implies, and this will be the respective salary. So this axis will be the numerical part, and this will be the categorical part, right? So each entitiy up the category variable is represented as a bar. So this is the bar. And that's why it's called bar chart because the numeric value is being shown as a bar. So that's why it is known as the bar chart. And size of the bar represents the numeric values. So this size is represented like this is somewhere around 15, this is somewhere around 13. The sea somewhere around seven, and this is somewhere alone, nine, and this is some TCGA. So e value is b value is seven, c value is nine, d value is eight, and this e value is 13. So like that. So this burst size will represent a numeric value, like if ABC or employee, this is their respective salaries, right? So this way we can use the bar plot and in R we can draw a bar plaque by simply watching barplot function. So barplot function and inside the function will just provide some. Parameters and our data and it will kill you at the bar plot paths are use this function bar plot to create bar charts. Okay? It can be both vertical and horizontal bars, okay? And thus in Texas bar plot, and it will take the argument, the edge, this edge will be the vector, data vector. The vector which will contain all the data. And this xlab is x axis, y is y axis like this, a, b, e, a, b, c, d will be the x axis, x and y will be the y axis. This main. And then the name max will be the, will see what is this? This, this will be the like, naming like this. 123 like let me tell you this. This g rho to 14 Lake, the employee's salary will be the data. And these names dot r will be the name of the implies. Ok, so this edge and names for both the vectors should have those same number of data, right? Same number of data number operators would be same in the HN. Names. Start on ok, will see that. So h is a vector or matrix containing the numeric value of the bar chart. As I said, xlab is the level of X-axis, Y level is level up y-axis, and main is the title of the bar. So here with this will give the title of the bar chart like employee salary. And then names.org is a vector up names appearing under the each part. Like I said, for this bar, this bar B like that. Okay? And the college used to give the colors to the boss like here we are given different colors. So if you want to make a heart chart colourful, we used the call argument, okay, and we'll pass that. Okay. So here a simple example. What I'm doing here, I'm creating a q vector and it will contain this, these many numbers. Ok? So this vector will contain these many numbers, okay? And I want to create a chart for this. So what I'll do, I'll just simply create this q vector and I'll pass this q vector two bar plot function. And it will clear the bar chart for us. And this one PNG file equal to e b, a PNG file name. You can create an image file up the chart or bar chart, and you can save it on your system. And for that we use the PNG function. And inside that we'll use the file equal to and whatever name upto. Finally want to give that you can give your so paid on these data are charter will be created and the chart data will be saved in this image file. And then we'll use the bar plot function to throttle. By bar chart for us. And then dev.off wins will save the file and it will be written on our local machine. Okay, so let's go to the RStudio and run this code. So here I'm creating two vector for data input. So based on this data and the bar chart will be created. Now I'm creating a file to write the bar chart on that. So I'm giving PNG file equal to o. And here you can give any name. You can give abc, abcd, dot PNG, whatever name you want you can, OK. So I'll give a, b, c, d bar chart, okay? And then we'll use the bar plot to create a bar chart. And then we'll save it, right, dev.off. Okay, save the file. Now I'll run the whole thing and see for some weird reason the plot is not being sold here. But I'll, so you see here, now our bar chart has been created. The file has been created here. So 12345678 entries, right? And here we have 12345678. The longest is 9,001, okay? And 99 thousand. So this way we can give it a bar chart. So this is a simple bar chart that we have computed based on this data, right? Next thing what we can do, we can go a little further and try to create little more interesting graphs. So what I have done here, I have created a data like B, vector data, which each containing the babies born in each month like B will contain V vector will contain the babies born in month like January, five hundred, six hundred February 53 and to March 7800 babies born like that up to the summer. Okay. So this is the 12 months data for the babies born in a particular month. And then, so this is the numeric value. And, and what I'm doing, I'm creating another vector M, which will contain the month name, right? For each value, the corresponding month name vector I'm creating that is January, February, up toward the summer it will contain ok. So this is clear. I'm creating the numeric value here and the name folder. Each bar I'm creating here for January five hundred, six hundred like that. Okay? Then I'm creating a file, babies born dot PNG. And I'm passing that filename to dot PNG function. Okay, connecting the chart image file name. And then what I'm doing, I'm simply plotting the Barr's chart based on this data. Okay, so what I'm doing, I'm first passing the VU ALU. So I'm inputting the data like based on this data I want to. Cutting it up bar chart. Okay. And then names.org, MS for naming the each bar I'm passing this m vector. M vector is scattering January, February. So the x axis will be January, February. And y-axis will be the number of babies born in a particular month. Then xlab, I'm given name month. And why 11 given baby's bond in that month. Then color I'm giving green. And min means the hitting of the chart. I'm giving babies bond chart. And then for the each bar that I'm giving LO, OK. And then I'm saving the file. So let me run this k, So I have run it correctly. Let us see that the CEO, now we have another file, baby spawn dot B and G. C. Now, we have this, January, February, March, April, May, June, July, August, okay, like this. And then we have the baby is born theta like January, somewhere around 500 and something. Babies born February somewhere. Sorry you 4 thousand, something like that. Ok. So most number of babies born in November. And then to lie, that is the, by seeing this bar chart, we can come to know that number is the month when most number of babies born and upgraded. July, July. And number first number and then second is to lie. See here. For July. Seven thousand three hundred and forty nine thousand eight hundred. I started July 9 thousand and number 9,800. So this way we can create, Suppose I want to change this color to red. And if I run this, see now are converted to read, right? You can see there is a little borders with yellow color. Let me change it to green so that we can see that clearly. Make it clean and run it again. And this file C, now it is coming green, the border is green. So this way we can create bar plots or bar chart using R. So I hope it is clear for the numeric and for the name. So name.org. I'm passing this m where this m vector and P M passing as a main input vector. And then xlab month. And why lap babies born and collaborate may uncharged hitting his baby born chart hand the border is green. And then dev.off will save the file to our local machine. Okay, so I hope it is clear that how we can create a simple bar chart and how we can create this kind of bar chart. So you have seen like we have created this. And then we have Jens is to read, write, and we have created this simple bar chart as well. So this way we can create a bar chart, right? See you in the next lecture.
38. Stacked Bar Chart in R: Hello and welcome back. So in this lecture we are going to draw out, we're going to visualize a very interesting bar plot that will give us a very good reward for what is going on in an organization like quarterly charged region wise. How we can plot using barplot. That's what we're going to see. So to do that, what I have done, the quarter liter, what each month actually, what I have done, I have created a matrix, okay? So I have all these revenue. I have kept insider matrix, so I've created a matrix with the revenue. What, four months, there were no quarter, one, quarter, two quarter, three, quarter four, quarter, one quarter, two, quarter, three, quarter four. So these revenues, what each quarter? And four quarters, right? So for each quarter, these are the revenues. Okay? So I've created a matrix, and what I will do, I will pass this matrix as our input data core bar plot. So I'm passing the m here and then the total revenue. And then name start on quantum passing quarter, quarter, quarter little vector. We just containing the value Q1, Q2, Q3, Q4, okay? So this L paths as the names.org and then xlab will be quarter the name and the wildlife. Wildlife will be the revenue. And then colored. I'll pass another vector, colors. So I want to draw for the bar ofs will be like cough, colorful. Once it is drawn and explained you. So four colors, I've created a vector and it is containing white, blue, blue, pink, yellow, and green. What? Okay. And then the quarter and then whatever app connected regions, regions like east, west, south, north, ok. So that also I have created, okay, and the metrics, what I've created, four rows and four columns. And I'm arranging the values by rock. And see here what I am doing for legion. I'm adding the ligand also top-left. And what the legend am giving their regions is filled colors. Okay. Regions is this regions out east-west? Salt not okay. And I'm filling with the colors, the colors only. Okay? So let me run this first. So see this is the matrix and see what quarterly revenue CEO. So these kind of bar chart or bar plot we are getting here for quarter one, this is the recharged quarter TO this is the result of revenue quarter three and quarter four. This, and see here this blue, pink, yellow, and green for these colors, I've created a legend Cl and saying that blue is for South Region, yellowish for East region. And green each flourished region, and north pink region and the South is blue, is this yellow like that? Okay? So by seeing this graph or the bar plot, we can easily find that, okay? These are the quarterly results for each region. For South, this is the revenue for quarter one. In the same way for the North, this is the revenue. And further east, this is the revenue. And for quarter two, quarter, three, quarter, four like that. So it is pretty easy to find and visualize things using barplot, right? And how we then that first we have created a color vector, then quarter vector, and then regenerate tos and then regions I have used in the legend and matrix updated four-by-four, okay. And yet I'm creating a quarterly revenue dot PNG file. And here I'm passing the quarter, this quarter ASOR names. And then I have a new hand em filling the colors, bypassing the colors vector. So this way we can get it a great visualization using R bar plot. Ok, so you can also play around with your data and try to create some beautiful, good-looking, colorful barplot. Stephen said the next lecture.
39. Boxplot in R: Hello and welcome back. So in this lecture, we are going to learn about box plot. Box plot is a method for graphically depicting groups up numerical data through the quartiles. I'll tell you what are these quartiles? So is basically a graphical pick some of numerical data, group up numerical data like we do in barplot and all. In the same way boxplot is on so there. But in this we represent a group of data in a box. So it will be graphical, but it will be a box wise like in bar chart. We have bars of the data and group operator here will have the, will have, we will have the box of the group of data. And box plots are major of how well distributed in the data, in the dataset. So it, it will be later measure four. And the thing that how will the data is distributed in that dataset, right? So we'll see when we draw the boxplot. Let me tell you one more thing. Boxplots and measure of how well distributed it, the data in the dataset, it divides the data into three quartiles. So what are these quartiles? Yeah, as you can see in this picture, that there are three quartiles. One is Quartile one, and then quartile two, quartile three, right? And this is inter-quartile range. This class represents a minimum, maximum median first quartile and the third quartiles. So this part C, This is the box of the data, okay? And these winds up outliers, these point, and this point is the outlet, this is the minimum outlier. And this is the maximum or glare, right? And in between this box and the blue line here is the median. This is the median of the whole dataset. So this is the median, yeah, median value of the whole dataset. And these are the all class minimum value and the maximum value. And this blue line is the median. And this is the maximum data that is near to the median. So this, these are the useful data actually. So this is minimum, this is maximum. And this is the, this range from year to year, this inter-quartile data, and this one is known as Q_1. Q_1 quartile, first quartile anthesis known as third quartile and the first quartile. And this is third coordinate, first quartile, May 25th. Percentile. And third quartile is 75th quartile. Okay? And in R, we use box-plot function to draw, to draw boxplot. And in this we provide the lake arguments like x data, not what with names and mean. So I'll tell you what are these things? So x is a vector or formula. So here x is a vector formula. So we'll see this is basically we keep two things to release our relation of formula on which the box plot is going to be drawn. And then the data, obviously the data from which we will draw this real essence if you write out the formula X. So this is the formula or relationship on which we are going to draw the data and this draw the graph. And this is the data, actual data from which we will get this formula or a vector. And not is a logical value. Set as true to draw, not, see what is not in a bit late. First, let me tell you the meaning of these augments. So what width is a logical value set as true to draw with the Vox proportionate to the sample size. So water with, if it is true, this box size will be proportion to the whole dataset. Size up the whole dataset, okay? So if it is not true, it will be not proportionally to the whole dataset. Okay. Next is name. Names is the group of labels which will be printed under each box plot. So this is one box plot. And for this, if you want to give some name, you can give that through the names argument. And main obviously is the name of the graph. Okay, so title of the graph we can give del min. Now comes the what is not. The notches on the sides of the box plot can be interpreted as a company region interval around the median value. Okay? And the height of the Nazis done median plus or minus 1.72 IQR divided by square root of n, where IQR is the inter-quartile range. We have seen what is inter-quartile range, right? So this is the inter-quartile range between the 25th quartile and 75th percentile. This is the inter-quartile range. So basically, this is the that is the n value, okay? Where IQR is the inter quartile range defined by the 25th, 75th percentile. And the N is the number of data points in the dataset. To proton number of data points in the dataset is n. And you can see here this is the box plot and this is the outlier, maximum outlet. This is the minimum outlier, and this is the median value, right? And the seeds that 25th percentile or T1, and this is the 75th percentile, that is Q3. Okay? And this value, this thing, the ninth and the not, this is known as the not this value from here to here is known as notch. What? Notches dot 10795 confidence interval of the median. Okay? So from median this, this is the median and this, and this value will be known as notch. Okay? So when you put not equal to true, you can see this curve and this thing, okay? If you put notch not equal to true, you will be seeing a straight Ranger. This not, will not be there. Okay? So this way, see here in this notch is false. So you'll see like this boxplot, and if you put not equal to true, you will be getting this much well, ok. It means that the most of the data here will be, will be near to the median and it will be the very confident interval of the data, like median value is this and this plus and minus of them medium like here. The concentrated data points will be near to the median and it will represent that truth data, right? So now we have the basic understanding of what is boxplot, what is quartiles, what is outliers? And what is the minimum quartile and what is maximum quartile? What is inter-quartile range? And what is Q1 like? 25th percentile and Q3, 20 75th quartile, okay? And this is the median value. So we will see in the next lecture how we can draw a box plot based on the data that we have in our Irish cell, blake, We are going to use empty cars, which is a real dataset that is available inside the package, inside the distribution itself. And we'll use that empty card dataset to draw the boxplot based under my MPT and the number of cylinders. So I'll see you inside the next lecture.
40. Boxlot using mtcars dataset: Hello and welcome back. So in this lecture we are going to draw a box plot, okay? And we are going to use empty car dataset that is already available in our distribution. So we no need to download this dataset. It is already inbuilt into the environment on our distribution. So we can directly use that and try to represent that empty card data into a box plot. Ok, so let see how we can do that. So first, let me show you what is that in the box. So the, what is that in the empty costs? So first let me Pedro Xist objects, data input and empty glass. That is already I've level insight. And what I'll do, I'll try to try to run this and see what is printed. Empty our dataset. And this is having d, d lambda, God's like module.exports dash since 17 ordinate for Dr. valiant, dust dot must t's and all these cards details are there like MPG. What is the Majlis up these cars miles per gallon. And then the number of cylinders in a number epsilon has like 2468 number epsilon die in the engine and then displacement SP. What is the hotspot? Hand rat? Wait up the guards. All those parameters are given here in this data that is empty costs. So what I'm going to use, I'm not going to use the entire data set. I'm going to use what I'll see costs. And I'll get one lee miles per gallon and number of cylinders. Okay, so for that, what I'll do. Okay, so I'll use this to mpg n number of cylinder, okay? And what I'll do, I'll print and then I'll huge head. And then I'd pass this data inputs sorted. We can see what comes out on this two lines. And see, now we are getting the cars and they're like miles per gallon and number epsilon done to date, they're in jail. So these 2D dense them getting with this. Okay, so now I'm going to use this data input which will carry the MPG n number epsilon. And I'll try to draw our box plot. Okay, so first thing what I'll do, I'll create a PNG file to store the graph on file and had huge file equal to allocate some name. I'll keep like empty costs, boxplot dot P n t. Okay. So I'll keep the file name as empty cars, box Plot.ly. And and then now what I'll do, I'll try to draw the boxplot and per diatonic use boxplot. And what I'll do, I'll keep to mpg, miles per gallon, width, the number of cylinders. Okay, so I'm going to draw, I'm going to kitty It a boxplot between these mpg and number of cylinder. Okay? And then what I'll give him give data, according to this data inputs are empty gas. So I'll keep data equal to empty cars. Right? N What we need to give next. When it could get the xlab. What will be the what do we want to write? So we'll write xlab number of cylinders. And why lab will be y lab will be mpg miles per gallon. Okay. Then the next thing, what we keep freaking remain the two mileage, hotter we looking costs myelinated. Ok. Then we'll save the file. But again, dot off. Okay? So what we are doing here we are. We are creating Boxplot with mpg and number epsilon n to the car. And we are using the dataset mt cars and x axis will be number epsilon and y, xs will be mpg and the name of the graph will be cars, myList data, okay? And let me run this. Okay? So this is our plot out, the boxplot that we have drawn. We can see here now the heading is the name of the chart is guards my data. And the seat number epsilon del 468, and this is mpg and these other box plot. Okay? So this way we can draw though, we can make box plots. Okay? If you want to understand one more time, has the LU, What we have done is simple. We have, we are using mt cars data, which is already available in our, our distribution, so we need not to create it or download it. It's already inbuilt into the environment. So we are using this. They take CO2 and precast means we are going to use this empty card dataset, which is having all this information about the costs of different cars. And then what I'm doing, I'm using a boxplot and I'm dragging the box plot between these number of cylinder and mpg for each course. Okay? And I'm Eugene data empty curves. And for x, x is, I'm using number epsilon and y xs MPG, okay? And then the name of the graph I'm giving my data and then I'm writing that file off. Due to some issues in my RStudio. I'm not able to see the plots heal and desks way what I have done, I have written it to the file empty car, boxplot dot PNG. And we can see the file here. So this is the boxplot and this is the median. And this is the median, right? And this is the minimum Eclair and that is the maximum outlier, right? Minimum and the jab them. No median value for number epsilon. So four-cylinder engine. My lenses here and median mileage is something approx 26 or 27 MPG or number of cylinders six. It is coming around 2020 mpg and number of cylinder if it is eight, the mileage is around 15 gallons per pupil in miles per gallon, right? So this way we can draw boxplot, okay, from the empty costs. See you inside the next lecture.
41. Boxplot with notch: Hello and welcome back. So in the previous lecture, we have seen how we can draw our box plot using MPI card dataset. And based on this number, epsilon dot n mileage per gallon. So based on these two parameters are being unleashed to features of this dataset. We have drawn the box plot and we have seen how the boxplot looks. So cars mindless data here, mpg and number epsilon del 468 and mpg here, 152025. And this black line is the median of each group. Okay? So four-cylinder cars, six Linda Carlson epsilon because now we can actually draw the same Boxplot with not. So we have seen what is not. Now. We'll see how not. If you put not score to true how this Boxplot with change and not is used to draw, like it will tell you how the medians of each group is unrelated to each other, right? How medians of different groups match with each other. Okay? So let's do that and also will see, I will also try to put some colors into this graph boxplots so that it will look and go to, okay, and we will also try to name this X axis. Ok, so let's do that. So first thing first what I will do, I will change this name to Boxplot with okay. And then what I'll do, I'll simply for quarter two, sorry, equal to true. And now, let me run this. Now let me go here. See, we have, so our earlier graph was like this. Now, when I put not equal to true, our graph has changed to this. And see, you know, this is the median of each of the graph, right? Each of the data set are each of the group, like four-cylinder, six cylinder. And how these medians are related, are different from each other, how they match with each other. See the median, the how would these things that are coming, the Nazis coming for each, each of these groups. Now, let me put some coloring. So, and then a red, yellow giving three colors, red, green, and blue, yellow. And this, let me put some names. So that will have the different, different graphs. Images. Can now we have different colors, different group on box. Ok. And earlier it was like this without colored and without not with, not with, not with colors. Okay. Now let me give some names here. Okay. So here the mileages. High rate for four cylinder car's mileage is high for six cylinder, medium. And for the estimator this law, so we'll give the names high, medium, and low. Ok. So that when we see the boxplot, we understood, we will understand that this is the high average car and this is medium and this is low Majlis class. Okay? So yes, cool. Names. High medium made on this. So we have high, medium, and low. And if you want to more specific, you can put something like this. Okay? So this way also you can predict psi. You hope you got to understand how you can port boxplots. So you also try and see you in the next lecture.
42. Histogram and distribution of Histogram: Hello and welcome back. So in this lecture we are going to learn about histograms. So we will see what is histogram. And we'll also see the types of histograms and how we can draw or how we can put a histogram based on our data in our practical system. Ok, so here we'll see the theory part and we'll learn what is histogram. So let me tell you. A histogram is an appropriate presentation of the distribution of numerical data. So it is basically a graph like bar chart or bar chart that we have seen. It is same kind of distribution represented as an upper distribution up numerical data. So basically, if you have numerical data, you can draw a chart or a graph out of bread. And histogram is a very appropriate represented as an object to a distribution of numerical data. So it is basically used for numerical data. And that numerical data should be continuous in nature most of the times, it was first introduced by Karl Pearson. So Karl Pearson has introduced the histogram. And we say histogram like another definition of histogram is a graphical display of Brita using bars of different heights. So like in a bar chart, we have seen the bar side in the same way. Histogram is also a graphical display of beta using parts of different heights. It is similar to a bar chart, but our histogram groups numbers into ranges. So if you see bar chart, it is not groups the data into the range like ten to 2020 to 30. It will not arranged. It will not group the data. But in the histogram, it will group the data into the ranges, and then it will put the bar charts or bar chart with numbers into the ranges. Grouping up numbers into the ranges will keep you histogram. So I hope the picture is getting clear. We'll see the images as well. We will see the real representation up how Bar Chart is different from the histogram. Okay, so here it is, good to know that bar chart plus grouping off numerical data into ranges. Like if we have data from ten to a 100. So it will create the bars. And with that, it will also range from ten to 2020 to 30 groups. And it will create bars. So it will be a histogram, the height of E bar source, how many falls into that range. So it will basically give you an idea from ten to 20. How many numbers are there? How many? Suppose if you are looking at the salary of the people. So it will tell you like ten to 20, how many people have their planter to tardy? How many people are there like that? Okay. Creating a headstock, sorry, creating a histogram provides a visual representation of data distribution. Histogram can display large amount of beta and the frequency of data values. So like what it does, it will group the data into the ranges. So it will give you the frequency like ten to 20. How many? So it will give you the frequency as well supported particular data value. It will tell you how many times it is occurring, the frequency of the data values, the median, and the distribution of data can be determined by your histogram. So median and the distribution of Rita can also be determined by histogram. In addition, it can any outliers or gaps in the data supports. We have the data from ten to a hundred and forty two fifty. We don't have any values. So it will show you that 40 to 50, it will not show you the Barstow, the graphical representation. You can find that, okay, 40 to 50, we don't have any implies. Ok? So the, it will solve it with a gap in the data and it will also pseudo outlier. Suppose you have like ten to a 100 and then you have another bar coming on from 170 to 180. So all the data belongs to ten to a 100 and another graph is far away. And it is showing 170 to 180, so that 170 to 180 ranges and the outlier that easily we can identify by looking at the histogram. So histogram are great way to solve this jolts up continuous, continuous data that I told you earlier, and such as height and weight, if we want to solve, the histogram is best suited for that kinda break down. So here I have. So I'm just showing you how you can differentiate the bar chart and histogram. So see you in the histogram there is no cap. It's a very continuous like for 68 upto 24 and barr south coming right in. So here you can see the difference between histograms and bar chart. So here you can see the histogram, it very continuous like that. Ignore gaps between the bars, right? And here the bar chart you can see it is showing you that different, like January, February, March, and there is a gap between the bars. So this is the main difference that like in histogram, the bus will be, there will be no gap between the bars and the bar chart will have some gaps between the bars. Okay? So that is one pictorial difference that you can find between the histogram and bar chart. Now comes the distributions of histogram on how many types of histograms are there. So normal distribution you can see like this. So in a normal distribution, points on one side of the bridge are likely to occur as on the other side of the Everest. So ceo. The data this side, left side, and the term on the right side are almost equal, right? So that is the normal distribution. And if we go to the bimodal distribution, it will show you in a bimodal distribution, there are two peaks. Ceo, there is one peak, and there is another peak. There are two peaks in the data. So this will be bimodal. In a bimodal distribution data we separated and analyzed as separate normal distribution. So this is one normal distribution and this will be another novel distribution. And when two normal distributions come together, it will create bimodal distribution. See, yeah, this is the normal distribution and this is bimodal distribution. The third type of distribution, a histogram is a right skewed distribution. What is right skewed distribution? Histogram, right is true distribution is also called a positively skewed distribution. Why it is called positively skewed distribution because see the skewed values are coming on the right side, gyro to infinity, right? So these are the positive values. So when it is right-side, the push to values are skewed. It will we called right skewed distribution. In a right skewed distribution, a large number of data values occur on the left side, with a fewer number of number of data values on the right side, see here on the website, more data values up there. And on the right side, the number is decreasing, okay? Alright, is through distributed equally across when data has a range boundary on the left hand side of the histogram, for example, boundary of g. Okay? And the next one is left skewed distribution. Here. A left skewed distribution is also called negatively. Why negatively? Because on the negative side it is getting skewed, okay? In a left skewed distribution or large number of data values occur on the right side. So see, yeah, the number is increasing from left to right. So when we're moving right and the number is increasing right? And fewer number of data values on the left side. The left side is lesser value than right-side, more. A left skewed distribution. And a right skewed distribution usually occurs when the data has a range boundary on the right hand side of this diagram, for example, boundary such a 100. Okay? So these are the four types of histogram. One is left skewed distribution, then we have seen right skewed distribution, then the bimodal distribution, and then the first one is normal distribution of histogram. Now, are you just hist function at JIST hist function to create histogram. And it will take few parameters or arguments to draw a histogram. Okay? So yaks is what ejects X is a vector of values for which histogram is the charge. So this x is a vector for which we want to draw the histogram. The main, main. Main. Main is the title for the histogram, and xlab is the axis labels, okay. Like frequency or any other thing, if you want to port. And then x Lehman white limb are the ranges of x and y values. Okay? And then breaks prefixes, one-off vector giving the break points between the histogram sells a function up computer vector breakpoints in a single number giving the sales of a histogram. Okay, we'll see what breaks in detail. Then we have the color that it's color, and then we have the border that you know. So in the next lecture we will see the example for how we can use the hist function to draw a histogram. So we'll draw a histogram using hash function in that next lecture.
43. Drawing Histogram using hist function: Hello and welcome back. So in this lecture we are going to write our first program for histogram. So what we'll do, we'll create and vector that will contain our data. And then with that vector, we are going to make a histogram. We are going to plot that data as a histogram. So let's get started with this. So what I have done, I have already created a file that is histogram dot, and I have written the code. So I'll show you what is the course. So I have written the code already so that we can seep time in writing. Okay, so first what I'm doing, I'm creating vector data for the graph. So I'm creating the data yet, so I'm creating a x, I'm creating a vector and that vector two x, okay? So X is a vector that is containing data like preprint origin seventeen thousand and fifty thousand six repeats external support. The salary of few implies I'm storing in this vector, ok, x. So this vector x is our data. And that is containing, suppose the salary offer you implies, okay? So this is the salary of the implied for Pintos. That okay. There are some dangers like PIP2, 2050 to 6020 to 4040 to copy like that. Okay, so now what next thing, what this is going to be, the very simple histogram example. Ok, so now we have the data. I want to plot a histogram. So what I'll do, I'll create foster histogram image file. Okay, so I'll use PNG function and will give file call to histogram dot PNG. Okay? And then what I'll do, I'll use the hist function that I have told you in the previous lecture. So here I'm creating the histogram, okay? So I'll use the hist function and I'll pass this X. X, X is the data for which we are going to plot the histogram. Okay, so these, these values will go here with x. Okay? So X is a data vector, vector data. And then what I am giving X lavish call to salary. And then color I'm giving as a green and the border I'm giving yellow. So I'm not giving too many. Bad. I'm just here. I'm not passing too many parameters. Yes. Only xlab that I'll give salary than a color I'll give green and the border and Gibbs law. Okay? And then I'll see if this graph image file to our system that is dev.off. Okay, so before we move, let me set our working directory. So to do that, what I will do this and the eta, what I'll do, I'll see an open going into this directory. And next thing what I'll do, I'll go to more. And yet what I'll do, I'll say to working directors, okay, so now I'll skip this. So I hope the steps are pretty simple. I'm creating a vector X with the data. Then I'm giving Eugene PNG function to create an image file for our graph so that we can see and we can visualize and then storing that on our local file. Dot off. Okay? And I'm using the hist function to create a histogram with the data and passing expert to here. And then xlab, I'm given the name salary and color green and bought up will be yellow color. So now I've saved, now I run thus this whole source file. Click on source and this is done. Next thing I need to go to the Dr. Alan CTO are 20 and see here, our file will be there. Histo. Let me check the file name. The file levies histone graph dot PNG. And if his true name, H2 crumb graph PNG. Okay. Right. I'm going to lead to all those that I've created. So now we have the hash token off x. And here what we are creating, I'm giving them, let me remove this, delete this, and let me run the program again. Didn't want this to red and yellow. Ok, let me run this again. And let see whether the file its histogram graph. This is salary and here frequency. So and then here are the ranges, like ten to 2020 to 3030 to forty thousand, forty two thousand, and PIP2 60 thousand. So now you can see how histogram is being broad. So ten to 20 thousand, how many implies are drying salary between ten to 20, 1-2-3. Let's verify this with the data. Okay, so let's go to 20 to ten to 20, write ten to 23. So ten to 21 tool. And then we have another You have one lead to YT swing, OK, then we have 20 thousand or twenty thousand, seventeen thousand and fifteen. And so these are the three implies. Who's trying sanely between ten to 20. So it is showing ten to 23, right? See, ten to 23 implies. So this way it will divide the data into the ranges and make them our groups, or ten to 23 implies then pointer to 32 employees. Let's verify data as well as 20 to 31. And and this 22. So C 20 to 30, we have two employees, okay. Twenty five thousand and twenty two thousand. That is also true story. Then 3242 again, 324234321, ok. 30 to 40 also to next is 4251. Leave one employees there. Let's see, 40 to 45. And since this is more than 40, so 40 to 50 only one implies during next is last 150 to 60, we have four implies. Let see that 52, more than 312, then three, and then four. So we have four, right? See, Steve, we have four implies. So this way it will group the data and it will tell the frequency. Like if you see this, you can easily tell that 50 to 60, more than 50 thousand salary for employees are drawing ten to 23 employees. 2232 implies, and 40 to 50, only one implies that who is getting salary 42 by three. So this way histogram will be drawn and it will group the data based on the frequency. Okay? It will define a range R beta, and then it will tell you the frequency of occurrence of Doc data into that range. So ten to 23 occurs, 2232 occurs, 3242 occurrence 42 cryptic chocolates, and then put a trustee for acronyms. So it will, what it will do, it will group the data into few ranges, and then it will tell you the occurrence of data into that trains. So ten to 20, there are three data, 310 to 20 and it is occurring three times. Okay? So this way, so this way we can set three implies that drying sadly between ten to 20. So this is the significance of Histogram. Ok. Next is what I have done. I have taken different data here. I'm what I'm using, 1145678910. Okay. So and then the same thing, I am giving a different file leveraged tomogram graph one dot PNG. And here what I am using x same edX, I'm passing year. Only the data points or different data values are different yet. Okay. And pretty simple later, 1145678910, ok. And then I'm saving it to, okay, let's run this and see what we are getting. So let me find histogram graph one, C here. See, now it is defining the data like 0 to two. How it is defining the range of 0 to 22 to 44 to six for six to 88 to ten. So g o to two, how many occurrences? Two occurrences that so g rho 2211, these two ones are less than two, right? So 0 to two to correct. Next thing is go to 41 to two for only 41, then 426242656 to write, then 628782 again, and then 910. So it to ten to 910. So this way it is dividing the data into the range of 0 to 42 to 44 to 66 to 88 to ten. And then it is giving us the frequency of occurrence up data between 0 to 22 twice two to four, once four to six. Repeating price, okay? To occurrence of values between four to six, Right? Yeah, 56. So this wheel, so you can change the color from here. You can make it black. And now the graph will be in the black. So this way we can kinda histogram. See you inside the next lecture, we will be seeing some other examples of histogram.
44. Using breaks xlim ylim in histogram: Hello and welcome back. So in the previous lecture we have seen how to draw histogram. And we have seen two examples, but two different data. So we have taken one vector. And then in the second example, we have taken the very simple data. And now what we are going to do, we are going to learn how to use break, how to use limit wireless written break. And pass these three parameters to the hist function to draw a histogram with break and hex limit. Okay, so I have written the code. So first thing, I'll use the same vector where we have the salary of your applies reading from 15 thousand to 60 thousand. So I'm creating an X vector that we have done in the previous lecture. And then I am creating a dish tastes and file where I will store this histogram, sum Eugene, P and G function. And I'm given phi lemma as is true with bricks dot PNG TO with PRX dot PNG. This will be the final length. And that will regulate it. The name of the file that will be created on which our plot will be, our histogram will be drawn, ok. And then here with the hist function, I'm creating the histogram. So I'm passing the x vector here at the data. And then I'm giving xlab name adds weight and color. I'm giving blue and the border I'm giving, that's green. And then x limit I'm giving. So here I'm using an argument that is excellent, excellent Ms. For the x axis, I'm giving the limits 0 to 40 thousand and see, yeah, our data is containing the values from 15 to 60 thousand, so it is going beyond 40. But here I'm looping, limiting that x axis where log2 40240 thousand, okay? And while m is G2 to ten, okay? So GY y-axis will be the frequency of occurrence of the data like genome to obtain a 0 to 20 whatever we create. How many implies our drawing salary between 0 to 2020 to 3030 to 40, like that. Okay. So that I'm looped into. And so, and then I'm using another parameter here, Brexit equal to two. So let me first make it one. Okay? So what I'm doing, I'm giving breaks equal to one. We'll see what impact this Blacks equal to. One is putting on our histogram, okay, and then I'm saving the file, the histogram that will be generated will saving on this IS TO with PRX dot PNG file. And four, to save that on our local scene, I'm using dev.off. So with this, it will be saved on a machine. So now this is the code, simple one-line code, and that we are passing some. Parameters with bricks, xlim and why I'm actually miss limiting the data values on x axis, y limits for Y-axis and break. We'll understand when I run this code. So you will be understanding better when you CDO output. And with that visualization, we will understand what Brexit equilibrium. Okay? So let me run this source hold source, click on the source and this will be RL. So let me open the output files. So here histo with bricks file has been created. Let it so now the file, how historic novels, so you see, since I've given breaks equal to one, we are seeing one live, one big bar, right? Good or to 40. And it is showing 0 to 48 implies or trying salary between it to 40. Let me verify that CEO, we have proton obtain and 12 34561234567. So 48, but we have only 70240, I guess 1234567, all around more than 40, right? 1234. Okay, so now we can see this graph. Let me Jane the bricks to two. So now you will see there will be two partition, okay, so vendor file again, see here, now we have 024002. Parties salary has been divided into two parts, Geno 2202240 and then 40 to 60. It is suing separately, right? So now the whole dataset has been divided into three, but 0240 since here. Since here we have invented or 240. So d row to 40, data to 40 data has been divided into two parts. Chido took 2060, C to 200 to 2240022002400 two twenty three twenty two forty four data and 40 to 65. Okay. So let me show you the difference. If I put t, t0 here, what will happen 0 to 60 data will be divided into two parts. Let me show you that we open the file again. Now to 60 has been divided into three parts because we don't have data beyond 60. It is not showing the other one but 60 we have good, we have beta. So it cater to 60 data has been divided into three parts. So the stage, what bricks does. So x lim, if I make it 50, what will happen? 0 to few pre-data will be divided into two parts. Okay, let me run this so you, so that you can let me click on Source and let me go to the file and open it again. Now, too few pre-data has been divided into three parts. Okay? Not it. 0 to 2020 to 40. And then for P2 60 comes together, right? So it is suing appropriately. If I make it three, what will happen? Let us see. Again. Cater to 2020, to 4040 to 60. Let me keep on experimenting. If i we can food. What will happen? See now it is saying ten to 2020 to 3030 to 4040 to 50, and then we will be separate. So 1234 has been divided into four, right? That's what we have asked for this or be divided into four. If I make it to 30. So 30 thousand data will be divided into four parts. So let me open the file again. Ten to 202230, because we do not have that many data, right? So ten to 2020 to 30 and then 30 and beyond has been divided. Because we make it to see. When do identity to parse. The first is 0220 and then 20 to 40 and beyond, right? So let me, let me put it like G dot to 60 and let it make for the output to 60 has been divided into 1234 parts. Let me open the file again to see the object file. Che 6110 to 2020 to 3030 to 4040 to 50 to 60. Okay. So this is what we do with the bricks. Okay. So this is what xlim and wildly, wildly molto you can, as CEO, what why limb is doing? It is restricting the y-axis frequency to 0 to ten. So suppose if I change that, if I change this to, suppose you only six. Let me run this and open the current file C. And now we are seeing frequency 0 to six. Ok? So this way we can restrict x axis and y axis with the xlim and wildlife. Suppose if I make it for what will happen. It's good to experiment with the court to see the output. And when you see the output, you will know the actual use of the parameter CEO for one little for it is, okay if I make it. Suppose omega two, what will happen? Now I'm changing the y-axis, the frequency. I'll see you see, you know, it is something like this only frequency occurrence up to two, it is going great. So this way we can play around with the chord in any programming language, be it our Python or anything. If you want to learn, you have to start playing with the code and start playing with the parameters. And if you change the parameter, you will see the impact of that parameter. And in that sense, you will learn better and you will get the object implications and impact of that particular, particular parameter in a particular function. Like in his function, we know what is X, what is, what is color, what is border, what is x? Lim. We have seen how we actually miss impacting the histogram, how, why limped parameter is impacting the histogram and how bricks is impacting the, impacting the Histogram. So when you experiment, when you play around with the code and the data, you will get better insights and better understanding of the course. So I hope you got to understand what is xlim wildly and breaks and how they impact the Histogram. So with that, let's see you inside the next lecture.
45. Basic line chart for time series with ggplot2: Hello and welcome back. So in the previous lecture we have seen how to draw a histogram. Now we are moving to some very interesting graph. And you can say a basic line chart. And this is important because this is going to be important when you learn more about time series of problems in your data science journey. And this is, in this lecture we are going to see how we can draw a simple line chart for our time series using g, g plot two, which is a package in R. So what is time-series forced? We need to understand that. So let me take you to the basic, very basic definitions from Wikipedia about time series. So a time series is a series of data points. So take basically a series of data points indexed or listed or grabbed. So it can be indexing of the series of data points, or it can be lift up series of data points, or it can be a graphical representations. Well, Syriza Peda points, okay. In time order. So it can be anything but our time model. So a Syriza breeder points index, time orders. That means though the data points should be built on time, so it should be based on the time. So when we have beta which are built on, which are based on the time for a particular time, video, a particular time, and it is based under time then that Syriza Brita, we can, when we plot, we get a time series plot. So most common, Lee, our time syringes sequencer taken at a successive, equally spaced point in time. What does it mean? Means a time series is a sequence that we have locked right? Times there is a sequence of data points index in time order. And most commonly at time series is a sequence taken at successive, equally spaced points on a time. Suppose on our timeline, we are going to take successive, equally spaced points in times of all 0224 hours. And then we will take 0 to three and then three to six hours, 3-2, nine of nine to 12 early success you equally spaced point click 33 R's interval we are taking. So that will be a data 0.3 to 66 to 99 to 12, like that. Okay, so I'm giving an example. It's not objecting. But you can understand like what timeline to our forearms and few things are happening. Each are inspected each minutes, right? So when we document those data points, like on a faster something. This is the data on the second art. This is the data on the ta-da. This is the data. So when we define that data successively. Quali, i spaced point in time. It will give you the time series, sequence of time series. Thus it is a sequence of discrete time data. It's not continuous, it's discrete in nature like because we are taking a equally space time points from that data. So we are making a discrete. So this is the definition. Now what we will do, we will try to understand this by plotting it when we plot, we will understand more about it. Okay, so before we plot, we should install the library GG plot too. So you can go to the package and install that g, g plot two. Ok, so for this we need Library GG plot two and the player, okay? So these two libraries is required. And then what we do will create a dummy data. So what I am going to do, I'm going to create a data frame that will contain day. They will be agitate as dot-dot-dot means it will take date in this format like foss Jan, totally 15. And then what I'll do, I'll I'll, I want from here to G 2.3652360 based. So from here to 365 days on each day, like first Jan and 31st December, 2014 like that, I will take 365 days. And for particular date of what I'm taking the value, I'm using the runif function. You know what is runif function running fox. And it will create what, what it will do. It will create the sequence of numbers, right? For 365. So it will create a sequence of numbers. So what are the runif function will do? It can reach the random deviates. So for uniform distribution and is written as runif. Ok, so what it will do, it will gender easily generate the n number of random numbers it will render. So what we're doing here for value, we are generating initiative 365 random numbers for each day. So we'll assign the first random generator number to date, one, like this one. And then the second randomly generated number assigned to the second date. Okay? So we are not just only using the randomly generated numbers from this tiny function. Okay? But we are also adding into that by, we are also generating a sequence from minus1, 42 to 40. And that sequence we are, what we are doing. We are taking a square root of that and then we are dividing with the 10 thousand, okay? So basically we have the output from this. We will add into what we get from this runif function. Okay? So runif will give the random deviates in uniform distribution. Ok? So what we get from this to date, day and value. That we will store in this DataFrame. So we are creating a data frame which will, which will contain two values, t and particular value from these two functions that we get will see how we, what are the values we are getting. Okay? And then what we'll do, I'll just print the data to show you what data we are getting. And then with this DataFrame data, what I'll do, I'll use the GG plot to plot this data. So I'll pass this data frame. And then what this AS will do, I'll provide xs, xs as a D, and y axis as a value. Plus I'll use the geom line function two. And then xlab, I'll use, okay, and then I'll print the what I'll print, I'll print up plot. Okay. Next thing, what I'll do, what this geom line function will do here. You can see geom line from x1. What it will do, it will connect the points in order of the variable on the x-axis and G. And it will, actually, it will create the just like stairs, step plot. Ok, so basically what it will do, it will join the points and it will create the graph. Okay, so here we will be using Phew symbols like percentage d is a, d as the number 0 to 31% H0 will be every V8 it as a weekday percentage capitally is like an abbreviated weekday, Monday. And this will be a loyal and percentage jammies. Month 12 and percentage B and capital B will be abbreviated and an abbreviated month like Chen will be abbreviated or full generative will be non abbreviated. Ok. And then percentage y and percentage capital Y will be 2l. Percentage Wyman's two-digit year and percent is capital Y will be four digit year. Okay? So this is the basic abbreviation that we are going to use. So let's go to the RStudio and try to run this course. So we are using GG plot AND Player library here. And then what I'm doing here, I'm using data. I'm creating a data frame here, dummy data with like, agitate, and then we are creating random value using this runif and sequence. And then I'm printing the, whatever we are getting from the runif that I'll show you what we get from the runif function and what we get as a data. Okay, so let me run the score. Okay? So let me run this tail here. So now I'll show you what we are getting. Okay? So see here, for the runif function, we are getting this value, okay, 0.3444. Like that, it will generate the 365. Points. Okay, and then for the data we're getting day, like January, first Jan total footprint, and we are getting this value. Okay? And then for the 31st, December third, fourth protein we're getting, so we're getting this data and value in our data frame. And now what I'll do, I'll pass the data frame to the GG plot data and then exit call today y equal to value. And I'll use the geom line to plot the data. And then what will be the, what I'm using here? I'm using the abbreviation like scale, underscore, underscore date. And here I am giving the date level sets x, v. Ok. So let me run the whole code, click on the source and see the plot here. So let me write the C. Now. We are getting a time series, three-year Jan 20142014, July 2014, October 20142015. Okay. And if I run this, what we will get, we will get abbreviated January, July to one n. So it is going like this. If I use capital Y, b, small b, and d, what we get, we will get the full year like 2014 then gen one doesn't 14 upgrade one like this. Okay. And this is the week, right? So it will show you the week, say like GOOG aerobic 13th week 263393, and so on. Okay? And if a huge mountain y, it will give us y mountain YC. So it is going to January 2014, Apprentice 2014, like that. So this is the year we have seen the month. On the x-axis. Here we are seeing the mountain yet we are seeing a particular value at that point in time. Ok, so this is the way we can draw a simple time series. Here we can modify this plot a little bit. And here I'm using SERVIR teams librarian that you can, if you do not have, you can go to the packages, click on package, click on Install and you can install it, and then you can use it. So I'm using the same data that we have created enough previous. Just now. And here what I am doing, everything I meaning same in the geom line, I'm providing the Colorado green. He see here is a black. Line. Four, No, I'm changing that to cream. So here geom line, I'll provide color equal to green. So it will be that. Graph will be in the green color. And then team underscored ipsum amusing plus for the team, what I'm giving X dot TXT got x element dot dx t, actually angle. I'm giving five and at just I'm giving one. So what this angle 45 will do, it will turn this graph at 45 degree. So let me run this whole code. See, now the graph is coming like this. Have the 45 degree, so it is converted at 45-degree. So this way we can use this angle equal to 45. Suppose this T. Let's see what difference we are getting. C and now it is slightly change. Let me change it to 160. Let's see if we are making any difference. Like that. It is coming like this. And when I am doing it like suppose 90 degree, it will be coming in on different way. This presentation is changing late January. This thing is sending rate. If I make it ten degree. And if I plot how this whatever we are writing here, January 2019, that angle is getting changed, right? If I make it, suppose 360 degree, let's see how this gendering. Let me make it one entity. When you play with the data, keep on. Understand what truly our difference you are getting right? It is January 29, January 19. So this is the difference. So this way we can and that level, okay? So x-axis level, orientation, weekend chains like this, okay. If you put 90 degree. So I hope now you are clear what different This angle is making right? Now it is at the 90 degree. So this way we can use the element x and we can change the orientation of the decks like January, like this. Ok? So this way we can do that.
46. Scatter Plot and plot matrices in R: Hello and welcome back. So in this lecture we are going to learn about scatter plot. So what is scatter plot? Scatter plot is a type of block or mathematical diagram using Cartesian coordinates to display values for typically two variables. What are setup beta. So it means it's the symbol. If you take a x, y plane, we need to put points on the plane x and y, x comma y point we need to plot. So we will not draw lines are same, are anything, but only the points where it will port. So support the Majlis and the number of cylinders, so forth, cylinder. And the number epsilon on the x-axis and the mileage on the y axis. So it will be like four comma 15. The number of cylinders for and the mileage is shifting to pour come up your pin 1 on the Cartesian plane we will put. So it's a pretty simple one. And if so, if the points are coded, one it is two variable can be displayed. Okay, so what I am doing here, I am going to draw one dataset that is empty cars and that is the inbuilt data St Alban level with us and that we have already used in our earlier examples. So I'm going to use empty cars data set that is readily available with our, our enviroment. So I'm going to do what this empty cars will have. It will have though. It will have the data regarding the car's like number Upsilon Dar mylist, and weight of the car and other things. Okay, so from this dataset, what I'm going, I'm going to pick the number of cylinders and mylist, and that means number of cylinders and the Majlis. And the particular cost that I'm going to fetch from this dataset. So I'm trudging. Recto. See cylinder myList cl, see my lists, that missile in my list. And I'm storing that into this object. Okay, so if I run this to what we like it. So I'm getting details for values costs like Mazda, Datsun, and I'm getting a number epsilon dot is 64 module.exports and the mileage is 21. In the similar way, Lotus europa number epsilon, that is 4344 band theta L h cylinder. And the mileage is pre-print 0.8 while Woof 142, e number epsilon, that is four and the mileage current ones. So this way we are getting details for many cars running a scars and there my list. Okay. All right, so now what I will do, I will keep the etas, scatter plots showing the cards and then myList. So for that what I'm teaching, I'm using a PNG function and I'm giving the scatterplot filename as my list plot dot PNG. And then I'm using the plot function. Simple plot function will be used to trot this scatterplot. And yet I'm providing the x equal to input. So x is basically the input vectors. So u at x equal to what I'm providing. I'm providing input as a number of cylinder. X axis will be input cylinder. It means I'm getting the number of cylinders, okay? And y, x, this will be important. Mpg, that input dollar embodiments, I'm getting the this mpg data, MPG. So that is my L2. X axis will be cylinder number of cylinder and y axis will be my miles per gallon. Okay. And xlab I'm giving X axis x-axis level. I'm given number of cylinder because I'm putting number epsilon does still. And for y exists, I'm giving name as my list. And x limit I'm given four to eight because number epsilon rules or varying from four to eight or ten. Okay? So I'm given four to eight for X axis limit and a y-axis limit, I'm given ten to 35. Okay? And graph name or a scatterplot name I'm giving number Upsilon did cylinder versus mylist. Okay. My let's up the costs. Okay, and then I'm off to save the file. So it is pretty clear here, no, save this file and use this name as empty cylinder. Okay? And now let's run this. So let's run this source, right? So now we've just successful later on. See the output file is to be scattered. Plot, plot. Okay? So see what is the name we have given. We have given mindless plot dot PNG. So this is the block. Okay? So see ya know that Jack name, scatterplot theme with number of cylinder versus mylist. And x axis, I'm showing number epsilon dot let each 45678. And y axis's Majlis 10152025, 30-35. Okay? And here with each point, suppose this is the point. This point is four. Number epsilon that is four, because x axis value is four. And y axis railway somewhere around 21 or 22. Okay? So four-cylinder and my lenses frontier, one for cylinder, Another 0.423 or 2023, then 244 comma 24, and then four comma two and D6 and D7 like that for comma 314, comma 334, comma four. So number of cylinder for and these other mileages for the four cylinder costs. And C, Yes, It is sown as a point, x comma y point on this Cartesian plane, two-dimensional x-y plane. And we are putting up points like number of cylinder for and mileages 21 something, okay? In the same way, x exists six myths number actually under 6, this number epsilon dot six cards which are having cylinders as fix, they had having these many mileages, didn't this my lens like footprint, 16-17, rm 1516 to 20210. This we can know that six cylinders costs that giving less mileage compared to the four cylinder costs. And your Sri Come to the air cylinder costs, they had mileages for that low. So the age cylinders costs are giving the Majlis and the lowest. So the myelin is giving Gaza the number of cylinders. Four, then six. Please. With this scatterplot, we can come to the conclusion that the number of cylinders are deciding the mileage up that cost less than number of cylinders in the car, more will be done. Myelin, that means if the number of cylinder is increasing, the mileage is decreasing. Ceo parties for the mileage is more six miles reducing and the number of cylinders is coming to eight. The myelin is further reducing any launched among the 468 cylinders costs. So number a mileages inversely proportional to the number of cylinder, a cylinder. Number epsilon dot in the carcinogen is less. My legend will be more and the number of cylinders in the car will be more, mileage will be less. Okay. So with this type of scatterplot, we can easily come to the conclusion by just looking at the graph, okay? And it is the, one of the simplest graph that you can see and it is pretty easy to analyze and get the details from the Chartist set. Next thing what I am doing, I just keep the file name as empty Gosse and thus cost scatterplot artist can mattress. So what I'm going to do here, I'm going to use the pair function in R. And with this pair function we can make a like mattresses up the chart. And how we can do that. We can date the data points from the empty cars dataset and that each weight, my lists per gallon displacement cylinder from this dataset. Empty costs or data. I'm getting data equal to M precast means we are using this empty costly, does it? And from this, we are using these four variables are four columns, like wait, mpg, miles per gallon, displacement and number epsilon dot. And these four columns, these four parameters will be used to make drops like weight will be taken and the scattered plot will be drawn based on weight. And my Liz weight and displacement. We then number epsilon dot in the same way will be taken. And then the graphs will be don't like Mileage Plus weight minus, plus displacement. Then my list plus cylinder. So Majlis versus cylinder, Majlis versus displacement, mileage Ross's weight. So this way, one variable will be taken and the graph will be drawn with respect to the other three variables. And here I am giving empty costs scatterplot matrix, flirt name. So first let me come into the main one and forced I look Jews. I'll start with the simple one. I'll just try to draw the MPI Majlis versus number Upsilon dark route that we have drawn here without payer. So I, what I Lew, I'll try to draw the duck pairs, one leaching, MPT and cylinder. And let's see what will be the output. Okay, so let's run this. And the filename is this c. C, So the theme output. But here, what we are seeing, we are seeing in a matrics bond. So here it is, swing mpg miles per gallon and yes, sorry, number of cylinder. So you can see here mpg miles per gallon. And here number of cylinders. So you can see your number epsilon, that is for all the four cylinder car, sorry, 0468. They are clustered. And for respective mileages, I'll heal the four-cylinder myelin. He'll then six cylinder mylist and each cylinder myList. So if you combine these two will get this scatter plot. If we combine these two, we will get this one. Ceo, we are getting the layer, we are getting this okay, number of cylinder into my lists. The, this graph we are getting, I'm going to upsell ended and my lit. Now we are seeing the y-axis. This like these will be the Majlis one, okay? Number epsilon does being Sonya and the Majlis being Sonya separately in a matrix form, right? The same thing but in a different representations. Ok? Now, let me go to the code and let me come in this line and uncomment this line. Now I'm Eugene dot four variables. And let me run this. And now see the graph is coming. So open this. So now see the four variables, weight, MPG, displacement and cylinder. So and then here, the weight, the Majlis displacement and cylinder are being sown here, right? So this waste is 18123456789101112416. Blocks has been drawn. Ok, let Miss 12 plus actually, and with this four things, so four by four matrix element, right? So this way we can use these beers to create the plot metrics for that data set. So this will create the scatter plot matrix fails, we can use to create block matrices. Okay? See you inside the next lecture.
47. Finding mean in R: Hello and welcome back. In the coming next few lectures, we are going to learn about statistics in r. So we will see what are the functions that are inbuilt in our programming that we can use which pot? Statistical analysis in R. And that will be very helpful in Saudi, that will be very useful in machine learning, artificial intelligence, deep learning, and all those things. So statistics is the main thing if you want to get information from the data. So analysis is the fundamental for the machine learning algorithms. So we certainly knowing what are the statistical analysis and Tom's. So in the coming few lectures, we are going to learn about these things as tactical analysis in our programming. So let's get started. So in ours, there are many inbuilt functions through which we can do statistical analysis. And these functions are very useful and just unit to use the function name and New York to pasta. Datavector took it with some arguments and your work will be done. So it's pretty easy to do all this analysis in R programming. So now we are going to see what is mean and how we can calculate mean. Then we'll also see what is median, and then we'll see more. So we are basically going to learn in this lecture, mean, median and mode. Okay, so less constructed. So first thing, what is mean? So mean is calculated by taking some of all the values divided by number of values. So it is like average. So you know how to calculate the average. Say suppose you have a numbers 12345 and you want to get the average of this. So you'll do one plus two plus three plus four plus five divided by number of wells. So the number of arrows TLR5, so divided by five, so you'll get the average of the averages, like mean. So mean is also mean is the well-linked to the embraced. So mean is sum of values in a vector data, some of values in the data divided by the number of data. Okay, so let me show you how we can do that in R. And for calculating mean, there is a function in R called mean function. Okay? So let me write a program for that. First, let me tell you what is the syntax part, mean, and then we'll go and write those. So for mean. We use this function mean and what are the things it will take? It will take x and that will be the data vector. And then it will take another argument that is equal to 0 and then n0 dot m. So what unaided Armando and what this stream is, we'll see when we do our practicals. So it's just n dot r m equal to false. And next it. Okay, so this is the basic mean function in art. X is the datavector trim, and then an a dot autumn. So what is x? X is the input vector. So let me tell you here, x is the input vector, which will contain the date data. And then trim. Here we are using. The trim here is used to drop some values from both the ends. So if you give trim equal to 0, it will not drop any values. But if you give trim one, it will drop one values from each site. So it is too dark. Values from both ends. Both the ends of this x vector. Ok, we'll see and we'll understand better. And NA, RM will, what it will do. It will just remove the missing values. So it is two to remove the missing values. So x is the input vector that datavector trim is to draw fellows from both the ends and NMDA autumn is removed. Missing values. Suppose you happen dataset and we had lunch, some values are missing. So if you want to remove those values, you can use any dot or m equal to true. So when this ended up RH or the false, it means it will not remove the while loops which are not having any. It will not remove the indexes which are not having values. When you use this Azure proof, it will remove all the missing values, okay? You can put it true or false based on your requirement. So let me comment these out. So now what we'll do, we'll create simple vector. So for time of doing yo, I'm going to create a vector that is input vector. Okay? So I'm going to create an input vector. So what I'll do, I'll use suppose I lose data or deluge A1 as the input vector on a simply a. So I'll use a and I'll, I'll, I'll give some random values here. Okay. Suppose I have 78967125634 to forty five, eighty nine, ninety seven minus 2x minus 30.32 minus 21, almost all the Trello. So we have an input vector which is containing these many Queloz. Okay? And now suppose I want to find mean. So and so what I can do, I can simply use suppose mean off a, that I'll create a variable here. Mean underscore a, that means mean of a. Ok. And I'll simply use mean function and I'll pass this E1, sorry, a, I'll pass the EBIT here. So what this mean function will do, it will take this date, this vector as input, and it will calculate the mean of these values. So let me, but into this mean underscore a, so it will give us the mean of these values. Okay? Let me run this. See, we are getting mean value as 33.78571. So this is the average of these values or mean of these values, okay? So this is how we can find them. I mean of input vector from the input data mean of the data. So these mean of this data is 33.78. Okay? So next thing, what we are going to learn is how to apply and adopt Arabic. So suppose I'll create another vector here. Let me put this dominant console on the left. That will be Eji. So I'll go to the paints and I'll put this on the right so that we can see the results right here of it. So mean is here, okay? Now what I'll do, I'll, sorry. We are going to remove the missing values. Okay? So suppose I have the same vector with some missing values, okay? So these are the missing value. Suppose one is, yeah. Okay. So for these, we don't have beta is NA not applicable. Okay? So how to remove these missing values? So we can simply, if I use mean, underscore a sari. And I mean what we get the result as let's see. So see, we are not getting any because it has the NA values. Okay? So now if I vote mean of a comma and dark Adam, what will happen? It will give us, let's see. Sorry, I have huge N. That is wrong. So let me run this again. See we are getting the value now. Okay? If I the MOOC and NAIDOC or to prove from here. And if I run it again, what will happen? See, we are getting any, Okay, so if you want to get done mean of this vector dropping the NA value, then you have to use an indirect autumn is called Pro, and these will give you the, okay. So next thing, what I'm going to do, I'm going to use dream functions, so using trim option here, okay? So what I have done, I have created a vector c, which is having value, simple values 1234567. Ok. And if I run this and find the mean of this will get some mean value. That is four. Okay? So and how we are getting four, because one plus 23610152128 divided by seven is equal to four. Ok? That's just simple Evers. Okay? Now, what I'll do, I'll simply for seven here. Then. I'll put one here. I have put over here, and three here. So simply, I'm just putting the values that a random place. Okay? So this is not a sorted array, right? So if I find the mean of this, again, I will get the four. Okay, we've got the same values, right? If I use m equal to 0, what will happen? Let us see. We will get the four right? Now, if I use G 0.13 equal to 0.1, what it will do, it will sort these Addie. It will sort this vector in ascending order, and then it will drop the one where one value from each side. So let's see what mean we are getting. You see, we are getting for Y because c, it will drop one, C1 and the leftover aloof, or two plus 244 plus 36 plus 399 plus 514. And then plus 62020 divided by five is four, right? If I dream to water to do it will drop down to Alice from its side. And again we are getting, so let me put this the original, the sorted one. So I'll just come in down here and I'll put 1234567. So what this trim equal to 0 to 0.2 will do, it will remove 12 from this side and 67 from this side. So what would lead? 3-4-5. So it will be live with c equal to 33 to four comma five. So seven plus three plus 47 plus 51212 do at right 34. So that's why we are getting four again. What will happen if I put j recall 2.3 and run this? It will sort and remove three values from each side. And again we'll get four. Why? Because if you remove 123 from this side and 5-6 m from this side, again will get the four. One leaf for will be left, right? Only four will be left and when affordable before, okay, so that's how it works. Let me output to some more value, 0910, Saudi 910 here. And let's run this. See now we're getting the mean as phi white because it will remove three wellness 1-2-3 from here and say when eight, saving 910 from this search. So whatever left, we're left with 456. So 4561, what will be the mean? It will reach 15, right? Some of the 4-5-6 scripty 15 divided by three. That means number of values three now, so footprint right, right, three we'll give, we'll get five. So this is how we can use the trim function. So what it will do, it will sort this at a sock, sorry, sought this recta and it will make it like this. And then it will remove three. Will I lose from East site, from the, from beginning three and from the ending tree. And that remaining it will cut through the mean. Ok, so this is how this dream walks. Okay? So if we want to remove the values from beginning and the end, both the ends. And we want to find the average on bean, you can use the trim. And logistics is 0.10.1 means one valence from his side. 0.2 means two where it is from the same 0.3 with three valence fermi site. Okay? So this is how it works. And if you want to remove the missing values, you can use a name dot Arabic called True. Okay? So it will remove this anywhere loose from the dataset and it will give you the remaining values, and it will find the mean of those remaining, remaining values. Okay, so this is how antidote autumn and trim walks. See you inside the next lecture, we will be learning about median in how to find median in us.
48. Finding median and mode in R: Hello and welcome back. So in this lecture we are going to see how we can calculate median of the data, Okay? So how we can find them median value. So first thing first we serve be knowing what is medium. So median is the middle most value in a data series. So suppose we have this data series. So we want to find the middle value of this data series. So let's get started. Okay, so middle, most relevant the value but gums in the middle. Okay? It's not like mean, which is the average value of the data series. But this is the mean, which if we plot this data on some x's, so what value will come in the middle? That is the value we are going to find. So medians and we didn't need the middle most value in a data series, okay? And to find the median, we use the function median in R. So we'll use function median in R. So not me, DnaA heights, medium. So we will use this function, median, Okay? My DA and medium to find the median, the dataset. So this is the dataset I have created and this is the median function. And what are the values it will take. It will take the E and the input vector. And then it will take any atom. According to our drew, whatever you want, you can give. Okay? If you want to remove values, you can put Adam dot. If you want to remove the NA values, you can put an autumn equal to false. We have seen how to use any dot in the previous lecture where we have discussed about the mean. So the same functionality and also so any dot, autumn, quarter, false mint, it will not remove the missing values. And if n is not equal to true, it will remove the missing values. Okay? So now if I run this, what we like it, I'll get, I'll get them median of this data series I'll put, suppose I'll use and the scope. And I'll assign this median to this, but doesn't have this well, this variable median underscore a. So let me vent this. So if we run this, we get the median as like some middle value of this data. Seizes series is nine, right? Suppose I'll use, I'll put some more random venues like 6745. Twenty two forty five forty seven point eighty two, ninety nine, seventy nine. And now if I tried to find will get 33. Ok. Similarly it will, it will try to find the middle most well off the dataset I will call t and frames, and this will get 44, ok. So it will try to find them we deal most well of the dataset is okay. So this is how we find median of a data series, our dataset. Now, next thing is we are going to launch that is finding mode. Okay, so for now, we'll use, will try to find them more to what we have learned here. We have not finding median. So now we'll learn how to find more. So what tease mode, mode is the value that has highest number, apocalypse inner city. Suppose we have this, let me copy this. And liquid create the dataset for this finding mode problem, ok, and appoint the seas. Our dataset where we have support this 4545 is occurring many times. So 45 then yours, I'll remove this pseudo 45, and then I will get four to our soul. Tbl 45 visually fittings 1234 times rate, suppose, so. Dc finding mode with more Easter number of finding the maximum number of occurrence of o value, right? So mode is the value that has highest number up occurrence in our dataset, like mean and median and mode can have what pneumatic and character. So first we will see how to find mode off these numeric data set. And then we'll say try to find for the character dataset also. So if I use mod because there is no inbuilt function to find more in R. Okay, so what we will do, we'll create our own custom function to find them maximum number of occurrence of all data by Illumina data say to okay, so for that, what I'll do, I'll create function. So what I do know it's create your widget for some, okay, so I'll create rigid frozen mode, okay? And what I'll do, I'll give this name as they've done this quote mode. Okay? And for these water use x1. And I'll boss this e value. Okay, so what is a dataset? Okay? And under this what I'll use alpine, unique, unique underscore a so unique value in a net, what I want to find. So for this, what I'll use, I'll use a unique function that is in. And what I'll do, I'll pass this. So this unique function will give me the unique value, this dataset. Okay? So let me simply bring this. Know what I'll do. I'll simply use unless core. And what I'll do, I'll does all this written mortals and okay. So yes. And I'll pass a liquid on this. Okay. So let me come in this and I need to put into the morgue and less code. So let me run this again. So see here what I am getting now. I'm getting the same this dataset right? Now what I need to do, I need to do some calculations here. I need to write some logic here to find the unique value number apocalypse sample, particular value. We'll find out like this 45 number occurring number in this series. Ok, so for this, what I'll do, I'll use unique underscore a and what will apply here and applying which dot max. And then I'll use tablet tabulate, emulate. And then what I'll use, the match Swanson. And what I'll do, I'll use, I'll pass the eax coma unique underscore a. Ok. And now if I get the unwritten, if I run this, I'll get Dem mode. Okay. See, yeah, 45, I'm getting done. Resultats 45. So this is how we can. Cd80, user-defined function for finding mode. So here we are creating our own function will return underscore more, which is taking this vector as input. Vector are the input data we are giving the data here. So this will, this dataset will go inside this function. And here what I'm doing, I'm creating another valuable and unique underscore ray. And here I'm using the inbuilt function in R that is unique, unique OFF. So it will give the unique well Wolf a. And then what I'm Eugene opera, this finding, this unique underscore a, I'm using dot max, which is the, which is occurring maximum number of times. So and I'm matching with this a, this original dataset with this unit dataset and finding the which number is occurring most of the time and I'm getting the results. So here I'm creating either mod underscore a and I'm just calling this function and it will give us the most number of occurrence, more staggering number from this data set that is 45.76. And if I tried to run this, what will happen? Let us see c And now we are getting 76 said no more startling number. So this way we can use this, we can undo the same thing with the text adjuvant. So suppose I want to create character underscore a some dataset. And what I'll do, I'll put some string values. Suppose I put countries, namely in the USA, retain South Africa, Australia, like this. And what I'll do, I'll suppose L, repeat this Australia fuel tanks, okay. Of course I'll repeat this Australia thrice. Okay. And repeat the spree Dan, twice. Okay. So now the siesta DC, the dataset where we have many ten twice and Australia twice. Okay. And I want to find the academy just quote a late. So what I can do, I can guess, call this written underscore mode. And I can just Boss this CAD data set here to this function. And it doesn't mode, okay? And if I print these, allocate the Australia as a result, so let boom, run this. Okay, so I forgot to put a silly mistake that I have done. So let me put this in. Yeah. Okay. So they stopped the things. Okay. So we meant getting entered because I have I have not put ceo. And that's where you are getting enter and I was so ignoring that, I didn't see that talky. So sorry for that. Let me run this again and see. Yeah, now we're getting Australia the digital. So Ashley's reporting thrice and Britain is twice. So we're getting Australia. Let me try something. Like I put Australia twice and thrice as well. Who bought the things as two times? Let me see what we get. We get pretend, so we'll get the first occurrence was Turkey. So pretend will come first and then Australia. And suppose I'll put I put in the asthma twice. Let me see what we get. C1 getting India hurricane. And suppose I'll put pretty decentralized currencies. And I put it here. And the fight on this, I'll see we are getting written. So what do we ever the first document that will be given highest professor, even though made in India and Australia all our darkening twice, it is giving us 30 deltas, Britain, Britain is washed TF ok. And if I put Australia, Australia again. And if I run this now, we will sort of get to Australia and all library it yesterday. So this way we can find the mode of our data, okay, more from our data, okay, most occurring value from our dataset beat us. Character dataset, numeric data set. Okay, so we can create our own function and return more. Where will eugenic function and then we will allude which dot max and will tablet that annual match and find the maximum number of occurrence of a particular value. Ok, so this way we can use more. So we have seen meeting the previous lecture and median and mode in this lecture. So now we know how to do the basic of these things mean, median, and mode in our programming, even though Modi not having inbuilt function to find mode, we have created our own user function to find the mood of our dataset. I hope you've got to understand how to do that. See you inside the next lecture.
49. What is Linear Regression: Hello and welcome back. So in the previous lectures, we have seen how we can do statistical analysis in R for machine learning or data science. So these are the things are very huge pool and we certainly knowing all this statistics to proceed further in machine learning and artificial intelligence or D planning, whatever you want to learn further. So in this course we are learning data science and machine learning through our programming. So we have learned mean, median and mode, wherein we have seen arhats invalid function to find mean and median. Whereas r does not have inbuilt function to find the mode. So what we have done, we have created our user-defined function, that we have created a custom function to find the, to calculate the more. And then we have calculated the mode we have done in the previous lecture. So you can go and see that we have not seen it. So what's for the, now we're going to learn a very important concept that is called linear regression. So linear regression is very important and it's widely used in machine learning and artificial intelligence. So if we want to proceed further, you have to know what is linear regression and how you can use that to predict. So linear regression is a machine-learning model. With that, we can predict the values based on, suppose we have data, suppose we have written here height and weight where we have the weight up abortion. And based on the weight upper portion, we are calculating the height or width and the height. We want to calculate the weight of proportion. So anything that you want to establish the relationship between these two variables, height and weight. We can do that and to predict, suppose this is the sample data that we have, that we have collected through our experiments. So now we have the data, height and weight. And based on this data, we suppose we have these, these footprints 16 to 19 and dataset here. And based on this dataset, we want to train our model, machine-learning model, a linear regression model. And suppose I want to a new height up a person. I'll give one new hijacked is not present in this column, and I want to predict the expected weight of that person. So what I do, I'll train my model with the data. And whenever I give on new height of a person, it, the system will predict the weight of the Pashtun, the expected weight of the person based on the calculation it will do. So this way, we can predict our weight of a person based on their height. So we'll train with this data and any data, any new data we can give height. And it will predict. Expected rate of that Poisson. Okay, so that is what we are going to do through linear regression. So what is linear regression? Linear regression is as tactical statistical method used in finance investing or any other discipline that attempts to determine the strength and character of relationship between one dependent variable, usually denoted by y, and a series of other variable known as independent variable. So what did we understand with this model? If some method to find the relationship between two variables, one is independent and another is dependent. So what is dependent on what is independent? Suppose, based on the height, I want to find the weight of the portion. So here height, which we are giving the input will be, as we will recall, less independent variables. So here height will be independent variable and the weight will be the dependent variable because based on the height we are predicting the weight. So based on height, the input which we have, height, that these will wait on. The value which we want to predict based on the height we want to predict. So height will be the independent variable, and the weight will be the dependent variable because depending on the height we are predicting, now wait, so height will be done independent variable and the weight will be the dependent variable. Okay? So you've got to know what is dependent on what is independent variable, right? So dependent variable is denoted by Y. And independent variables are the series of other variables. So there will be one. Suppose you have a company and you want to predict the revenue of the company. So the revenue of the company can be only one valuable right to, so that we can predict, built on several other things like how your company is performing, what is the property, what is the loss, right? How many clients you got, how many clients you lost, and how, how we are giving Salish to imply. So all these salary component, profit, loss, market condition, all these are the independent variable data are going to define your revenue. So revenue here is a dependent variable and all other things that are going to affect your revenue is called dependent variables. So there, there can be many dependent variables and there will be only one independent variable that we are going to predict. Okay? So regression analysis is set-top statistical processes for estimating the relationship between a dependent variable, often called outcome. So what outcome we're going to get that is called dependent variable. And one or more independent variables. So one or more independent variable like implies salary, profit, loss, market condition. All of those things often called predictors. Covariates are feature, so these are also called features weighed on the features we are going to predict some value up, up particular thing, okay? And these are also called predictors because they are going to help us in revenue or the weight of a person. The most common form of regression analysis analysis is linear regression. Okay? So here, there are multiple regression analysis like multiple regression analyses of simple linear regression, linear regression, all those things. Ok. So in the linear regression what we do, we find researcher finds the line on a more complex linear communists and that most closely fits the data according to the specific mathematical criterion. Okay, so what does it mean sea? The these are the data, width on the x-axis and height in the y axis. Ok? So based on the height, we are going to predict the weight of proportion. So here, if we see the, these are the points. These points is like 6464 KG, R1, 77 KC, 177, height. The weight is 64 something, right? So this point, these are the data points that we have. So what we do with linear regression, we try to find a line. He'll try to find out line Yo, which will represent the data, which will repair them. Outcome data like you support, you find any point here like 65. And when you draw a line, when here it will cross there, you will get a point and weighed on this weight, you will get that height are based on this height y-value, you will get the x value, right? So this point will give you the x and y combination, our height and width data, right? So when you draw a line here, you'll get the one. Suppose we are given 65.8 and we are giving 162 has high. So when we come here, we will get the x-value. Yeah, that is, that will be the weight of the portion. So we will try to find out line which will correctly represent the line, okay? And data, write data according to the specific mathematical criteria. Okay? So linear regression, these two variables are through an equation. In linear regression does two variables, dependent and independent variables will wear exponent power. Both these variable is one, ok? Because what we are going to use, this. Simple line equation. Write Y equal to m x plus c, right? Y equal to m x plus c, where m is the slope and the constant variable, right? So a is a constant, right? So here m is slope. So built on this weekend, draw any line, right? So this is the simple line equation, right? Straight line equation Y equal to m x plus c. So based on this X and Y, we try to find this line which will truly represent the data Haydn with data, okay? And since they are, their power is one, y squared dx plus the power of x and y is one. It is called simple line equation. And if their power is not equal to one, it will form a carve. It, it will not be a straight line, it will be some curved line, lexical Firefox on or something, right? So based on this equation will try to find, okay, so why is the response variable, right? Because y value we are going to predict based on the X1, right? And x is the predictor value, value our independent variable a and B are called coefficient. So when we perform linear regression, will try to find out a and b value. These are called coefficients or COPC and self-regulation. And based on these two values, a and b, we find the y value, okay? And with that, we will get these points, these points on the line. And when we draw a line, joining those two points will get a straight line or decrease on line. And this line will give you the true prediction of data. Okay? So where is it used, where we use the linear regression? So regress analysis primarily used for conceptual distinct purpose forces. And he gave some analysis is widely used for prediction or forecasting. Suppose we have heightened width data. We want to build on the height. We want to predict the height of a person there. We can use our forecasting. Suppose tomorrow it will rain or not, true or false cases. Okay, so that kind of forecasting we can do like Vaden though, if we there will be a like a sunny day, I'll go out on FOR sunny day, it will not rain. That forecasting, we can do. So in the forecasting and prediction situation, we use the linear regression in machine learning. And then second is situated and rigorous analysis can be used in for casual relationship between the independent and dependent variable support you when you have two variables in dependent and independent, you want to find how they are relating. So we can use linear regression to find the relationship between the dependent and independent variable. So now how to establish the linear regression articulation. Simple example that we are going to do in our hand, sunsets on, that will be predicting the wheat up a person based on the height. So if we know the height of a portion, we can predict the weight with this linear regression analysis. Okay? So to do this, we need to have relationship between height and weight of a person and how we do that, because we have that data heightened width here that we are going to use owner in our example. So what I will do first, we need to collect the data for which we want to establish that elicits. If so, we will have this data. And then we'll, what we'll, we'll do once we have the data will. The actual thing is, once we have the data, we'll perform some exploratory data analysis to clean the data if some missing values there, how we can remove a, how can fill that now our data. So those kind of things we do in a real life. But here what we will do, we will have a very assorted very character data here. So we know need not to do all those sort of things that we'll see in the latter part of the course. But for now here we have the very clean data. So what we'll do, we'll perform the, we'll use the lm function and will try to establish the relationship between height and weight. Height and width, y and x, right? Height and width will try to. Paul Farmer. The listener will try to create a linear regression model using LM function. So Linphone Solexa inbuilt function in R, that will do all the mathematical calculation in the background. And it will give us the relationship between x and y. Write y and x are heightened vet you could eat the real essence if function. And when we get, when we use summary of that relationship, we can find the coefficients a and b and how they're relating. So based on that, we can draw the Wrigley Sun line. And we can also predict the weight of a person using predicts function. There is another function called predict, which we'll use this relationship that is LM function. Okay? So it will use the linear regression are really sensitive that we have established in this step. And it will use this x and y values and this relationship and try to draw the regression line for us. We'll plot the regression line here, but here with the predict function and predict function will internally use that function are really sensitive that we get through linear regression model. And it will use this model to predict the height based on the wheat. So we'll see how we can do in the next lecture. So I hope you got to understand what is linear regression. Linear regression is something like suppose we have the point C and we want to predict, based on the new data, we want to predict the weight on the height. I want to support the weight on the rod to 30 height. What will be the weight I wa