Data Science & Machine Learning Bootcamp: Class 2 of 10 - Numpy and Pandas for Data Analysis | Dr. Junaid Qazi, PhD | Skillshare

Playback Speed

  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Data Science & Machine Learning Bootcamp: Class 2 of 10 - Numpy and Pandas for Data Analysis

teacher avatar Dr. Junaid Qazi, PhD, Data Scientist

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

22 Lessons (5h 8m)
    • 1. Welcome - Class 2

    • 2. S3: What is Numpy? A brief introduction and installation instructions.

    • 3. S3: NumPy Essentials - NumPy arrays, built-in methods, array methods and attributes.

    • 4. S3: NumPy Essentials - Indexing, slicing, broadcasting & boolean masking

    • 5. S3: NumPy Essentials - Arithmetic Operations & Universal Functions

    • 6. S3: NumPy Essentials Exercises Overview

    • 7. S3: NumPy Essentials Exercises Solutions

    • 8. S4: What is pandas? A brief introduction and installation instructions.

    • 9. S4: Pandas Introduction.

    • 10. S4: Pandas Essentials - Pandas Data Structures - Series

    • 11. S4: Pandas Essentials - Pandas Data Structures - DataFrame

    • 12. S4: Pandas Essentials - Hierarchical Indexing

    • 13. S4: Pandas Essentials - Handling Missing Data

    • 14. S4: Pandas Essentials - Data Wrangling - Combining, merging, joining

    • 15. S4: Pandas Essentials - Groupby

    • 16. S4: Pandas Essentials - Useful Methods and Operations

    • 17. S4: Pandas Essentials - Project 1 (Overview) Customer Purchases Data

    • 18. S4: Pandas Essentials - Project 1 (Solutions) Customer Purchases Data

    • 19. S4: Pandas Essentials - Project 2 (Overview) Chicago Payroll Data

    • 20. S4: Pandas Essentials - Project 2 (Solutions Part 1) Chicago Payroll Data

    • 21. S4: Pandas Essentials - Project 2 (Solutions Part 2) Chicago Payroll Data

    • 22. See you in the next class

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class

This is class 2 of Data Science and machine Learning Bootcamp.

Reminder: Please use the suggested version of libraries during the course/class. It is recommended to use .yml file to create the environment. You can watch lecture 6 and 7 from Class 1 to set up setup environment for the course. 

We are working with these versions of numpy and pandas: 

numpy=1.11.3 and pandas=0.20.0

We have learned Python Essentials in Class 1. Let's move on and explore Numpy and Pandas for Data analysis now!.

If you have already downloaded the Course_Material, you don't need to download again. Otherwise, you can download the Course_Material from the panel on your right.

Meet Your Teacher

Teacher Profile Image

Dr. Junaid Qazi, PhD

Data Scientist


Dr. Qazi has a solid knowledge of Maths, Statistics that are key to Data Science and Machine Learning. He holds MS in Computer Science and PhD degree.  As a mentor and a researcher scientist, with over 17 years of professional experience, Dr. Qazi has developed a skill set in data cleaning/mining, data analysis & data modelling, project management, teaching & training and career advising while working with academic and industrial giants. Dr. Qazi has also served in academia for several years at the rank of lecturer and assistant professor. During his career, he won several funding awards for his research ideas and published high quality articles in well reputed international journals in collaboration with leading scientists from University of British Columbia, Canada; Uni... See full profile

Class Ratings

Expectations Met?
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.


1. Welcome - Class 2: 2. S3: What is Numpy? A brief introduction and installation instructions.: So welcome back, guys. Excellent work. So far you have finished the section on bite on essentials. The most important and key concepts to do did a size. Let's move on and learn. Numb by for data analyses in Pirata number is actually a short US no medical Pitre, and this is one of the most powerful and fundamental package for the medical computing in Pitre. It is a linear algebra library, which is powerful and incredibly fast and also provides tools to integrate C C plus plus and fortune in court. It's important to north that almost all the libraries in he why did our ecosystem depends upon numb pie as one off their most important and mean building floor. So learning numb pie is very important. If we want to know medical computing in python, it's very important to install numb by now because we're going to walk, You know, um, by if you have fightin stalled using in a corner distribution, go to the terminal and type corner installed numb pie. If you don't have in a corner distribution installed, you can use people come on and people install numb pie 40 course. We will be dealing with numb by areas. These are actually mattresses and tractors. Just recall the metrics is simply is rectangular area of numbers. Well, as you've actress is Errol or column after metrics, so the metrics can still have only one row or one called as discussed in the Big Ning. Off the course, you have a question. The good way is to check after a Q section. If you don't find. Answer to your question. Africa section. Go ahead and check question and answer for him using appropriate keywords. If you still don't find answer to your question, a good idea is to post new Question 50 context. We're always there to help you and will reply as soon as possible. Remember, posting questions in question and on to form will also engage others, and you will make create connections that could be helpful for everyone to grow as a data scientist. Once again, I want to thank you for enrolling in my course. Cheer in the next lecture. Good luck 3. S3: NumPy Essentials - NumPy arrays, built-in methods, array methods and attributes.: Hi guys. Welcome to the number I Essentials Lecture part one as a fundamental package for scientific computing, Numb Piper Wides de Foundations off Mathematical Scientific and Jim Link and data science programming within Piketon Ecosystem Num pies. Main object is the homogeneous, multi dimensional any. In this lecture, we will go through the range of important concept and building functions that we will be frequently using in becoming sections as you. This notebook is a reference to the video lecture. You can always explore this notebook if you need help. I hope that you have already installed numb by. Let's Move one and create a new notebook to explore more about Mumbai. Go to file new notebook Quiet on three and here we are in the new notebook as we have already installed Mumbai. We have to import this library if we want to use it. So to import, we simply follow the command import number by AS and B. And this is a usually off importing number as n peak. Let's from this sub so number has many building functions and it has lots of capabilities. In this lecture, we will focus on some of the most important and key concepts that we will be using in the scores. Let's start with number areas. Number A raise will be the main concept that we will be using in this course number. Areas essentially come in two flavors factors and mattresses. Directors are strictly one dimensional area hole in it. Mattresses are two dimensional. Others metrics can still have only one drew or won. Called him, we can create number every from vitamins did today, like from two poles and lists. Let's create a fight on list. Forced my list equal to say it minus 10 and one That's from this son and check holding my list look like Let's check its type as well. So let's run this self. So we have my least minus 10 and one, and it's type is by time list to create an umpire area from by terms data structure. We use numbers airy function. The number is area function can be accessed by diving nb dark Harry because now we have numb by as nb so we can see it and be I thought any we don't need to say number my daughter today we need to cost our bite on data structure, which is my list as a perimeter to the area function. Let's try to know this thing. My energy equal. Tow N p dog That's pass in my list. Let's check hold. It's my area Look like and also check its data type aspirin. Let's run this so the my area is minus 101 is the same as this list here and now. This Maya is numb by Don't end the every object, so we have created an umpire using vitamins data structure list. Let's create a two dimensional number area. We need to cast list off list to generate this thing like you can see my the metrics equal to 123 It's War 56 Salmon Sever it. Nine. So we have my metrics as a list off lists in by time. Let's see how this my metrics, it looked like, and we can check its type as well. Let's run this said So my metrics is an airy off areas, and it's type is list that's passing this, my metrics to and the doctor at a function and generate in them by metrics. So, metrics, let's call it one of our first metrics and p dot very in passing my metrics. That's from this cell and no check holding my Matics look like and we can check its type as well t y so here it is. So we have two dimensional metrics and it's type is numb by area. So far, we have used list to create number areas we can use to pull instead. So let's create a to pull my to pull equal to begin se minus one, do you and want the same one. We can pass in my to pull to my very against A T stands for to pull my very truthful equal to and be dark. Sorry. And we can pause in this to their run this cell and let's see how this my charity looks like and we can check it die pastoral. So we have created Numb Pirie using to pull and this and here it is minus 10 and one, and it's type is numb by Harry so we can create number areas using its building function. As for most of the time, we used them by building functions to create areas because they're much simpler and faster . Let's discuss the first function that we can, usually Jews to create number area A range images much similar to buy trans function range . It's syntax is similar. We give stock point stop point, and then we give a step. It returns evenly spaced values within a given interval so we can call this a range function without saying and be dark hearing. As usual, we can check it's documentation using Shift that. And here it is. Start, point, stop, point and step. So if you want, you can read this documentation and with some examples and you'll. So let's possum zero as a start point, say, a lemon as a stop or the end point. And we won't step two and let and let's run this, sir. So here it is. We have a number. I area 02468 and 10. It's step size two from zero to Dilemma. We can give this step and deter type aspect like, let's copy this one base detail and tell this function the Tardis equal to so what we want . We want area floating point numbers, so let's turn this room. So instead of 0 to 4, we have 0.2 point four point because we have mentioned we want data type s floor. So the next very important building function is Lynn Space L i N s B c e. We can't use this with haute and he Daut and in space returned evenly spaced numbers over specified in tow. We can check it's doctoring with shift tap. And here it is. We have a stop point stop point some other perimeters and data type Asa, Let's pause in one as a stock point 15 uh, 15. So what we want start from one end at 15 with 15 equally spaced points between one and 15. Let's on this court here. So here we have 12345 up to 15. So we have 15 equally spaced points in our output. Let's take another very important perimeter in Lynn Spears. Shift up R e T step and the default value is false. What are e. D. Step is if our 80 step is true, it returned the step size that's boss in true complete this front basted her and asking and other better me that step and run this quarter me. So now we have 1.0 in addition to this, Harry. So this 1.0 is because of this perimeter, which is telling us the step size in this area is one. Let's consider another example and instead of 15 passing 30 here and on this. So now we have different steps size here. That's parts in the All right are 18 true Afrin and now here it is. So we have the step size here point for 872 and so on. Now instead of one. The step sizes different because we want 30 points in this case. So one thing not to confuse between a range and Lynne Spears is in age the external argument as a step size. Let's check this thing. Start stop in step districted out commendation off limbs space. So with us, Lynn Space, take third argument as a number of points we want Default is 50 here, So let's move on to the next very important building function. Zito's We need to call with Empty Again. The Beatles create an area with all zeros like if we want and be dark zeros and passing three as an argument it's creating. There are really three zeros we pass in 10. It's creating away with 10 zeroes, and if we pass in a truthful, we'd say 46 It would create a two dimensional matter expert, four rules and six columns. So now we can create a two dimensional as well as one dimensional area using zero Seema zero stairs and other building function and be dark ones. Apple and E X. Every part in four kitchen writ four ones and if you're passing is we passed in 46 In Zito's capacity, she appear. It's returning the two dimensional area with ones, so it's very similar to zeros. Ones and zeros are very similar. The only thing is like seagulls is giving us all these values once is giving us all one values. So let's move on to and other one I. We need np dot I'm so I get identity metrics and identity. Matics must be square metrics. This is useful in several ways in linear algebra problems, and I hope many of you guys are familiar with identity metrics. So ive i e. R. I didn't two dimensional areas once along the diagonal and zeros at all other places and in identity metrics, we have ones along the dock. So let's take its documentation so it needs in as a perimeter and is the number off rules in the output that's passing four here. So we have created four roars and identity metrics must be a square magic. So it has four rules and four columns and one's long diagonal, and all other places were getting zeros. So moving forward to random number creation in Mumbai, we can also create area with random numbers using numbers building function in the module, random or do we can access with N B Dawg Random? Let's start with the first function in this random ari Ah, and the so our A and e or land create an area off given cheap and populate it with random samples from uniform distribution over zero and one instead of NP. We need to use NP doctor and them because Rand is a part off random. So MP doctrine, Um, doctrine. And let's fasten three here and what we're getting 0.0 or 20.31 something 94 and 0.3 So we're getting three elements one dimensional area because we passed in three and all these numbers are between zero and one. So let's copy this one here, basted here and check It stops string if you schooled on. We see in other example here NP doctrine. Dumb doctrine 32 So let's pass in the same 32 So now what we're asking generate a random number metrics with three rolls and two columns. Not from this thing here. So we have 29.53 so wonderful to rule and three rows, and we have two columns here. So there is another very useful and similar function in number by friend and then and returns a simple form off standard normal or Gaussian distribution. So if we want to use rent and that's copied this from here and pee, we have to excess with NPD or trend um, doctrine. And let's passing, uh, to here and now. It's generating delivered 17 and minus this one. So let's pass in a four by four. Now we want a metrics with four rules and four columns, so it's generating in metrics with four rules and full columns. So what we said, it's and it is standard normal or Gaussian distribution off the sample So let's talk about another useful function in the same class or same category. So Rand ain't so. We need NP Dr Random don't charities. So Rand int, Let's see one and 100 and run this cold here. So what? This is doing Brandon eggs load and high value here and generates a single random number between one inclusive and 100 exclusive. So if you're only giving these two perimeters, it's only generating one random number. Let's look at the documentation strength so it has. No, you have one here. Hi, we have 100. Air determines random in teacher from low to high. Exclusive. So what? These pedometers are the third on his size. Here it is size. So if there's no size, default is none, in which case the single value is returned. So we're not passing any step Any size. Insert Tony single value, and it's random number generation. Each time it's generating different output. 50 hair, 73 hair and anything. Six. Here That's costs in this size. We want four. No, it's generating a list of four random numbers because we're asking improve, generate for random numbers. What if the past, then it diligently list of 10 random numbers. So what if we want a metrics in the output? We can do this thing aspirin like we want to Matics to buy three. So now it's donating a metrics, two dimensional Larry Jewell rules and three columns. We can passing four by a fool, and it's generating a metrics with random number for Before Magics with random number. If we don't again and again, the numbers are changing because we're generating random numbers. So these are few very important functions that we have explored. Let's move one toe area matters and attributes some very important matters and attributes we need to know here R R E shape max Mean aren't Max argument? We're going to explore all these methods now, so let's generate and Harry using a range and be taught you orange. This was in 16 and didn't it? And other area with see Ran into isn't a member, and we don't Brendan dark. We can press tab here toe auto complete. So So there are lots of options next rated and no, we get few options passing. See you 100 Khomeni. We want That's what you want then and let's see how the old put look like for very a range and and plan. So we have two areas we have used a range and rand. So you want to up to 15 and some random numbers. Let's use a built in function. The shape reshaped returns an area containing the same data within new shape like we haven't airy here and we want reshaped that, Harry, I would say four before metrics. So what we can do? We can take this early. Call this function on this area and it got reshaped. So airy, arranged Autry she and we can tell the function what type of shift we want we want for before and now it's reshaping this one dimensional area toe, two dimensional area full before shape. So the next one is Max and me. What if we want to know what is the maximum number in this area and what is the minimum number? It's very simple. Was just Max on this very let's take Harry ran into and we call. I mean on this Seeing Mary, let's run this self 87 and 13 and if we look at this area, we have 87 the maximum one and 13 is the minimum one. So next important, one that we're going to use in our course arc, Max and Arc may. So art makes an argument. Are used to find the index locations off maximum and minimum values in our area. Like, let's see, we want to know what is the index location off 87 and what we will do. We will use our GMAC's on early ran int. So let's copy Errant in. Call the function Ogg max, and just run distinct. So we have 87 on seventh location in this a 01234567 So this is the seventh location. So our max in returning the location off maximum number in an area and a similar way Location off minimum number in. So we call it our Me. So it if you look at this thing 012345678 13 Got location it in the area soul. So let's more one to some other really important concepts, like if you want to know the size off area if we want to know the shape of Harry, if you want to know what type of data it has. So that's you'll. The same area, Poppy. It taste it and undersell again. So now we have any a range from 0 to 15. And what if we want to know the shape of this rally? We simply call shape, and here it is. It's 16 element, one dimensional area. What if you want to know the size of this area we call an other actually size, so it has 16 size. Its size is 16. It has 16 numbers. 16 elements. What if we want to know its data type? So the same way we call the time and its data type is int that's reshape this early, using the ship share full before? No, we have two dimensional romantics. And again Checketts, she So it has four were formatting. So we're rearranging this area four by four and then again checking its returning the shape of this new area. We can call it like 16 one, but we can reshape to 16 1 and now we have 16 1 area. We can do it 1 16 as well. Most options here and then we have the different shape. So we're done with department and number essentials. We have learned some very useful functions. How to create areas, how to use different attributes on the number areas, how to generate random numbers and several other very important functions. So that's more wanted next section. And in that section we learn some other important numb by concept. See, with the next lecture, Good luck. 4. S3: NumPy Essentials - Indexing, slicing, broadcasting & boolean masking: Hi, guys. Welcome to the number I Essentials Lecture part two. In this lecture, we're going to talk about a few other very important concept such as indexing, slicing, broadcasting and 1,000,000,000 masking. We will learn how to reference elements within an area and how to assign values to the elements within an area. So we will learn how to select element or a group off element from them by any. We will talk about broadcasting a very new concept at this stage bullion mask are masking operations. We will do some masking in this section aspect. Once again, this notebook can be considered as a reference to individual action and you can come back any time if you want help. So we can either create new notebook or rich and move on with the previous notebook. What way already, huh? Now in this. Nor. But we don't need to import Mumbai because we already have imported numb by as NP. If we're creating new notebook, we need to import Mumbai again. Let's create a simple one dimensional number area. We can use a range or we can create using list. So, Harry, we call Is anyone d equal toe end. Be dark, Barry. And we can That's bossing a this to minus then minus two zero to 17. One Tzeitel, six some random numbers and 200 Not from this cell. And see Hold the They looked like anyone. D So here it is. We have a very minus one minus. True. This is what we have created now, in the simplest case, selecting one or more element US number Area looks very similar to bite on lists like if we want, let's copy this one. And if you want to zero it is at zero index. We have minus 10. If we want a slice or arrange like Skopje this thing here and say we want 0 to 3. It's the same as list The sprint, this area along with this one just to compare. So this is same at least so from 0 to 3. But third mean that is not included. Zero one to so 0 to 3, then it minus minus two and zero. So we can use negative indexing as that. That's Copley. The score. Yes. You tear and enter. Say minds to or minus one minus. True. So here it is. 106 from this area, which is minus one and minus two. And in a similar way, what we did with bite on lists, we can see colon to let's see what it is giving us. So anything from zero index up to two. But two is not included. CEO one So two indexes to you, which is not included on the same way. If you'll remember, we can say kind of to, and we don't pass anything yet. So in this case, anything from two two onward in the list zero 01 to this is to and everything after that one. So that's less true into the thing here. That's passing index hundreds. Do you think you have any index in area, which is 100? No, we don't have. Let's see what the four days. Okay, so if there is no index present in the area, so we're getting Index. Let's create a two dimensional early but 25 elements using a range. And I can work that into two D mattress and we can do this thing. Actually, we can create, uh, toady equal to and t dog in your range 24. And let's print this. Oh, its creation. So we have one dimensional area. So what you want? We want two dimensional area. We can use police shape here she six four. And here it is. We have two dimensional area 01 toe up to 20 24 elements. So to access any element, the journal former is, uh, two D. We pass in drool and then we pass him column. So if you want one So the rule is zero and the columnist want instead of passing row and column in two square brackets we can use and other notation you can see coma, separated rule and call him asset. So let's try to access some elements in this altitude E. So if we want see 22 we have 10. Let's see where the tennis. 012 012 This is 10. Here we can go. Uh, Jodi been crossing one, and it's going to return the whole grow, actually a complete roll. 456 at one. So an other way is likely your several equal to call equal to three. And then we can see Harry two d and then passing rule. Call it simply like assigning values truly variable and passing those really a bills and we're getting element, which is 23 So 23 row two 012 and column 30123 This is element here picking, pausing every two D and say we can pass in like 2 to 4 Pharrell's and say 2 to 4 four columns. Now we're getting elements from 2 to 4 in rows and 2 to 4 in columns, so it's a two dimensional magics in the output, which is a part of this metrics. So moving forward, let's talk about a very interesting and very elegant concept in Numb by broadcasting number . Areas are different from normal bite and ists. Because of the ability to broadcast, we will only cover the basics on broadcasting. In this lecture, you can explore more by opening the provided link in the reference notebook, And this one is they were the extensive documentation on broadcasting. If you want to know more about broadcasting, you can do with examples on the NUM buys offcial documentation. So let's start with a very simple example that's create an area one de and B Dawg. You think that's zero and then let's see how this ended up like so here we have a one dimensional area from 0 to 9. Let's take a slice, Harry one D zero to why and broadcast value. Say 500 and run this. So let's see how this area look like now. So here it is. The 1st 5 values are No. Five under, so we only broadcast this value on a selected slice and all the values are changed. Let's try to learn with another example, using two dimensional Ari A to D and B. Dawg, Let's say we can kid every off ones and ah, it's a four by four metrics. See all this every look like. So here we have full therefore metrics. Let's broadcast some value to see fuss troll. Only we can do this thing, Ari to do at zero index, which is the first rule equal to say 300. Run this. Thank see what has changed. Here it is. So with only a single statement, the whole row is changed. That's cleared in other early. They are equal toe end, the dog eating zeal for let's see how it looks like again. So we have 0123 so It has four columns and one draw. That's and this to and see what we're getting. So this one dimensional area is broadcasted Baylies from top to water. Let's try to understand this broadcasting concept from a visual example. Let's go back to the reference Don't cook. So where do we have? Yeah, very good example. Like in this one. If we look at the first rule we have, you don't want to t elements in one dimensional. Very so if we want to add a single element five, This is broadcasting. So five will be broadcasted over this one dimensional. And so we will have five plus 01 plus five and 567 So if we have two dimensional Mantex like three by three, in this case with all want and if you want to broadcast one dimensional with 012 this will be broadcasted across the roars. So all of these roles will be a factor. Zero plus one one plus 12 plus fun. 123123123 So in the 3rd 1 we have a call him 012 and we want to broadcast and areas you don't want to on this column. So this is quite interesting if we broadcast a call him on a single rule. What? We're getting 000011 So this fast roll this fast roll all will be considered at low. And they will be. I did. According to the given operation, you don't want to see your 12 in this case. Second element in the column is fun. So all across the road will be considered one. On the other hand, this whole row will be considered in the second place aspirin, and we will get 123 and same as in the third rule 222 and in the second. And here it will be considered you one, too. And we're getting 234 So let's move back to the notebook and, ah, alone with another example. See if you want to broadcast to t and ah, see 300 on this whole tour. The area. Let's see how it looks like. So we're getting 300 plus 306 106 106 103 +01301301 Got the all ones. This 300 is broadcasted over two dimensional metrics. Let's try and other examples we have A which is 012 We create another two equal to and be not key range She zero and four and now we want on Lee Call him, actually. So will pass everything in the first place that role and be dot new access. So every time it should give new axis. Let's see hope they are two Looks like So we have 01230123 So let's broke us. He are two. They are too. So what we're going to do, you are too plus, uh, natural descent. See how it looks like. So this is what it is. So what we're getting 0123 This whole rule will be considered zero, and this single rule will be broadcasted on this road. And in the second case, one plus zero one one plus one to two plus one, three plus one. So this whole so the elements they don't exist will be considered one. So again, it was third True. So for third row zero plus 21 plus 22 plus 23 plus two. And for the fourth row zero plus three, one blustery to blustery six. So this was all about broadcasting at the moment. I hope you got an idea. Holding broadcasting concept spoke in ah number areas. So let's move back to our working notebook and discuss another important concept. Fancy indexing. So what is Fantine Back Sink Friends Indexing actually allows us to select in Tirol or call him out of order to do this. Let's create a metrics. I used the court from the reference book to create a magic copy this school this to tail. It's let and let's run this court. So this court is going to generate and magics with zero one, 23 and four grows with all the rows, all ones all to all trees and all four so I can have any rule. Like if it pause in a list of 123 my mantra. 123 In the output. I can just pass in a list and grab this rule. 123 I don't need toe. Mention the order here so I can I can grab your engender mortar as been like If I want de zero one, it's going to sit on treats your one and so on. Let's create another magics and use fancy indexing on mathematics. I will copy again the cord from the reference book. So this is This is very simple. I'm creating a two dimensional metrics with a range 24 giving a shape six so four and then printing it. So I have zero for 8 12 16 20 in first column, and it has six rules and four columns. So let's grab is ruled in this metrics. Let's say I want to grab a little too and four passing the list two and four I'm grabbing second and 4th 01 and two, three and forth. I can grab engender order as well. Like like passing fly. Do you know Sensi? So I have 01234 five and zero the 1st 1 so I can grab the columns as well, in this case, like I can passing Harry, Joe D and everything and then passed in the columns here, like this means all rose and pass in the column. Say I want two and three. It is. I have second deal. Want to and t I can pause in generally aspirin like three zero and I have. Don't call him here. You don't want to Three and zero this one here. So this was all about fancy index mayor. We don't need to worry about the order, and we can call complete row or complete um, call him in a Lindholm order. So let's move one to the next topic, which is 1,000,000,000 masking. So bullion mask is very useful and handy when it comes to count. Modify extract on many plate values in an area based on some condition or criteria. For example, we want to count all the values greater than a certain value, or we set a threshold and want to get rid off total eyes in our data. In Mumbai, bullion masking is often the most efficient period accomplished. Such tasks. Let's start with simple example and create an other area. Ah, one. Once again, we can use end be dot a range and let's say we want 1 to 10. Always good idea to see how the Harry look like. Then you have security, so we have area. Want to up to 10. We can apply condition such as greater than less than equal to etcetera. Let's create a Boolean area for some condition. Say Harry, say the area one is greater than three and let's see true mask equal toe. What it all can it? What is overall condition? Area one great country. So we have created brilliant mask. Let's see how this 1,000,000,000 masked looked like thirties. So it has false and truths. If we compare area to this 1,000,000,000 mask, the condition is everyone is greater than three. Is this one great entry? No, to No. Three? No. And this is really the condition is satisfied. And all these values from four toe up for till 10 we got true. True. So this area pull mask is an Arab, It false and truth. And we got in masking in this area Want to mask even number or numbers so we can use marvelous operator. So let's create a mask here again I say more to mask one de equal to zeal and one more to so if we're on this thing and then we run this mask. So So we have created the mask Far list in air one were the marvelous equal to zero, so we can use this masking areas to do masking operations in masking operation. We simply index on the brilliant area like on this area or this area. Bull Basque. All the values in position at the mask area is true that we will get in our output. Let's try to do this thing using more to mask I do on a it are one true you well, even values we want to do and one and what we do. You passing this mask and let's print this one as well. So here it is. So we have filching out the art using this masking area in our everyone. So here it is output, which is only even that bush. So let's take the two dimensional Alec and clear to two D area using and B. Dawg a range 24 and then a Trudy sex is then a duty door. She equal toe, want 6 to 4 and want to mask more to two d equal to zero. Equal to this is the area we have creative want to. So we want to have all the even values in our out food and we're creating a mask. We're a two d more is equal to zero This is our condition. Truell. Let's run this scored. And now that's print at it, Judy. And let's bring this mass, Kasman. It is now for all even values for zero. It's true. For one, it's falls for 26 10. It's true, and it's false. Let's filter out the values now using the mask. What we have created recently, what we need to do. We need to take this area and take this mask and it is Run the cool and we're getting 02468 12 there to 4 12 6 in 20. And so it is actually column one and column three, where the condition is satisfied and we're filtering out all the values which satisfied this condition in our masks. So this was all about masking Excellent job, guys. We have gone through several very important concepts in this lecture regarding number all these concepts, we're going to use them in becoming lectures. Please go to them once again, follow the lectures and try to revise all these concepts. In the next lecture, we're going to explore Maura about Mumbai soon. The next lecture lordly 5. S3: NumPy Essentials - Arithmetic Operations & Universal Functions: Hi guys. Welcome to the number is intellectual party. This is the last part in the number of section Let's struggled, numb by operations in this section, such as an automatic operations and universal functions. Let's move on to the Egyptian notebook. We can start with the previous one. Will we stop? And in this case, we don't need to important by again. So let's create and Ari again irritable are equalled with the news and P don't a range leader to find, Let's see, hold this area look like always. 012 and four. What if you want to add a in, we can do this thing Us so 01% to 2 to 433 We can do other automatic operation as well, and it is simple making python. We're adding them, like adding numbers, like adding variables. You're multiplying them. We're subtracting them. You can do division, so this is warning for zero divided by zero. So which is the place with them? And in our result, we can do one over irritable. So when we're dividing 1/0, it's infinity, and this is again warning here. So moving forward we can get the poor off whole. Or we can do the multiplication like multiplication off some scaler toe, uh, to multi playback. Or maybe Dan multiplayer may the same thing again. Within the multiplication, we can compute powers. We can subtract, divide, multiply and do all these kind of automatic operations on number areas. So let's talk about the university functions now. Number. I have a range of building universal functions. These are essentially just mathematical operations, and we can use them to perform specific tasks which are associated with the function. You can explode mawr about the universal functions from numbers. Official documentation. Let's go to the reference normal and opened the provided link. So if we open this link, I received a universal function or you Funk. For short is a function that operates on nd areas in an element by element fashion, supporting any broadcasting typecasting and several other standard features so we can explore more about the functions, like if you want to see the mats operation that's click this link here, we have, add, subtract and if they moved on our more and ah exponential law law to square root square and even moved on, we can find these tricked a metric functions like signed cost and and if you want to know the votes on specific function, you can click the link and it will lead you to the documentation related to that function and did some examples. Let's look back to the Jupiter notebook and use some of these functions on our area to see how they perform. So coming back, let's try to apply some new universal functions to our area, either. Let's print edible our mastery of this area. We want to get the square root of this area to call this function as usual, we have to use end The thought s to R T. This is scary. You can check the documents thing if you want and read whenever you want. That's passing Harry, and our output is the scared off all these elements into usually in the newer we can. We can get the maximum value and be Max are we can get the minimum really moment. So four is the maximum value. Zero is the minimum wage. We can do other trick dramatic operations as well like and be ducks like MP. Don't sign passing very every getting the sign values off all these On the ritzy area, we can calculate exponential that E XP and passing the area. And here it is. We can cut the log, aspirin, n p dot log passing Harry. And here the law reviews once again the warning for Infinite. So there's another very useful function and B dawg degree take toe. So if we're passing very, all these values will be converted from bigly to radiant. So the function it's come surging these values are in. These values are in degrees, and it's converting them into radiant. So there's another one apposite to this one and be read to degree and let's boss in Let's Boston dysfunction. So in principle, we should get our list back. So here it is now, this radiant two degrees converting back these values to decrease. So this was all about universal functions and mathematical operations in Numb by. I hope you enjoyed this lecture. This was extensive learning in last three sections. We have gone through lots of new concepts in Mumbai. In the next lecture, we're going to have a quick overview on the exercises in number. After that, we will solve those exercises in the solution lecture. However, I recommend you guys to go through those exercises before you move onto the solution. Lecture soon. The next lecture. Good luck. 6. S3: NumPy Essentials Exercises Overview: Hi, guys. Welcome to the number I essentials practice exercises. It's time to test your knowledge. So in this exercise is simple tasks are given just to see how much you have. No like what is the major difference between Bactrim? Metrics, hold two important Mumbai library and something like in warding. Bite on the sternum by area and gender it, Harry, and something like, Ah, early of five zeros. So you're going to use the knowledge what you have learned in last three sections. Haeggman, go through all these tasks and try to get the output which is cumin Onley type record where it says please called here. If you call in this son right above the outward, you will lose your output and the new output. What you will be generating ah will be replaced by this output. So moving forward here is asking for Please generate the falling metrics. Sometimes the output is given and you're asked to de creator regionally that output so moving forward some random number exercises and ah generating methods to the area and then ah, a couple of tasks here we're kind of taking a slice of magics taking rule column and the peace and a part of Matics in the output and then moving forward. It's asking for throw some roll collar. So tell Cliff some of all the rules and columns in any area like into the area and then creating a 1,000,000,000 mask. And ah, just apply that 1,000,000,000 mask to get this output. So these are simple tasks. I hope you would be able to solve all these tasks by yourself. However, if you feel any difficulty at me, just go through the solution exercises in the next lecture. We're going to solve our we're going to complete all these tasks. Syrian. The next lecture. Good luck. 7. S3: NumPy Essentials Exercises Solutions: Hi guys. Welcome to the number I essentials Practice exercises solution. Notebook. So let's more warm and try to answer all the questions and ah, solve the given tasks. So in the 1st 1 he's asking for what is the major difference between factor and metrics? Please console the lectures so the factory is actually a single rule or a single column. However, metrics is multiple roles and multiple columns hold two Important on by library. This is a very simple task. Import slump, I S and P. And this is the traditional and common way off important number. So in the next one, he's asking for convert given bite on list. So department listed schema into numb by area and check its data type. So this is the result we want in the output. We're going to court here. Let's create an area It double uh, what we used to do. We used to call Harry methods from Mumbai toe end, be dog every and passing in list. Will it run this self first? And he's asking for toe print. Checketts die. So what we're doing, we're passing the list to end be dot area and converting part on list to numb by area, and then we're printing area and printing. It's time that's from this court. Yes. So this is what he was asking. We're going to see him all put in the next question he's asking for. Generate Ari 012345 Using numb pies built in function a range. So this is what we did several times in our previous lectures. So it's it's again, very simple and be door. The range is asking for 0 to 5. So we want zero and six because the six is excluded and we want up till five. Let's bring this one. It is we have the same off Patel again. In the next question he's asking for Generate an area off five zeroes. The rose is a building functioning numb pie that's called that one loved by zeros and home in the Deedles. Five. That's passing five. It is. We have an addict with five zeroes in the next question he's asking for generate the falling metrics. So the output is given. We have to generate in metrics with three rows and four columns, all zeros. So we're going to use this zeroes A cane But instead of passing five, we have to pass the shape what we want. We want three and four. We want three rows and full columns. Let's run this set. So here it is. We have the same output a game. In the next question he's asking for Generate 1.1 0.1 using numb by built in function. So if you remember, we have learned about zeros and ones we can use ones here and the dog ones. And how many elements were looking for 12345 play So notice that were very in the output. We need to pass him to print so that we don't see this out. Seven hair and we don't see Allie Toe Print is an official way off getting out in fighting it is regard the output what he was asking for. So in the next question, he's asking for generate and airy off five pens, so we need and airy each element should be turned. This is again a very simple task and can be implemented using ones what we can do. We can copy here and be daughter ones, and this will generate and any of five ones. And no, we can tippler back 10. And here it is. We got an addict with tens five tens in our outward. So in the next question, he's asking for use airing to generate an area off even numbers between 50 and 100 and the hint is 50 and 100 are not including So let's use and B dog a range. We passing 50 in thunder, so we know this 100 is already excluded. But this 50 is including he's asking for. Even numbers will start from two. I know there's 1/3 argument that we can pass. He's asking for even number. So we need to air step, which is to touch on this court. So here it is. We have 52 54 56 After 90 years. Let's disable in the next Western he's asking for generate an airy, off 10 linearly spaced points between zero and one output step size as well. So if you remember Lind space function, we used that function to create limit Lee spaced points. Let's try that function here and be north. Ling speaks and what he's asking for between zero and one and the third argument is home any point we want. We want 10 point. So let's run this court. We have added from 0.1 point two between zero and one 10 point lead nearly space. So one thing is missing here. We need to step size as well. If you go to the dark string, we know we can print step size with R E T. Step True, that's copy this thing here and passing. Yes, true and run this one. So here it is. We have the step size as well. So in the next question he's asking for performed the falling tasks. The 1st 1 is generated a vector airy of 25 numbers using a drink and after that right accord to convert the vector and 22 dimensional metrics using reshape, Can we use shape instead of free ship as well? Just started it double R and the dog. The range is asking for 25 so let's bring this air. So heritages we have You don't want to 34 So the next he's asking for Second task right accord to convert a vector area into two D matter excusing, reshape this con wards area into to demand if we know we can pass in the shape here. If we look at the docks, shrink. This is the shape so we can pass in the shape. And what is our required shape? Fly by five. So five five it is regard the same old, so the next task is can real shape instead of reshape aspirin? Yes, we can use shape instead of reshape. Let's try to use shape here instead of the shape, so ah, no, she equals do 55 This is whole used to generate Ah, this whole We used to convert one dimensional area toe two dimensional area. So now it is. So we have converted one die mental ari to two dimensional Eriks using shape instead of reshape in the next question he's asking for. Please generate the falling metrics. So we want this. Patrick's in the output. What we can do. We can simply generate a metrics from 1 to 25 and divide that by 10 to get the output What it is here. So let's try this one. End the dog a range pass in warm to 26 so we want one till 25 26 is excluded. As we already know, Let's bring this one home. It looks like that's really she This to find by five. Now it looks like we're approaching towards the solution. If we divide this one by then, here it is. We got the same metrics in the output. In the next question, he's asking for right accord to generate the outward below, use Lind space and print, so have to generate its magic here. If you look at the output, we have 0123 up to 24. Let's use and be dark, clean space zero zero, 24 and we have 25 element in this metrics. Let's run this thing so we have 0 to 24. So we have 012 up to 24 in total 25 elements. So let's free share fighting to five. So we have the metrics here now, So if we look at this thing, he is using print statement and he's printing these steps size at the end. If report this one into a print statement. So we got the metrics here and now we have to print step size and we have to write this string as well. Well, this step size is no. We know we can get step size from in space. And if the person and if that person index warm, we get this step size. So this is what the old Booty is looking for. So the next one he's asking for, what is the main difference between leaned space and eating? If you remember, they didn't take third argument. As a step size, however, Lynn Space takes third argument as number of points we want in the output. Like here, we wanted 25 forints between deer and 24. So this is the major difference between a range and Lind space. In the next question, he's asking for how to generate single random number using them by building function. So this is again very simple, and the door crescendo. No sprint one. So if the person wants Reville gender, it is. Things are random number and your arms and may be different because this is random number generation. So in the next one, quite a good to generate seven into five my tricks off 35 random numbers. So now we have to generate in metrics lead 35 random numbers. So once again with envy dot random dog training, we can do in two ways, like we can generate 35 numbers and then reshape to salmon fights. Or so this is what the output is. And he's using friend a game we can do in another way as well. Like N B. Dawg, Brendan Rand and we can passing are going seven. Why? So we can directly chin rid a metrics with seven into five ship. So let's bring this one here on. So we got the metrics with 35 random numbers. The next Western generated falling metrics using number five building method for identity metrics So and be dog ive way you five. So we got an identity matrix. We can multiply this by five and this is our output generated falling metrics at a duty and replicate the provided outputs. So no, these are a couple of questions 18 ABC and so on. Let's send it the metrics. First, he's asking for every two d equal toe end. Be dot the news, the range, and from 0 to 29 bring you to pass in 30 here, and then we can unleash you to 12345 into 123456 six rows and five columns. Let's bring this plan tricks here. So this is the metrics we want. Necklace. We need to get the part a slice from this magics. 17 18 19 17 18 1917 22 27. So this part here we can grab apart here like enter to D anything, parson. So I can pass in 123 60 and three and everything after wards and call him one, too. So it's two and everything afterwards. One do in this room. 123 and this one. So we're grabbing this section here. So her it iss in d. C. He's asking for to grab 29. So 29. Is this corner element here? If we look at the magics 01234 Tzeitel 12345 So row five and call him for what we need to pass in five and four. So here it is. Now who's asking for this output here once again and what we need to grab. So we need to grab 16 11. So what? We can person everything and then three and ah, 1 to 2 and you don't warn 2012 Yes. So if we run this court here, Yes. So we're getting 16 11. So the next one is is asking to grab them. 11 12 13 14 10 11 12 13 14. So 10 11 12 13 14 Is her drool in the column so we can grab the so we can grab this troll to D to so rule that index to and in column. We need everything it is, sir. In this one, he's asking for two rows at index to and, uh, three index to an index street. Let's copy this one here. No, he's asking for index to entry. So we need from two up to four and four is not included. It will come to entry. Everything calling. So hit it. ISS in The next one is asking for Can you killed lady? Some off all the numbers in area two. So, yes, we can calculate the sum of all the numbers we need to call. Not some here. So if we call dot some here, it's going to return the sum of all the numbers in the metrics next one is helpless, Some off all the rows and columns in territory. So now what we need Rule some and columns. So next door this thing Notice that we have to use the train statement here again Sprint. It'll so is equal to no good, uh, to the we called some money and now we have to call on Axis. So the 1st 1 is roll some. So we called access one. Let's coffee this one place to tell And no, this one is access zero So this one is column Cem So here it is. We have raw some 10 35 60 85 and call them some electron is calculated the standard deviation off the value In Italy, we can call standard deviation function on metrics. Ah, Tour de STD. It's again to pull. That's wrong. So here it is. The last run is he's asking for create a bullion mask and list out the numbers that are not divisible by three in actually so force will have to create a mask And what will be our mask? Zero. We want divisible by three. So any number if we divide it by three, we should get zero if it is divisible by three. So what we want to eliminate if they are not equal to zero. So as usual, not equal to, um, two D and we want to know the remainder. Three. Let's bring this pool mask just to see our looks. So here it is to filter off the and immense we have to pass in this 1,000,000 mask to very truly Let's commented. So there it is. 124578 and 11. And this is what we need in the so guys. This was all about the solution part us numb by I hope you got a very good understanding on numb by and it's ah, basic concept. Once again, number is very important to move on in the scores. If you think you need more practice, go ahead and devised this section soon. The next lecture. Well, we will learn another very useful bite and library for their designs. Good luck 8. S4: What is pandas? A brief introduction and installation instructions.: accident World Guys Plus Terfel. Congratulations on finishing numb by section. Now let's move on to the next very important by turns Library turned us. Hannah's is an open source library that provides easy to use data analysis tools for the Python programming language. This is built on numb pie, and that provides a reason to learn. Numb by fast. Carter's provide fast analysis, data cleaning and data preparation. It excels performance and productivity for the users Honda's has built in data, Realization features astral, and we will along these features. In becoming lectures, planners can work with data from white variety off sources, and we will see how different type of different source data from different sources can be simply imported in palace. Now you need to install this library. If you have and a condom distribution installed, go to the terminal and type corner install turned us. If you don't have in a corner distribution installed, you can use people. Come on, Pip install turned us. So see you in the next lecture with installed partners. Good luck 9. S4: Pandas Introduction.: Hi guys. Welcome to depend us. Essentials Palace is an open source library providing high performance, easy to use data structures and data analysis tools for bite on programming language. Today, pandas is actively supported by community off like minded individuals, our own divorce, who contribute their valuable time and energy to help make open source span as possible. Fightin for data analysis is a great read by rest McKinney, whose de creator Off Panels Library. In this section of the course, we will learn to use band us for Did I analysis. If you have never used Fina's yuk unthinkable planners as an extremely published version off except and with a lot more features, we will cover the falling key concepts in this section. Siri's data frames indexing and selection. A radical indexing did a cleaning preparation and handling the missing data. We will learn about merging, joining, combining or concatenation. We will talk about the data aggregation and group by. Along with these, we will talk about several other useful matters and operations and much more. And at the end, two full data analysis exercises to practise the skills. So the first thing is to get this library working. I hope you have already installed this powerful library. If not the best fit installed pandas for this course is using an a corner distribution. So use Kanda installed, turned us. And if you don't have any condom distribution, you can use pip install turned us. So seeing the next lecture well, we will talk about Siri's good luck. 10. S4: Pandas Essentials - Pandas Data Structures - Series: hi guys. Welcome to depend US. Data structures, lecture CDs and data frame are to work cost data structures in partners. Little talk about CDs first, Siri's is a one dimensional, every like object which contained values and an area off labels we should associate it with . The values serious can be indexed using labels. Siri's is similar to number three. Actually, it is built on top off, numb by every object. Siri's can hold any arbitrary bite and objects. Let's get hands on and learn the concept of cities with examples. As usual, this reference notebook is provided in the course material. You can always come back to explore more in this reference or to start with. Let's create a new notebook and start walking that one. So the first thing first we need to import numb by and find us. We know we import numb pie as and beat and let's import burned us as BD and B M PD are alias for number and partners. So let's run this cell. We can create a series using list number, area or dictionary. Let's create these objects and convert them into panels. Siri's so the first thing we will try Siri's using list. So let's create two lists. My leave a bull equal to say, X. Why and is it and my data equal to say 100 200? See her under? Let's talk about my gator for us and convert this my data into Panda. Siri's. We can access plan our cities as PD dog. You can shift tab and auto complete. And if you want, you can shift tab and explore more about the document string. Let's talk about data at the moment, so we've passing data equal to my data and let her on this cell. So we got our cities. Call him 012 which is This column here is automatically generated index for elements in series with the 102 103 100. We can specify these index radios and call the data points using these indexes. So let's pass my labels to the cities as index now, so B D dog. We can shift tab again to do auto complete and say data equal to my data and index equal. Do my labels. Let's run this one now. So now we have X, Y and Z as our index and 102 103 100 our data, we can write this court as my data, an index, aspirin. And the reason is, if we explore shift Tab, we see that the 1st 1 is data, and the 2nd 1 is in day, so this is already in order. So instead of writing data equal to my data index equal to my label, the Siri's will take the 1st 1 as data and the 2nd 1 s index. So let's try to create Siri's using numb by area. So first thing, we have to keep an area my very equal to. We know which country it number, area with NT dark area. And we know we can convert the list. So let's pass in my data here and and that's from this and print this one. So we have in, um, pie every 102 103 100. We can convert this PD door CDs and just passing instead of list passing. This very we can see data we don't need to What we can still say. They're So data is my area, and this is now panda cities created using number area so similarly, we can pass in index as well, which is index equal to my label. So we get the CDs from number. Dictionaries are very useful to create panda CDs and ideally, like this to period panda cities using dictionaries. Let's create the dictionary forced and then convert that dictionary to parent a serious my dear equals two Necks 100. Why so 200 The same day that we can use and that 300 that run this thing and see hold additionally looked like as usual. So we have digitally acts. Why and the are the keys 102 103 100 rd values? So that's passing this dictionary to P D dot cities and run this self No, we have series from our dictionary. Notice the difference here. If we pass in the dictionary to Siri's Pandas will take the keys as index or labels, and it will take values as the data this is this is quite conveniently, ask editing. Siri's serious can hold a white were idea of object types. Let's see with some examples. So instead of passing data as my data, we can see P d dog CDs, and we can say our data is my labels, which is actually a list of strings, so Siri's can hold strings as well. So just for an example, let's pass in building function to CDs p d dot CDs. See list off some functions Mean Max. Some is another building function print. Let's run this thing here So we have a panda Siri's that has built in functions minimum maximum sum and print. This is just one example. You may not see this in the Zeal War, actually, Now, once we have CDs, most important thing is hold to grab data from Siri's. So indexes are the key thing to understand. In Siris, Banda's used these indexes, which are names or numbers like In this case, we have X Y Z, and in this case we have 012 Our fast information retrieval index work like a hash table or a dictionary. To understand the concept, let's create three series. So that's your dictionaries forced, I say. We have strong too 500 Callie Calgary 200. When cool See 300 and Long Trio SE 700 second Dictionary, we can create Let's call this one and ah, let's delete trunk call and we have Calvi 200 Court 300 Montreal seven under. So that's create an other dictionary. And along with these three Ed and other city, see Jasper, which is a very beautiful place to visit. If you I really want to drive, tow the Rockies in Alberta. So let's from this cell we have T dictionaries, Dick. 12 and three. Let's rely on this cell now. Okay, so let's create Siri's ones s r. One equal to P D North CDs and what we need to do. We need to pass in this dictionary here. That's Poppy. This for base here. Two times changes to to and three. And this one is our to Yeah, three. So what we did recreated three dictionaries and then we were cleared in three cities and passing these three dictionaries to P D dot series. That's from the self. So the good thing is, check how these Siri's look like so we have S e r. One spr to that. That's three. So these are over t series with with Keys are now index and numbers are now data grabbing. Information from series is very much similar to dictionary. So what if we want to grab Calgary from C one tell agree So we got the data Far Cal. Similarly, we can grab se instead of calorie. We can get passing, We can pass in Tronto here and we get 500 zar Tronto now. One thing notice. Here we are passing its string which is Calgary here. Tryingto here over index contained these drinks which is names off the cities. If the index is number, then we have to pass a number instead of this string. So just a quick north here when we are only passing a dictionary the index in the resulting Siri's will have the dictionaries keys in sorted order. Let's even look here. Look in dictionary one. The 1st 1 is Tronto. Second is Calvary. Third has been cool and forth his mantra. But when we printed the Siri's, we got Kelly first Montreal second Tronto end in Vancouver. So the order is sorted out in Siris would see empty and V. If we don't want this order, we can override this, bypassing the dictionary keys in the order. We want them to appear in the resulting Siri's so moving forward. Let's talk about some basic operations and CDs. So the basic operations on Siris are usually based on the index. For example, if we want to add Series one and cities too, let's try to add them for us. So we have seen one and see to what if we see one plus to. So this is what we have. So what's happening here? It tries to match of the operation based on the index for Kelby Montreal and when cool, it adds the values like Calvary Calorie two plus 2 400 mantra 700 plus 714 100 when cool or 600 which is 300 plus 300. So it's adding the values here. However, for Toronto it can find a match, and the output is in the end. So let's say this is our city fall and then see for here it is known as tried to add See fine Eker toe S E. R. Four plus. Yes, we are. Treat is our 3rd 1 And let's prince s five s letter. So here we have Calvary mantra and then four. It didn't find the match for Jasper and Toronto and returning any so once again, the values phone in the series were added for their appropriate index. On the other hand, if there is no match, the value appears as any end, not in number, which is considered in parent as Tamar missing or an A value. So this is missing data. So moving forward, let's talk about fuel. So moving forward, let's talk about some useful matters and attributes here is no not know is no returns. True, if it doesn't find a value, however, not Mel returns. True. If it finds value, let's try to use these functions on, say, City four C S E. R. Four dot is multi. So we have 400 souls. This is not not 1400 falls. This is not not the true value where it finds N and it returns true. So let's try not Melo on S E. R. Four. So this one is opposite to Islam. We have false and Truls here. Taxes and values are too, with useful attributes that we can call on Siri's excess returns list off roll axis labels or index, whereas values returned the list off values which is over Tater. So let's call s r four dot e x, we can shift tab and it is we have. Str four Calgary, Montreal, Toronto in core and not if and what it's called values. Here we got the data 414 100 and in which is missing their dire and 600. So let me introduce two very important and very useful functions. HUD and deal. Keep it in your mind. We're going to use had until quite often so had entail are used to view a small sample of Siri's or did afraid. Let's try. S e r for door had, If you press shift tab. There it is. It has a deformed number five but we can pass in any numbers American passing to here. So this is the sample off over data in S E R. Four had entail is also used for data frames. And we will talk about different in our next lecture. We can called Dale to you, and it is showing the last two values when Court Toronto and we're tryingto and forehead plus two Calgary, Montreal, Calgary Montreal size is an other very useful attribute. And let's use this on a CR four, and it returns the number off elements in your data like we have 1234 limiting our data it's returning number of elements in our data. Empty is an other useful attribute. Here. Let's run this one empty returns. True if the series is empty, so we're getting falls because our Siri's is not empty. So as a quick overview in this lecture, we learned about creating Siri's from lists, number areas and from dictionaries and dictionaries are quite easy to create cities in Panis, so cities can hold all kind of data like these are building functions, strings numbers. And then we learned about some operations on Siri's and we learned about is no and not know how to. No, if the data is missing or not, we learned about access and values, and two very important methods had entail that we're going to use them in becoming lectures quite a lot. So see you in the next lecture, where we will learn about digital frames to expand our concepts on Siri's good luck 11. S4: Pandas Essentials - Pandas Data Structures - DataFrame: Hi guys. Welcome back to the parent as data structures in the previous lecture, we have learned about Siri's Let's trackable data frames now, which is the second workhorse for Penn US. A. Very simple way to think about the data frame is a bunch off series together, such as the share same index. Digital frame is a rectangular table off data that contains an order collection off columns , each of which can be different value type like numerical string bullion, etcetera It a frame has put rules and column index like in table. It can be thought off a dictionary off series, all sharing the same index. Let's continue with some examples and move back to the Jupiter North Book where we were working action. So we don't need to import Paradies and number because we already have planned us and not by imported if you need. If you're creating new notebook, you need to import Pallas import number Pi S and P Import Find as best PD. If you are using new notebook, you have to import numb by and panels as PT so moving forward. Let's create two labels are indexes, one for rules and one for columns we can call for Rose are oneto art in and for columns. See? Want to see him so index equal to we can use Split here on 12 See 45 six Someone you nine 10 We can use split here and calls equal to so c one c two c three four by 67 you mind and and split split will split the split will take it at the string and split the values at white space and return the list off. See, once you toe up two seater and now we need data as well. So a to d we can use numb pie peerage. I hope you still remember that from 0 to 100 and we want two dimensional. Very And let's disease she these 2 10 by 10 that from this self eso we have index calls and very to the as a data Let's check that it's all this good to check. So this is our index are are to do our 10 calls This is oh calling and this is already let's use index column and Ricciuti to see it A data frame now su pandas provide p d dot data Frank, if we check its documents string. So you know, we pass in the Charlie Passing Index three passing columns. Deter type. We have data. We have index. We have columns. If you want to read more about data frame, you can scroll down and read this document saying this is always useful. So that's passing. Our data is two D, and our index is index. Oh, columns, Conchs. Let's say this is equal to D and then we run this cell and let's see, hold this DF looks like Here it is. This is our first data frame. C one c two C three up to seton Our or columns are oneto art in about our rules. There a two day 0123 up to 99. These are the values for the respective column and drool for our digital frame. Each column is actually a pendant. Siri's shooting common index like rule labels here. So once again, the most important thing is grabbing data from the state. A friend let's learn how to grab data that we need. This is the most important thing. We want to learn, Dermawan, because we're we're going to walk with data frames, and we need to know how to grab the data from the data. So what if we want to go a single column? Sissy one? We pass in C 12 did a frame, and here it is. This outward looks like a series. Also, the cities have the same index and our debt offering, like our one to art in our Oneto art and our Siri's is calling one, which is the State of frames column C one. Let's check. It's type we know we can check. It's time using T v I. P type and passing this one. So here it is. This c one is a panel score series time. So what if we want to grab two column instead of one? Then it's again very simple. We simply passing list off those columns that we want in the outward heritages. So now we have two columns even and see two, and both are sharing the same index. Similarly, we can grab 34 like we can grab, say, instead of C one week and graph C five and C three so C five and C t. The order doesn't matter here. Whatever we're passing the we're getting in the output. So moving forward instead of passing in the square brackets, we can say the f dot C, even aspirin, so it's returning the same value. What we're getting from this score, however, this is not commonly used. It's always good to pass in the columns in square bracket. And if we want more than one or two pass in the list. So but it's good to know that we can access this column. The daughter so moving forward. So let's bring the data from again. We can add new column over two different like the f We want a new columns that new we can do some addition here. D see, we want See you one plus the, uh, C two. So what we want? We want to add a new column, which is actually the summer off. Call him one and call him, too. And, uh, we want to see how over two different looks like. So here it is. We have a new column which is actually the some off C one and C two. So what if we want to delete a column we can call Drop on our did a friend like the f dot draw strike this one in passing new here. This is not going to work, and I will tell you why it is. It's telling labels new not contained in access. So if we check the documents showing here, the default value off Axis is sealed and zero refers to the rules. We want to delete a call him so we need to pass in the Axis because it's taking zero here. We need to tell him it's not cereal. It's one here, So let's try one. It's working now, so we don't have the column. New here. Let's take if we have deleted this column. New let princely effort game. Oh, it's back. So we have. So we have not deleted this column. New. It is still a part of data frame DF, but here it's not coming out because we are dropping temporary here to delete this column. We need to tell data frame that. Please delete it. Actually, pandas is very generous. It doesn't want us to lose the information by any mistake. To delete this permanently, we need to tell partners to delete it. The F dog draw. We want to delete new and exes wanted to leave the column, and we need to pass in another argument in place. True, if we check the doc stream by default in place is false. So when in places true, it deletes to call prominently. So let's check over. Do nothing now. So now we have deleted this column New. So what If you want to walk through the doors, we know how to receive data from columns we want really treat rose by their name. Our position so far rules. We have L or C and I m Policy Functions LLC excess A group off rows and columns by labels like Let's see if we have DF Door and we'll see we passing, we want part of one. We want this first row. So our old put is C one C two, all the columns and the respective data. On the other hand, I l l C is index based location like this one is 012345 up to certain index. So what if you want to get data based on index than I love? That's passing in that zero for the same data for our one. So no So now we're getting the same day that based on the index location, we can pass in. See why, and we have fifth index 01234 and 5 50 51 52 sore. So what if you want to grab more than one draws? It's the same every passing D f thought hell will see in the past in list one are to and run this one. We have R one and R two in our outward. So what if we want to grab a single element? Say we want to get zero one or maybe 11. We need to pass in the location for that single element like 40 The rule is our one, and the column is C one We can use the F dot and we'll see and passing the location are one , and then they call him Seat one. So let's run this cell and here we have. So we pass in the location for zero and we're getting zero in the outward so we can grab the subset off. This did a free like they want to grab 01 and then dilemma. This is a subset of the state offering. We need to pass in the list off rules that we want. In this case we want R one and R two 40 10 1 11 and column C one and C two for 01 10 11. So to grab the subset of the data frame, we need to pass in the list of rules that we want and the list of columns that we want. Let's grab these element 01 10 and 11 the dog a little sea and then we need to pass in the list are one are to and for columns, See you want see to Let's run this So so here we have a subset off other data frame so we can do the conditional selection as well. Let's try condition, say the is greater than five on whole. Did a frame DF and see how the output looks like. So here we have we're getting falls for all the elements with the condition is not satisfied, however, where the condition is satisfied, we're getting all truth. So this is similar to numb pies. 1,000,000,000 mask. Let's try to create a 1,000,000,000 mask for our data frame here. Goal a mask and say, We want the F Mourned said three. Equal to see you. That's run this cell here. So now we have pull mask. Let's see how bull masked looked like. So here we have where amoled the condition is satisfied we're getting true and where ever the condition is not satisfied, you're getting false. Let's use this 1,000,000,000 mask on our data frame now, so D f A mask. And if we don't just think we're getting zero and in an entry, So where Amedy condition is satisfied, we're getting the number. And wherever the condition is not satisfied, we're getting an A. One thing we want to mention here that it's not common to use such operations on entire did different. We usually use them on selected columns or rows, for example. We don't want a row with an N values. What should we do then? Let's have a look with an other example here, so our data frame is DF. This is our duty free, so let's apply a condition on column C one the f. So you want you over Call him and we apply the condition greater than dilemma. So let's run this cell here we're getting output R. One R, two R Falls and all other are truths, so we don't want r one and R two As they written and in or null values. Let's filter the roars based on condition on column values. In this case, this is going to be our bullion mask. Let's call it, be them equal to and run this self. So now we have B M as a pillion mask for this condition. We need to pass in this brilliant mask to overdo it. A friend d f me. And if we don't this so we are not getting arm and are too, because they got no values and we are filtering them out using this 1,000,000,000 mask here so we can write this in single line. As for the and passing simply this core hair and if we're on this, we're getting same output. So this state of frame is with applied condition, which is C one is greater than 11. We can select any column from the state or friend. Let's create a video able result and passing this data frame with condition to result. So that's from this cell again. No, we have Now we have a variable results which contain a little frame with applied condition . Let's see, holding results look like Here it is. So this is a data frame with no r one and r two. We can grab any column. It is old and we can go see one. Here we have RTR for our fight and all those radios so we can do all these operations in a single line as well. And at some statement, you will get familiar. And when you will get enough practice, you will be able to do all these operations in single line as well. Let's try to do this single life. So we have a bullion mask really are filtering out, see one little, then alot And this is our brilliant mask. What we to we passing this mask to the F and then what we do next? We want to select a call. Um, so this is Stephen. I hope you got the idea. Fasting is creating a 1,000,000,000 mask than passing the 1,000,000,000 mass to deter frame to filter out what we want. And then after that selecting call, see, one that's run this certain there. So we're getting the same off. Put in single line cord. So what if we want to grab two columns, we simply pass in. Yeah, Second calling. See to in a list. So now we have a list of two columns that we want in our outward. So here we have two called so we can get another one, say, 3rd 1 and run this self. So we're getting three columns in the outward. So far, we did these operations on the columns weaken dough these operations on the rules as well. And in that case, we need to use Loc. Let's copy this one. This is our conditional data frame. And then call loc on this conditional data frame and passing. Say we want r three and r four rt list off rules that we want in the port boat. Let's run this up So we got our tree in our foreign. The output. Another thing you may want to know like you want to such a role that has some kind of, say, 70. Let's return a role from over data frame that have value 70 in column C one. So in that case, our mask will be C one and what we want equal to 70 and and real pass in this this two DF and and let run this self. So we got the output are it which contains 70 in column C. Even so, we can combine to conditions as well. Let's try on C one for evil, you say greater than 60 and on c two foreign values a greater than 80. So that's right down the conditions for us d f. And we said C one see one greater than 60 and the and the f see to we said greater than 80 . So if you remember, it's always a good idea to separate them using parentheses. Now, this is our one condition on C one, and this is our second condition on C two and and operator, we want both conditions to be satisfied. So this is our mask. What we need to do, we need to pass in this Masks too. The f So this is a mask with two conditions. One on C 11 on C two and we're passing this mask to DF. And let's run this what we get in the out food. It's all heritages. So based on this condition we're getting two rules are nine. In our tent, we're see, one is greater than 60 and C two is grilled an 80. Let's repeat this cell big a nd end and run This this is giving us our it saying the truth with you off a series is ambiguous. So this in biggest means true on Lee. Work for a single bullion at the time, true and false. So the above court using and gets confused with series of true and falls, which is his hair. So we instead of a nd n, we have to use this and symbol. In case off our operator, we need to you the corresponding symbol for our operator, which is this bar tank off object. Here it is so moving forward, Let's have a quick look on a couple off. Very useful Mattocks. So that's trackable creases Index and set index post. Let's output our data frame first, so this is over detail frame, and we want to reset the index 40 state of frame to it's default to no medical index, which is 0123 To do this, we need to call the F dot reset under school index matter on our duty free. And if we look at the documents string we have in place falls, which is default, will you? So let's copy this one and and paste you tear and set it to true. And let's run this cell and see how digital from looked like now. So here we have. So now we have the index off over did offering, which is in no medical index and default index, which is 01 toe up tonight. So another very common thing. But we face most of the time like we have a data and we see it is a column that can be used as an index, and that is a very useful. If we set that column as an index, let's create a colum post and new end equal. Do yeah be see Dane E, uh, CI. Each eye and cheek the 10 alphabet We can you split to create a list and let's print this answer. So this is our lists. Let's add this new end as a new column toe over the different, and we know how to add this as a new column to our data frame the, uh, new in and equal to you're. And let's ought to put our data off ramp as well. So that's run this self. So we have new end, as you call him, you know where you different. So we think this new end is a very good index for over it offering, and we want to set this new in as an index ordered offering here, we can call set index on over a tall frame the F dot set underscore. Remember, we can always use tab for auto complete set index, and if we check shift up, we have We have keys and in place falls, so we want to set in place. True, because we want a permanent change and keys is the column. Labels off list of column labels So we have column labels, which is new in and no in place, equal to true. So let's run this cell and see hold the date A friendly it is. We got new in as a new index will did a frame. Another set off matters is head until plant dear 100 turns the first end rules over did a friend. So next call head on over the different the daughter. If we press shift up, the default value is five. So but we can passing any radios. American we want see everyone to. So we get first to rules off over the different. And if we don't pass anything, so default is five and really getting five in the airport. That's called tail on our duty free in passing to hair as well. So it returned the last two rules In our data frame. Info is another very useful matter, which provides a concise eyes summary off the deed, offering Let's call in full on a warded off the DOT info and run this cell. So what? It tells our data flame has 10 entries you need to change, and these are the columns. 10 values in each column, Norman and they're type. So it's a concise summary off overdue. Different. At the end of this lecture, I want to introduce an other very useful matter describe. Let's call, describe, don't overdo the frame and see what it is returning. So here it is, describe actually generates a descriptive statistics that some rises the central to Nancy dispersion and shape off the data sets distribution, excluding and in values see here we have count 10 each mean STD Standard deviation, Mean Max and so on. So, guys, this was all about data frames at the moment. I hope you got a very good understanding on paradise. Did afraid Aikman revised this lecture because we are going to use data frames very often in our coming lecture, especially in machine learning. In the next lecture, we're going to talk about high radical indexing. This is another very important talk that's more want to the next lecture. Good luck. 12. S4: Pandas Essentials - Hierarchical Indexing: Hi, guys. Welcome to the hierarchical indexing lecture. Hi. Radical indexing is an other really important feature in partners. It makes it possible to have multiple to arm or index levels on an axis somewhat abstract Lee. It provides a very to walk with higher, dimensional later in a lower dimensional. For let's start with a simple example off Siri's Let's jump onto the new North Book and start with importing Numb Pie and Thomas Import Slump I S N P. Import. Find us as PD from these sons. Let's create a list off lists for index index equal to, say list here. You and you know that you and then de be and be and then see one more scene and let's at D and and other d. So and the second list is one to three, one to three. Again want to 12 So what we want? We want a BBC D as an index for our level one and want to these numbers as index for level two. So let's run this thing here and see how the index slow play. Here we have index. So let's get a series s a r using PD dog Siri's. And then we can use numb pies capability off generating random number and p dot random. Remember, you have to call ran in from random in numb pie. So NP doctrine them dot r e n d n and then passing 10 random numbers we want. So this is our data for cities, and now we have to set index And what is our index Peaches index? So once again, let's have a quick look and p dot random dot r a nd and is generating 10 random numbers as a data for our parents are Siri's and Index, which is our list off lists, is going to be the index for our cities as you let's run this so and see how the S e a looks like. So here it is. This is our outward. So we see our Siri's SPR has two indexes, ABC Indeed. 1231231 toe, one toe. And these are our random numbers generated by our in the so with the heretical indexed object. So called partial indexing is possible. This enables the concise selection off subset of the data. But if we select a from our Siri's So we're actually selecting this part If we're selecting be we're selecting this part because B and this 123 is the sub index for the so for C we're selecting this far. Let's try S e r e. So here we have. So we have this part with Onley One selection, which is a We can further city like this. Well, you say by simply passing say this to here and to hear So what we have in the output 1.627 what we're doing. We're moving from level one to level two and grabbing an element from the series. So this was high radical indexing with Siri's. Let's try another example that data frames So with data frame, either access can have a hypothetical index. Let's create a debt offering here D f is our duty frame e d door did, uh, training and let's again use numb pies, earing chair and P dog a range and we want 12. Unless really she this to say four by t metrics and, uh, said the index equal to say you hey, be to me and the 2nd 1 equal to one to 12 and then set the columns equal to say it would be , will end and say, BC. Let's move this back toe the same line what we're doing. We're creating a data frame using numb pies, a range method generating 12 numbers, reshaping them into four by t metrics and passing. List off lists as an index and list off strings A B o NBC As columns toe over the different Let's run this cell and see how the data from looks like So here it is. So we have a data from now with Level one index A B and Level two index. 1212 We have multiple indexes on draws, whereas columns are A B, O and NBC. So now the question is hold index the state a frame on column access. We just use a normal practice notation with data frame plot. Example. The F and passing A B and run this cell, and we're getting a B and D level to index Want to want to, and they call him a B 03690369 And in the similar way, if we want BC, we're getting 258 11 in the output. But if you want the rules, as for on the rue axis, we use Loc like if you want rules DF daughter and we'll see And then in brackets we passing say the A hair or be here that's pressing. Be here and run this So we have this portion this subset off dead offering be 12697 10 it 11. So what if you want to grab a single value? The idea is to go from outside to inside. For example, if you want to get five here, the first step is to grab a and this whole set that's called instead of the that's call here and months sense. So we have 1203 14 and now we want five. So on this one, we have to call and we'll see again and what we want to go to here and after two, we want the sea because the location for five is too BC to what? We want to pass him, too. And then, bitch, call him me. See next on this end. So we're getting fight in the off. So just a quick overview on this line, the idea is to go from outside to inside. So first we're grabbing the So first we're moving from outside and grabbing a portion or a subset off over different using A, which is the Level one index. And after that man, we have the subset of our data frame with A. We are using LoC on that subset to locate the value five here so moving forward, the hierarchical levels can have names as strings or anybody objects. If so, these will show up in the console outward, and we can use index dot names on our data frame to get the list of those levels. So let's call this index dot names on our data from the F so we don't have any names for our levels in digital free. Let's give names to these indexes. Let's name them level one say L one and l two. We can use DF dot index doctor names and set it equal to a list of one and to So we are naming the levels l one and l two. That's parts. Let's run this court and run this. DF dot indexed our names again. So now we have the names for our indexes Level one and level two. Let's see how the data from looks like. Now, here it is. We have given the names to these indexes and one is the Level one index and two is the level to index. We can select any name according to our convenience and based on the data, what we are working with so moving forward, it's very important to introduce an other very useful function. X s excess has ability to go inside a multi level index. This returns a cross section, which is rows or columns from the CDs or did offering Let's call X s bone over the different D f the f dot X s and passing Say here. So no. So what we're getting we're getting a subset off over due to frame for index where the level one is a. This one is a very easy example. Let's talk hold and other one which is a little complicated if you want to grab all the data. Indeed, a frame with index level two is one like in this case, if you want to grab the first rule, this one, uh, this one with level two is one It is tricky for loc method, but X s will do the magic hair. Quite example. We tell X s what we want in the art food. So here we want level two, where all is one in the outfield. That's call X s own different. And let's look at it. The document string. Here we have we have key, which is some label contained in the index. And we want one here, and we want to mention the level hair as well toe Let's pass in key, equal to one level equal to l two and run this self. So here we have so access does the magic in this case. So there is This was all about high American indexing. I hope you got a very good understanding. An idea about the multilevel indexing and how to grab data from multilevel indexing. Once again, the idea is move from outside to inside. So see you in the next lecture where we will talk about handling the missing data. Good luck 13. S4: Pandas Essentials - Handling Missing Data: Hi, guys. Welcome back to the partners Essentials. Now we're going to talk about the missing dead. Missing data is very common in Benedita Analyses applications. Panis has a great capability to deal with missing data. That's lance. Um, convenient matters to deal with the missing data in Panis. So let's move on to the notebook and we're going to start walking with the same notebook very left in the previous lecture. So we don't need to import panders and numb pies again. Let's start with creating a data frame with some missing dead. I'm going copy called From the Reference Notebook and let's run this. So what we're doing, we're creating a dictionary with keys as a, B, C and D. So a has been used one to and p dot And so here we are, using numb pies and in and then four again numb pies and B has fall and in reviews, see doesn't have any missing data and d we have and other missing data. 16 np dot intent and we're passing this dictionary two pandas data frame and creating a data frame. The F from dictionary. So what has happened? The data from will take A B C. D s columns and these lists healthy values to digital frame and it will generate index 0 to 4, which is the numerical index. Let's see. Let's see how the data frame looks like. Here we have the data frame. We have a B C D. As columns. You know, 123 is our no medical index and he has two missing values he has all missing. See, there's no missing value and D has another one. Missing values so is null and not eternal are very useful methods that we can call on over data frame to see if there is any missing data in the data frame, it will return truth and falls. Let's call is no on our djfmd f The door is no so we can check the doc string here and what we see returns a Boolean same sized object indicating if the values are now that's wrong. This cell here and here we have Walt, This is not No, this is not know two and this is true. So wherever it is getting an end, it is returning true. So let's call not, you know, want lower do Dufresne wherever it is getting marginal. It is returning true and wherever it is getting. No, it is returning falls. Let's go put over did off them again. Here we have awarded a frame a B C in the columns. So moving forward, let's call few other functions on a certain column on our data frame and see how they walk . So let's see if we call the F. I won't call them E and what we called some and see what we're getting. So the F A dodge some equal to seven So while taking the some and is considered as 04 plus two plus 17 What if we called me in here? Not did see holy clocks. So it is me. See, any end is ignored for mean as well, like seven divided by three is too pointy. Three. So it is not considering an n values in the mean as well, so it's ignoring an end. It's taking one plus two plus four divided by 123 values, which is 2.3, so n a N is ignored for mean Astra a few more functions which are very helpful. Drop in a. We can fill these and values. And this is all about cleaning the data drop and A and Phil and a set of very useful functions that are used in cleaning missing data. We can fill the missing data using feel and may and we can drop the missing data using drop and let's try drop and a on already different drop in a express shift tab and see the perimeter that we can pass. Access zero We know axes for rules and columns. SoHo. How is any are all any is dif Also, If there's any value, which is an A, it will drop, the truth slash hole is none. We can give you value here, say trash. Worley's default is none. Interval. You required that many known and the values we can say 3 to 1. So if you pass into a tree, all the rules with trash hole for non and a equal to tree will be in the outward. So in place falls and true once again for permanent change or not natural. This with default perimeter. So here we have. We have nothing in the output, only the collar. So So if you look at the data frame we have an end in each rule. This is why we're not getting anything in the output. Let's person that shift tab and copy. It's fresh air and let's pass in fresh Threshold tree and run this again. So now we have those rules in the output, which is three known and values. So the default exes is zero. It means this operation were performing on Rose. We can perform drop and a operation on columns as well. In that case, we have to pass in the access here so access equal to work. Let's delete this treasure and run this Celia. So we're getting a column that has no en en value and this is column C so we can pass in flesh. Father meter Here s will see fresh is three X is wrong, endless on this self we're getting. So we're getting all those columns that are satisfying this condition for trash equal to treat still in a We can use Phil and A to fill the values in the data frame where we're getting any end. Let's try feel any on or do different feel and a so if you press shipped tap, we can explore the documents string and we have values which is none. So we can pass in the values that we want in the end, in places we can use matter and there are a couple of metres that backfill forward Phil and so on. And if we want permanent change, we can use in place true and so on. That's passing value at the moment and see and see how it works. So value we can say filled and run this cell. So here we have. Wherever we had an end, we're getting filled. But keep it in your mind. This change is not permanent. If we print dear, we're getting same data from again because our in place is false. If we want permanent change, we need to pass in in place. True. So instead of value equal to fill, we can pass in some kind of say mean value for any column like the F, and we want to feel I mean Off column A. And this is going to be mean off column A. Wherever it has n a n. It will feel mean off column a letter on this end. So anywhere where it is getting an in. It's filling the value according to the according to the given instructions. So let's try matter here. So the f thought feel and you and instead of value, we say we're going to use method and say We want yes, still forward. Let's run this thing and see what is happening. Let's bring data from for comparison again. So here we have original data frame, and this is the forward feeling. What this F feel is during 112 to the any, and after two is again, too. So this is forward filling. It is taking the previous value to the next and N value and the last, any aeneas. For now, it is taking the previous value to the next and air for be. There's no value, so it's taking an air. Percy. All are already there, so nothing is happening for the is doing the same thing. We can use backfill here as well. It's passing be here and real in this cell. So what's happening now? It's backfilling and in the last value because it's starting for any and there's nothingto feel after that, and this value is filled with the previous value as a back. So the value at two is filled with four, which is according toe back for we can pass in our own value action. You see a value equal to zero. So wherever the any and is it's going to feel zero. So here it is. So wherever it's getting any and it's get, it's feeling that value as CEO. So this was all about handling the missing data at the moment. So you got an idea how to handle with missing data? The common way is using means some or these kind of matters wheels, backfill and forward feel as well. And it all depends what type of data you're working with. So this was all about missing data at the moment. So in the next lecture, we're going to travel data wrangling. We will talk about joining concatenation it sector. Seeing the next lecture. I hope you're enjoying this course. This is really overwhelming. Now we're getting lots of things in each lecture, and I hope you're following each and every step. Keep it in your mind, stay committed and try to understand as much as possible if you need before moving forward . Just realized the lecture so that you have better understanding, as all the lectures are somehow connected to the next lectures. Soon the next lecture. Good luck. 14. S4: Pandas Essentials - Data Wrangling - Combining, merging, joining: Hi, guys. Welcome back to the Pandas Essentials. In this lecture, we're going to talk about combining and merging data sets Data contained in panels. Objects can be combined together in a number of ways. March and contracting Mission are two ready common bays. March Connect rose in data friends based on one arm or keys. This will be familiar toe SQL or other relational database users as it implements database joint operations, whereas then cat or Khunkitti Nate stacks together objects along an axis. So if you don't know, ask you and don't worry the concepts off merging represented it. Very simple examples so that you can follow these steps. Although our focus hair is not to learn SQL, we only want to go through the vitally used and few very important inner and outer joining operations for the data wrangling. If you have questions, please ask, and we're more than happy to help. Important thing you should know is marching operation may give an end in the outward, and those n n values needs to be treated according to the circumstances or requirement. During the data analysis, let's move on and discuss these matters with examples so much or joint operations combined data sets by linking Rose using one or more keys. These operations are central to relational database To walk with, let's move on to the Jupiter notebook and start working with the data. So this notebook is same and we may not need to import panned out and numb pie because we have already imported pandas and numb by at the very beginning of this notebook to move forward, we need data to walk with. Let's create two dictionaries and then convert those dictionaries to data frames using PD tortured a different So dictionary, the one equal to we have que and our data is he the to see and ah, game. And we have another key, a one and the passing range five. So we can use range function to create a list of numbers for this key here one and and other key say be true and we again pass range and no, we started five till 10. So this is our first traditionally. So let's create another dictionary due to and we have t s key and re passing data. Yeah, be and C c. And we have another key. See a two here. So here we can again use range and passing three here. The other key is be to and, uh, once again we can use range and pass in 3 to 6 year. So that's on this cell and see how the one and the two looks like. So these are our dictionaries. Let's create data frames from these dictionaries, So the F one is are for state a frame b d dot We can use tab to auto complete and passing. Do you want for only F two b d dog you can use against Top to Artal complete Andy, too. Let's from these cells and see how the F one and we have two looks like. So this is our data from D F. One with a one B one and key columns, and we have data from D F two with a two B two and key call. So before we move on, let's explore march method. First, let's add fuel lines and type P. D. Thought Lord and Prince shift up to see its documents string. There are several perimeters that we can pass to the march matter, and the most important ones are home and on. These are our focus in this lecture, So let's talk about a little more on whole perimeters. How are inner outer left or right? Inner uses intersection off keys from both data frames similar to SQL inner joy, whereas Alter uses union off keys from both data frames, which is similar to SQL Full ultra dry if how is equal to left. This uses Onley keys from left frame, similar toe astral left outer join, on the other hand, right uses Onley keys from the right frame, which is similar to write out a joy on is a liberal or list, which is a field name to join on, and this must be found in both data frames. Let's try to Mars did a different musing inner operation. So we will pass How equal to inner although there is a default operation will still passing because we're learning here. So So let's march. Our data frames PD thought March and our data friends are the F one and the F two and whole . We want to march on in a and now we need to tell the march which key we need to Marshall and that he must be present in both Jeddah frames. So, Kiki, Eva is the so Kiki. Eva is the one which is present on boarded A frame. So we want to march on get you, I Let's run this cell here. So here we have the output so D and e didn't appear in the march Outward Innovators intersection off the key columns on Lee. This is why we don't have the e India to, and we don't get this in the old data frame. Let's try this one with the ultra call to this one. And instead of inner you passing older and let's from this cell here No, we have some end in values as well. So alter is a union operation. We're getting an in were used for a two and B two columns, and we're getting n n. For those for the values that don't exist in, Do you have to? Let's try another one left here. Let's run this cell and see how it looks like so once again and N Values for Index D and E in a two and B two, as they don't exist in the F two ho equal to left with only used the key column off the left did offering, which is the F one. This is similar to left outer. Join in. Ask your and we're getting an end for a two and B to, in our result, did afraid because those values does not exist. It's Troy. This try right here and compare hold. The is absolutely left. Look, this is different from the left. If you compare on the left, we are using the key column from the F one, whereas in right, we're using the key column from Do you have to? So do you have to Doesn't have D and E, and we're not getting the And in our outer data frame, let's try another merging example with two keys, say key one in key two columns. This is little complicated, and we have to create and other data frames. He left and dried, and I will go to the reference notebook and copy called From There. So here we have the court. Let's corporatist thing, and I will explain what it is. Next PST tear and run this one. So what we're doing, we're creating a dictionary and passing their tour did off name to create left and right data frames. Both of these dictionaries were getting Key one and Key two columns and we have String and we have list off strings Park even and key to and for A B and C D other columns in overdue . Different. Let's see how the left and right look like. So here we have digital frame left a B key. One key to CD Key won t two. So the key one and key to column there present in both the difference left and right. Let's add fuel rules to make a space here. And ah, so let's march p d dot march What we want to do We want to much left, right? I know what you've done left right on or do different and whole We want to march, you know it is actually default And now on because we want to merge and key one hinckley to we need to pass in the list here. So you want Pete to Let's run this thing and see how it looks like. So So as we know, inner is intersection on Lee. The key pair present in both data frames will appear in the result. So we have King a baby, The A c a. It a b is not present on board, be it. B is present in both C B and C A. So the only A and B a is president both. And this is what we're getting in the after port here. Let's try with outer join now classing ultra here and we know ultra is a union operation. So all key piers present in both data frames will appear in the So we have a, B, B, A, C, B, D, B and C A and their family. It doesn't get in value. It's putting an end there. So this is a union operation. Let's try left here. So for lack showing, the keep here in left will be used on Lee. So let's try this one here, and we have the left Jetta frame. We have key pairs A A, B, B and C B. Any keep here, which is not in left did a frame is not appearing in the result in different let's try the same thing for right now. So So for right join the keeper in right will be used on Lee. Let's run this thing now and we're getting the keep here, which is present in the right here, is present in board and ah, b A Is president ing bored? The baby didn't write and see Isn't the right if we look at the data from here A, B, B, B and C A and we're getting C and B B in this one as well. I hope you got a very good understanding on merging operations. So moving forward, that's struggled. Concatenation Now concatenation is interchangeably referred as binding or stacking. Answer. This operation basically blows together data frames. It's important to remember that I mentioned should match along the axis. We are confronting it in all we can use p d dot com got and pass him a list of data frames to concoct any together. So let's create to date of frames DF one and the F two to learn more about concatenation the F one equal to P. D. I thought frame and let's pass in dictionary, but key A and list off strings is you. Do you have one it to and, uh in three So and other key be and a list well B zero, B one and B three, and the 3rd 1 is C as a key and list C zero. See you one. See to C three and another one. Say the as a key and list the zero New York be to and e T. That's passing index here as well and say our index is a list 0123 just because we have a zero B zero season or D 01 tooth and three So we have index 0123 left. Let's around this self and see how the data from the F one looks like. So here is our data from the F one. Let's create an other data frame with the copy this one based IQ and change it to the F two and change these 24 five, six. Soon. Soon six. This is six s room five tour four by six in seven. Now for D s. Well, the four be five. Do you six in the seven and let's call this index to four by six and seven left from this cell and see how the data from the F two looks like. So So this is our data from the F one from index 0 to 3. And for our data from DF two, we have index from 4 to 7. Let's get getting in these two data frames PD dog contact and pass in the list Off data frames, TF, one in the to. And if we press controlled, we see the default X is zero, which is wrong. So let's run this cell. Here we have we have to do the frames stacked together with 0123 This is our the F one and this one waas DF two and both are can coordinated along access zero. And this is our result and data frame so we can concoct any data frames along the columns aspirin. In that case, we have to pass in X is equal to one. Next copy this one. Paste it here and passing access equal to one. So let's run this. So here we have our to date of frames can get meat it along, access equal toe one along the columns. So in this case, we're getting lots of any in values because 0123 indexes doesn't exist for our data from India to were asked 4567 indexes doesn't exist for our day. Different DF one. So in this can capture nation we're getting any in values. This was all about merging and congratulations at the woman. I hope you got a very good understanding and better idea all the merging operation and can get emission waltz. If you don't have knowledge on SQL, you may need to revise this section. Just go through the election again and repeat all these exercises. See you in the next lecture where we will talk about group by operation would like. 15. S4: Pandas Essentials - Groupby: Hi guys. Welcome back. Next, learn another key and very useful concept in partners. Good Bye Group by is one of the most important and key functionality in Panda's. It allows us to group the data together, call aggregate functions and combined the results in three steps. Split, apply and combine before we move on to the hands on. Let's try to understand how this split, apply and combines walk. We will use a very simple data with different colors. In our example. True here were given with the data, which has a key column, and it did a call. The keys are A, B, C and D. The colors are chosen to make things clearer and easier to understand in the first step, which is split. The data contained in the pandas object, such as Siri's or did a frame, is split it into groups based on one arm or keys that we provide. In our example, we have Keys A, B, C and D. The data is split it based on keys and is represented with different colors. The spitting is performed on a particular axis. Often object, for example, they did a frame can be group on its rules, the axe is equal to zero or its columns were access equal to one. In this example, this splitting is along the rows. So in the next step, which is apply once the spreading is done, if function such as some I mean and standard deviation is applied to all groups independently, which produce a new value in the final step, which is combined, the results off the applied functions in apply step are combined into a resultant object. So all these three steps split, apply and combine applied we didn't group by function, and we get is resultant at the end. Let's more worn to the Jupiter notebook and learned with examples now. So I hope you got a good understanding. Off group buys steps split, apply and combine. Let's try to understand this with some examples now. So the first thing first we need to import Numb by Paris. Let's around this cell an important umpire and Panis as NP and PG. Let's create a dictionary and convert that into pound. Us did afraid I will go to the reference notebook and copy the cooled from reference notebook and let's based it here. So are our dictionary is data, which has store customer and sales as keys and list of stores. Walmart, Walmart, Costco, Costco, Target and Target. Customer names are unique. Jim, Joe, Me, Mark, Denise, Ray and Sam. Well, a sales on medical values. 1 52 100 some random values. So we're passing this dictionary to PD Dodge Data frame and converting that into a pandas did a frame, and after that, we are getting this did of him in output. Let's run this cell. So here we have our data frame. Let's group the data in our data frame based on stores using group by method. What we need to do. We need to grab the data friends the F access the group by matter, using dot operator group by and passing the column that you want to group our data on. So here it is. Let's run this So this court has generated a data frame group by object somewhere in the memory at location. This one. Let's give this object to some valuable by store and read on this. So now we have the data frame group I object stored in by storm. We can apply extricate functions on by store now, such as Mean max, standard deviation, some and so long. Let's school mean on by a store and around this cell. So now the kind of us will apply mean on number columns, which is sales. It ignores theme the non new medical columns automatically, and same is true if we apply standard deviation, maximum or other aggregate functions. Another thing. You can notice that the result in data frame with store as index and sales as column. So we can write this court in one line as well, such as the EFF, and then called Group by on DS store. Call him Bought Comb. I mean, let's call some here instead of me and from this court. So this is this some And if you want to grab, say, told it, we can cold Not well. We'll see and passing Todd and this real return only target in the output. So let's run this sense. So we have target and its sale is 550 which is the total sales off the target. We can perform whole lots of aggregation operations on by store object. Let's try some off them by store. It's a minimum mean So let's try Max here to see what are the Max Rail use. So the the maximum values and let's try standard deviation as well, So STD. So here we have the standard deviation, so we can call the instances in the column as well. In that case, we have to call code. And let's run this notice that for Costco, we have to customers and to number of sales for Target we have to customers and to number of sales. And same for Wal Mart. Another very useful function that we learned before is describe Let's call, describe on by store so describe. So this is the describe output, which gives a bunch of useful information. So just to present this data in another way, let's call transport on after the sky and run this. And so now we have Ah, our So now we have the same data in a different condition. Stores Costco, Target, Walmart Cone mean And so on. Max. We can call a column name for selected store to separate information with transport as well . So let's see if we want to coffee this one and nested here and we can pass in Costco here So the only thing we need to passing in square brackets Costco and run this center. So here we have the information for is selected store, which is cost score. So this was all about a group by function. I hope you got a very good understanding on group by as a quick overview, we created a dictionary and then do the free called group by on our data frame and store the object in by store and then called me and some other functions on by store object. And then we called describe used transport so that the data look in different orientation. Then we learned that we can grab information for a selected store as well, seeing the next lecture where we will learn some useful functions that we will be using in this course for data analysis did a realization and machine learning Good luck. 16. S4: Pandas Essentials - Useful Methods and Operations: So welcome back, guys. I hope you're enjoying this pandas essential section. We're learning lots of new concepts in this section. This is going to be the last section in our panel's essentials. After this, we will move on and do some practice exercises in this section. We're going to explore some very useful matters and operations. There are lots of options available in Para Nurse to explore and get the basic statistics on our data. We have already covered some off them, such as had is no drop and a Phil and a and so on. In this lecture, we will explore some more general purpose operations and devise what we have learned in the previous lectures. Let's get a data frame to get hands on experience on these operations. I will repeat some values and also Djinnit and in our data offering. So just to move on, let's copy discord and run so that we can save some typing the issue too and well, on this one, what we have done, we have imported numb pie and planned us. We have created a data dictionary here. We'd call him one. Some numbers call him to again numerical values column three. The Alfa Bravo, Charlie and some an Air. Then we have created data frame from this dictionary and passed the index 1234 and five. And this is our dude offering. Let's start with what we know we have learned about in Four in four provides a concise summary off Did offering. We will use dysfunction very often in the course. Let's call in full our data frame, dear. So here we have we have column one. Call him two column, 35 Number of values not now and the type off the value. So it tells us five entries. We have 125 and it concise summary on the data the 2nd 1 But we learned Waas had. So let's call Doug, so we know if we don't pass in any number here, the default value is five. So let's pass in, say, to hair. So the head is displaying one and two. First two rules off our data. This try is no. We already know what is no. Thus so is not return a Boolean same size object, indicating if the values are now, let's run this thing and see so false, false false and wherever it gets none. It returns. True. The next one I would like to use here is drop and a and let's say, the F dot drop and a and if you remember, we have to pass in access default. Zero. If you don't pass any access, it will consider zero as default access for columns we have to Passing X is equal to one. Let's pass in X is equal to zero, and passing X is equal to one and put them in print statements so that we can get both outputs at the same time. Tool. Let's run this cell. So this output is from the first statement, and this output is from the second statement when access zero it's dropping the rules that contained an end. And when access is one, it's dropping the column, which contains and an r null values. So the next one let's try fill in May the doctor feel and a and we know we can pass in value. Here we can use my third Let's passing some value here, See X y Z Onda. Uh, that's from this cell. So we have wherever it got in null value. It's filling with the given Value X Y Z, we can do forward filling, backfilling and so many other options available in Still and a Let's Try F Phil or forward Phil. So instead of passing value here, we need to pass in matter method, which is, But we want to know, if so, forward feeling, Let's run this court. So now we have these two are en, en or no values. And the forward feeling is filling these non values with Charlie, which is the previous family. So the next very useful matter that we will use in the coming lectures during data analysis , visualizations and machine learning some exploratory data analysis is unique. So if we want to know how many unique values are there in column one or column two or column tree, let's try unique on all these three columns, one by one. What we need to do. We need to grab the column one by one, called one and Cold. Unique. Sure, let's from this, and we have 1234 and 51234 and five. We have five unique values in column one. Let's call this one on column to copy and paste and run. This one would call him, too. So we have 111 to 2 to 3 35 55 four. Unique values in column two. Notice that 111 appeared two times in the column, too, but in unique it's on. Lee appeared once because we want to know these unique values. Let's apply the same one own column three, which is String Column So it wasn't strings astral. Let's run this court and we have El Far problem. Charlie and the End. Once again notice that N A. M is twice in our This is our original data frame. So any and appear to times in column three. However, it only appeared once in the unique operation. Rather than displaying values, we can display the Count Khomeni that is an unique rather than calling unique on each column. Let's try and unique, so unique and and unique only need to add end at the big nick off unique. And if we run this thing, we're getting five. So we have five unique values in our column. C one we can call this own and other column called to Dot and Unique and C home. Any number off unique values are there In column two. If we don't this think we get four. So in the column to we have 123 and four and and unique is returning the count and other very useful function that we will use quite often is value count. So what if we want a table with all the values along with a number of times? They appeared in our data. Well, you can't do the work for us. Notice that for any and it count a missing value and doesn't do anything in the output. So let's apply value counts on call of one using dot operator value conchs and let's run this cell. So here we have 51 for 131 to one and want one. So if we look at this one, five appeared ones. Four appeared ones. Three appeared bonds. So it is telling us these values appeared on Lee once in our call. What? Let's try the same value counts operation on our second column copy and paste and changes to to and run this cell here. So now we can see 111 actually appeared to times in our data frame and value account is telling us 111 exist two times in column. Two, however 22 is one treaty appeared once and 55 appeared once. So once again, I want to mention that unique and unique value accounts are three very useful and frequently used matters which are associated with finding unique values in the day. The next I want to introduce is sort values. So let's call the F dot sort values. And if we press shift tab, we have by axis ascending true, which is default in place fault, which is default value for permanent change. We have to pass in true. So moving forward let's passing by and walk with all default values other than by. So we want to sort over data by seeing phone to. And if we done this cell here we have 14 to 85. So according to call him 2111111 than to 2 to 3 35 55 So once again, if we explore the documents string, the ascending is true. This is default, so the results are in descending order. So moving forward, let's talk about data selection. Now we have learned to grab data in our previous lectures. We know we can grab a column with its name, though the conditional selection and much more. We can use L L C and I loc Toe find Rose Astra. Let's realize the conditional selection here. So what? We did the half and let's apply condition on column one and see better than two is our condition. So it's returning falls where the condition is not satisfied. However, wherever the condition is satisfied, it is returning true. So this is actually a 1,000,000,000 mask. We have already used such mask toe filter, all the data. Let's try to create a 1,000,000,000 mask with two conditions on Column one and call him too say on column one. We want this condition and let's put it into the mantises and the second condition we want on the F see call to and we want devalues. Which one? 11 So these are our two conditions. That's passing this to a variable bull s e r. And run this so if we print this, too as they are, we have false, false, false true falls based on these two conditions so we can apply this bull, see to feel throughout the data from our data frame. So what we did, if you remember, we need to pass in this mask in square brackets and that's run this thing. So here we have 44111 and it's returning the It's filtering out the values according to the condition provided and returning where it is true, only which is four. So moving forward. Let's loan and other useful method apply. Indeed, apply is one of the most powerful miners feature what we can do with apply, we can broadcast of a customized function on our data. Let's see whole weaken broadcast our customers function on or data we have already learned apply in the previous lectures. And this is this is a revision, and I hope you remember the previous lecture as well. So not scared a function the FC Square and pass in apparent mitral value. Hollande and the functions of tones Will you square so naturally the self. So we have a function square. Let's broadcast our customize function square using apply mattered the culprit. Square off Any column in our data frame. Say we want to calculate the scare off color one in our data, friend. So what we need to do? We need to crap the column. One from our do different and using Dr Operator called Apply. And then we need to pass in. We need to pass in function, so our function is square. Copy this front. Austria, what we're doing, we're grabbing the column. One from our data frame and using apply mattered to apply our customized function Square on the column in our data from Let's Understand. So here we have the scare off the column one using apply 12468 and 10. So the same operation can be conveniently carried out using state of the art limbed expression. So rather than defining a function and then applying using apply matter, we can conveniently carry out this operation using state of the art Lambda expressions. Let's try Lambda expressions instead. Copy this one based it here instead of square. We want to apply lemper expression. So what is this in text for Linda expression It started keyword Lemba l eight MP. The it and then will you and what we want to do with really will you square So instead of this function, we're only using lender expression and passing this to apply Mr Let's Run this Thing. So here we have the same results. We know we can use building functions with apply aspirin like we want to find the length of the string in a column. So we have called him, uh, this three column where we have strings. That's let's find out the length of the strings in column three. So the F grab the collar. Three. She went three and let's select 0 to 3 for three value are not know and apply early and which is built in function. Let's run this court here so we have in the first rule we have five. It's true. England 2nd 5 A salmon stream land So we're winding nn on null values for column three. Here, let's try to believe this thing and apply. So here we have the ah type error, so we have object of type. Float has no land type error. This is why we selected from 0 to 3. So it means the end end has a float type, and that doesn't have length. Let's check the type of end and now and B. Dawg. So this is floor type. So just to a wide this type air, we passed in 0 to 3 here and here we have the same old food again. So a couple of more things to know we can get the index off over data frame using the F DOT index. So our index is 1234 and five. If we can get the columns like the F Dark columns, and we have called one called to called three if you want to drop any column, we need to call the F dot drop. And if you remember, we need to pass in column name and then tell the drop the axe is a swell. If we run this thing, it's going to reason error because it does not find Call one along Axis CEO. So label called one not contained the axis. We need to tell that the exes is one, and if we don't this court again, it's dropping the column one and returning behest of the two columns in our data frame. If we output data from again, the column one is still there To get permanent change, we need to pass in in place. True. So we need to run this cell. And now if we run our data for him, we have column to and column three because it is permanent change for in place, equal to true. So we have deleted the column, called one Using in Place. True. Let's get that calling back into our data frame. We can simply run the sell well, we created the data frame, Let's go back. And, uh, so this is where we create the day different. Let's read on this cell. So now we have our data frame into its original state. We call, call one, call two and call three. Actually, I need all these three columns in the next example. So let's move one. So moving forward and other very useful matter is people table what pure table does. It creates a spreadsheet style people table as a digital friends. If you want the f dot people tables and leave its documentation as well, so create a spreadsheet style people table as a data frame. The levels in the pivot table will be stored in multi index objects, which is high radical in Texas on the index and columns off. The result did a frame, so we have some permitted that we can pass in. And this is a very common example, which is given the date off in this document stream. If you want to know, you can read this document saying, Let's apply this people table on our data frame So the f dot pivot table what we want. We want to pass in values say our values are called to and we want calling one as index. And then we want, uh, columns, as called three. So what it will do. It will take the values from column to set the index from column one and set the columns from column three. So L for probable Charlie would be our column. Names and values would be calling to and Index will be our column one. So let's run this cell and see how the output looked like. And here we have. We have values from column two, and wherever it didn't find the value it is giving, it is putting an end index from column one and column tree is our Alpha Bravo Charlie so any and appeared for missing data and end in column three will not be used for the column. Names in the pivot table. This is why it is skipping Index four and five. Because for index four and five, we have any end in the column. Three. Let's look at another example. This example. I will copy the gold from the reference book again. So here we have a data. A full, full, full bar bar bar. Be 1122 and 11 c and D and reallocating data frame full bar from this diction. Let's run this thing and print this food war to see how the state of frame looks like. So we have a data frame. Full bar A, B, C, D or columns is full. Full bar bar the is 112 to see accuracy and the cinematic column. Let's create a pivot table for food bar. So full bar, not P. What table and we won't values. Let's say the Richard numerical values. So we want values as de and we want multi level index and let's pass in index e A and B so index equal to you need to pass in the list here be and then we want our columns as see So let's pass in columns. Equal toe, See? So what we want to do We want this d call them to appear as values in our people table. We want Multilevel index A and B E column and be column. He has Level one index PS level to index and we want columns from C. So let's run this cell and see how the data looks like after a pivot table. So here we have It is our bar and full B is our 1212 And it is getting values from three and wherever. It didn't get the value it is putting any and and columns are X Y x Y x y. These are our columns. So this was all about useful matters and operations independence. So we have done with pandas essentials. I know it was extensive learning for you guys, but I hope you have enjoyed this journey with me. We have learned lots off key concepts in pounders essentials, and we're going to apply these concepts in becoming lectures. See you in the next lecture. Good luck 17. S4: Pandas Essentials - Project 1 (Overview) Customer Purchases Data: Hi, guys. Welcome to the E commerce purchases exercise over you after a crash course and pound us for data analysis. It's time to do some practice because of privacy issues. I have created fake did I said here with 30,000 entries, the situation is customers are providing some personal information while purchasing stuff online or in store. For some reasons, your client wants to know the answers to some of his questions from the data set. Let's try to help him in his situation, so feel free to consult the solutions if needed. However, it's very important to try yourself first. The task given in the exercises can be sold in different ways. Try your best answer and compare the D solutions. So let's move on and have a quick overview on the questions your client is asking. So these 1st 2 are just for python fun. It simply asking for which director you are in and ah looked through the files in your working directory and display their name. You can run and see how this thing walk, so the first question is asking for Please Lord. The data set in variable cost data file name is cost purchase. Faith data about CS three. So this file is given. You have to lord this file in cost using pound us. The second question is, it's good idea to see how the data looked like display 1st 5 rows of your data set. You have to out with this. You have to display 1st 5 rules of your day, does it? No. In the third question, he's asking for how many entries your data have. Can you tell number of customers in your data set? So the answer are out. Foot is given. You have to get the same output. The next one is what are the maximum minimum ages off your customers. Can you find me off your customer? So the answer is again given you have to calculate the max mean and mean and display them in the output. Question five is what are the tea? Most common customers name. So it should be very simple to get these three values in the output using value card. So the next one in six days to customers have the same phone number. Can you find those customers So actually, you have to find this first line on Lee. So I just displayed too apart for the same question is now we know the four number. Let's find out the other stuff. So here is displaying all the information related to the person who has the same phone number. So in Kristiansand, when he's asking for how many customers have profession structural in Jinya? So you have to explore the profession of the customers to answer this question so question it is how many male customers are structured unions. So actually, you have to apply two conditions. One is structural engineer and second is male. What of these conditions must be satisfied to get your answer so question nine. Find out the female structure, ingenious from province. Good barter. So now female and structural ingenious and Alberta problems. These three conditions should be satisfied. So moving forward. Question 10. What is the max mean and every spending? So you have to find all the spending off your customers. How much is the minimum spending, which is already zero maximum spending, and the average spending should be easy to use. Max mean and average or mean functions Here. Question 11 is who did not spend anything. Company wants to send a deal to encourage the customer to buy stuff. So once again, you have to consider the spending column and see if there's anyone who has not spent anything so that the company then ah, plan the marketing strategy for such customers. Kristen Trump, as a loyalty reward company, wants to send Thanks Cooper toe those who spend 100 Canadian dollars or more. Please find out the customers. So once again the condition is you have to find out the customer who has spent more than 100 Canadian dollar or equal 200 Canadian dollars. It should be again very simple. Try your best to solve this as well. So question 13 Homan emails are associated with this credit card number so the current card number is given. You have to find out hoe. Many e mails are associated with this number. The two emails here are associated with this credit card number. Crescent Footing. We need to send new cars to the customers well before the expiry date. How many cards are expiring in 2019? So you need to find out the credit cards were expiring in 2019 so try to solve this. Get this number. If you don't get this number, you can always consult the solution. So question. 15. How many people use how many customers use Visa as their credit card provider? So you have to find out the customers were using Visa Credit Card. You have to find you have to put the condition and type of credit card again. Can you find the customer who spend 100 Canadian dollar using visa? So this is the output you need to get this output and find out the customer who's spending 100 Canadian dollar using visa so the condition is spending 100. Canadian dollar and card is visa, so these are two conditions must be satisfied. 17. What our two most common professions. You need to find out the most common profession who the 1st 1 is preschool teacher, and the 2nd 1 is distribution manager Christian. 18. Can you tell the top five most popular email providers? So you have to find out what email people are using most frequently. So the commonly used one, his team male than me than outlook than life and hartman dot com. So you have to find out these, so you have to get this in the output. Question 19. Is there any customer who is using e mail with a m dot e d u. So you have to apply the condition. Onda, find out if anyone is using a m dot edu And yes, one customer is using a m, not edu Hint is used Lambda expression in apply not split the email address at that. I hope it should be easier for you as well. I If you remember, we have ah done such task in the previous lectures Question 20 years, which is the last one which day after the the store gets more customers. So he's actually asking for only one day on which is getting more customers. But we just displayed 12345 just for comparison. So you have to find out the day at which the story is getting the most customers. So good luck, guys. Try your best to answer these questions and here in the next lecture where we will solve these tests 18. S4: Pandas Essentials - Project 1 (Solutions) Customer Purchases Data: So welcome back, guys in the e commerce purchases exercise solution lecture. I'm sure that you have already answered all the tasks but said, Let's go through in this lecture on the task and solve them one by one. As I already said, the 1st 2 cells, you can run them and see all the old would look like. So the first thing first we have to import pandas. So let's run this self. And the first question is please, Lord, the data set in variable cost data file Name is cost pushes fake. Did a daughter CSP So chord here C u S t equal toe? We know we can read this file b d dot and we can use tap toe auto complete Zied dot CS we and the file name is cost. If you want, you can press tab and see which file you are going to, Lord. So cost purchase fake data set. So we're going to look this five So let's run this set so the file is loaded. Now the next question is it's good idea to see how they didn't look like display. 1st 5 rules off your data set. If you remember we learned about few functions and one off them. Waas had so to display five rows we can use that had function on our data sets. So our data set is cost called it functions had owned. That does it. And if you press shifter, you will see default and equal to five. We don't need to pass in any value for end. So let's around this self here we have the output, which is same as he's asking for moving forward. How many entries your data have. Can you tell the number of columns in your data? So once again, if you remember, we learned about in full function as well. So in this case, we can simply call cost dot in four. And if we don't, this cell, we have the same outward Toho many entries, so it has 30,000 entries. Can you tell the number of columns in your data? So the number of columns are These are the number of columns which is total 20 columns moving forward on question four. What are the Max and Min ages so maximum and minimum aged off your customer? Can you find mean off your customers? Actually, it should be a chair and you So if you go back and see the data said, we have prefix first last email, gender and age caller. So we have to get the age column and use max mean and mean on that age. Call one more thing important. Note. This out foot is using print statements, so we have to use print statement here. Print and ah, let's Poppy this one so that we don't need to relight. And here it is and what he's asking cost is overdue. Dresser. We need to grab the column age and called Max. Let's copy this one base hit here, and we can face one more time. So this is Max ish. So this is actually mean age for the customer is this is, I mean and then average age for the customer. So let's copy and paste it here, and this is average, Which is me. So what we're doing, we're grabbing the age column from our custody data set and calling Max mean and me we need to per we need to put Prentice's hair astral, so here it is. So let's run this cell here. So here we have the Max age for the customer. 65 minimum 18 and average the same number. So this is minimum. Actually, this should be mean. This should be minimum as well. So this the type of our So the next question is asking for what are the three most common customers name? So we're going to, uh, display the most common names. Let's cold. Let's grab the name calling C U S T. And if we go back and check, we have prefix fost and lost. He's asking for only name. So let's grab the first name. So here we have first and then we can call value colds on DA. Then he's asking far 1st 3 so dot hud. The Ford is five, so we need to pass in three here. So let's run this self. So here we have three common names in our day. Does it? So what we're doing? Once again, we're grabbing the first name column from the data set and calling value count and displaying the 1st 3 because we know the value count is descending order, so it's a higher to lower. So moving forward to customers have the same phone number. Can you find those customers. So if we look at the data for him again, we have a full column here. Aspirin, some there. We have a phone. Call him here as well. This is the phone call, so we need to grab food column here. So let's grab the form column cost. This is similar court as this one. Let's copy this one because instead of first, we're grabbing phone call in there so fast Food value counts calling on our customers Food column and then let's display to because he is displaying to and run this. So here we have the this phone number is repeated in the data set. Now he's asking for we know the phone number. Let's find out the other stuff. So let's find out about the customer who has the same phone numbers. So this is again a condition we need to apply on form column. So gust four. And what is phone number? This is full passing the string former and then passed in to the data frame. So we are actually filtering out based on this condition. So this is kind off our mask and we're passing to the data frame. Let's run this So here we have Mrs Lily and Mrs Peter 8 38 27 They have the same phone number moving forward. How many customers have profession? Structural. Ingenious. Now we have to grab the customers profession. Call them here. So let's grab the professions. See you S T is overdue to sit profession, pr rule profession and which professional we want structural Ingenious. Let's call this one. And peace to tear sand. This is our condition and based from this condition, let's pass into the cost. If we run this self, we're going to get kind of all these people, all these customers who are structural, ingenious, but we don't want them the But we don't need this whole information. We only need to know how many. So to get how many we need to call count on this one. So here we have it is 7 87 indecently this is the This is the output is asking for in principle. He's asking for only one number. So you can after count, you can call on any column, say, last our first or any column because they all are 87 so you can grab any column prefix first last email gender it and this is actually 87. So this is actually it is seven customers are structural engineers in the data set. How many male customers are structured? Ingenious. Now the same thing he's asking for to grab on Lee males. It's coffee and based here and now other condition is This is our first condition, and our second condition is they are man So Gender column. We need to grab gender and gender is male so we can call on Lee. Count here because we need the same out foot. So what we're doing, we're grabbing the customers. Profession column Taking profession a structural engineer and then gender as male and is the we are applying and and these both conditions must be satisfied. And then we're calling count afterwards. Let's run this self. So here we have the same output. So moving forward, find out the female structural ingenious from province Alberta. So now the same condition is actually for female and one additional condition their problems. So here we have to earn an other end reaches instead of gender. We have ah province that's copied this province based here. And what is the province. It be So what we're doing. We're applying three conditions and they all must be satisfied. Profession should be structural engineer. Gender should be female and the province should be barred. So he's not asking for calm. He's asking for the whole information so strong our data that let's run this So these are the customers who are female and ages and there profession and the structure in junior. And they belong to embark, moving forward. What is the max mean and every spending? So now we have to grab the column price. If we look at this data frame, we have this price in Canadian dollars, which is actually the spending. So let's go have this spending caller so cost prize. I see a d and we're going to call Max. Let's run and see how it works. So here we have 100. So we have. So we have to get these three in the output weekend use and this is print again. So print copy this one. Here's to tail. Mm. Max, it's copy this line off court Pasty tear twice. This is I mean and average. So call it. Men and fear call. Mean So what we're doing. We're grabbing the price column from our data set, calling Max mean And me. Let's run this symbol. So here we have the same output going forward who did not spend anything compliments to send a deal, to encourage the customer to buy stuff. Let's try to find out the person who has not spent anything so anything means he had. His spending is zero. So we have to have the column spending again. Let's copy this one here. So grab the column. Spending color, which is actually price in Canadian dollar. And what is the condition? He has not spent anything. And this is our masking condition, our filtering condition. Let's person to the cost, Dada said. So once again, this is our condition because SMERSH who has not spent anything and we're passing this masking condition to cost. So let's run this cell. So here we have 5320 10 5975320 10 59 as a loyalty reward company wants to send thanks. Coupon to those customers who are still spending is 100 more. Once again, the columns same. We have to grab the spending column and instead of equal 20 We need to apply the condition where the spending is greater or equal to hindered. So what we're doing, we're passing the condition as a masking condition. Where the price is with the spending is greater or equal 200 letter on the self. So here we have 76 to 1093210 1976 to 4385 So we're getting the same output here. Moving forward How many e mails associated with this current car? So this is the credit card number we need to find out Home and e mails are associated with this cleric. Are if we look at the data frame, we have C. C underscored. Nor so this is so this is credit card number. Let's grab this column now because we need to find Dick credit card number. So Khost Sisi underscore. No, and credit card is equal. Do let's corporate this number. So this is the number, so we don't need to passing in the tradition marks because this is a number. So let's pass this to cost. So this is our condition where the credit card is this one and we're passing this condition or mask condition to lead it off for him. Cost. Let's around this. So here we have a person who has the same credit card number. So this is kind of doctor every and Miss Claudia. So he's asking for email on Lee, so we need to grab that e mails. So the email is so the email is this column here, if we don't descend. So here we have the Open Christian 14 we need to send new cars to the customers well before the expiry. How many cars that expiring in 2019? We need to. We need to apply Lambda Function here and once again, let's let's split it into kind of steps. The first thing is, if you look at the data frame, we have ah expiring. So if you look at the credit card expiry, it is 04 slash 201803 slash 20 to 4 07 slash 2019 So if he um if we get the index location for last two So 012345 and 65 and six. So we have to find out these last two. So we have to grab these last two digits from C C. Underscore expired. So that's final. It's little tricky Cursed, So C C E x p. This is our caller we need to apply. And we know what is the lamp us In text l a MBT lambda passing X and our X is this column here which is actually 03 slash 20 to 4 this type of former So what? We want to grap five and onwards. So what? We want to grab the last two, which is 52 onward and only 19. And here it is. Let's get in a lot of you again. We're grabbing the credit card expiring. Call him from our data set and applying a London expression, getting credit card expiring and checking the index value from five toe onward. Because we know these two are from five to onward. If they are equal to 90 and, uh then let's run this thing. And then after that, we'll see what to do. So here we have. We're getting our mast Falls, falls, falls true, True, True. So this one is 19 expiring in 2019. So we don't need these false and true. So let's put this condition to some. And here we have. So these are the card. 2684 cars are expiring in 19 so moving forward how many people use Visa as their credit card provider? So once again, let's check the head off over data from again. August Dog Justice e headed car for wider So C C number, Sisi Expiry and CC type. So Lovisa Switch, Miss True and all these type off credit cards. Let's call unique on credit card type not, Let's see. So here we have so move either switch its list of all the cards that the day does it have MasterCard Ah, in stop payment, Ex American Express and so on. So we only want. These are so the condition is equal to Liza, so we only want the customers who are using visa. So let's call some on this thing because we want the total number of customers who are using Visa as the credit card provider. Let's run this and here we have 1721 customers are using Visa as a credit card provider going forward can you find the customer who spent 100 Canadian dollar using visa? So here we have to consider two conditions. The one is Visa and the second is 100 Canadian dollars. So one is 30 card type, and the second is price Column. So let's grab both columns so cc type equal to visa and bust price to see a d you called to is asking for 100. So these are two conditions. Let's put them in currencies. So these both of these conditions must be satisfied. And let's passing to cost. So just an overview again. This is our first condition. The card type is visa. Our say of a second condition is customer spending 100 Canadian dollar and both must be satisfied. So this is our mask and we're passing the masks to cost. Let's run this. So here we have 76 76 last name Braun is 31 and Liza and spending his 100 so visa and spending his 100. Next question is asking for in 17 what our two most common professions we have to find out to most common professions. I hope you should be able to figure it out simply using video cards and displaying the talk to show. You need to grab the column, which is profession. So we have a profession. Caller. Uh, here we have a profession column and then call Will you Kong's And then? And if we don't this, we have preschool teacher distribution manager. So a list of all the professions we only want to. So let's call head and passing to here. So here we have preschool teacher and distribution manager Christian, 18. Can you tell the top five most popular email providers, for example, gmail dot com yahoo dot com. So once again, we have to use lemper expression. If you remember, we can use Split because the evil is some name at gmail dot com, so we can use split at at Symbol and then find out the top five email providers. So what we need to do we need to grab the customers e mail corner and then call. Apply Linda Expression. Passing the email Tulia Bill C X here and then call split if it x were we want to split at that symbol. If you remember, we had like, if we have a name at gmail dot com, if you remember we had something like this type of email, and when we applied ah dot split, we got in the output the name and de email dot com. So this was our out foot just to recall. So we're splitting at the list will be name and gmail dot com. So this one is index zero, and this one is indexed one. We want index one because we are grabbing gmail dot com or yahoo dot com. We're trying to find out the most popular email addresses, so we're splitting the email at and I think at Index one, the email provider. So let's call here index one ass with. So let's delete this because this was just for understanding. And then now what we did so far we grabbed the customers email caller, applied the limber expression and doing display it at a symbol and grabbing the email provider. We want the first five. So here we have to use value Cox, and this is market. Is this is that's on the set. So we have G mill me dot com Outlook live Hotmail, Yahoo aside a T and K I. That s the and so on. We only want top five. So that's called Head. We don't need to passing anything because it's defaulted. Five. So here we have to see him. Oh, so the next one is Is there any customer who is using email with a m dot edu? This is the similar question, actually. So we have to grab the email and then split at at symbol and then grab the index one location and compare if anyone is a m dot edu. You should be very simple. Let's copy this one here. Still this thing here and s detail. We're grabbing email, applying Linda expression, splitting at at symbol, grabbing the first index and then comparing if anyone has a m dot e d u. So let's pass in. So this is our mask that's pass into first. Let's run this, sir. So here we have 151 150. Let's check the email address. So this is email it m dot edu you so we are supposed to find a embodied you. So the next question is which they awfully vehicle. The store gets more customers. It should be pretty simple so far because you learned how to use value court you simply need to apply value counts on a weekly because we want to know the weekly action. So that's got Khost leaked there. So this is the column. We need to grab, call, read, you count. And we need to Part of this gorge should go here. So let's run this cell. So we have sexually Wednesday, Thursday, Friday. So Saturday is the, Ah, busiest day. So he's asking for 12345 So let's call her dad. Just look at the same output. So here we have the same output now. So this was all about the exercise solutions. I hope you got a very good understanding on paradise so far. See you in the next lecture. Good luck. 19. S4: Pandas Essentials - Project 2 (Overview) Chicago Payroll Data: Hi, guys. This is going to be your second project to do pandas exercise. You have made a great progress. So I should say accident Shaw. So this data set is a real does it, which is payroll data off Chicago City. So the data set is available tag a website and can be downloaded using the provided link. If you press this, does it here, you can download from the Calgary upside. However, a copy of this data set is already provided in the course material. So just a quick overview before moving forward. Just for you, guy. These a few similar data sets if you want to do more practice like city payroll information for all Los Angeles employee since 2013. This one is for San Francisco, and this one is for New York City. So you can explore these data sets as well, which are quite similar to she called The Payroll Day. Does it? So just over you, He's asking for import panners. Then read the data set and ah, here he is asking for the heart after data said so questions are given overview on the data set like in four command here and finding the null values. So you have to go through the data set and answer all these questions. So I hope this is going to be quite easy for you guys now. So, like here, he's asking for which department has maximum number of employees. So you have to find out the maximum number of employees you have to find out the same output here. So again, how many employees are on salary and how many are overly in the police department? Once again, you have to find out the person's. The employees were getting salary, and we're getting overly paid. So what are the mean Max and minimum salaries you need to call me and Max and minimum functions. This is a hint where you can explore the stack over fooling to get help if you need in question Element he's asking for Find an employee who has maximum satellite. Do not use the maximum number you got about. And he's also asking for try using I. D. X max as well. So if you want, you can go to the I. D. X max and explore what the I. D. X Max is far. So so try solving without I d X max first and then you can use idee X max Astra Invest in 12 Find employees who has the minimum. Sadly so, he's explicitly asking for use i d X men. So moving forward once again what are the mean mats and mean our lead it. So this is a similar question. But we had ah, some of here and then moving forward, you need to display this out foot and ah, the answer is given. Sorry, I remember to delayed this one. And you know how many employees are getting Max overly Salvi who is getting maximum hourly rate and so on. I hope these are going to be very simple tasks for you after this much practice. So the last one, how many people have the world officer in their job title? This is pretty tricky. So So I have already provided the answer to this thing So you can define the function, find string and find the officer in back string in the title, and then apply using and then used Lember expression in apply to get the number of people who have officer in their tighter in their job title. So so I hope it should be very simple now. Or at least you can try to solve all these problems. If you feel like you need some help, you can always explore the solution notebook. In any case, we're going to solve these tasks in the next elections here in the next lecture. Good luck. 20. S4: Pandas Essentials - Project 2 (Solutions Part 1) Chicago Payroll Data: So welcome back, guys to the Chicago Payroll Data Exercise Solution Lecture. Now I hope you have already addressed all the questions and tasks according to the instructions which are given in the notebook. Let's move on and go through the possible solutions in this lecture. So the first thing is, import planners as PD should be very simple. Import. Find us on Speedy. Let's run discourse. Or here it is. So question, too, is Reed City off Chicago payroll data dot C s three. Let's copy this one and read this one in pay equal to be the door read. So we have C s we find Let's pass in the name. So so pay equal to P d dark treed CSP and this is defined less on yourself. Show 1st 5 records Once again, this should be very simple. You need to call head and had is default five. So here we have the 1st 5 records we have name, job, title department, full or part time salary. Typical hours, and we'll salary over literate. So moving forward gets the overview of the data said I hope you still remember you can get in four. Ah, so be dot in for so it should be pretty simple as well. Here we have 32,658 and trees total eight columns and ah, 32,658. No, no nominal and some They're here. 7883. No, now. And they are subnormal values and there are some null values in hours. And will salary hourly rates going forward? How many end end or none values you have in each column in your data should be simple. So so, if not is useful here. Let's call is no on our data frame. So let's let's run this score here. We have false, false, false, true, true. True. So this is true. True, True false false phones. Each column is telling Wichman is false and which one is true? So we're getting some false and truth in last three columns. So what we need to do, we I want to know how many this is a full did offering, which is long like daughter thought, continue to up till end. So let's call some on this run so that we can get the total number. So here we have name this normal job tighter zero salary and hourly. Zero. So last three columns we have 24 7757888 moral values and this is 24,775. So moving forward in Question five, he's asking for output. This statistics for your data set. If you remember, we have a function describe that's called be door. Describe and let's from this sir, So we have only one column in the outward. Let's check its documentation shift Contact Roll and we see include is none, which is default value, and none mean the result will include all numerical columns. So we want to include all columns, including Numerical and non America. To do this, we have to pass an include equal toe all. So let's pass in include equal to all and run this self. So here we have the same output. We get all the columns in the output now, so let's move on to the next room. So the best in six is what are the maximum minimum average? Typical hours whose typical over columns let's have the call pier and the column is typical hours and porn Max and Let's run this court both. So we have 40 here. This is the answer we want is asking to get all these three. So pass into the train statement and let's copy this one. And let's coffee again and this is many more, will you? I mean, bvg average so called minimum here and mean here. So let's underscore for us. So we have maximum Typical our 40 turn 34.67 So we got the same answer. Let's talk about this entire being typical over dot drop and the dot Me So we did not use this drop and a in our court here, if you remember by default and as drop all and a values all null values. So if you don't drop no values, hand us still, don't consider those null values while calculating me. You can try discord as well, and you will see you are going to get the same answer. Let's put it in the print and copy paste here. If you see you get your getting the same answer. Yeah, Astra. So in the next question is asking for how many employees are on salary and how many are working on hourly basis. So in this case, the hint is group the data on a salary or hourly column. We know we can call Group by on Haller salary or Arbel debases. And if we run this cold here. So we're getting grouped by the data frame group by object at somewhere in the mammary at this place so we can call aggregate function. We need to call Comb because we want how many. So let's run this thing so we have the over the n. Sadly, the same output here, job title department, full or part time. So we have 7883 persons ah, on hourly basis and 24,775 on South. Next question is which department has maximum number of employees. This should be pretty simple. Now we need to grab the department if we can simply call radio calls and, uh, that from this court. So we have all the departments. We only want the highest. Let's call. I had to, because this is what we want in the outward. So we have the same number moving forward. How many employees are on salary and how many are on hourly in the police Department, so this is a little tricky. What he's asking far. First he's asking for Grab the call, Um, which is department? Get the persons ah, who are police and then grouped by on Sally or our and then determine the count. Let's try to grab this thing. What we need to do. See and grabbed the column department. Then what is the condition? Police. So the police is all capital. Let's put it in the brackets in passing to our did a friend pay. And then now what we did we grab the call of department and getting the condition where the department is police and then calling group by owned Sally or formerly column. So if we don't this think again, getting the buy object we can call it's harmony again. Let's call count here and run this self. So we have ah, overly Sally and all these columns about putting grip department here again. Little in the school. So here we have the same old but we did. We got the column. We grab the column department come and got the persons who have who have the Police Department group by on salary or hourly and then called the count on groups data and then grabbed the department again just to got the same. Over What question? 10. What are the mean backs and minimum salaries So the Hindi is used, not S T. R. And replace kid in new caller salaries and separate the number from the dollar sign as flawed. So if you follow the link, you can explore more on pandas dot cities darts dot str dot split and a practical example is given on stack Overflow link. So here, if you want, you can explore more. Let's try to create a new column here. What is the column? Pay salary and what we want to do? We want to grab and the column is and salary. So here we have annual salary, which is a common we move on here we have annual salary, and it has a dollar sign and number. So we're grabbing this column, creating a new column from this animal salary and our new column. Ease salary so heavy are, and then calling S t r. So what we want We want to replace dollar sign. So here we are. Let's replace this dollar sign and then convert using s tie food. So what we're doing, we're having the end will suddenly call him doing the replacement for dollar sign and converting that value to floor and then passing it to new column, which is salary. So we're creating new collar numbers are float on. So that's what they said every half. It's always good to check the data friend. So everywhere we have a new column. Sadly so. These are the salaries. No, we can pass now. We can call Max mean on salary column. So he's asking for meant see salary dot main lecture on this thing here. So regard the minimum Saturday. We can copy this one and yes, to tear Max. I mean, so this is going to be Max, and this is going to be me lecture on this thing. So we have minimum maximum and mean salaries here. So here he's asking again for the head after data. So is asking to display True Rose only. So here we have the same off put next question is, find an employee who has the maximum salary. Do not use the max number you got about, and he's asking for Tri vid i d x max. Let's try without I d X max first. Then we will explore how to use I d X max. So what is our condition? Be salary and what we want the day we want to do the generalized condition. So hey, Sally and equal to Max. So we want to grab thesis Alley were faysali dot max equal toe Depay salad. So what we want to do we want to grab the salary where the salary is maximum in the celery caller. So this is our condition. Let's pass into our different pay and run this thing. So here we have the same answer. So no, he's asking to use I d X max So let's try to see what his i d x man I D x max returns index off first, occurance off maximum over requested access. So access our default. Zero, which is index and one is collar. So we want to grab the index. Let's try using i d. X max. Let's put it hash in warranties. We want to grab the index so we have to use a little sea and then be on salary column and we can call I the X max. So what we're doing? We're calling loc on our data friend pay and grabbing the column. Sally and calling I d x max on Celtic Electoral This school. So let's go one to the next one. Find the employees who has the minimum. Sadly, now use I d. X men similar to i d X max, we have i d x mean option in apparent us as well resettled the index off first, occurance off minimum over requested axes. So once again, let's try to call this I d. X minimum at first place. What we can do similar to the previous score instead off Max No, we need to call. So we're grabbing the salary call calling. I deem in on that one the first our Chris, which is minimum and then grabbing the location off that. So here we have so job title department mayor's office minutes off the salaries Number 96. And so it's getting long. Let's split this lecture here soon. The next part of the same lecture. Good luck 21. S4: Pandas Essentials - Project 2 (Solutions Part 2) Chicago Payroll Data: Hi guys. Welcome back to the partners. Did analysis exercise solution show in the next one? Question 30. What are the mean? Max and minimum Overly dates use dot str and replace. So once again, he's asking for to use dot str and replace with the and the place dollars saying and create a new column. Each underscored it. Let's go back and copy the cold where we did. Here it is. So now we need to grab overly rate instead off salary and our column is going to be the new one. Be tricked. So what we're doing, we're creating a new column, driving overly rate, the placing dollar sign and converting that into floor. Let's run this thing here we have and ah no, he's asking for Max and mean and ah So let's grab this column and call minimum Here. Here we have didn't mean we can call print and I mean and then copy this one and this is going to be Max and we can call Max. I mean and mean and let's run descend. So we have minimum maximum and mean values here. So next he's asking for to check the head off your data frame. So here we have 01234 Didn't want 2345 Police fire Law file, please. Fire law and we have a new column. Hourly rate as there. So moving forward here? Yeah, this is we already did. Actually, This one 2.6 96. 32 2.6 96 and 32. So question 14 Homan employees are getting maximum overly rate. Now we need to find out the number of employees who are getting a maximum overly raid. We have created a new column each underscore frayed. So we have to grab the column, each underscore it and find the maximum value from that collars. So what we What is our condition? Pay he trade is equal to We need to do the we need to generalize this condition. So we have to grab the maximum great from that column. So let's call Dr Max. No, Max, we cant you we can't use 96 points. Usual, because what if at some stage, it changes from 960.9 96 point Gino? So we have to generalize this condition. This is why we're using pay and grabbing the overly late calling what we have created and getting maximum value. So whatever the value change in future, or any time this is going to walk for any situation with the maximum value is kind of 96 or anything beyond 96 months. So this is our condition. Let's pass in this condition to our data frame pay, so because he's asking for how many So we need to call come from this one and let's learn the state of him. So here we have name, job, title department, and so on. We got the same results. So, uh, the Harmony employees are getting maximum overly rate. So this is what he's asking for. Moving forward, who is getting maximum? Our leader did so let's find who is getting maximum all really. So once again, we have to grab the each thread calm. And what is our condition? Were p each great is Max. So this is what we want to grab a person who is getting maximum over leader sold guy who has maximum value in h underscore. Great column. So this is our condition. Let's pass into our data from pay and let's run this thing here so here we have dawns and department is held full time or part time. Half full time. Typical always 35 96.0 hours. Uh, 96 points is resident all of spot Our so moving forward. How many employees are earning less than the average overly rate? This is a little tricky. Like you have to find out the mean first and then do the comparison, uh, using mean So what he's asking for. We need to grab the a trick called and let's put the condition later on They a trade dot me . So this is our condition. We're grabbing a trade and the mean on the oval, it'd and the condition is the pair is less than demand, right? So this is the condition like in the if only they'd call him. The overly raid is less than the mean off that call. So this is the condition. Let's pass in this condition too. No more do different p. And once again, he's asking for home any. So we need to call counter here if we run. So we have. So he's asking for number only so we can call each braked and then here we have 2791. So in question 17 he's asking for home in. Employees are paid hourly and they have a full time job. So let's check the head off over data from first. So if we see hourly raid, it's either an end or some value. So anyone who is no, it means that guy or that person is not getting hourly rate. So one condition we have to know the fasting is the value should not be no, and the second is is part time or full time. Status should be equal toe half full time. Let's apply both conditions. Glad be, then we can grab any column. Eight read or I only did it. Let's get the original one. Omar Leader. What is our condition? Is not no. So anyone who is not know any value, which is not know it's understandable that that guy is getting, uh, overly rate. If it is no, the person is getting salary as we can see here as well. So this is our first condition, and our second condition is bay full time or part time is actually theft. So he is four time. Let's put these two condition into Prentice's and now these air to conditions. And let's pass in these two conditions to our data frame there. So this is our masking condition. So we're filtering based on this condition. So it's asking for home any we need to call count left front, this cell here. So here we have 5906 so we got the similar number. So now we can call only one column, which is, say, we can call flu time or part time again here because we ah, walking with this caller and less on this. And so here we have begin split it here. So we're calling count and displaying only one full time or part time in fascinating. He's asking for find the full time employees who are working at overly laid off 10. So now we are given a number for over the day, so we need to find out all those on employees were working with H eight equal to 10. That's passing this thing here so they once again we need to grab full time or part time because we need to find out full time. If so, this is our first condition. That's pretty Jim Francis. It's always good to separate them. It parents is our second condition is now. This is This is what we can. We can grab this column or we can grab this color. If we're grabbing each underscore eight, then we need to pass 10 points. You know, if they were being each over the day than we have to add dollar sign. Answer. So let's grab this overly later and keep it in your mind. This one is string, and this one is in the back, so we can say be overly raid. And what is our condition? Equal to ball of sign. 10 points you and this is our second condition. So that's person this masks to pay and see how it works. Let's learn this court here. So here we have it should be 10.0 because this is the low zero two years after ah dot So here we have. So we have one, 232 4522 and this number. So if you don't want, you can add if you want to use this column, so 10.0 so it's going to be the same. So you can grab. I already did. The original column are the one you have already created. So in the next question, how many unique job titles are there in the data? This should be pretty simple, actually. So you remember you have, ah, unique and and unique. So we need to grab job, died adults and call. That's called unique first. So we have 12345 and somehow we can call. And just to get that number like Daughter Dart, it's long list. So we have 1000 and 95 unique job titles. So the 20 it is, what was the average salary off the employees in each department? This should be pretty simple. We need to group the data first, based on the department by and department. If there are, we are getting the group by data frame group by object. So let's call mean on this thing. So here we have. But we only want salary. Call him So let's grabbed it, Celery. So here we have 7891278912 and 666 So demean salary of each department. So moving forward. So the next question is what is the job title off a girl? Poland. Be so please load there to special between a gr and Bullen. So let's try to grab this one. Okay, We need to grab name. This is the name off the person and what is his name? This is his name. And he's asking We have two species here. Let's add two spaces. So this is our condition to filter the data. Let's pass in this too. The did a frame. Okay. And let's from this course. So we have We only need to get the job title. So let's grab the job title only passing the shop title here. So here we have 2 30 Chief in junior. The next one is what are the top most common job titles. Once again, value count is going to walk here. So if you remember, I mentioned in the previous like just that we're going to use had value contra these common functions quite often you need all these kind of function, so explore more about the data. So the most common job title. So we need job title. This is Let's copy this from here. Job data if you call Bill, you count and let's run this here. So here this off. All the job titles, we only need five. Let's go ahead on this one, though. So here we have a police officer, firefighter, the empty Sergeant Poole, mortar truck driver and police officer. And so next question is, how many people have the world officers in their job titles or this already given this is little tricky and the answer is already given. So So we have to write a functions. They find thing passing the title. And then in that title, we are searching our officer. So if officer string officer in title daughter war so here dot lore we're dealing in upper and lower case is so we're converting if that is a party of committing everything to law. Little true, if not the else Ritter fault. So this function find string. We are calling in Lemba expression. So what we're doing, we're grabbing the job title using apply for Lember expression and calling find string. So we're passing acts job title to the function finds thing which is returning to our falls . Then if it is true, we're getting sums for all truths So we're getting elementos and modern and one person who has the officer in their job better. So this was all about Chicago payroll data exercises. And with this exercise, we're done with the planners data analysis section. So this was extensive section. We got a lot to learn in this section. We're going to use all the knowledge and skill set what we have achieved in this section in becoming did a realization and machine learning sections revised them once again. Go through all the exercises and try to follow the concept. If you need help, you can always ask in the group you can always write me, and I would be more than happy to answer all your questions. If you feel you need any help, write me seeing the next lecture where we are moving forward to the data realization now. Good luck 22. See you in the next class: