Data Science Essentials with Pandas and Python | Andrei Dumitrescu | Skillshare

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Data Science Essentials with Pandas and Python

teacher avatar Andrei Dumitrescu, DevOps Engineer and Professional Trainer

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

12 Lessons (1h 47m)
    • 1. Welcome

      1:36
    • 2. Intro to Jupyter Notebooks. Installing Jupyter Notebook

      5:36
    • 3. Jupyter Notebook Usage

      8:51
    • 4. Intro to Pandas. Installing Pandas

      3:06
    • 5. Pandas Series

      8:23
    • 6. Pandas DataFrames I. Working with Columns

      10:57
    • 7. Pandas DataFrames I. Working with Rows

      8:27
    • 8. Pandas DataFrames II. Filtering Data

      12:58
    • 9. Reading and Analyzing CSV Files with Pandas

      21:28
    • 10. Reading Excel Files. Groupby and Other Useful Operation

      11:13
    • 11. Reading and Analyzing HTML Pages with Pandas

      8:16
    • 12. Working with Missing Data

      6:07
  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

130

Students

--

Projects

About This Class

c3760666

Data Science Essentials with Pandas and Python introduces you to the popular Pandas library for Python.

In this course you'll learn to analyze data quickly and easily with Python's powerful Pandas library!

The goal of this course is to bring your Data Handling and Analyzing skills to the next level to build your career in Data Science, Finance or Business Analytics.

Since this is intermediate Python you are required to already master the basics of Python before enrolling into this class. My advice is to first check my other classes on Python published here on SkillShare; they will help you build a strong foundation of Python Programming Language.

In this course we'll get the skills to get ahead!

 Major topics of this course:

  • Installing and Using Jupyter Notebook.
  • Installing Pandas in Python 3
  • Pandas Series
  • Pandas DataFrames I. Working with Columns
  • Pandas DataFrames I. Working with Rows
  • Pandas DataFrames  II. Filtering Data
  • Reading and Analyzing CSV Files with Pandas
  • Reading Excel Files. Groupby and Other Useful Operations
  • Reading and Analyzing HTML Pages with Pandas
  • Working with Missing Data

and more!

Datasets used in this course.

Meet Your Teacher

Teacher Profile Image

Andrei Dumitrescu

DevOps Engineer and Professional Trainer

Teacher

I've been a Network and Software Engineer for over 15 years, the typical profile of a DevOps Engineer.

I've cofounded Crystal Mind Academy, a Cisco Academy and professional training center in Romania,  that focuses on teaching cutting-edge technologies to students.

I have contributed to education in areas of programming, information security and operating systems. During the last 12 years more than 20,000 thousand students have participated in-person or online  training programs at Crystal Mind Academy. 

I have developed documentation, labs and case studies for many training programs such as Cisco CCNA, CCNA Security, CCNP, Linux Administration, Information Security, Python Programming, Network Automation with Python or Blockchain Programming (Ethereu... See full profile

Class Ratings

Expectations Met?
  • Exceeded!
    0%
  • Yes
    0%
  • Somewhat
    0%
  • Not really
    0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Welcome: Hello and welcome to this class on Data Science Essentials with Ben Bus and Typhon. My name is Andre Dmitry School in the bureau instructor for this class as well as lacks of other classes here on skill share data Science decisions with Ben Dust and Python introduces you to the Popular Pandas library for private. In this course, you land to analyze data quickly and easily. The goal of this course is to bring your data handling and analyzing skills to the next level toe. Build your career in data science, finance or business analytics. In this course, you'll get the skills to get ahead sees. This is intermediate python. You are required toe already master the basics of python before enrolling into this class. My advice is to first check my other classes on python published here on skill share. They will help you build a strong foundation off python programming language. These hands on course goes straight to the point without any destruction and focuses solely on how to effectively analyzed the tires in penned us. If you want to waste no more time with incomplete scripts or Totori ous copy paste solutions or confusing source code, then this class is for you. See you in the class 2. Intro to Jupyter Notebooks. Installing Jupyter Notebook: in this lecture will take a look at a very popular python development environment for data science, which is Jupiter Notebook project. Jupiter was born out off the I Python project and its name Jupiter is an indirect acronym off the three core languages. It was designed for Julia Python and are, and it's inspired by the planet Jupiter. The Jupiter notebook is an incredibly powerful tool for Inter actively developing and presenting data science projects. It's infect the development environment, off choice off data scientists and analysts. But what is, in fact, Jupiter notebook? It's a Web application in which you create and share documents that contain life vital coat equations, visualizations as well as text. Jupiter Notebook is one off the ideal tools to help you gain the data science skills you need in this section will use Jupiter notebook instead off. By term off course, you can continue using pi term or any other I D. Lex started with installing a Jupiter notebook on Windows 10. The installation is very simple, and for that we're going to use people. So let's open cmd dot dxy and run beep. He stole I python and Jupiter off course for this. We need the working Internet connection. Peop is downloading and installing I Python and Jupiter Notebook. Okay, Jupiter notebook has been installed. Now legs tested. I'm going to create a new directory on my desktop, where I'll store all notebooks to keep it simple for you. These notebooks are files created by the Jupiter and notebook application and contain both python coat and other reach text elements like paragraphs, figures, tables, equations and so on. I'll move on my desktop directory, see the desktop and here I am K Dear and my notebooks. All notebooks will be start in this directory. Now let's start the application. The Jupiter notebook application is a client server app. Veterans on local host and allows editing and running notebook documents via a Web browser in order to start to the up I'm going toe type of Jupiter notebook. The default Web browser was opened and we see the off our Jupiter notebook application. Here. I'll create a new notebook by clicking on new and by phone free. We can give this notebook a name by clicking on Untitled from above and typing the new name off the notebook. Lex right? My test and we can see here my new notebook my test dot i b y and be This is the extension off a notebook file. Now let's see how to use Jupiter Notebook will write our python coat inside sales and them will run of those cells. Here I write, Brent, let's say hello, Jupiter and then X equal ists Then Now I can execute the coat inside this cell by clicking on run okay or by pressing on Control plus Emperor. Another way to execute the code inside the cell is to press on shift plus ember. In this case, the code is executed and the new cell is inserted below. When you are born with your notebook, you can save it by clicking on control plus s or by going toe file and then click on save and check point. There are a lot of short tracks, and you can see them all by typing control plus shift plus p. These are whole shortcuts. We can see all these short tracks also by going toe help menu and then keyboard shortcuts. Okay, that's all about the basics off Jupiter notebook. In the next lecture, I'll show you other useful features off Jupiter Notebook for you, Toby. Really efficient when writing python applications in Jupiter notebook. Thank you 3. Jupyter Notebook Usage: in this lecture, I'll show you how to become an effective user off Jupiter Notebook The first step to do is to start the Jupiter notebook by typing in a terminal Jupiter notebook. They care that the directory where you started to Jupiter notebook is important. All notebooks files will be saved in that directory, and you will be able to load existing notebooks on Lee from a vet directory and ex subdirectories. I'm going to move on my desktop directory CD desktop with an upper case D letter and then CD my notebooks and here I type Jupiter Notebook. The Jupiter Notebook comment is going toe. Open up the application in a new browser. Now let's create a new notebook and rename it toe private least New Python Skin notebook. I click here on Untitled and I renamed the Notebook toe Pathan lists. This is just an example, and we see that a new file appeared in the directory where we've started Jupiter NOTEBOOK Jupiter Notebook files have the i p. Y and be extension. This comes from it's all the name, which is I Python notebook. I want to see the toolbar, so view and toggle toolbar this is the toolbar. There are two types off cells code cells where we write valid python coat and mark down cells. Text can be added to Jupiter notebooks by using marked down cells. You can change the cell type to mark down using the cell menu, the toolbar or the key. Short cut em markdown is, in fact, a popular markup language that is a super set off xdm l This cell is off type market down. You can use simple features to format your comments. Let me show you some basic text form our thing you can do in a markdown cell to create a heading at 1 to 6 hesh symbols before you're heading. Text the number off hashes you use with the term in the size off your heading. This is an age one. Heather. You save the cells content by executing the cell press on control, plus enter to execute the cell. If you want to add another cell, press on out. Enter. This is a coat sell. Let's make it a markdown cell. I'm going to create another header line this time using three hashes Lex say least methods . You can make the text bolt by using two starts before and after the text. You want Toby Bolt? Let's try an example and I want to make the world Python bolt, star star and two stars after the word by phone. Now Python, he's bolt. I can use a single star toe. Make the text italic, for example. I wanted the world the least. Taubate Alec. You can indicate a comment within a sentence with single back ticks. Let's try an example, and I'll put the world upend between single back ticks. This is python coat or a special method, and it looks different. Okay, I think these are the most frequently used marked, down form adoptions. Lex write and execute some basic private coat in a cell like say, L one equals in the least off numbers. 123 You can print out a value a result or a string using the brain function. Or you can just write the operation or the name off the variable. For example, I can right here print L one and by executor code by pressing on shift and Emperor and I see here the result or a K simply write only L one or one blast one. If you don't specify the print function, it shows the result below in an output cell. Who you see here, this is an output set. You can run the cell either by pressing control, plus enter or by pressing shift plus enter. In the last case, a new cell is insect it below for you to write other python coat. If you create a variable inner cell and run that cell, the variable exists also in the cells below eat in the same notebook. For example. My l one variable exists also in this cell. They care that when you first open the notebook, you must first run the cell where the variable is created. Before using it, I am going to save the notebook by going toe fall menu and clicking. Going to save and check point Big toe the Jupiter Notebook directory. I'm going to shut down the notebook and I'm starting it again. This is the notebook. If I execute this cell, I'll get an error vexed because L one is not defined. So if I want to execute this cell first, I must execute the first sell off the notebook where the variable L one was defined. I have executed the cell, and now I can also extract the element off the least at index one. If you want to insert a new cell before another cell, you press on a for example, I'm going to insect on yourself before this cell, and I am pressing on. A on yourself has been inspected above here. I can write python coat if you want to insect on yourself. After an existing cell, you press on B, for example. I click on the cell, then I'll press on B and a new cell has been inserted after myself. Here I can write bison coat or I can set the cell is being a markdown cell. These short cracks are available in what is called comment moat to enter comment mode. You press on escape. If the execution off a cell takes a longer time, you'll see a master risk. Here. Let's try an example. I'm going to import the time module in the right time. Don't sleep of four. In fact, my script is waiting for four seconds and then print like say then and I'm going to execute the set the cell is executing and we see a master risk a star between square brackets. You delete her cell by pressing on the plasticity. For example, I'm selecting this cell and now D and the and the cell has been removed. Another useful shortcut is shift plus step. If you press on shift plus Deb, it will show you the help off a function. This is the help off the upend method. There is also a nice help menu where you can find the short cox or other useful information . For example, if you're a beginner, you can start with the user interface store and varies a tour for you. You can save the notebook by pressing on control plus s or by going toe file, and then click on safe and checkpoint. 4. Intro to Pandas. Installing Pandas: in this section will take a look at the data I analyzing with pandas. This is a very big topic, depending on how far in depth we go. Ben Bus is an open source. BSD license the library, providing high performers easy to use data structures, and they tie analyzes tools for the Python programming language. It is also dependent on other libraries, like Nam Pie, and has optional dependencies like mud plot Leap for plotting. Ben Bus stinks for Python data analyzes library. Initially, the name was derived from the term Penhall Data on econometrics term for multi dimensional structured data. Six. Band Bus is quite a game changer when it comes to analyzing data with Typhon, and it is one off the most preferred and widely used tools for data. Scientists toe do data manipulation and analyzes, as well as data cleaning and preparation. An easy way to think off Pandas is by simply looking at it. A Spieth anus version off Microsoft's Excel. This was a very short introduction Toe penned us. Now let's go to coding. The first thing to do is installing the pandas module toe dough that simply open a command line interface like CMD Dottie Accion Windows and right People install pandas. I have already installed pandas on my computer. However, you should go ahead and install it off course. Make sure you are connected to the Internet Before doing this. The examples offer this section will be done in Jupiter Notebook. The Web based idee off choice. When it comes to data science or they tie analyzing off course, you can use any other I D. If you are going to your spy charm, they care to install the Pandas module also in the victory environment that by charm uses by default. To do that Goto file, say things. Project Interpreter. Click on the plus sign Search for Ben Bus and he stole the module. We can test that it has been successfully installed by trying to import the module. There is no error when running the script, so we assume that pandas has been successfully installed. Now that pandas has been installed, we can start analyzing data. See you in the next lecture, where I'll start talking about Ben Bus Siri's and banned US data frames 5. Pandas Series: in this lecture will take a look at Pandas. Siri's first Let's Start Jupiter notebook in My Notebooks directory and create a new notebook. I am moving toe my notebook directory I've created in a previous lecture, and here I started Jupiter Notebook by typing the Jupiter notebook, I'm going to create a new notebook by Clicking or New Python three and I Love the name this notebook band. Thus Siri's and the notebook file appeared in my Notebooks directory. What is a pendulum? Siri's A Siri's is one dimensional labeled array and stores data. It has one dimension, so it stores one column off information we can think of a Siri's as of basic pandas data type. It can store any type of data like integers Bull Ian's floats strings. But ideally, it stores only one pipe off data. So consistency matters. In order to create upend us, Siri's. We call the Siri's Function, which is, in fact the constructor. It creates a brand, the new object off type Siri's licks important. The pan bass module is PD a shorter name import ban bus SPD. I am creating a least called data using the least constructor and the range function, let's say, from tool to 11 in steps off to Okay, this is my list. Now from this least, Lex Greater Siri's called s one. It's one equals BD dot Siri's of and I pacing the least this argument. So Data and lex print out s one deep type indicates the type off data that makes up the values or for this Siris. In this case, the values are integers. Let's see what happens if the values our strength. I'll comment about this line in my right, data equals least off in the streak a, B, C, D. And I am executing the cell again. We can see that the type his object, this is how penned us refers to strength. There is also a Nady sh inal component and the vex the index. A pen bas Siri's is indexed like a python list. The first value off the Siri's gets index zero. The second value off the Siri's gets index one, and so on. One difference and on advantage over a python least is a threat of these indexes don't have Toby numeric. They can be any data type, and by default they are nomadic values starting from zero. Now let's modify the index's I'm going to create another least called labels equals least off a B C D. And this is the list, and I'm going to create another. Siri's Cold is to in the following manner s two equals p d dot Siri's. And now I am using two arguments. Data equals numbers, and I'm going to call the least numbers, not the data. Let's say here numbers I executor cell again. And then the second argument off their Siri's method is index equals labels, and I execute the cell we have now a label index Siri's instead of numeric values starting from zero, the Siri's is indexed by strings. It's not necessary to use cured arguments and that this means to specify the name off each argument we can just best in the data and the index in this order, so there is no problem. If right is to equal speedy dot Siri's off numbers coma labels. We can create a Siri's this way. Instead, off a private least, we can build a penda. Siri's using a purple Anoop, I worry or a dictionary, but we cannot use, for example, a set because set is un ordered. So if I try pd dot Siri's off and here I am using us said This is a set. I get a never This is not permitted set type. He's on order. If I use here atop well, it will be okay. I'll get no error. Keep this in mind. Let's try it with a dictionary. So I'm creating a dictionary called My Dict equals Dict Off Zip off. Released like, say, a is the first least element B and C in the second argument off the Zip function will be another list. Let's say one toe end. Three. This is a dictionary Lex printed A, B and C are the keys, and 12 and three are the veil is now. Let's create another Pemba. Siri's called S three equal speedy dot Siri's off my dicked, and we can see how it automatically used the keys is indexes or labels, and the values is the values in the Siri's. Now let's see how we can grab some information from a pandas Siris. In fact, this is simple. We use the index or the label toe, get the corresponding value. This is similar to dictionaries when we want to get the value off a key licks. Try, for example, s three of a and the corresponding value is one. We can also perform arithmetic operations with Penda Siri's. It will try to make the indexes or labels in the Siri's and per firm the operation on the corresponding values. If there are labels that don't match, it will put a moral value or and then, which means missing value. This is a stool and this is s three. Now let's try is two plus s three. I eat a mixed A BSC from both Siri's and at index D and E it put a name value. This means missing value, Vicks, because S three doesn't have the and E as labels. We also notice how it returned floats values, even if the open ranks are integers. Okay, that's all about Bemba's Siris in the next lecture will talk about pandas data frames. Thank you 6. Pandas DataFrames I. Working with Columns: Hello and welcome back In this lecture, we start talking about a very important concept and then expand US data frame. According to the official Pandas Documentation, a data frame is a two dimensional size, mutable and potentially hitter, a genius tabular data structure with labelled axes, rows and columns. This can be thought of as a dictionary like container. For Siri's objects, a data frame is the primary pandas data structure. Let's start with some examples. I'm going to start to Jupiter notebook. I'm moving toe my notebooks directory I've created in a previous lecture. This is on my desktop. My notebooks in here. In this directory, I am starting Jupiter Notebook. Jupiter notebook is starting and a new browser tab will be opened. Here I am creating a new notebook and the name off with this new notebook will be pained. US data frames part one First, I'm going to import ban bus SPD, and I'm going to create a variable called data. This will be the pandas data frame that contains lists. In fact, this is a list off risks and the first element is at least first item is the name then the second item, like say age and the annual salary. Like, say, 40,000. This is the first element, which is a list. The second element, another least. John, 40 and 50,000. Here I can heat, temper and write the remaining A lemmings on a new line. This is for a better readability. Now I am creating a data frame object called DF equals BD dot data frame. This is the constructor that creates a data frame object. The first argument data equals my least. Its name is also data, and the second argument columns equals. And this is a least with the name off each column. The first column is called The Name H and Salary. Here. The number off columns must make the number off fields or columns in the data frame. If not, I'll get on air. I'm going to execute the cell. Now let's see what type off data is DF. So type off DF and type off the F and between square brackets. Let's say name. This is the name off a column or H, of course, And the Casey that DF is off type band bus data frame and the F off age is off type penned thus, Siri's. This is, in fact, a column. We can use the shape attribute to see how many rows and columns has the data frame. The F dot shape, NBC. There are four euros and 30 columns. Lexi. Our data frame veces our data frame. We have four heroes and three columns. Another useful attributes off a data frame is in for heat returns. Some useful information about the data frame. Now let's see how to work with columns. First licks Select some columns. When I asked for a single column, I get Becker sees, and when I ask for multiple columns, I get back a data frame. A new date. Afraid Lexi, for example. The column called Name and I write the DF in between square brackets and single quotes name , and it returned a Siri's. The column called Name or I can also write the F dot in the name of the column, the F dot age. This is another possibility. Toe. Get a column. We can also create a new column off the data frame in place. We specify the column name is It already exists equal. Stein and the values in that column, for example, Lex greater column, cold phone, the F off phone equals and the values in that column. These are some random values. A new column has been created. We can see the new column now. Let's see how to drop rows and columns. We can drop a row or a column using the drop method off the data frame object and a second argument called Axis, which is zero for her house and one for columns. The default value for excess argument is zero, so if we don't specify it, it will remove our our, for example, Lex. Drop girl number two or with index toe DF dot drop two and zero. This is the row, and this is the Axis. Zero means rose and one means columns, and we see how arrow number two or with index to has been removed. Now let's drop the column called Age so the F don't drop the name off the column aches and the second argument, which in this case is mandatory axes, equals one. This Means column and the Eggs column has been dropped. When we use the drop method, the Pandas data frame is not modified in place. Band bus does this for you not to lose information by default. For example, Lex print out the data frame and we can see there is still a column called H. If we want to modify in this case to drop a column in place, we use another argument cold in place, which should be true. Let's drop the Eggs column again, this time in place so the third argument is in place equals two. Let's see the data frame. This time there is no H column. The data frame has been modified in place. Okay, keep in mind that if we want to drop Arrow instead off a column, we use the same drop method. The first argument is the label off the row, and the second argument should be x zero x zero means zeros and X is one means columns. Let's try another example. DF dot drop in the first argument. The only argument is one. It's not necessary to specify. Exes equals zero vexed by default. In this case, it will remove this throw off course. It's not in place. If I want to move to modify the data frame in place, I should specify another argument in place equals two. Now let's see how to rename columns when coming to the naming columns. There are two ways we can use the data frame. Rename method to he name just some columns or use the data frame column. Attribute Toby. Name all columns at one time. Let's see how Toby name just two columns off our data free. I'm going to rename. Name the first name and salary toe annual salary. DF dot rename. The first argument is columns equals and this is a dictionary. This is the all the name off the column name Colon into the new name off the column. First name coma, the old the name Celery Colon. The new name. Annual salary. And I want to modify the data frame in place. Flexi the data frame. We see how these two columns have been renamed. The second possibility is to use. The columns are tribute, and in this case, I must provide all columns off. The data frame is the least in the next example. I'll keep the first name and annual salary, but I'll change the phone column, toe mobile phone and I right D f dot columns. This is an attribute equals and we have released off columns. So first name he remains the same annual salary in the last column will be renamed. So mobile Phone and Lex display the data frame and the last column has been renamed. Okay, that's all about working with data frame columns. In the next lecture will see how to select only summer with off our data frame. See you in a few seconds. 7. Pandas DataFrames I. Working with Rows: in this lecture will take a look at how to select her house. We use the same data frame from the previous lecture. This is our data frame. We have four girls and three columns. There are two methods used to select Rose loc or look and I L O C or I. Look, let's see the first method called look. When using this method, we pass in the label. In it returns Arrow, which is in fact a Siris. Both rows and columns are pandas. Siri's We pass in the label between square brackets. Let's say, for example, the F dot look in our between square brackets. Zero. This returns the entry at index zero. We can also he turn a single value at the intersection off Earl with a column, for example, the F dot look of one in the second argument name. So I want the name it index one, and it will return. John. As I said earlier, E three turns a Siri's. Now let's see how to return. Multiple arose for that. We pass in a list the f dot look and here we passed a least that contains of the entries we want to return, For example, I want to return entries at index 01 and three, and it returned Onley those rose. We can also use slicing, and this is similar to slicing we've seen at strings and lists. Just let the stop argument is also included. For example, DF daughter look zero colon to eat to return all arouse from zero included toe to include it. And if I want only some columns, we pass in a second argument, which is a least off columns. DF dot look. The first argument is the least offer house, Let's say one and two, and the second argument is a least off columns, Let's say name and salary he to return row at index one, andro at index to and only these two columns off course. We can also use slicing here instead. Off Using these values is the least I can use slicing zero colon toe. Okay, take care at syntax if we want to return. All arose, but only some columns we can do the f dot loc Ah, colon. This means all our house. And if we want only some columns, I placing a second argument, which is the least that can pace those columns. For example, I want only the age and the celery, and it returns all rows and only two columns. Now let's see the other method called I L O C. These method uses index based slicing, and it's the pie tonic way. The stop index is not included. DF dot i loc of zero. He returned the arrow at index zero. The main difference between LoC and I loc is that LoC uses labels, and I LoC always uses indexes that start from zero. For example, if we change the index, that pen bus used automatically and used strings as indexes will still give toe I Alosi method is argument. Zero for returning the first row. I'll show you many other examples in the next lectures to make it clear. Now let's see how to return on Lee Arrow and a column using the I Alosi method. The F dot i loc. This is the row and this is the column. Now let's return from 0 to 3. Excluded All columns the f dot i loc of zero colon three. It will return from index zero included. 23 excluded all columns. We can also try something like this. D f dot I'll o c. The first argument is the least off feroz One comma free. It will return tomorrow at index one in the row at index three coma and now some columns. And here I use slicing zero colon to so it'll return from zero toe to exclude it. Okay, Affects basically how we return growth. We use the LoC or the I l O C. Methods. Now I'll show you how to return or end tomorrow off our data frame. For that, we are going to use the simple method the f dot simple. It will return a random row. If I execute the cell again, it to return another each time it returns a random row. If I want to return to random arose, I can pace in here. Um, argument called n equals. End the number off rows. This time it will return to Random House. Keep in mind that Ichiro is independently selected one by one. So the order in which they appear in the data frame we see here may be different than the order in the original date. Afraid? Another possibility we have here is to return a fraction off the data frame, for example, d of that simple. If our A C equals 0.2, we thirst a random 20% fraction. So it returned 20% off our data frame. Now it return to 50% off the arrows in our data frame. In the last example off this lecture, I'll show you how to read and convert some columns off the data frame into a python dictionary. I am going to use the python built in function called Zip that aggregate stow columns off the data frame into a dictionary. The values in the first column are the keys in the values in the second column are the values off the dictionary. For example. I want addiction argued that has the name s key and the age is value. Dicked of zip of the first argument represents the keys in the second argument represents the veils. So the f of name names will be the keys. And the second argument is the f of age. And this is our dictionary, our private dictionary. Okay, that's all in this lecture we've seen how to select her was using loc and I Alosi methods. Thank you 8. Pandas DataFrames II. Filtering Data: Hello and welcome back in this lecture will focus on filtering data sex. This means selecting a subset off rose in our data frame, based on a certain condition that should be met will continue with the same data frame from the previous lecture we have here a data frame with four employees there Name, age and salary. Let's see how to filter data. We can write something like this DF of salary greater than 50,000. We see how it returned Tour force for Ichiro based on the condition. This is in fact, a Siri's off billions. Now let's select and return. All arose where salary is greater than 50,000. This is a condition based on a column for that by right DF of and between square brackets I right DF of celery greater than 50,000 and it returns all rows where salary is greater than 50,000. We can also use the LoC method with the same result. Lex tried DF of and between square brackets the F dot loc a pair off square brackets colon . This means all our hours I test only the celery column and the condition is in this case. Let's try less than 50,000 and eat retardant all those where salary is less than 50,000. In our data frame, there is only one grow. If we want to select only one column, we add a pair off square brackets and between single quotes. The name off the column, Let's say here, name it returned only the name where salary is less than 50,000. If I want more columns, I use a least off risks. If I try something like this, trade another element here. Let's say age. I'll get a mirror here. I should use a least off risks. So this is the correct scene sucks, and it returned the name and the age where salary is less than 50,000. Be careful with the syntax here it's somehow a little bit strange. Perfect. Now let's return. All arose where age is equal to 30. DF of DF dot loc Hold on. I was age. The column. This is a condition based on a column equals equals 30 and it returned. All arose where the age is equal to 30. Let's see how to use negation. This line off coat will return. All arose where salary is not less than 50,000 DF of a tilde this minister negation and a pair off for emphasis between parentheses. I right DF dot loc colon The name off the column celery in the condition lesser then 50,000 and we see how it returned All rose, where salary is not less than 50,000. The first throw hasn't been returned, so Tilda is used for negation. Now let's calculated the maximum and the minimum off all values in a column. What is the maximum celery? The f of salary dot max and the maximum salary is 50,000 Lexie, the minimum age, the F and instead off age between a pair off square brackets I use dot and action. This is another possibility that mean and the minimum age is 29. Now let's see another useful method called idee X max. It returns the index off. There are where the column in this case, the celery is the maximum value likes. They are equals. DF of celery dot i d ex Mex and the Lexie. The value of our our is to we can see that the Darrow, with the maximum value in the celery column has index to if I want to see that row I right DF dot i loc of our are being the index and this is thorough. Let's point out our data frame again. We can change the default index and use another column is index for example, celery By default. Bemba's used numeric index is starting from zero in my right DF That set underlined index of salary and the second argument is in place equals true. Let's see the data frame. We can see how the seller he is the new index. If you want to reset the index tweaks initial value, you can use the reset index method the F dot reset index. We can see how the change was not in place, so the data frame hasn't been modified. If I want to modify the data frame in place so to change the index in place I piss here in place equals true and the index has been reset. Okay, I'm going to delete the cell and this one by pressing on dnd and I'm going to change the index again. I want the index Toby the Celery column. Perfect. Now I'll show you the difference between Loc and I Alosi. If we want to return a value using LoC, we use the key in this case, the salary. So I right d f dot loc of let's say 50,000 and it returned there. Oh, where the key or the index is 50,000. If you try DF dot loc of one so position or he index one, you'll get an error. But if I use I loc of one eat retardant darrow at position one i loc of zero this is the first row. So we passed Always a number starting from zero toe I loc method. I want to be sure that everything is clear So I'm going to change the index again. The f dot set index name and in place equals two. This is my data frame. Now the column name is the new index. Let's see how to return the so using loc for that I write DF dot loc of John. This is the label and it returned retro. But if I want to return in the same row using I Alosi I right d f dot i Alosi of one position one and it returned the same for the next examples. I'm going to use the original data frame. I'm going to run the cell where I've created the data frame again. This is our data frame, the original one. Now let's try multiple conditions like celery greater than 50,000 and age greater than 54 of it. We use part emphasis and the M percent between each condition. Next, right DF of a pair off for emphasis in the first condition. The F off salary greater than 50,000 in percent, a pair off parenthesis and the second condition DF of age great, darker than 30 and it returned. All arose where the salary is greater than 50,000 and the age is greater than verity. In this case, there is only one row. Take care if you use end. He stayed off in percent. You'll get on air so here and returns an error. It's mandatory to use n percent and not. And if you want to use the logical or operator, you write a vertical bar. A pipe instead. Often percent Lexie, the employees with a salary greater than 50,000 or the employees that are younger than 30 instead, off in percent. I use a pipe of vertical bar, and I want the employees that are younger than 30 and there are two rows. So is a conclusion. When you want to check multiple conditions, you put each condition between parentheses and use between conditions in percent for logical end or a vertical bar, a pipe for logical or in the last example off. For this lecture, I'll show you another useful method Off upended Siri's and vexed. The between method is, you might guess the between method is useful when you want to find values that fall within a specific range. Let's try an example. Let's pull out all employees that have a celery between 45,000 and 59,000 inclusive DF of and now the condition the F of celery dot between This is a method, and the method has two arguments in this case, 45,000 coma, 59,000. These arguments are inclusive. It really turned the salary between these values and there are two rows off course here. We could get the same results using two conditions and the 1% sign between them. The first condition is celery, greater than or equal to 45,000 and the second condition is celery less than or equal to 59,000. But between method makes it a lot simpler. In this lecture, I've shown you how to filter data. Usually, we are not going toe build data frames in this manner. But we actually are going to read the data from Sears, Vic Cell or even HTML files in the next lectures. I'll show you how to hit data from C S V and Excel files and how toe analyze that data. See you in just a few seconds. 9. Reading and Analyzing CSV Files with Pandas: in this lecture will see how to import data into our project. This is infected. The first step off any data science application Banda's is a library has the ability to import data from a wide variety off sources like C S V. Jason or Excel files, HTML pages or SQL databases in this lecture will focus on how to work with data from CS REFILES. First, let's import our module import pandas SPD. Then I'm going to create the data frame the F equals p d dot reit Underline CS V. This is the method used to load the data. Pharmacia's V file in tow, a pen dust data frame. The first argument is the CIA's V file, and in this case, the file is in the current working directory. If not here, you should use a valid relative or absolute path. In our case, the name off the file is countries off the world dot C. S. V. There is also a second argument day limiter equals and here we put the daily meter by default. The day limiter is a coma, but if the sea is refiled, uses another day limiter like, for example, a colon or an exclamation mark. You should use this argument for this example. I've chosen a C S V file that contains facts like area population, birth rate and so on about the country's off the world. You may find this series refile a text toe this lecture. I am executing the cell Now that we've created our pandas data frame, let's see a concise summary off this data frame by calling the info method the F dot info and I am executing the cell. We see a lot off useful information like how many rows and columns are in the data frame. What data type is start in each Siri's, the memory usage and so on. One interesting fact is that some arose have missing moon or nan valuers. That's why we have 227 values in the country column and only 205 values in the Climate column. Let's see the content off our data frame veces our data off rape. If there is a data frame with many rows and columns, it will not display all rows and columns by default. Here we see how it's kept from Line 29 to line 197. Just imagine. How would it be to display 100,000 throws? Probably it will freeze. In our case, there are only 227 hours and 20 columns in the data frame. If you want to see more rows and columns, there is the set option method off our data frame object that can be called, for example, Lexie. All rows and columns off this data frame. I can do this because I don't have so many rows, and I hope it won't freeze. PD thought set underline option. The first argument is the option, and the second argument is the value. Display dot Mex her House 500. It will display maximum 500 throws pd dot set option of display dot marks. Columns 50 NPD dot set option display dot with in the Value 500. Now let's see the data frame and it displayed all arose in the data frame all 227 rows. Now let's practice what we've learned in the previous lectures on Earth. This data frame Let's see the 1st 5 euros for that. I'm going to use the head method, and it displayed the 1st 5 rose. If I want to see the first generals I passed in here. Then his argument, and it displayed the 1st 10 rows. Let's see how many rows and columns are in the data frame. The F dot shape. We have 227 rows and 20 columns. This is a couple so we can get on Lee the number off first or only the number off columns in the data frame, for example, the number of froze is DF dot shape of zero off course. The number of columns is DF dot shape of one. Now I want to see Onley, some columns, for example, the name of the country and the GDP DF of and I am going to use a least off columns. So I want to see the country and the GDP. I must use the exact name off the column, so I'll copy Paste the column name and it displayed on Lee these two columns. If I want to see only one row, I can use the LoC method Lexie Dero at Index eight, the F dot loc and eight between square brackets. This is the country at index. Eight weeks Argentina. Now let's see more grows, but not all around us. Lexi arose at index 04 and eight DF dot loc and I am using the least zero comma. Four. Comite. It will display these three rows. Index 04 and eight. What if I want to see only some columns? I use a second argument, which is also a least with the names off. The columns count three population and the lexei net. Immigration, and it displayed only three rows and three columns. Now Lexie all rose. But only these three columns I'll copy paste the cell here and instead off at least that contains the indexes. I use a colon. This means all Rose and I've got all 227 rows. I can also use the eye loc method like this. D f dot i L o C of zero colon five Coma colon. I want rose from index zero toe 45 excluded And all columns. And this is the output. Let's return. Some arose and some columns DF dot I'll OSI of the first argument is at least that contains Darrow's one coma free coma 100 and the second argument re presents the columns Here. I'm going to use slicing zero colon For this means country index zero region index one population and area and we see the expected output. If we want to get a random row, we use the simple method the f dot simple This is a random country France, another referendum country. If I want three and them cos I can use an argument here, another way to return random arose is to use the argument cold. If our a c, it will return a fraction and I want to return 5% off our day. Pass it the F dot sample. If our A C equals 0.5 this means 5% and it returned these rows. Now let's do some filtering. I want to see all countries with a population greater than 100 million DF of and now the condition DF of population greater than 100 million. And these are the countries. If I want to see how many countries in the world have a population greater than 100 million , I can use the Len function and there are 11 countries. If I want to see only some columns, I lose here another least and the element off the least is the least with columns. I want to see the country population and birth rate, and it returned only these three columns. Let's try to sort the output by population in descending order. For that, we use the sort values method. This is a data frame. So if I want, I can right here likes a D. A form equals. I've created a new data frame and I can call the sort values method on these data frame. The first argument is, population. I am sorting by population and ascending equals false and the output is now sorted. In descending order here, it wasn't necessary to create a new data frame object. I could do something like this. I am going toe delete to this cell and here, instead of creating a new data frame object, I write not sort various population, and it returned a new data frame sorted by population, this time by default in ascending order. Now I want to get the index off the row where the GDP per capita has the highest value. In order to do that, I'll call the idea ex Mex method. Let's create a variable called X DF of the name off the column in this case GDP per capita . I am going to copy Paste the name off the column that I. D Ex Mex and that country has index 121. Lexi What is the name off the country? D f dot I'll O C of X and that country is Luxembourg. Now let me show you another useful method off a data frame. Vex the is in method. It is useful for checking for multiple values within a single Siri's. Let's try a very simple example and to demonstrate it. First I test it on a simpler data frame. I am going to create a new Pandas data frame like this data equals and a leased off risks. And now my underlined DF equals BD dot data frame data equals data columns equals. The first column is called Name in. The second column is called Skills. This is the new data frame. Now I want to extract all the rose from this data frame where the value in the skills column is either programming or marketing. How does it work? I am starting with extracting the Siri's column, The F off skills and then I'll call the is in method on this object, my DF of scale. This this is a Siri's and I am going to call the East in method on this object is in and it takes his argument. A list likes a programming and marketing, and I can take it in a new variable. Like, say, skills Lexi. What type off object is skills? This is a Siri's off billions. If I want to see the employees that have programming or marketing is skills, I right. DF of this killer is sorry. The name is my dear. This is the new data frame, and there are 30 employees then pull and Helen that have these skills. Let's return here. The East in Method takes in his argument, at least, or a topple. And what is going toe do is comparing the values in the skills column with the elements off the list that is provided as argument. If the value in the current row in Skills column is equal to any element off the least he could get talking through. Otherwise it will return false. Another thing to notice is a vet. Sometimes there are leading or trailing white spaces at the beginning or at the Vienna off a string. In this case, we use the str dot strip method off a city's object to remove to strip white spaces, including new lines or a set off specified characters from a string in the city's Lexie. An example. I'll put here some white spaces in front, off marketing. I'm executing the cell again. I am creating thus Kilis, Siri's and we can see false here because I have some white spaces in front, off the marketing value in this example, the first value off the Siri's is false. We can see that then doesn't belong anymore. Toe the new data frame. Now let's call the strip method. Here I write dot str This is a stream method dot strip and I am executing the cells again. Now we see that the first value in the Siri's is true and then was selected Perfect. Now let's try the is in method with our country's off the world data frame. I want to select all countries in either Eastern Europe or Western Europe, and I want to display them sorted by population in descending order one more time. This is our data frame, we can see there is a column called Region, and there are Western Europe and Eastern Europe. I am going to create a variable off type Siri's called Region equals D F of the region. That s t r dot strip Vicks, because in the CIA's V File, there are spaces in front off the region. This is the CSB file that is in and the least off values. Eastern Europe and Western Europe. Region is a bullion Siri's. It contains false or true, to leave. The country belongs either to Eastern Europe or to Western Europe and false otherwise. Now let's see those countries DF of region region being my Siri's variable. These are all countries either in Eastern Europe or in Western Europe. Leks sort of them by population in descending order. DF of region sort of values. The first argument is population, and the second argument is ascending equals false. And these are the countries Germany, France, United Kingdom, Italy. We see how they are sorted by population in descending order. Okay, that's all about the easy method. And my advice is to play around with this method because it can be really useful sometimes Lexi other useful methods. For example, I would be interested to see the 1st 3 largest countries in the world in terms off population. I can try something like this. D f dot n largest The first argument. Three and population. It will return the three largest countries in terms off population China, India and the United States. Now let's see the five largest countries in terms off GDP per capita. Five in the column. He's GDP per capita, and these are the countries Luxembourg, Norway, United States, Bermuda and Cayman. I likes we can call the n largest method. Also using cured arguments DF dot n largest of an equal stool columns equals birth rate. And these are the countries. They have a huge birthrate off course. You have guessed if there is the end largest method, there is also the M smallest method. Lex, Cedars, moralist countries in the world in terms off area the F dot and smallest, I want to see five countries and by one to the smallest countries in terms off area. And these are the countries leks, sort of them by population, sort of a lose of population. They are sorted in ascending order and I can right here, ascending equals false. Now they are sorted in descending order 10. Reading Excel Files. Groupby and Other Useful Operation: Hello and welcome back in this lecture will focus on how to read and analyse Excel files with pandas. In order to do that, another module cold X l rt. Is required. Make sure you have it installed or install it if necessary. By opening cmd dot e x e and running people install x l rt and the module has been installed. Now Big toe our Jupiter notebook. After importing the Pandas module, I am going to import the X l rt module. Then I am creating a data frame object called the F equals speedy dot reid Underline Excel This is the method used to eat Excel files. The first argument is the Excel file. For this lecture, I've created an Excel file called salaries dot x l SX. It stores information about employees on three columns, name, salary and country you can find this file is an attachment toe. This lecture salaries dot x l s X. The second argument is called sheet Underline name and is the name off the sheet in our workbook. In this case, there is only one sheet called sheet one. Let's see our data frame object veces our data frame. Let's do some simple sticks. Let's calculate the mean value of the salaries, the F of salary dot mean and this is the mean value. Licks calculate the maximum value the F of salary dot Mex, and this is the maximum value 81,000 the minimum value, and I use another notation. DF dot celery dot mean it's the same and receive that the minimum salary is 32,000. Let's see how many celery values are in the Celery column. The F off salary that count. We have 13 salaries. We can also calculate the standard deviation by calling the STD method, and this is the standard deviation off the salaries. If you want to see unique values in a column, for example, you want to see the least off countries that employees are from. You can use the unique method GIF of country that unique, and it returned Honore off unique values. This is not a python least. This is a new by array that can be easily transformed into a python list. Let's take it into a variable called X Piper Fix returns known by our A. And if I want the least, I simply call the least constructor least off X, and I've got a list A python list. If you want to see how many unique various are in a column, you can use the Len built in function. Or you can use the n unique data. Siri's Method Len of X. There are four unique values, so four different countries. We can also use the F of country dot n unique, and we've got the same result. If you want to see how many times each unique value occurred in that column, you can use the value counts. Method in fact, in this example, will see the number off employees from X Country DF of country dot value counts. In this case, there are four employees from USA three from UK, forgive from Germany and Brazil. Now a short recap about conditional selection. I want to see all employees from UK which have a celery greater than 50,000. We can do these in one or in two steps. If we want to do this, you wanna step and this is the most frequently used way We right DF and now the conditions using part emphasis and m percent between them. The F off salary greater than 50,000 in percent. Another pair off part emphasis DF of country equals equals UK and it returned All arose where the salary is greater than 50,000 and the country is equal to UK now. I want to show you how to do it In two steps I am going to create a new variable X a mosque equals and the conditions here. Don't forget parentheses, they are mandatory. A mask is a serious a serious off bull years. And we have to go for a textural if both conditions are met. Now I write DF of mask and I've got the same result. This is done in two steps. Now let's see how to sort of the rose off the data set by a column d f dot sort values of salary. We notice how it sorted by celery in ascending order. If you want to sort by celery in descending order, you should use a second argument and the vex ascending equals force. And of course, you can also sort by other columns also by columns that store string values. In that case, it will sort Darrow's alphabetically leks sort of them by name. and it's sorted by name alphabetically. Another very useful operation we can perform on Penda Data Frame is to group together lows based on a column and then perform aggregate functions on a them. This is done using a group by statement and it's similar to group buy from SQL. A group by operation involves some combination off splitting data's applying a function and combining the results. This can be used toe group large amongst off data and compute operations on these groups. In the following example will group by country and then call some aggregate functions on aggregate function, takes in many values and returns a single value. First, which is the column we group ABI and pandas will gather all the rows together. Based on that value, I am creating a new variable x a c. I am grouping by country, the F dot group by and the argument is country. The column name see is a pen bas generic data frame group by object. Let's run some aggregate functions on this generic group by object. Let's return the maximum salary off all employees from X country si dot Mex and it returned this data frame see what I mean, these are the minimum salary from each country. Let's return the mean celery value for each country. These are the mean salaries and the standard deviation. The result off these pen Ba Aga Gate functions is another panda's data frame, so we can further perform other operations on the data frame. For example, let's see the maximum value in the data frame where the label is UK. I am creating a new data frame called D F one equals C dot max. The seas are new data frame and now DF one dot loc of UK and eat returned this. Throw this Siri's. Instead of creating new variables and calling methods on those variables, you can write just a single line off coat. For example. Lexie. The minimum salary off all employees from USA D F dot group by we have the same argument country daughter mean dot loc of U. S. C. I've done it in one step at the end. Off this lecture, I want to show you one last useful method that can be used with Group by and Vicks the describe method. It returns many useful information all at once. D F dot group by country. Don't describe and we see a lot off useful information. If you want each Oto B a column, you can forever call the transpose method dot transpose. Okay, that's all about reading and analyzing Excel files for the moment. Thank you. 11. Reading and Analyzing HTML Pages with Pandas: Hello and welcome back in this lecture will take a look at another very useful function off Bambas and the next reading and parsing HTML files. In order to avoid any possible problems, another module should be installed. That module is called L X M L and can be simply installed by typing in a command line interface. People install Alexei Emelin. The pandas read. Underscore HTML function will read html from a U R L a file like object or a string containing HTML and parts all HTML tables found in the content into one or more pandas. Data frame objects. The function always returns a least off data frame objects, actually zero or more, depending on the number off tables found in the HTML. Let's start our first example with HTML multi line string. I have created this very simple webpage, and I, like Opie paste the source code into a python variable. This is a multi line stink ekes enclosed by triple quotes. This is the HTML source Court off the Web page. I am creating a variable called DF s from data frames, b d dot reid, html of. And here I'll apace in the variable html strength. This function returns a least off penned US data offerings. We see this is a list. If I want the first table, I should access the data frame at index zero. DF equals DFS of zero, and this is the first table saved in a data frame. If I want to the second table, I use index one. This is the second table. All operations we've seen earlier are applicable. Also toe these data frames, for example, DF of year, and it returned a Siri's The Year column. We notice how Penn Bus automatically found the header to use thanks to the table Heather Egg. This is the Hebert and this is the tech. But this is not mandatory, Toby defined, and actually, often it's missing on the Web. So what happens if it's not present? I've modified the HTML coat and I've removed the table. Heather. This is the new HTML code without a table head. Let's hit the HTML again. We see how Penn Bus automatically named the Columns 01 in tow and the table header is infected. The first throw off the date. Afraid in this case, we need to pass the row number to use as Heather. I will use a second argument for a read HTML function. Heather equals zero. So the first thorough will be the header and I'll exactly the cell again. Now it looks great. Okay, in the next example, well, read and analyse a table from a page hosted online at the specific girl I have with this Wikipedia page that contains a least off European cities by population. I'm going to copy the u. R L. And I am creating a variable cold. You are the F s equals BD dot reid html and I'll apace in a single argument, which is the You are Lexus e the DFS variable. This is the least off data frames. We see how it contains many tables. I'll take the first table in another variable off types. Data frame DF equals the fs of zero and the seas. My data frame the European cities sorted by population. Let's try a new example this time I want to analyze the historical stock price off Amazon. I am searching on Google Amazon historical stock price and I am going toe open the NASDAQ weapon. This is the Web page I want to read with pandas. I am creating a new variable called your l equals and the u R l Off the page the F s equals pd dot reid html of you are ill. The REIT html method returns a least off penda as data frames each data frame stores html table in this example I want to analyze the stable so the F equals the F s of zero Lexie. Which table is at index zero and that's not my table index one and index to This is the table I want to analyze. Let's try to find in which day there was the maximum open stock price. There is a column called Open that has Index one If we start counting columns from zero now , using the I LoC method, I want to select only that column the f dot i Alosi of colon. This means all rows and column at index one. This is the column. Let's find the index off the row where the value in our column the column named Open has the maximum value. For that. I'll use the idea ex Mex method so dot I d ex Mex and the row that contains the maximum value has index for Let's take it in a variable called I and return that row so d f dot i loc of I the index for And this is the Siri's. This is the row that contains the highest open price. And this is the arrow Vicks how we read and parse html documents with pandas. Thank you. 12. Working with Missing Data: in this lecture will see how to work with missing data. His data comes in many shapes and forms. Bend US aims to be flexible with regard. Toe handling Missing data within pandas A missing value is denoted by N a. N or Nen and is what most developers know as little or missing data. Many times we want tojoin or marks more data. Six. That don't fit together perfectly. Let's say, for example, there is a column that exists only in one day. Pass it. When we join these data frames will introduce Nool names or missing Vallis. Let's go toe coding and see some examples. I am going to create a new private notebook, and in this example I'm going to create a pen vast data frame from a dictionary that has missing veils. I'll use the python non keywords for those various. This is my dictionary. From this dictionary, I will create a pen bus data frame, the F equals, speedy dot data frame of my dicked. This is the data frame, and we see here men or missing values. We've got the data frame with three columns A, B and C and three zeros. 01 in tow, and there are some missing values in column B NC, but no missing values in COLUMN A. Now let's explore a useful method called Drop in A that helps us dropping missing values from the data set. If there are just a few missing values, there are times when we simply want to drop them. We don't want to include them in our analyzes do toe. They're incomplete. If we call this method without any arguments, it will drop all those that have at least one missing value. Keep in mind that it doesnt modify the data frame in place. It just returns a new modified date. Afraid the F dot drop any. We see that the data frame was not modified. If we want to modify the data frame in place, we pass in the in place equal strew argument, and the data frame has been modified in place. I am going to create the data frame again perfect by default. The method that removes missing values across the rows so infect it removes rose that have missing values. If we want to remove columns with one or more missing values, we use the argument called X is equal strong. Next right DF dot drop in a access equals one. Xs equals zero, which is the default means rose and access equals one means columns, and it removed all columns with missing various. Another useful argument off drop in a method is threshold. It's used to drop only those rows or columns that have less than the threshold number off non men values. For example, if threshold is too, it will keep all rose that have at least two known men values. In our example. It will keep the row with index toe because there are at least two non men various, So the F dot drop in a dress H equals two. Now let's check another method called feel. Any. This method is used toe feel missing values in our data frame. We used this method each time we want to fill the holes in our Siri's or to a place those and then various with something else. Let's check our data frame again and see how toe feel. All or just some are missing values. Let's try DF dot Fill in a value equals python, and it feels all missing values with Python. Let's try another example, DF dot feel in a value equals 100. We see how it feels missing values with 100. We can also calculate the value it will feel using other Pam bus functions. For example, we feel the name values in column will be with the mean value off all other values. Off that column DF of B I want to feel Onley missing values in column b dot Fill in a value equals DF of be dot mean and we see how it feels with five. The mean value off the other two values in this column. There are many approaches for feeling missing data impending US data frames, and they depend mainly on what type of data you are dealing with.