Python pandas tutorial | road to machine learning part 3 | Michal Hucko | Skillshare

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Python pandas tutorial | road to machine learning part 3

teacher avatar Michal Hucko, Python | Docker | Kubernetes

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

14 Lessons (2h 14m)
    • 1. Introduction

      1:41
    • 2. Reading and attributes

      20:27
    • 3. Selecting

      15:35
    • 4. Indexing

      11:09
    • 5. Multi index columns

      5:34
    • 6. Updating

      9:02
    • 7. Joining

      9:04
    • 8. Describe

      19:51
    • 9. Iterating

      5:38
    • 10. Group by

      14:19
    • 11. Strings, datetimes, sort

      8:44
    • 12. Ploting

      7:12
    • 13. Writing dataframe

      3:32
    • 14. Project

      2:35
  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

48

Students

--

Projects

About This Class

In this tutorial I teach pandas essentials. Pandas is a python library dedicated for data analysis. Is one of the most important tools of the modern machine learning developers. In the tutorial we cover all the functionality needed to head-start the data-analysis in pandas. 

For the whole course we prepared detailed examples documented and executed in the jupyter lab notebooks. You can find source code on the course github. We highly recommend first to try to code the examples along the lecture. Use the source code as a last resource to overcome the errors. This way you will learn much more with the course :). 

During the lessons we present example code run on the kaggle dataset which you can find on this link. To download the dataset from the kaggle you need to be registered. Registration is free. Kaggle is an ideal place for datasets dedicated for learning the machine learning :).

The agenda of the course is following: 

  1. Reading the dataframe and accessing attributes
  2. Selecting from the dataframe 
  3. Indexing the dataframe 
  4. Column multi index
  5. Updating the datframe 
  6. Joining the dataframes 
  7. Describing the dataframe (intro to exploratory analysis)
  8. Iterating the dataframe
  9. Group by 
  10. String, dates and sort 
  11. Basics of plotting 
  12. Writing the dataframe
     

Meet Your Teacher

Teacher Profile Image

Michal Hucko

Python | Docker | Kubernetes

Teacher

Hello world!! My name is Michal Hucko and I am passionate python developer. I am former university teacher. I was doing my Phd degree in computer science, however because of unfortunate situation I decided to currently postpone the study. Thats why I want to teach computer science online. Hope I can help you to understand the modern world of machine learning and distributed computing.

Besides programming I like to spent time with my wife, my brother and my friends. I am passionate fitness guy and sometimes I play computer games.

About my engineering career

For past 5 years I am working as a machine learning dev ops developer. I am working mostly with docker, kubernetes and python. Currently I am working for one of the biggest computer company in the wor... See full profile

Class Ratings

Expectations Met?
  • Exceeded!
    0%
  • Yes
    0%
  • Somewhat
    0%
  • Not really
    0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction: Hey, guys, my name is Michael would go and welcome to the kind of tutorial Thes course is a part of a serious courses called the Road to Machine Learning, where I'm teaching you how to apply machine learning algorithms in the fight. These course will be about Banda's library. Banda's is a bite and library used by the death of scientists all around the world ward wide toe process. The big data is video. I will cover Banda's basics. I will show you Kansas data types how to read data into pandas. How to select that from the frames index data frames and describe that frames and I will cover much more. If you are interested in this stuff, please join my course. This course is also dedicated for absolute beginners because from the start of the course, I woke over absolute basics how to install pandas and how to help work with the funds properly. My name is Michael Good school and I used to be a PhD student and they would have had the university where I studied computer science. Right now working gets a machine learning develops Engineer. If you are interested in my research which I did in the text processing and emotion detection. Feel free to check out my publications on the Internet. You can find more details about my career at the LinkedIn profile, I hope guy. So your interest in bond us as much as I do. And I hope I will see you in the next video off this course. Thank you very much for watching. 2. Reading and attributes: Hello, guys. My name is Mika who score Welcome to our end of the video off Pandas tutorial in this video , we're going to talk about basic data frames reading that airframes, I will explain little bit. What is the frame? And I will show you how you can read the time to plant us. That a frame? Okay, so before we start, let's this castle data we will use Feel free to use any data you have in your computer or in your company, which you are working with daily. I highly recommend using some structure data with multiple columns and lots of rows. In our case, I will use housing price regression data which I will show you where you can find. Feel free to use everything. You can use data for craft classifications, regulation. If you don't know anything about ossification or regulation, just wait until our next video where I will explain more details about the machine learning and the terminology which I'm using right now. Okay, so let's talk. Talk about the date at which we're going to use for the purpose. I will use data from webpage website called Kegel that come giggle. That calm is a place where you can file find multiple free data sets used for the machine. Learning giggle also provides you some example called used for the machine learning problems like regulation and classifications. So feel free to logging to the K. Go that calm because you need to log in in order to be able to down with something from the Kegel and then a put into search the name house prices, advanced regulation techniques. Then in the debt data tap, you can find the whole data set at the bottom where you can download it with these bottom. Okay, so feel free to download it after Donald again, you will get the ZIP file, which you can place to your desired location. In this case, I'm going to use the Pandas tutorial folder, which are well used through the whole course and then inside the house price that assets after you unzip the file, you will find it data description that 60 which is the 60 file with the description of the data sample submission. See, every these file is not like interested for you. You will. We will not use it in this course, and then the test and train CSP, which is basically the daytime or about the test and training the tested. I will explain to later videos and tutorials, so just feel free to think about these as a They die inside there. Okay. During these course, people use pandas. And if you are interested in what versions I'm using with the library's connected to the pandas, you can find them on my get help are linked in the description below. On this get help page. If I owned the examples which I'm gonna cold for you in this tutorial, so feel free to go there. Check out the code there, Donald the cold for U N. And you can go alone with the gold. But I highly recommend you to go through the examples on your own. Please write the coat. You know you need to transfer the cold into your brain through your hands, not just copying, not just pressing controversy and control the so So please go ahead and write a coat with me. Okay, so so And now, as you can see, I'm in my terminal. If you don't know anything about the bite and or or any basics off the bandas or the Jupiter, which we're gonna use through the school. A tutorial. I highly recommend you check in my other previous videos on the skill share link the on the screen skill share. So please feel free to go first through these videos, learn some basics and then come back. If you just have the knowledge from my previous courses, this should be sufficient for you to start with this tutorial. So So feel free and let's go along with me. So as you know, this is the sign which says that we are running the virtual environment called Pandas, and I have already installed all the library. So as I mentioned, there is the requirements txt file, as you can see in my folder over here with all the libraries, which I'm using. So, for example, if you go under the B, you will see that we're going to use the pond as one that zero that five and then under the ANA is the number I won that 19 0 So feel free to install this for you. All the other libraries will be installed along. But if you are just like if you don't care about libraries Just go and go like cape install dish are and requirements that 60 which will install the of the packages for years. Something like these install are requirements that the extra presenter and evil installed old libraries, which I having still local in my computer, which I'm going to use to this tutorial. Okay, but let's go into the coding. So we will run the Jupiter from the desired location. In this case, we're going to run it from the Jupiter tutorial. So I'm going to run your fighter lap and these will open. Did you buy your vital up for me? As you can see, I'm on local hosts. 8889 Becks, last lap. So? So maybe the skin of or I across different versions of the pundits. So, for example, you going around the fighter on 8888 So So just just understand that this is the same thing is just the different port you don't need to worry about this. Says you can see I have pre prepared some examples for you. For each of these lessons, I will go to these examples with you, Some of them. I'm gonna use right to read, Ride on my own, some of them. I'm going to just show you from the file. But really go with me through this cold and write them. So I'm going to create a new Banda's notebook for these pandas means the name off the environment. If you have the name I don't know, we help the name of the environmental beetle. Okay, so before you do anything with the pandas, you need to import a pond. Us. And also I highly recommend through important No, by because we will use no bite quite often. And the shortcut for this is a slight and be This is quite used in the data science community toe call the shortcut for pandas a speedy and the umpires and be okay. So right now, what's the situation? As you can see, I have you the folder called House Price that I said, And if you click inside there, there are these two CSB files. So one is called the train and second is called the test. But basically this is this is the data which we're going to use. So if you open the train, you will see there are a lot of problems together. There are 81 columns and about 1400 droves. And what is the doctor said that I said I'm going to use is describing the house houses which were sold in the America and and the these data set is dedicated for production off the target value, which is, in this case, sale price. So all this column are describing the house which was sailed for sold for these sale brats . So, for example, here you can see the sale condition, which was normal. He had sale type, which is some shortcut. And all the description about this fuels you confined the this txt file. So if you open this txt file, there are name off the columns, the name of the values and explanation. So, for example, the column Ali stands for type of l. A access to property, and there are three types available in this color. One of them is called the Gravel Second Speight and third note away access. So as you can see, it's quite easy. Quite simple. There are some numerical data. There are some string data. We will cover this later. But right now this is The data set is located in the folder Feel Frito Place your daughter . Setting Sam Desired folder is a quite good practice to have the folder for data, which is sometimes called data, and sometimes it's called differently, so feel free to call if I want. So going back to the reading, the first thing which I want to show you is how you can read the CIA's. We file into the pandas that frame well, many of you don't know what is the panels that maybe you have some intuition about that. So this is exactly the lesson which I'm gonna explain you the data frame logic, independence. So in order to to read the data frame, Banda's has a constructor called data Frame. If you press tab, you will get the alto field on there. You need to specify the path to the desire to the frame you want toe. You want to read? Okay, so in this case, we want to create it from the constructor, right? So we can create but the suffering, for example, from from the string from lists. If I run it as you can see, I created the data frame with column zero, because I didn't specify the name off the column and Row Zero that one. That too, right? So it's fairly easy to transfer basic bite and data types into the data frame. I think, for example, transfer also the dictionary into the frame. Let's say I'm going to have a key called me Help, and then the values will be this least off numbers. And when I ran, it is you can see the key will be treated as the column name and then the the value will be treated us as the as the values into that frame. If I want to change it in the exciting to specify the index, which I can say, Let's in this case, it was No. Zero, that one, the three, the two I can I can go when started Italy, for example. So if I run it doing excess change, right, so as you can see that a frame constructed special easy, you can do a lot of stuff we did, but you have the data to seize the file, and you need to read the CSB. Filing for these purpose pandas provides your function, which is called the Read CSP. And if you go to the ritzy SV and specify the location off the sees me like this and you ran it, it's gonna read the whole CSB and put it in the data offering for you. So, as you can see there, 1460 rose and 81 columns so you can assign this that the frame toe variable And we have now the that the framing the valuable DF. Yeah, so, as you can see, because there are so many rows, Jupiter Lap is so clever that it is not drinking the whole that a frame into the output because it will take some time. And it will kill maybe some of your processors if their some of your CPI use. If you are running care like that, a frame with 1,000,000 ropes, which is quite common use case and the end with a lot of columns. So, in this case on Lee Soo, part of the frame has been treated for you. Quite. The filling is to bring the head of the data frame, or or tail that the frame. So it's always after you, so you can see when you put the head on, Lee. Five rows are going to be printed for you. So So you have a nice little over overview of the off the frame. Okay, So sorry. CSB is the function for you, which is which is the basic function for reading the CSB files. As you can see here, I I show you that the read see is we function has a lot of arguments. For example, besides specifying the file, pot or buffer which is the first argument with J, which is which is not like optional, you need to specify it. You can name it like this. Then you can specify the separator, which means what is the separator in this year's we filed two separate the values, so it's quite quite simple. So if you go like, see SV stands for comma separated values So So So basically how does this file look like? So if you go here and and, for example, you open theis file with the text editor to you will see that the fire that values are are separated by the comus, right? And the new lines and then the fightin knows internally that when it sauce the comma knows that this is the new value. And when it sells the new line No, it's a neuro, right? So But sometimes, for example, to export the CSB from the excel, there are different, like millimeters like tops and stuff like these. So here you can see if I if I change it toe the semi column, everything will be in one place because fighting restricted it like it's looking for the semicolon to separate the values. But because there are no cynical music treats everything as a one value. Okay, so you can override this parameter and said it. What everyone? The Heather which says if what? This, Uh, if what is the role? The role which is specifying the header. So, for example, you can skip even the headers, right? And what is the head? During these cases? These I d m super class time zone in great. And sometimes you have the seas refile, which is missing the head around. You want to put it there manually So you can say that you don't want to read the header in this case, Then the names off the columns, you can override it. For example, In this case, I'm overriding number. So if I will read it like this, the name of the off the head reveal beaches numbers. That's keep Rose if you want to skip some some first rose, for example, in its quite typical for Excel to save the CSB files with a bunch off additional data above the C S V, and then you can skip it with ski bro's number of rows you want to read. Sometimes you don't want to read the whole data frames. You can specify how many roles you want to read and then, for example, if and this is quite useful. If there is a daytime in your data and you can directly bars daytime and specify the the column indices, where these days this is being located and then you can it will be reading this as well. Some of you are now confused. Well, I don't understand most of these things, you said, but we will explain it later. Some of you found the answers, which they were looking inside this tutorial, So I'm I'm glad if I help you understanding a little bit how the pundits works. Besides reading this easily, files you can read also the excelled html Jason Barket SQL Table, which are different that the sources right? Some of you know, excel. Some of you know html adjacent market. It's a very sparse that a type which is highly compressed data format, which is quite used for big data. And it's connected with databases like Have you been stuff like this? SQL is connected with the relational that the basis in the new kinetically the table from the database, and that's very directly to the into the pandas. This is quite often used. OK, so as you can see, this is the same cold which I run. I'm reading the head off the file, but now book inside the frame. So as you can see, every doctor frame consists of three things. There is the columns part, and in this case there are the names like strings. But the name of the columns can be treated the same way as the dictionary keys, right, so so everything go detectable. So, for example, you can treat as a column name an integer value. You can treat a six column name a couple right? It's quite common to use the apple. I will show you later into tutorial to treat This s according and the same rule stands for the indices. So in these cases, we are using the number index starting from zero going up the poor. But you can use, for example, the strange index or the couple index, which is again quite common. And the third important thing in the data frame, besides the index and columns is the values and values are everything inside there and there can be almost everything Tonight I will show you and explain you later. What? Everything can be inside the volume, but right, so these are the most important parts off data frame. So when we go next, quite commonly use case is to read more people that are frames in tow, multiple CSB files or multiple source that our sources into one that offering and execute cities is quite a common case. And also its case of this case. Because you have to files, you have the training and the desi is really right. You have two files and for example, let's say you won't read it into one of the suffering. So you what you can do, you can specify area in this case. Sorry. List off parts to the files to the data sources, and then we gonna literate through this list. And, uh, Onda, we gonna read each off the data frame and because I'm sitting saying that there is no heather, and then then the index economies is known and I'm a pending get to the area. And then at the end, I'm calm, Captain Ating one that frame under another ones. If I read this, yeah, I didn't around the cold for for the library because I'm using to better lap. And then if you look at the shape the shape is is bigger, it's not these 1400 because I'm having the train part and the test part come caffeinated after one each other. So if you look at these Yeah, as you can see, the this time we have much more rows and everything is connected. Right? So this is quite a useful comment to use to have in your abandons repertoire of comments eso so you can basically use it mostly. Now we go next because the data frame is an object. It has a lot off very essential attributes which we're going to use your that, the signs wrote Right? So let's start with the basic attributes. One of the basic attributes is the Axis Attributes, which stands which will recto to elements in the area where the first is. The index in this case is arrange index starting from zero up to 14 60. In this case, it will be a 40 in 59 because in existed between red and then there is the column index where you have the values for the columns, you can return the shape of the data frame. And then another useful attribute is the the attribute, which stands for transpose and at the transport off. The data frame eyes from the mathematics and works the same way as when you transport the metrics transposed the metrics so all the rules will be treated as the columns and visa versa. So all the combs will be treated this rose. So in these cases you will have 81 rose and 14 60 columns because I'm still working in the original at the frame. In these examples, if you read the original that the frame here and go back to the transpose, you will see it Z's one. Exactly. So then the shape you can return to shape. Yeah, because some off the attributes are returning that frame. You can chain the calls, so it's you saw previously. The shape is just right, turning the shape off the frame. So how many columns and rows it So the transports shape and I'm supposed shape is 81 40 60 but the normal shape is 14 60 to 81 right? You can directly access the columns with the columns. You can dynamically access the index with index right. It's quite straightforward and intuitive. Comment. Intuitive attribute. Then you can directly access the data types off off the columns, and I will talk about that types like from Maurin deep. But as you can see, if you if you preet bring the that the frame hat, you can see that the idea is consistent off in futures. And that is the reason why the idea is in Teacher 64. So 64 bit in teacher Emma soup glass is again some integer values, so it's integer, and then and then you can see there is the Amazonian part, which has some strengths inside there, and when you are talking about the strings or are the award sentences in this case bond a streets? These s an object that the type And there is also a good example of the Lord frontage which is we just set float right? Then there is the attribute which says if the if the frame is empty, then with the one which is saying how many dimensions are there So we have only two dimensions will poke about multi dimensional that the friends like there then the whole size which sends for the number of elements are those times columns basically is the is the result value and then you can create a from the data frame the area off a raise. So as you can see, this is the area in inside areas for each of the column rose combination. So this is basically like you are creating a gnome bind metrics. If you don't know anything about Mumbai, you can think this about the two dimensional released off least in the in the bottom. Right. And these command is not important right now. Find these were the basic attributes off the frame I have. We recommend you to go through this cold and try on your own before you press the wrong command. Police, like, try to answer what will be the output. This is the right way. How you can learn new things to guess the output before Durant, you ran the comment. Then if you're right, you will be like satisfied that you understood this topic correctly. And if you will, like, wrongly answered the question, you will be like learning some new things. Okay, guys, these were the basics off dependents. That a frame. I hope I will see you in the next video. Thank you very much. 3. Selecting: 4. Indexing: 5. Multi index columns: 6. Updating: 7. Joining: 8. Describe: 9. Iterating: 10. Group by: 11. Strings, datetimes, sort: 12. Ploting: 13. Writing dataframe: 14. Project: Hello, guys. Make a whole school here. And this is another video of our bond. A studio Oreo. Thank you very much for watching all the previous reveals about the pandas. I hope you learn something new, and these video is dedicated to your project. A sparked us off each course. Also, these course has the project. And in these project, I will ask you guys to process the data. Set off your own choice. If you don't have any idea which that is situated process. Feel free to refer to the project description. In the skill share course where you can file, find a couple of examples, we jump providing for you. But I highly recommend to work with data like it makes much more fun. And you will. You will make a much bigger progress with the data set of your own choice. Feel free to work with anything. Basically, you don't need to limit yourself for the numerical data. Feel free to put their also the string data and data if you want. At the end off the project, police make some photos or pictures off your progress. And what did you accomplish with the Jupiter notebook? you can then pass these pictures and photos into the skill share project, part where I can check them and give you feedback. Would you can, like, make better, or or what's or Or? Or you can just share your progress with all the other students within the class. We are here for you. And if you will have any problems with solving or with analyzing these that that said, feel free to ride in the discussion where I'll be glad to answer all your questions Now what should you brought sets in the data set in your In your data set. Feel free to apply older knowledge, which you learn to discourse. So, for example, if your Data City split into multiple CIA's V Your Excel files, feel free to concoct innate them into one big data frame after dead, check out data types and try to feel the any and inane non values in the data set in case off the numerical values, feel free to use the mean values off the columns. In case off the object values. Feel free to use some. Some research key worked for the non values. You can also blood the box bloods for the numerical values which are quite useful in the data Sanctus world. So then at the end, you can just make photos off your Jupiter. No. Bogus, I said, and then posted into the project part off the skill share. Okay, guys, feel free to start with the project right now and send us your progress.