Excel to Python using Jupyter: Part 2 | Binjamin Barsch | Skillshare
Search

Playback Speed


1.0x


  • 0.5x
  • 0.75x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 1.75x
  • 2x

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

    • 1.

      Introduction

      1:48

    • 2.

      Project

      1:07

    • 3.

      Lesson 01 Getting Started

      3:51

    • 4.

      Lesson 02 Excel Like Operations in Python

      11:49

    • 5.

      Lesson 03 Advanced Operations and Statistics

      10:19

    • 6.

      Lesson 04 Plotting Examples

      7:03

    • 7.

      Lesson 05 Converting Data

      2:01

    • 8.

      Closing

      0:40

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

1

Student

--

Project

About This Class

Why Take The Class?

In this class I will be showing you more common Excel operations like sorting, filtering, and adding calculated columns and how to do it in Python. We will show how to take Excel functions like like =AVERAGE, =MAX, =MIN, and even VLOOKUP and do the same in Python.

 

What You Will Learn:

- Excel-Like Operations in Python

- Advanced Operations and Statistics

- Plotting Examples

- Converting Data

 

What you will need:

-   Any browser with the internet to practice coding Python online using Jupyter lab:

https://jupyterlite.github.io/demo/lab/index.html

Lastly, I have over 15+ years experience in computer science and data science with a focus in Python. I work as a lead software engineer and I have also tutored, mentored, and coached hundreds of students and developers in programming.

Happy Coding!

Meet Your Teacher

Teacher Profile Image

Binjamin Barsch

Full Stack Python Developer | Software E

Teacher
Level: Beginner

Class Ratings

Expectations Met?
    Exceeded!
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction: Good day and welcome to this introduction or Excel to Python using Jupiter Part two. This is a skill shared course. In this introduction, this course will continue with introducing you to the Python program language. We will assume you are new to Python, but familiar with Excel and spreadsheets. We'll be using Jupiter Lab as a Python coding environment. Just note that this is part two of the course and focus on more Excel to Python examples. We will show you Excel operations like sorting, filtering and adding calculated columns. We use pandas and show you how to group data, find top end values and visualize data with plots. We will also cover advanced topics like how to do V lookup, index matching, and Python. Lastly, we will cover cleaning data over tables, and performing statistical operations. We will use the same Excel data to set the demonstrated examples we used before. This an example of the data, four columns, ID, carme, model and price. This is a section of the examples I'll cover. For example, we commonly work with average, Max and Min in Excel, and we will show you how to do the same examples in Python as well. V Lou and index matching is quite popular and useful in Python, as well, just like you can use it in Excel. So we will show you how to do that quite easily in Python. Great. Let's get started. 2. Project: Now, a first hand approach to understand what we'll cover is actually a project. So here you'll see what you'll be covering in this part two of a course. So again, download the Dam data dot CSV file from the resources section of this course, and you need to process the data for this project and plot the data as well. So this is data you'll be working with city buses, population in millions. So you're going to have to extract cities with population above 21.5 million and save it into a CSV file. Calculate the average price of all the cities and the histogram of the cities. Again, these are the things you'll learn in the course. So further things you have to do cred a bar plot of the city with cities with population above 21.5 million, add labels for the X and Y axis for your plot, add a title, and finally save your plot and upload it as well as a CSV file. Great. Now you know what you work with and what you'll be learning in this course. Great. Let's get started. 3. Lesson 01 Getting Started: Great, let's get started with Lesson one. Let's get familiar with the Python coding environment you will use, namely Jupiter Lab. You go to this link, Jupiter lab dot Github IO slash Demo slashlablash index dot MO, and you'll see a browser popping up that gives you the option to click on my Notebook option. So here we're going to be using Python Podide Notebook option. Again, this is what you'll see when you go to the link and you see the Notebook option and we'll select Python Piodide. So this is what you done before. But now in this Part two, we will be using card data two dot CSV five found in the resources section of the course. You just download that and upload that to your Duple lab environment like we've shown you in Part one of the course. So click on upload button on left side of a toolbar to do that. So again, what you need to do when you want to read in any data, especially CC file, first, I would say import Pandas as PD. And then we're going to store the data in DF variable. This actually stands for data frame. This is how Panda stores data. So we can store it in DF and you equate that to how we're going to do that. So how we're going to read in data, we're going to use PD dot read CSV. This will us read in the CSV and we'll give it a name, card data two dot CSV. Then if you print out DF, you'll get your dataset that was in the formula CSV file and you can see the data being represented here. Now, not the difference in layout when not using print. If you just say DF and execute that, we get a very nice table layout in JupraLab which is a bit easier to read and see as well. We'll be using this in the course. Now a very useful property for data frames by accessing any properties, you can say dot and various properties. We'll cover barriers. But the first one we should always know about is the head method or the head function, which shows you the first five rows of your dataset or your data frame. Now, we can specify different number, for example, ten or one or 11 or however many you want, but it is a very useful property or method to use and access with data frames. Now a very useful one ASOR is dot info for your data frame. D info gives you information about the columns and rows. The most important thing when you're doing this is how did Pandas read in your dataset? We can see we have various columns and we can see they're no now, there are values represented, and the D type is actually most important one to look at. Because this tells you how Pandas read in the CSV file and how it's currently stored. Currently, the ID column stores as integer. Car mac is stored as object, Pandas represents strings and texts as object, which is what you expect. Model year is represented as int. Any datet fields we don't actually want as integer, we actually want as a date time field, which you'll get to because that allows us to filter and modify the data how we want. We'll change this but later on. Price column is represented as a float, which is what we expect as well. So this was a recap just to get it started to where we left off from Part one. The only difference is we're using DF, not to store the data, not the data variable, and we're using car data two a CSV file. Great. Let's move on to the next lesson. 4. Lesson 02 Excel Like Operations in Python: Great. Listen to El operations in Python. So we've read in the data before. In the previous lesson, now we're going to do it again. Repeating that step, input pandas PD, DF equals P dot read CSV and the filename card data to dot CSV. Just remember the underscores are quite important as well. The first thing we're going to do is sort by column. Now, this is quite common to do in Excel and we can easily do this in Python as well, using Pandas. How do we do that? Well, again, we reference our data frame, DF. All our data is stored in DF and we use the dot to get certain parameters or methods that we can actually change the data and modify it. So there's a built in one just like we use head and info. There's one called sort values as well. So what you're going to do is sort values. And there are certain parameters we can select for that built in method. So we're not just sorting values, we want to sort values by price. And then how do we want to? Another property is in terms of how should it be sorted, while we want to make it ascending false? Why do we That's order that we want to sort it by. So let's see once we run that execute it, we get the sorted by price, and we see the highest the largest prices on top, and it goes in this order. So I have just shown you the first five. You can obviously change it, you know, make it ten as well. Then this was just not run before, let me run that over here. Now let's do it over here. There we go. That's a good practice of how you can see if things go wrong, what to do. The previous sort of DF actually wasn't run. That's why it didn't work. It's a good practice too. When you see a problem, try to solve it and many times when it happens is you haven't run the previous cell, you haven't executed the previous cell. Those are useful to see how that can be fixed. So next we can filter rose. So for example, we want to filter rose based on car prices larger than $20,000. So we can do that quite easily in Python. Now, we're not necessarily modifying the data frame. We are filtering it. So how we filter it is we have to create a new data frame or create a create a subset of data frame that checks if the values are bigger and smaller than 20,000. The first one we actually have to look at is DF price. If we just look at the inner bracket, what's inside the bracket. What's inside the bracket is we're just checking is the current value in the price column bigger than 2000. If it is, it'll give you a Boolean value. Boolean is either true or false. It's going to actually give you a new data frame. Now, because we are actually only accessing one column data price, this is actually called a series, which is a single table with a single column. But this gives you values of true or false. And what we're doing there is actually we're taking the true and false values, and we're now applying that to our outer data frame. By doing that, we'll filter. We only want to see the rows that are true. That is by default what will happen. The false ones will not be shown. And if you run it, this is what you'll see. You'll see your output being execute folded DF, you'll see only the cars carmX with value bigger than 20,000. This is how you filter rows, and you can filter data in various ways. And this is one way to do that. Now to do some summary statistics. So functions in Excel like average and Max and Min are quite common to use, and we can do that quite easily with pandas. So if you just look at, if you want to access, for example, only the column core price in the data frame. We can access just like you've done head and info, um, now we can also work out the mean just for that column. We can work out just the max for the column using dot max. We can work out the minimum value for that column. These are all the different various properties or specifically methods, we have access to with pandas by default. I'm just printing it out here and this is the results that we're getting. So adding calculated columns. So we can add a new column with formulas. For example, the price of the 10% discount, for example. So how to do that? Well, first, again, let's access the one column of price. All we do we add a 10% discount with times by 0.9, and we save that into a discounted price. We save it into a discounted price. Let's run that and you can see that if you first show the first five, this is a discounted price in the new column over here. So again, this can modify and add it to your data frame. So we can also use pivot tables. Pivot tables are quite common and feature in Excel, and we can do that as well with Python. Now, I do that using again, axis the data frame, and all we do is we use a group by method and we specify we want to group by karmic. And how do we want to group it? Well, we want to average of the prices. So let's see what we get. So there were a few groups. There was Honda and Kyoto were grouped. The others were unique, but they were grouped in the category. Worked out based on the average of the Honda and Kyota I make. I think Kia also there were more than one options for Kia. Now, you can also find the top end values. In Exil, you can use conditional formatting or filters to highlight the top end values. Python is pretty straightforward. Again, you have this method called dot enlargest and you can specify, okay, give me the top three largest values based on the price. And when you run that executed, you get the top three quite quick and easy to do all built independence. Change in case, you might want to use upper and lower functions in Excel, you have that functionality. In Pandas as well. Again, you're going to access maybe the karmic because that's strings. That's text. So we don't just say upper here because now we accessing in different properties of that. We have to access the string properties. That's how we say dot string. And part of Dot string by default, we have a dot upper method, and that just creates all the letters in uppercase. There are various other properties of string, and this is just one of them. Great. This is what you see. The car makes now we put in uppercase. You can see, as I mentioned before, two Kyotos 202 kyo. That's the previous example was just showing that how you group them together, find an average. Now you can also add a column based on condition. This is you do quite often in El using if statement, the condition if true, if false. Now you can do this quite easily using Python, using a Lambda function. I can start here say if X is 15,000, then the result is yes, else, it's no. This is a format that you can write using a single line function. There are other ways to do it, but this is just one way. The nice thing with pans is we can apply this functionality throughout Again, we act as a column of price and we use a dot applying method, what we're applying, applying this function. We're checking if X bigger than 15,000, place yes, else place no, and we'll store that in a new column cord is it expensive. There you can see, if you run that, you see yes, no, yes, no, no, four. An cars above 15,000 would get a yes or no, no, or get a yes. So this order I spoke before about the model year, converting strings to dates. Exil you can use date value or format cells as dates. Python, you can easily do that with to daytime. In this case, what you're going to do is we're going to convert model year to a datetime format. I won't be string anymore, but a datetime format, and we're going to keep it as a year. We're going to save that and almost overwrite it into the existing column model year. When you get that, you see uh, the model has been, you know, changed. We can get rank as well. Rank is bolt in function in Excel. Again, for the price column, you can use rank and ascending a false and just showing the first five. You can use rank as well. Rolling average. This is the rolling average and Exil function you can use, you can use similar approach. Now there's bolting meta quad rolling. You can choose your Windows three or how wide value you want and Wout dot mean as well. Then you'll get the rolling average for the first five values as you can see, as to be expected. Great. Well done for getting this far. Hopefully you can see the benefits of Python and all the different functionalities you can easily do in Python from Exl. Great. Let us move on to the next lesson. 5. Lesson 03 Advanced Operations and Statistics: Let us move on to Lesson three, advance operations and statistics. So we've covered info examples in Python and how you can do what you can do in Excel and Python. Now we're going to use filters to find specific rows. Now, as we've done before, we can read in the CSV file using this line. And now we want to get specific rows. And like we've shown you before, if we want something specific, we want to create a table of true and false um, values, a table of that that we're going to use to filter out our main data frame. So we have, one in the middle mini one inside the inner brackets, then we have the ashore data frame we want to filter. So first, let's look at this one. All right. So we want to find all the rows that contain the comic Toyota. So to do that, we're going to first reference the column mic. And then we need to access a method that has string properties, so we say dot string. Then part of dot string is it has another method called contains. We've done before upper and lower and string dot string method has a lot of other methods or sub methods it can use, contains is one of them. So it checks. Does the string contain the word Toyota? And this will give us a single table of true and false. All the rows will give us does this row content iota or not? True or false? And then we apply this to our main data frame, DF over here on the square brackets. The result is a specific rows. And if you run it, we'll see our new data frame that just has two rows with a karmic Toyota and Tota. This is a quick and easy way just to filter data only what you want to see. Very useful and handy. Find and Replace is a common tool in Exl, in Python, you can also do it very easily. So now we are modifying the column specifically, we just use DF Comic first reference it, and then we use a built in method called replace and replacing Teota with corolla. When you run that, we can see just first print out the first five, we can see the Choda has been replaced with corolla. Very quick and easy way to do that. To remove duplicate rows. You can duplicates in various ways in El with pandas. If a rose is an exact duplicate of itself, can use dft drop duplicates and will remove that row completely. An example if you just want to print it out, although there are no duplicates in this example, but if you wanted to, you can use this. If you want to fall in missing data, which is sometimes the case. We can use full NA method and apply that to our data frame. And what do you want to fill it with? We can fill it with zero, one, or whatever you want. But something something that is maybe a bit more applicable is maybe let's fill it if we do find an empty cell, empty empty field. Let's replace it with the average other price column. But nice way to fill an empty value. Again, this example doesn't have that, but you can always use in your own if you need to. To maybe manu delete columns of rows, you can use the drop method. If you want to, for example, drop the price column. You can just by using DF, applying this to your whole data frame, drop and then specify the column name. We're just using the price column run that you see is no more price column once you execute this command. Be Lou and index matching, obviously very popular and useful in Excel and we can do the same in Python. Again, in this case, we are using two inner and outer data frame. The inner one is going to give us a table, a single column table or series of true and false values, and it's just going to check is ID equal to car ID in this case, we just made it five. It's going to check, is it equal to ID? If not, it's going to be false. We're going to get a list of single column of true and false. We apply that to the other data frame and we get car details. Then as you can see, there's only one ID five, and this is the row that we'll get out. Another useful thing to maybe use with V Lou is to combine data from two sheets. Now I've added additional data as well, using the merge functionality. What I did here was, I created a dictionary of two lists, one ID, so I can match it to the existing one. And what I'm actually adding is mileage. I'm just adding it to the first five rows. I create a new data frame and I can just add this dictionary here as input and to create a new dataframe, I say PD dataframe add the dictionary and I have my new dataframe I can use. Now what I'm going to do is I'm going to merge it pd dot merge, my original data frame with my new one DF two, and I want to merge it on ID and how it's going to be left? It's a left drawing, which means my original data frame DF that has all the data already. That's going to be my main data and it will stay there and only anything that matches it on ID will be added. Printed out, you'll see how mileage has been added to the first five. Create Pivot tables for summary statistics, something we covered a bit earlier as of, but this is a different way. In Python, by default, data frame has pivotables as a method and has certain parameters that you can use in terms of values. We can reference the price column index. It's going to be a model year and Ag funk is going to be me. That's what we're going to use to specify. This is how you print it out once you execute this row data. Then transposing data, again, something to flip rows and columns. Quite quick and easy to do this just DFT. And if you run that, you'll get your data frame that's represented here as flip. We did this before manually using average min and max. You can also use Pandas built in describe method, which actually gives you this by default, average deviation and various other things. If you say DF dot describe and you run that, you'll see you get statistics on data. Count to mean, sanar deviation, min, different quantiles and max for all the data. In this context, price is the only column that's relevant, but it's quite handy just to get a feel of your data. Aggregations. If you want to use custom formulas and pivot tables, you can. For example, if you want to groups specific column, group buying karmic and relative to price, can define the aggregation mean max and min and you can group that as well you use an aggregation method and you only get these functions displayed for you in your custom statistics and you can add various other functionality. But we just using mean Max and M printed out to get custom statistics for the various fields. If you want to, for example, maybe add a percentage change, you can use formulas like B two minus B one, a B one. You can do that quite easily with price change as well. Column if you add that. You can use built in method for price change, added to the column called price. If you get that, you'll get this value over here or you get the table over here, which is what you expect for the price change for the various values. You can also work out cumulative sum, sum range or cumulative formulas. So using a CM sum, you can apply it to the price column and you'll get cumulative price. And you print it out, this is what you'll get with all your data. Correlation as well, quite important and useful in Excel. Correlate range one and range two. Again, based on the model year, correlated to the price, and you want to find the correlation between model and price, and you can see a correlation of 0.68, which means the R quite correlated quite closely. Well done, this is some advanced operations and stats getting this far. Let us move on to the next lesson. 6. Lesson 04 Plotting Examples: Great. Let us now go to Lesson four, which is plotting examples. Here's a basic line plot that we're going to cover and Excel uses line trot for certain trends. Now, in Python, as we've shown you before, we use a plotting library called MatplotLib Pilot is PLT, and we're going to use that for plotting. Now, let's first read in our CSV file over here like we've done before. For line plots, specifically, this one we only want to use a first ten rows. I filtered it by just saying Df dot head ten and stored ESMA DF two. Now to turn off interactive mode, this is just for Jupraab. You can do this. And by doing plt dot Iof then how do we specify the line plot? Well we just say df dot plot. There's a method. Specify the X values, Y values, and what plot is it? We want a line plot and add a title, price trend by model year add a Y label, price. When we run that code, we get the following. Now to actually plot that, we have to say plt dot I on and plt dot show. This will give you plot of your data model year versus price. Now we're going to do a barhot. Now again, start off by switching of interactive mode. Again, in your own example, you don't really need to do this, but for my presentation, I will, we're going to do a bar chart. So great bar charts. It's useful to compare categories and categorical data. Now, to do that, in this case, we're going to use DF and say plot, again, comic wise price kind is bar like a bar plot and the title. Once you run this, you'll get a bar plot of the various data that we have subcomk versus price. And then to do a pie chart, and this is obviously useful to visualize proportions using Pie charts. We first have to group, we have to group by karmic, and then relative to price and cumulative sum, we're going to plot that all in one line. The kind pie chart or the pie chart. This is just a certain parameter four plotting pie charts. You don't have to worry about that, but by default 1.1 will give us the um, the value you'll see in the pie chart and the title the distribution by Karmic. Once you run that, you'll see this is the distribution and this was the percentages of that value that you saw over here. The auto PCT, this is the 1.1 F. This is how the percentages will be distributed. This is the values at C. This is a typical high distribution of a mic. So again, we're going to switch of interactive mode and do histogram. Again, for an Excel, we use histogram for frequency distribution, and again, we're using DF. Just remember for the line plot, we use DF two, but for the rest of plots, I'm using DF just to show the more data that you're working with. So to show histogram plot is kind just hist, a number of bins we can choose as five titles spice distribution, color, sky blue, edge color is black. Then we can add a X label as well for that and we get to run that plot that on show, then this is distri price distribution for the plots for the data. To do scatterplot, also quite useful to show relationships between variables, DFA plot, X axis, model year or Y axis price or the Y data or Xata kind is scatter title, price versus model, colors red and add a label as well. And when you run that, we're going to get by using these commands you get the scatter plot of a model year versus price. So we can also do a box plot as well to visualize data spread using box and whisker charts. So here we're going to use df dot box plot. This is a unique type of plot. Column is going to be price by mic. We're not going to use grid or vertical properties, just make it false. Add a title, subtitle, label. When you get that, you get your box plot and various distributions for Tyota Lexis, Kia, and Honda. These were options that had multiple rows. And the price distribution. Then we're going to do a multi line plot. Multiline plot, plot multiple series in one chart. Again, we actually using DF two. So the line plots we're using DF two and all plots of DF because the plots will be not proportionate. So we're using DF two and I'm just making a copy of that, storing it multi DF. What I want to work out is work out discounted price as well. I have that in the column and then I'm going to say the model price versus a discounted price. I can see the comparison between two different line plots. The purpose of this to show two different line plots, just with a 10% discount, as we've done earlier, if we do that, we ri out, we can see we get two different plots and you can see they match quite closely, obviously one just being 10% different price versus the discounted price. Bar chart also something quite useful to use to show contribution to total using stacked bar charts. You can group by, again, cumulative sum kind bar and stacked ba chart of prices. So you can also use. And there's the type of plot that you'll see. Great. This is just, again, a bit more in depth introduction to all the various types of plots you can use in Python and something you might use in Excel as well and how to jump from one to the other. Great. Hopefully, that was useful. Let's move on to the next lesson. 7. Lesson 05 Converting Data: All right. Listen five, this is where we're going to focus on converting data. In this section, we're going to look at how to convert data in different formats. Now if you input the following, we're going to have to convert data frame to CSV file. We have been working with data, we modified it and change it. In Exil, we usually save data to CSV format for easy sharing. We can also do that in Python. So if we modified and filtered and sorted and change our data frame however we want, now we can actually convert that and save it as a new CSV file, sat two CSV. And again, we can save it as cars that data CSV and leave the index as false. And then you should when you run that, you'll get this file saved in your upla lab as well. If you want to convert dataframe to JSON format as well. Now again, JSN is for web or database applications, you can easily do that with the two JSN method that's out of all the data frame. You can just maybe call it cars data or JSON or by records and lines equals true parameter. These might change based on what you output. To convert a data frame to HML, if you need to convert it for web display or reports, you can easily just use dot two HML and save it as cars data two HML and discrete index is false. While doing on converting data, you can also convert data frames to Excel as well, SQL Parke, feather formats. There's a lot of ways to convert the basin based on what you need. Great. This actually completed the course, hopefully you will learn something and now for you to do your exercise, specifically working on the project. 8. Closing: Now in closing, we're just going to summarize what you're focused on. In this course, we focus on building our Python skills from Excel. We showed you the Jupiter lab and environment, how to write Python code. We also showed you some more advanced features of Python for Excel. Win work with a small dataset, we showed you how to filter, sort, aggregate, group, and workout statistics. Showed you how to do various plots and convert data frames in different formats. There's much more to Python than we plan to cover. Stay tuned for next course and happy coding. Thank you.