Practical data analysis for coders (part 1): from .csv file to useful insights | Ahmad Baracat | Skillshare

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Practical data analysis for coders (part 1): from .csv file to useful insights

teacher avatar Ahmad Baracat, Facebook / ex-Amazon Alexa

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

17 Lessons (33m)
    • 1. Introduction

      0:47
    • 2. Let's talk about the project

      1:14
    • 3. Install Conda

      2:28
    • 4. Verify Conda installation

      1:46
    • 5. Create Conda environment

      1:33
    • 6. Install Jupyter

      0:52
    • 7. Install Pandas

      0:55
    • 8. Create a Jupyter notebook

      4:55
    • 9. Highlevel view of the data

      3:27
    • 10. DataFrame column selection

      1:35
    • 11. Find number of paid apps

      1:21
    • 12. Convert release date to DateTime

      3:59
    • 13. Install Matplotlib

      1:43
    • 14. Plot histogram of apps release year

      1:30
    • 15. Estimate apps minimum profit

      3:07
    • 16. Summary

      0:31
    • 17. What's next?

      1:03
  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

18

Students

--

Projects

About This Class

The goal of this class is to provide you with a step by step guide on how to start doing data science and data analysis of real-life raw .csv files.

Prerequisites: basic programming skills preferably in Python

Concretely, you will learn:

  • best practices to properly setup Python for your project using Conda environments
  • how to install packages (Jupyter notebook & Pandas)
  • how to analyze a .csv file using Pandas DataFrames to extract useful insights for your business

Meet Your Teacher

Teacher Profile Image

Ahmad Baracat

Facebook / ex-Amazon Alexa

Teacher

I am currently a Software Engineer at Facebook. I used to work at Amazon Alexa solving computer vision problems using deep learning. In 2017, I was awarded a silver medal in the Understanding the Amazon from Space Kaggle Competition. Currently, I am building my AgriTech startup Priceless AI.

Few years back, I created 10+ apps/games for Windows Phone, Android & iOS with 300K+ customers and featured by Microsoft in 150+ countries. You can have a look at my games and apps on my personal website barac.at.

See full profile

Class Ratings

Expectations Met?
    Exceeded!
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction: I, um Ahmed, former software engineer at Amazon, Alexa here in the UK Um, so in this course, I would like to walk you through an end to end process from setting up from correctly setting up your price and environment. Still, loading data and extracting some useful insights from the data on assumptions are that you have some experience programming, maybe invites him, and you would follow along with a project with a real life project that I'm currently working on toe analyzed data from the play store to do some like to extract some useful market research insights on A I hope you enjoy it. 2. Let's talk about the project: So the project that we're gonna use for this class is is an actual project I'm currently working on. And the aim of this project is to do data driven market research on the play store, meaning that for a given query on the play store, I want to answer three questions. Um, how many off the top APS for this query are paid? Um, and then how much money are they making And when Where they released. So these are the three questions I want us. And this is actual data for a query for N V. A. Yes, we have the title of the APP description. Number of installs, the rating on the play store. You also have the free call him. So whether or not cost free, um, what was the price somewhere here? Yep. And when it was released. So this is the data that we're gonna be working on and to answer these three questions 3. Install Conda: So let's start with The first thing that you need to do in a bison project is to set up your visual environment. Um, so a virtual environment in port bison bison virtual environment is just, uh, a separation between your OS bison installation and, um, the project that you're working on. So, um, your operating system that will s comes with bison preinstalled, at least for Mac. Um And, um, you generally want to separate your project, uh, on your projects tohave different installation toe. Have different packages that don't affect your us. And in general, it's good practice to have each project to have its own bison environment with packages with with specific versions that don't affect other projects that you're working on. So the first thing like the easiest way to have ah, virtual environment up and running is to go to conduct, uh, and install like the condom packet shoe. It says here condoms and open source, cross platform language, agnostic, package manager and environment management system. So and four person. So you would go just toe installation and, um, regular explanation. So for me, it would be Michael s, and I'm just going to go to many Kanda on, I will install the bison three point, uh, seven version, um, package and follow the instructions. 4. Verify Conda installation: So one problem I ran through now is on the last step. It says you need to test your installation, and that's by running on the list on your story in it. So if I run from the list on my terminal, I get command north phone. Andi, It turns out if you scroll down, uh, for Michael s catalina, which is what I'm running now, you need to run Thies to commence in your in the directory where condo was installed. So in order to figure this out, you you search for conduct or many Kanda and there you go. And this is the bin, and inside it there is the activate script Or is it the active, active it script, which is the one we need to source and then around this. So how do we do that? Like the easiest way to just, um, create the terminal here, And then I just source activate because we are already in the in the directory and then we do Kanda in it show. Now, if we go back here and right on the list, we still want to get anything because it way to refresh the window of the terminal. So we create a new tab on the list and there we go 5. Create Conda environment: So now that we have installed Kanda and verify that it's working that it is installed correctly, we need to create the environment to create one environment and activated. So, um, in the getting started on managing environment section in the documentation of condo, you would find the comments to create an environment. Um, we can just say create name, uh, play store data analysis on, uh, this is and then it will just Yes. Cool. So we didn't need to specify that, Uh, that you need a specific version, a specific package installed while you are creating this environment. We'll do that manually. So now, uh, see, here we had we have a base Kanda Bays environment. Now, we can just activate this environment or data. This is And now we the terminal would tell you that you are now in the in the place toward it. Analysis, involvement 6. Install Jupyter: So now that we have activated the environment, we need to install the packages that we will need inside this environment. So the first package would need is Jupiter, uh, notebook. And this is a package that allows us to write interactive, like to write. Could inter actively and do visualizations among other, uh, stuff. So, um, let me go here to install Jupiter. Um, and let's install either to the Jupiter Lab or the Classic Notebook, so I'll go with the classic book. 7. Install Pandas: So now that Jupiter is installed, we need another package. It's called pandas on, uh So this is the key thing about thunders. It offers data, structures and operations for manipulating numerical tables on time. Siri's So it just a convenient way of manipulating and loading file says we files and other core stuff that we are discuss. Um, so in order to install it, weaken just right, conduct start funders and let it run. 8. Create a Jupyter notebook: So now that we have and start pandas and Jupiter notebook, let's, um, fire or creating you notebook. Um, so let me first create a directory, an empty directory to host our project, and then in order to launch a Jupiter, you're just trying to get her notebook. And this would create the server. Local servers, Um, which has nothing because it's created inside the directory of the anti director that I just created. So, in order to create our first notebook, um, we need to create to say new, and you probably won't have all of these environments. Ah, and you won't see this notebook here, Uh, this environment play store data analysis. Um, and the reason for this is that we need to tell Jupiter used the environment that the virtual environment that we're currently in. So in order to do that, let me first kill the server by pressing control. C control C control. See, twice near this. Um, so if I just, um, right Jupiter no book inside a virtual environment because I don't remember these comments and are so I've already clicked on this link before, so I know it is the one. So this is the command we're looking for. Um, so essentially, just installing or telling Vice and the Jupiter notebook but the Jupiter installation to use the virtual environment we're in eso I'm copying it. And the only thing that we need to change is the name and for convenience, I'll call it. Actually, let's call it a nice to for you to see the difference and, um and yes, so it doesn't need to be the same name as a virtual environment, but for, um, I like to make sure that things are clear or clean. It's better to name it having the same name, and then we're just launch a Jupiter notebook. And now hopefully we'll have this data analysis to which did exist before. So let's create our first noble let's give it the name analysis. And so what is a Jupiter notebook? So it has, or what does it consist? Consist of cells which can have any arbitrary number of lines, lines of code or comments. So, for example, we can write ex sequence too, and then x, so it will output this variable because it's already there. We say why per se Okay, why is not defined So it's a place an environment, um, but organized a bit differently. And you can run cells by saying Run here or you can press, shift, enter to Iran and go to the next cell or control enter to Iran and stay on the same self. So these are just a few shortcuts. The other two important shortcut that I use frequently are A to create a cell on top of the cell and, uh, be to create one after it. So A and B and control entrance shift enter are the most used comments shortcuts I use and Dede to delete cells suppressing twice. 9. Highlevel view of the data: So now that we have created our first notebook, you'll see here it has, um like the ICANN is in green and that's because it's running. You can shut it down and you can If you refresh here again, it will re create that respondent and turns green again. So let's upload our first, um, our first, our only CSB Fine. So it's in desktop nb Epps. Uploading just means that we're we're moving it to this directory that we we are inside, so you can actually do it from the finder. You can do it from the terminal if you're just copying files around. So But I I'm just showing you this for convenience. Um, so now that we have this envy APS C s b five, let's start analyzing it. So the first I think, um, I like to do is to include funders import ponders that we installed and load this CSB file as a data frame. They did a frame it trust table, uh, or you can think of it as it so to do that. And I'm just pressing tab toe to complete. I hope I'm not over what? Overloading you with information. So so funders, uh, don't treat CSB, and then we can load this. Yes, be five, because it sits in the same directory. Um, what I like to do is first look at the content. So the FDA head, which were you the 1st 5 rows? Um, so it has a title for the up the description from the play store, number of installs and all other meta data that we may use for our analysis if we pass the we confess that the number of rows that we want to see So just one and then I like to do the airport shape to see the number of rows and the number of columns. So you see these mattress these match. And, um, then I like to look at the columns, so these are on the columns that we have. Um we can also look at the types. I think it wants types off these columns. So this is, uh, Bond has already inferred the the types off the columns 10. DataFrame column selection: So now that we have an overview off that over the data like a very high level overview has to get rid off this. Let's filter down to the columns that we care about. So, um, we can say columns, toe filter, and let's select the columns that we care about we can about the title. For now. We also care about the price, whether it's free or not, and the release date. And so let's create this subset off this data frame. So, um, it's upset where we only, um, get the condoms that we care about. So this would be, uh, filtered IHS data frame. And we say we only need this column. So this is a syntax. You just bus. The columns are a off the condoms that your you want to select or extract, and so this would have only these condoms. 11. Find number of paid apps: So now that we have reduced the data frame toe a more manageable state with only the columns that we care about Let's, um, answer one off the questions. So how many APS are they? So we know that the whole column that the whole data frame has 250. So how many of these are paid? So we can say you have bid is filtered one. And but this is just syntax explain. So what we're doing here is we're saying, take the filter data frame and only get me the ones with, um, with three his force. So this is not three. And let's print, um, there dot shape. So we only have six. So I'll took 250 for this specific. Really, we only have six paid ups. 12. Convert release date to DateTime: so we can actually do, um we can clean this up a bit, So, um, we can say number or no paid apps. It's just the length off this data frame. Um, number it pops. We're just constructing a string number eight absence. Six. So this is just to clean things up so great. So let's now. So we answered one question. Um, let's now move to the next one. Which is, um, what is a distribution or when Where these APs released. So because this is just six rows, we can say they don't have, and we can just look at it. But if you have more Ross, how would you do it? Visually. So So this column, uh, is a ihsaa date. Um, but it actually happens to be an object, because funders, when loading this little frame, this CSP Sorry, I wasn't able to figure out that this column is a daytime called poison, so we can't actually play around with it. So the first thing that we need to do is to convert this call him to daytime called So Dundas Comfort. Call them the time and there is a function here. So, funders, look to date time, and then you give it the argument, which is the object that you want to convert. Um, so in our case, this would be to date time. Um, you have been, and so we don't want to pass the whole data cream. We just want to pass this course so it will return date time versions off the strings so we can actually assign this column to a new column. So let's say, uh, released. That's fine. We can't ignore this. And now we have, um, that and d type is actually that there is. It is a date time. So this means that we now can start manipulating, manipulating this column and visualizing it. 13. Install Matplotlib: So now that we have created this, call him with the time we can't start visualizing it. So, um, one way to do it as to say, um yes, call history grammar on this, Colin, and you'll see here, um, let's first actually go to the documentation of this option. So it creates. Ah, hissed a gram instagram of the distribution of data. The function calls might look not live on each Siri's okay, And it generates something like that, which is what we We would be very helpful in individualizing later. Um, so we need to install the plot lip in the environment. So let me go here and, uh, activate, I think placed. Or that and this this Congo and started that. So let's installed that. 14. Plot histogram of apps release year: So now that Matt Gottlieb is installed, let's actually let's first extract the release date the released year. So we create a new column and then, um, for the release date time, we just apply function. Um, I love the function that so this essentially says for each row for each roped off that column. So for each value, uh, extract you And because this is a deep time that this is available. Um, so we do that and this means that you can look at this. We extracted the year out of the date time object, and we can clotted. And so this means that one app waas released 2011. 127 12 15 18 1912 things like that. So that's great. So we know answered our second question. 15. Estimate apps minimum profit: So now we're left with the third and last question, Which is how how much money are these APS making? So we look out the data frame. Um, we currently have the price. What? We're missing the number of downloads. So we need to go up here and select one off these two columns. So let's let's see, what are their types? We need something that is an end, which is minimum is men stalls. So the minimum number of installs. So I'll just add it here and on. Do kernels restart and run all which would restart this notebook and run on, go over all cells, um, and run them. So restore turned around all set. So when it is running, you'll see that the notebook here at the top, right corner the dot was showing that was great out. Which means that the notebook is is running something, something like some said. So here you will see here that now we have it. Minimum starts. So what we can do is let's create a new column, call it profit or minimum profit, And this would be, uh, come in and stars times, um, the price and because I know that, uh, play store, like Google takes 30% of the revenue, so they are left with points of and so we can look at it. Yeah, something is happening. So we have the minimum profit here. So, for example, for this, and be a gym, they made, uh, 1,000,700 dollars. So So yeah, and we can actually also, um I thought it. So you see the distribution off the off the So what? This is this is it for the last question. 16. Summary: So in summary, we learned how to properly set up our workspace, or bison were space using virtual environments. And this is important toe like it's important to make sure that each product this have contained with its own packages and there respective versions on not affect the US We learned how to stop packages like pandas and Jupiter Notebook. We learned how to load CSB files and how to manipulate them, um, and how to extract useful insights. 17. What's next?: So what's next? I would suggest you pick up a CSP file, uh, or a data set you care about and start using the same techniques. And to set that, we used, uh, toe to analyze it and let this project and your curiosity got used. Words. What you need to learn in terms off data science in general. Or, um uh, like bison, pandas and Jupiter, please, if possible, leave a review on sculpture. And, um, I would like to know if if this class was useful, how was it useful? How can I improve? And more importantly, what other topics you care about? Um, in terms off the field off data science, machine learning and maybe practical software engineering, uh, tips n tricks. So, yeah, hope you enjoyed it and stay safe.