Introduction to Data Science for Complete Beginners | Fahad Reda | Skillshare

Introduction to Data Science for Complete Beginners

Fahad Reda, Data Science & MIS Mentor

Introduction to Data Science for Complete Beginners

Fahad Reda, Data Science & MIS Mentor

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
14 Lessons (1h 56m)
    • 1. Course Outline

      2:14
    • 2. What is Data Science?

      5:20
    • 3. AI vs ML vs DL vs DS

      16:08
    • 4. Supervised-Unsupervised Learning

      8:49
    • 5. Supervised Learning

      6:56
    • 6. Examples Of Supervised Learning

      7:23
    • 7. Unsupervised Learning

      8:14
    • 8. Examples Of Unsupervised Learning

      9:34
    • 9. DS Apps and ML Terminologies

      7:15
    • 10. ML Terminologies in Details

      12:01
    • 11. Data Science Workflow

      7:23
    • 12. Data Types

      1:57
    • 13. DS vs ML

      11:48
    • 14. Skills Needed to be a Data Scientist

      10:40
  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

3

Students

--

Projects

About This Class

Data science and machine learning is one of the hottest fields in the market and has a bright future

In the past ten years, many courses have appeared that explains the field in a more practical way than in theory

During my experience in counseling and mentoring, I faced many obstacles, the most important of which was the existence of educational gaps for the learner, and most of the gaps were in the theoretical field.

To fill this gap, I made this course, Thank God, this course helped many students to properly understand the field of data science.

If you have no idea what the field of data science is and are looking for a very quick introduction to data science, this course will help you become familiar with and understand some of the main concepts underlying data science.

If you are an expert in the field of data science, then attending this course will give you a general overview of the field

This short course will lay a strong foundation for understanding the most important concepts taught in advanced data science courses, and this course will be very suitable if you do not have any idea about the field of data science and want to start learning data science from scratch

Meet Your Teacher

Teacher Profile Image

Fahad Reda

Data Science & MIS Mentor

Teacher

Class Ratings

Expectations Met?
  • Exceeded!
    0%
  • Yes
    0%
  • Somewhat
    0%
  • Not really
    0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Your creative journey starts here.

  • Unlimited access to every class
  • Supportive online creative community
  • Learn offline with Skillshare’s app

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Course Outline : Hello everyone and welcome to this video. And this video, I'm going to teach you what is a data science and who is data scientist. I'm also going to teach you the difference between data scientists and machine learning, learning engineer or the machine learning scientists and data analysts and data engineer. And we're also going to learn that type of questions that the data scientist can answer. And this topic is from the machine-learning section, which is called supervised and unsupervised learning. And it's one of the main topics that I need. Data scientists should know. That's why I covered it here. And then I'm going to show you some real life application of data science. Which means I'm going to also show you what type of value that a data science can add to any organization. Since it's an introduction course, I'm also going to show you some of the skills that are needed in order for you to become a data scientist. I'm also going to show you how you can start studying for the designs. And I'm also going to show you how I'm going to, or basically how you should practice science using a platform called Kaggle or Kaggle. And then I'm going to mention some of the certification in the field of data science. And at the end I'm going to show you some of the books that I read in data science that I really liked and helped me a lot in understanding that a science. So I'm excited for you to enroll in this course and learned from this course. And please don't forget to rate the course. Thank you so much and see you in our next video. Hello everyone. In this video I'm going to introduce myself. My name is fired, Mr. Derrida. I've been teaching science for over five years now and I've been MIS mentor, which is management information system mentor since 2008. I have bachelor's degree in Management Information System from King Abdulaziz University. And I have Master's in Computer Information Systems from the same university. My master thesis was in the field of data science. And I'm also a WordPress muster for more than eight years. That's pretty much it. See you in the next video. Bye-bye. 2. What is Data Science?: Hello everyone and welcome to this video. In this video we are going to learn what is a data science now basically here, as you can see, that the science is an interdisciplinary field used to process and analyze and derive insights from different types of data. Okay, now what does interdisciplinary field means? Just like we have, let me just bring my pointer here. It's like we have MIS, which is Management Information System. Management Management Information System is also an interdisciplinary field that consist of IT and management. For shortcut. So you combine management and IT, you get MIS, same thing here in data science. We, if you combine all these majors or field, you get a data science. Think of them, each field kinda like a Power Ranger. Because remember Power Rangers, let's say this blue Power Ranger is math and stats, and yellow is CS, computer science. And the Red Ranger is machine learning and AI. A Pink Ranger is business expertise. And the green Ranger is that the analysts. When they combine together, they have some decor magazine. So basically for us, data scientists is when you combine all this, you get this powerful person or magazine Ana. So same thing here. So that the scientists would like one or two, not more than like 45 fields. There are completely different fields. We have computer science, we have math and statistics, and we have business domain expertise. We have traditional software, we have something called machine-learning. And then we have data. Alice. If you combine all these together, you get something called data science. For recombine only math. With computer science, you'll get some different machine learning. If you combine business expertise with mass and math and statistics, you get something called data analysts or data analyst. So you get the gist of this Venn diagram. So basically, data science is the study of data, which involves developing methods of storing and analyzing data to effectively extract useful information so we can make informed decisions. So basically all this headache, It's all about that we have bunch of data, different type of data. Maybe they are images, audios, match of text, numbers, whatever we have. And then we want to analyze that data. So at the end we can extract some useful information from it and then we take a decision from that information. So at the end, this decision is going to improve my business. And this is the main idea of data science. Now. It relies on Panthera of techniques. There are a bunch of techniques you have to do. It helps to visualize the data, have to use statistics on the data. And you have to build something called machine learning models. And when you're going to do all these things, you're going to start getting some sense of that data. And out of different families with data science process, the data analysis that, okay, these are the, basically the techniques here. So who's a data scientist is a person who does all these things as a person who's responsible of collecting and analyzing and interpreting large amount of data. You can make, can say like big data. But there's a difference between large amount of data and big data. We'll get to that later to identify ways to help business operations or business processes. Basically, we are doing all these things so we can increase our profits, decrease the expenses, improve the business processes, all this stuff. So again, what is the data science is a field that analyzed and derive insights from data. Basically, we have some business problem and then we transfer that business problem or B, and reformulate or formulate that business problem into data problem. And then we start extracting the data which is related to that problem. And then you start collecting the data, analyze the data, do some visualizations, apply some statistical tests. And then at the end, we built something called machine learning model. And from that model, we get some insights. And from that insights, we take decisions and that decision will improve our operations or business processes. And that's a lot to take the stay with me and you will know in the next video. See you next video. Bye bye. 3. AI vs ML vs DL vs DS: All right, so in this video we are going to learn the difference between which is artificial intelligence and machine learning and deep learning and ds, which is that assigns. If you want to learn about the science, you need to know what these four things, all right, let's start. The first thing is artificial intelligence. What is Artificial Intelligence? Any machine that mimic human behavior? We'll call that machine that it has an artificial intelligence applications. Let me give you an example. Let's say if I'm driving a car and I see red traffic signal for me, now my brain is going to tell me the Stop the car right now because the traffic signal is red. If I'm driving a Tesla. And there is an application inside Tesla car, and the Tesla car sees or detect traffic signal, it's red and it will stop immediately by itself. Then I can say that, yes, Tesla has an artificial Intelligence. And it, in simple terms, if any machine or computer mimics the human behavior, we'll call that this machine has an artificial Intelligence, and that's it. And artificial intelligence here, if you can check the Venn diagram, the subset of artificial intelligence, you have something called machine learning. Now I'm going to spend the machine-learning and I'm going to explain it in details in the next video. So buried me right now. Okay, So the first thing you should know about machine learning that it is one of the areas of AI because we have something called deep learning also. We'll get to that. So what is a machine learning? In machine learning, we use machine learning algorithms in order to predict the outcomes from a given. That. In order for me to explain machine learning in details, I'm going to have to explain that traditional programming Versus Machine Learning, and we're going to do that in the next video. So if you didn't understand what is machine learning, it's fine. We'll explain it in details. Alright? Okay, so here in machine learning, we give the data to the algorithm and then the algorithms find pattern. Okay, pattern. And from that pattern it gives me the outcomes. Okay? So if I have a lot of data, they say it's a big data. For some reason, the machine-learning algorithms is not going to give me. The best results. So in that case, we're going to use deep learning. I'll give you an example. Let's say I'm using I have a data for a bunch of patients in hospital and the data is in a spreadsheet. It's a CSV format comma separated value. It's an Excel spreadsheet and it's a basic data. So machine learning outcome is going to give me a good result, okay? But what if I had images? And I want to classify those images? Or I have videos. Now, I work on YouTube company. I work in YouTube company. And they gave a bunch of videos, let's say, 20 terabytes or 11000 video, and they want me to classify those videos. Machine-learning algorithms is not going to give me a good result. So I'm going to have to use deep learning. So basically, the main reason we have, you can see that deep learning is a subset of machine learning. It still counts as artificial intelligence. But deep learning came into existence when machine learning algorithms didn't give me the best results. And because the data itself became complicated, I mean, like if you go back like 40, 50 years ago, the only data that we had was a bunch of numbers and text. And then now we have images, videos, and hashtags, tweets, links, images, emojis. So we have these type of videos. So this type of data, so the data became complicated, became big. So here we can see that it is a subfield of machine learning concerned with an algorithms inspired. But this algorithm is inspired by what? By the brain neural network. And it's called Artificial Neural Network. Here in machine learning. And I said algorithms. We have many algorithms, we have linear regression, we have logistic regression. We have k-nearest neighbor, okay, we have decision tree, and we also have something called an artificial neural network. So the deep learning is actually based on artificial neural networks. Artificial neural network. And then we have something called data science and AI are really explained. What is the data science? Basically data science. I have bunch of data and extract some information from that large volume of data so I can use machine learning algorithms. And if I had a very complicated data, so I can use also deep learning. Since these two are subset of artificial intelligence, you can see that data science covers all these three areas, artificial intelligence and machine learning and deep learning. This is what the Venn diagram means. I'll explain machine learning in detail in the next video when I explain traditional programming versus Machine Learning, See you next video. In order for us to understand machine learning, we need to know how the traditional programming works. Now remember when I said we in machine learning, we give the data and we give the output, which is the outcomes. And then the machine-learning find something called patterns in the data. And then it gives me the instructions or the rules of the program. Let me just start from all over again. In traditional programming, I give the data and I write the instructions, which is F and loop and all this stuff in programming. And the computer compiled state and gives me the output. But in machine learning, I'm the one who's giving the data. But instead of writing the instructions, I give the output. And from that output and the data, there are machine learning. Finds a pattern in it and it gives me the instructions. It will like how? Okay, let's just go to the next slide here. If I am driving a car and I have three categories, which is, one is over speed. And then you have normal and we have slow. If I'm doing this or I'm writing this program in traditional programming, I'm going to be doing it like this. If and else. If the vehicle speed is greater than 100, then the drive label is going to be overspeech. Else, if vehicle speed is greater than 40, but it's smaller than a 100, which is between 40 and 100. The driving category is going to be normal. He's driving normal. Else, which means less than 40 is going to be slow. This is how it's done in traditional programming. I'm pretty sure you guys already know that. And the data that I'm going to give is going to be the vehicle speed if I write 50. Here, Here's telling me if it's greater than 40 and less than a 100 is going to be normal. So the after, right, after I enter the 50 and the vehicle speed, the computer's going to compile this code and it gives me the normal. Okay, Let's say I want to do the same thing. In machine learning or deep learning. This is the outcome and this is what the program is giving me. And this is the data. Okay? This is the data. And this is the output or outcome. Here. And machine learning, I'm the one who's giving the output. Now you'll be like How am giving you the output? Because I already have the data from the last year or from previous experimentation or something like that. And I'm the one who's giving you the data here, the vehicle speed. I'm always giving the data in both ways. And machine learning and deep learning is going to find the rules are the instructions which is distinct. This is how it's going to find it by finding pattern. Okay? Let's say I have bunch of spreadsheets, okay? And I have say I have a spreadsheet here. It's called Drive. Okay. Rife speed. Okay. And then I have the outcome. Let's say Dr. Seade here, I have 50. And let's say vehicle one. The outcome's going to be normal. Let's say I have 150. According to this rule, if the vehicles feed is greater than 100, then it's going to be over speed and so on. So this is the vehicle to vehicle 1 thousand or 10 thousand. By feeding this data, the entire data into the algorithms. Okay, this is the data. By this, this data is this data. And these are the labels, the outcomes which is here. And the machine learning is going to go and check the vehicle one was 50, normal vehicle two is 150 over speed. If there is a vehicle, let's say 999, It's one, 75. According to the rule or the pattern he found. If it's anything, is higher than 100. I mean, I'm pretty sure they're going to be other data here, which is going to be a vehicle, 844, which is going to be, let's say 100 is going to be what? Normally, because there is an equal sign here. So let's say 157. Since the 150 is over speed, obviously, the ones that have been 175 is going to be also over speed even though it wasn't there in the data. But by finding the patterns, by itself, the machine-learning algorithms find, found the pattern here. And he was like, You know what, according to this guy gave me anything is greater than 100. It was labeled over speed. Since 175 is higher than 100, I'm going to give, give it as an ABA speech, sometimes makes mistakes. It all depends on the data that you give. Okay, this is how it works. Let's make a smaller example. Here in data, I have two numbers, okay, let's say x and y five. X is five and y is five. And the program is that I'm going to make in traditional programming is an addition program. Which means I'm going to end up adding these two numbers. And traditional programming, since the program, the instructions that say x plus y equal to z, the output's going to be z, 10, 5 plus 5. Then simple, easy peasy. Same thing if I'm making a machine-learning, I'm going to give him that data. As I said, I'm always giving the data. But Initiative, I give him the program here. I'm going to give him the output. So the machine learning algorithm, algorithm is going to be like, okay, I have two numbers here and I have also an output is a number. So let's say 5 minus 5 equals 0. Is it the same as the output? No. Well then subtraction is canceled. Let's say five multiplied by 5. It's going to be 25. Is it the same as 10? No. Well then multiplication is wrong. 5, 5 divided by five equal one. Is it the same as an output? Nope. So division is canceled. 5 plus 5 is 10. Is it the same as an output? Yes, then the patriot that I found was edition. This is in simple terms, guys. Okay. Let's take a real example. I'm pretty sure most of you got a smartphone. I mean, pretty sure some of you watching this course on smartphone. Smartphone, we have something called the weather widget, which basically gives me the data or the temperature for today. And for some reason or another how is giving me four tomorrow and on Saturday, on and on and on Monday, which means it's giving me the same pressure for the five-day forecast. I mean, how he knows that. I mean, I'm stealing today. I don't know. Maybe tomorrow it's going to rain after that rain clouds and rain and whatever. So how do they, how do they find distinct, I mean, how the widget or the app works. Here, he, he took the data from the last year. This is the date. This is the maximum temperature was on that date, and then minimum temperature and humidity and rainfall and the outlook or just the output, is it loudly, which is written here. And same thing here. The data is going to find a pattern that say, if it, if it, the max temperature was 43 and the minimum was 27, the humidity was 54, then rainfall was 0. Then there's a big chance that day is going to be sunny. Sometimes it's wrong, but most of the time it's correct. Same concept of this example. But in this data, this is how the machine learning is different between traditional programming. I hope it was clear. Thank you so much. See you in the next video. 4. Supervised-Unsupervised Learning: All right, so right now we are going to learn what type of questions that a data science can answer. Now, we are going to start with something called supervised learning and unsupervised learning. I'm pretty sure you can know that from the title of this video. But let me tell you now we are talking about machine learning. So what type of questions that science can answer? Let's pick the first one. The first one is, is this a or B? What do you mean by that? Okay, well, this customer renew their subscription. Let's say I work as a data scientist in a Netflix company. Ok? And let's say I have a bunch of data and I want to know if this customer is going to renew his subscription or not. So the answer is going to be yes or no, 01, a, or B. So these type of questions are called binary classification. I'm classifying and we call it binary because there are only two. Okay? So this type of questions are called binary. So yes or no, 0 or one, a or B are called binary. Another example is this an image of a cat or dog. Let's say I have two folders here, minus four cuts. And one is for dogs. And I have, let's say 100 images of each other. I'm going to throw it, both of them into the machine learning algorithms. And I want to, I want the algorithm to tell me if he's going to take these images and you're going to learn it. And then I'm going to give him a new image. I can go and I wonder machine learning algorithms to tell me if this is a cat or a dog, 0 or one, a or B. So this is another example for binary classification. Same thing here. If I have and store like Amazon or eBay, which one will bring me more customers if I give them a coupon, which can detect $5, or should I give him a coupon that's going to detect 25 percent of the total of his checkout. So I want to know which one is going to bring me more customers than this one or this one? 0, 1 a, b. Get it. If this person got diabetes or not. Yes. No. Yes, he got diabetes. No, he doesn't he didn't get the diabetes. So these type of questions are called binary classification. And binary classification is a section of classification from supervised learning. I wanted to expand the binary classification first and then I'm going to come here to this image. So we have something called machine learning. And machine learning, we have actually have initiative to, I mean this one and this is two. We have actually we have 34. But I'm going to only cover only two. I'll tell you why. The third one is called semi-supervised learning. Semi-supervised. Ok. And I'm going to I'm not going to explain it because you're not supposed to know it, because this is an introductory course. Basically it means that half of the data is supervised and half of the data is unsupervised. Okay, what is supervised versus unsupervised? We're going to learn it in more details in the upcoming slides. So stay with me, okay? And the fourth one's called reinforcement learning. And reinforcement learning is something is more discussed in robotics and artificial intelligence itself. But in real data science, most of the things is going to be only supervised and unsupervised or sometimes maybe it's an semi-supervised. So we're going to just don't talk about this, these two and only on this one because these are must more and more common. So we have supervised learning and unsupervised learning. The first question is going to come up in your mind, is going to be like, okay, what is a supervised learning and unsupervised learning? And what is classification and regression and what is clustering. Okay, So I will explain some bread learning in details in the next slide. So just bear with me. Okay. I just explained what is the classification. Now? Classification when the output is going to be categorical, which is yes or no. I mean, and then compute a mean and we're going to do it is going to be 0 or 10. And we're are numbers not, not very categorical, but these zeros and one represents yes or no. 0 will represent yes and one will represent no. And maybe opposite in some cases. But you get the idea when the output is yes or no type of questions, or a or B, then we call this one classification. And if it's only 01, only two, we call it binary classification. If it's more than that, we call it multiclass classification. Okay? So now we know what, what is the classification? If the output is categorical? And it's a form of 01, it's going to be only one. I mean, it's only two. Categoricals is going to be binary classification if it's more than R2. For example, I want to know if this image is a cat and dog or, but. So now we have three. It's not binary classification is called multiclass classification. We'll get to that later on. But for now you have to know that classification is a subset of supervised learning. And when the output is categorical, we call it classification. And if the output was to output or categorical of two outputs is going to be binary classification. These are the things that I want you to know from this slide. That's it. Okay? What about regression with the output? And instead of categorical is number. Then we will call this one regression. How I want to predict prices for a bunch of houses that I work in a real estate company. And I got some data for a bunch of houses around New York City, for example. And I made a module on disk, this data. And there's a new palace or, or building is going to build and some type of district in New York. And I want to know how much it's going to cost. So the output is going to be price, right? So the output is going to be price. Price is a number, then its regulation. How many new followers will I get next week? If I have a YouTube channel and I keep producing a bunch of viral videos, I wanna predict that how many new followers will I get? Every day I'm getting, let's say 100 to 180 followers. I want to know how many followers I'm going to get next week if I keep doing it and say I'm pretty sure that most of you are using smartphones. I'm pretty sure some of you are watching this course on a smartphone. You're going to have a widget or an app called weather. And the one that we discussed earlier. Here. This one. So the output is going to be what? Temperature, right? Going to be temperature. And then you're going to have a categorical here, cloudy wherever, if the outlook was loudly and sunny. And rainy. Three, let's say we have three, since the output is text is going to be what? Categorical, but if its temperature is going to be what regression? So what will be the temperature next Tuesday? This is a regression problem. What about unsupervised learning? And what is unsupervised learning? Where is clustering? We're going to learn it in the next video. Okay? Before I go, we have supervised learning, okay? We have supervised learning and unsupervised learning. If the output was categorical, then it's classification. If the output was numbers, then it's regulation. If the output was yes or no, then it's binary classification. That's all I want you to know. And this slide. Thank you. I'll see you in the next video. 5. Supervised Learning: Okay, So in this video we're going to talk about supervised learning and little bit more details. These things were already covered in the previous videos. So right now here we can see that if you see these things, This thing is called an artificial neural network, a, a, and a, and an artificial neural network. Here we can see that we have a picture of a dog. Here we call this thing an input layer. And we have this one thing called hidden layer and then hidden layer. And maybe we can have another hidden layer and another hidden layer. And at the end you're going to have something called an output layer. And here we can see that this neural network is classifying that 95 percent that this image is a dog and 5% is a cat. This is how a neural network work and this is binary classification problem. Okay? So let's just go and deep dive little bit in details and supervised learning. Now, what do I mean by supervised learning? I mean, I did explain classification here in the previous video and regression, but what do I mean by supervised learning? I want you to focus here. Here we can see that cat has a labeled. You see this image a little bit labeled, which means I know the type of the data. I know that these are images of cats. So this is called supervised because I know I know the data. Okay. Let's say you work in a company, you work in an university. And you can get, let's say, three types of data. Three types of data. One is for students data, one is for professors that, and the other one is for the college or university, admin data. So let's say you're working as a data scientist for a university. And the managers told, gave you a database, let's say CSV file. And you're like, what type of data are there? He's going to say these are students data. So now you know that what type of data it is. So you have the label and the label is students data. Just like here. I know that these images are, let's say here I have a folder called cats and my PC. Since this folder has labeled, then it's a supervised learning because it has a label, because I know what type of data, which means in this folder I'm going to open, I'm going to have, let's say 10 thousand images of various cats. Can see that here. This guy has a pointy ear, and I must judge, this one has small ears. This one has a pointy ears and a small postage. Okay. This must touch and years and years, these are, we call them features, but we're going to talk that later on. For now, you have to know. Supervised learning or supervised means that I have a label in my data. So I mean, I know, I know that data. Okay? So this is what supervised means. So here we have labeled and we have bunch of images for, for cats. And I gave it to the machine learning algorithms. And then by studying of bunch of cats and dogs is not, the image is not showing here, but the idea was that we have two folders. One is for dogs, one is for cats. And you see the doc folder has labeled and the cat folder has a label. So this is supervised learning and I have 10, 1000 images of cats, and I have 10 thousand images of dogs. And I took these two images and gave it to machine learning algorithms. And at the end, the machine learning's going to tell me which images are cats and which images are not cut. Which means dogs, because the only other image, images I have is either cats or dogs. So the idea is like this. You have 10 thousand images of both categories. And the categories are cats and dogs. You give it to machine learning and machine learning is teddies. These images by finding patterns. And those patterns are pointy ears must touch. Big nose or small nose and eye with an eyebrows or without eyebrows, you know, these are all called Features, go to steady them. Same thing for the dog. I mean, I have a point to your dog, but I also have down here dog. So these are called features and at the end he's going to learn this. Okay, then I'll, I'm going to go and find a random image for dog, which are not included here nor here. And according to those features which adds a new image from Google, I just downloaded new image. I'm going to give it this new image to the machine learning algorithms and machine learning algorithm according to the tray, the training he got from these two folders, from these two images is going to classify to mean that the new image, is it a dog or cat? Yes. No. 01, binary classification dislike I told you. So here we can see that he can classify cats and knockouts, which means dogs. This is how, I mean in simple terms. Okay, I'm still, this is an introductory course. There is no Python, no coding, no nothing. I'm just trying to explain the fundamental student and step-by-step the upcoming courses. I'm going to go deeper and deeper and deeper. But for now we're just making the foundation. Okay? And I just like I said, we have something called classification and regression, sorting items into categories just like here, these are the cats, dogs. And in this example and regression is going to be identifying a real values, which means numbers like dollar prices or weights or stuff like this. Okay? So this was what supervised learning work. Okay. So I will see you in the next video. 6. Examples Of Supervised Learning: So in this video, we're going to take more examples about supervised learning. So we're going to take the classification section, which is predicting diseases. And we're going to take an example of predicting diabetes and predicting cancer. And then you're going to take something called sentiment analysis. And the example that we're going to take is analyzing sentiment through Twitter tweets. And the last example that we're going to take is going to be face mask detector. Which means if the person is wearing a mask or not. The first example that we have here is predicting diabetes. Here we have the data of patients with their medical variables. And these are the medical variables. These are called features. Bmi, diabetes, jQuery function h, and then the outcome. Okay. One is means yes, he got diabetes. 0 means no, he didn't get diabetes. Now, let me explain it to you. Here. We have the data. Okay? And then we're going to feed it to the classifier, to the algorithm. And then the algorithm is going to classify by, let's say 0, which means no, no diabetes. These people, because they have diabetes. And one, which means yes, these people got API 01, yes or no, binary classification. And this is how the data look like. And this data is bunch of numbers and text. It's a simple data. So we're going to use machine learning algorithms. Let's say we're using logistic regression. Okay? This is the name of the problem and you can check there that this example in details from this link, you can find these slides on my website, the dash minus.org. You can find it there. Okay, So this was example number one. And this is how the data is going to look like in real life. Okay? Next, we have predicting cancer. Here we still have the data in the field of medicine, but previously it was bunch of numbers, numbers and text. This time it's images. And there's our brain images. Since the data became complicated, it's no longer a bunch of numbers and texts, images. Now, we're going to jump to deep learning. If you remember. We said, whenever the rain reasons, deep-learning came into existence because the data became complicated, which means we have now images. So here we're predicting the brain cancer. And the name of the example is classification of brain tumors from MRI images. And these are the MRI images using something called convolutional neural network in short, we call it C. And CNN is a type of neural network or artificial neural network. We're going to take that in the deep-learning course. This is the link. You want to check more in details, okay? If you see this small circle area, these are the tumors. And the output is going to be, yes, he got brain tumor. No, he didn't get 0. And 1 or sorry, the opposite. One means yes and 0 or whatever depends. So this is a binary classification. It's still in the field of medicine. Still supervised learning, and it's silly classification. But here we can see the data is no longer a bunch of numbers and text. Okay, Here we have images. Simple data. We can use machine learning, complicated data, we use deep learning. There are some scenarios you can use also deep learning and simple data, but we'll talk about that later on as another example do we have here is predict face mask detection, forgettable regions, FS max mask detection. Here we can see that it's not an image, it's an, actually a video. I am giving these examples intentionally because and I'm taking it step-by-step. Simple data, too complicated to very complicated data. Here we have a video and it's a live video. And that camera, we have an application, and that application we have a machine learning model or deep learning model. Here we can train a lot of images of people not wearing mask and a lot of people are wearing mask. And then here the model is going to detect that no mask, 95, 0.99%. This guy is not wearing a mask, which is, let's say 0. And this is 97.77% that this guy is varying mass, which is one binary classification, but this time it's deep learning and we are using the data is video. Right? Next, we have sentiment analysis. We are analyzing tweets, twitter tweets. Let's say I want working in a company. And I just launched a new campaign. Okay? And people are talking about this campaign. And let's say here, here, we can see that we are so happy you found us feel free to send a message if you need any help. And let us know when you're building and there is a Smiley emoji. So since there is a word, happy and there are no curse words or bad words or negative words. This can be classified as a positive tweet. And there are dangerous and ugly and negative words. We can classify this thing as a negative force. And if there are, let's say, both negative and positive words and it's finding some difficulties here. You can see it's neutral, it's neither positive nor negative. These type of things that we call the multiclass classification because we have 012, for example. So we don't have a binary if it was only positive and negative and there was no neural neutral. It's a binary classification. And here we can see that we have an, a text, only a text. Okay? These fields called NLP, which is called natural language. So it's not an m. Let me write it again. En el natural language processing. Natural language processing is a field of machine learning where we were, our data is going to be only text. Okay? So here we have also a classification problem. And after that we're going to talk about unsupervised learning, and we're going to talk about it in next video. Thank you so much. See you next video. Bye bye. 7. Unsupervised Learning: In the previous video, we learned some examples about supervised learning. In this video, we're going to learn what is unsupervised learning, which is the other section of machine learning and supervised learning. We have a labeled data. It's like learning when you have a teacher, Okay? If you are a kid and you're learning and you do some mistakes, you can show it to the teacher and teacher going to correct it. But the teacher is going to tell you that this is an apple, so it's labeled, it's supervised by the teacher. The teacher is telling you that distinct is an apple, okay? In unsupervised learning, it's like you are learning without a teacher. You have no idea. You're going to do it by yourself according to whatever the pattern you find. So let's say we have the same example, but this time, I mean previously if you check the supervised learning, we had folder which was labeled as cats. Okay? But here we have bunch of animal images. Let's say we have a folder called animals. You'll be saying, okay, this is a label here. It's labeled with I might have different types of animals, okay. And inside of it I have 10 thousand pictures of cats and 10 thousand images of dogs. Now, I, as a human, I know these are images of cats and these are images of dogs, and the labeled is animal. But the machine learning doesn't know that in this folder there are only cats and only dogs. We did. Okay. So only I know because I'm the one who collected these images. So what it's going to happen that all the images are going to fit into the machine learning, okay? And the machine is going to cluster them or group them according to the pattern he's going to find. Let me just go step-by-step and let's say I'm the machine learning algorithm and I'm going to find the pattern. Let's say I'm going to group some images according to their pointy ears of this dog has appointed a year and this cat has appointed year. I'm going to group it to, let's say group or cluster is going to be cluster actually, there's going to be cluster 1, okay? When, and it's going to be folder called Cluster one. And I go and click on that cluster 1 folder. I'll see all the images of animals that I have, all the dogs and cats here, which were with pointy ears and other cluster is going to be pointy ears and mustache. Okay? A third cluster is going to be no point t years. Okay? A third cluster is going to be small and knows only these small nose and a small nose and a small nose and a small nose. This one be taken. And this one be taken. Another cluster going to be big nose, okay? A third cluster is going to be a square, sorry, a circular, a face like this one. This is a circular phase. This one a circular face, but this one is not a circular face. And so this one is an oval face like this one. These two will be cluster and so on. He's going to find many patterns wherever he finds. Okay, so let's say we have, here is something called cluster number 1, just similar group. And he classified them correctly. All the cuts in one section or in one group and the other one is all the dogs are in one group. And this is how clustering works. Okay? Identify similarities and the groups. Here. The similarity, similarities were the pointy ears and the small nose and the most starch and whatever, for example, are there patterns in the data indicate certain patients will respond better to this treatment than the others. For example, if you have a vaccine or a new medicine, Let's say if you give this vaccine to diabetic people, it will effect, it will have some side effects. Okay, Let's say you had 5000 okay. Diabetic patients and 5000 non-diabetic, you give them, let's say this, the vaccine, okay? And all the 5000 Davidic people, they had side effects. And these, they didn't have side effects. So it's going to group them. I ought to look like classification, but the label, the data aren't labeled. Okay? So it's going to group them or cluster them. Take this example. The image is, forget about this example that might make it complicated. So take only this example, okay, we got to take some other examples in next video. So we have something called clustering and you have something called anomaly detection. Okay? And the other one called dimensionality reduction, which we'll talk about that one. So we're going to talk about the clustering and we're going to talk about that anomaly detection. Anomaly detection basically means is a hacker intruding to our network. Let's say I work in a company, okay? And let me just clear this thing up. Pointer point to research. Okay. So let's say I am working as a network admin in accompany. Okay. I'm network admin and accompany. And I have bunch of departments. I have an HR department, I have a marketing department, I have accounting department and so on. And I am giving each 110 mega bit as Internet speed. Okay. And I'm the network admin. And I told all the departments that do not download movies, do not download big files from the internet. You want to download something. Tell me, OK. But I found suddenly that the HR they're using all the ten megabit I am allocated. And he was requesting for more, which made this department and this department have a low connection. So I call the HR people and I told them or is there anyone who's downloading any movie? They were like No, there's only one guy here is working and just working on an office, softwares, then I'm going to know that there is something wrong. There might be an application is requesting a lot of network and maybe there is a hacker my network. So this is like anomaly detection is going to be like this is the internal speed, ok? Working like this. And the guy is working like this. Let's say you downloaded some files. We went like this and then he went back after you downloaded, and then suddenly it was like this. This is when the hacker started hacking. Okay, so this anomaly detection, anomaly Detection means something abnormalities in the data. Something weird just happened. Okay, next video, we're going to see some real examples and unsupervised learning. All you have to do, all you have to know right now that if there is no labeled into my data and I have no idea what that this data is all about. I'm going to use unsupervised learning, unsupervised machine learning algorithms to group this data and classify, channel like classify or group them or cluster them for me in the literature on I'll go and check each cluster and see according to which pattern the algorithms cluster these data for me. So see you in the next video, we're going to see a real life examples about unsupervised learning. Buh-bye. 8. Examples Of Unsupervised Learning: And the last video, I explained about what is unsupervised learning. In this video, we're going to take some real examples about unsupervised learning. So let's start. The first example we are going to have. It's a very similar one. Let's say here we have an input and as you can see, it's not labeled, it's not even written that there are these are fruits. Okay. There are a bunch of fruit out there. And I fit it to the model. And this model group them or cluster them according to their color. Here, we can see he clustered them, coordinate the red color here according to green color, and here, cluster them according to the orange color. If I add a lemon here, okay, he's going to put it here. And if I add tomato here, he's going to cluster it here based on the color. So here we can see that these were clustered according to the red and the red color. And red color was the pattern. This is a very simple unsupervised learning and this is how the clustering work. Now, let's go a little bit more detailed one. Let's say I'm working at Amazon.com and I have these are all my customers. These are different people have customers. I have teens, girls, boys, men, women, single women, single men, all type of people. Okay. So let's say there are a bunch of girls who always buy some makeup every time there is a new makeup was released, or there is a promotion or a new pallet was released, or wherever these people immediately going to buy. Maybe there are some boys who are buying that megabit for Amazon. They check the behavior, that buying behavior. And it's going to cluster it into, let's say make up cluster. And then there are people who always buy the new gadgets to new iPhone was released they immediately by pre-order or whatever. And you cover a new power bank or whatever. So these are clustered according to the young tech savvy. Maybe these are the people who do reviews on their channel. Let's say that there are people who buy some traveling tools like tent and whatever. And there are people who buy stationery tours are clustered as students. So according to their buying behaviors, these were clusters. Okay. Why do we cluster them? Let's say new pallet was released or there is a promotion for some makeup. They're not going to send that promotion email to all the people. They're going to only send it to the specified segment. Okay, so this is a real life example of customer segmentation, which is another thing called which is what I just explained. It's called clustering. Okay, here we are. Clustering demodulator, buying behavior and previously or classless reading. Clustering them according to their color. Okay, Another example here we have movie recommendation and the net flux. And same thing. We have the recommendations engine in Amazon. Let's say if you buy a data science book, a bunch of data science books. Every time there is a book released in the field of data science or machine learning. They're going to tell you that most of the people who bought this book are also bought this book. Same thing here. Let's say we have Anna here. The girl name is Anna, and the boy name is David. And been on a net flexed subscription for a whole year. So every time she watches a movie or a TV show, her behavior or watching behavior are recorded in the Netflix system. And let's say Anna, she always watches horror movies and TV shows. Okay. Let's say David being for three months only. The subscription was for three months only. And David, it's like, you know what, whatever whatever its new TV shows or movies coming, I'm going to watch it. I don't care. It's an action and horror or adventure or whatever. He started watching, let's say, action and horror and adventures. Let's say he was watching a little bit more harder than this and this. So because he watches a lot of horror, he was added to the horror people group. And let's say Anna is like the big boss or cheap because she always watch horror and she always watch. So she's also in the same group. So if there is a new TV show was released and it was horror. And Anna immediately boasted. It will be showing to David that if you are watching this show, there's a big chance you're going to love the new show which is also heard. This is how the recommendation work. According to watching behavior. Same thing, an Amazon. If you buy a new iPhone, they will immediately tell you that most people who bought the iPhone, they bought a cover for iPhone or they bought screen protector for the iPhone. And the what power buying for iPhone, which is same thing Here's was watching behavior towards buying behavior. This is movie recommendation system in Netflix, this is recommendation engines and Amazon. These are examples of customer segmentation and recommendation system. Now let's just go to the anomaly detection figures. Remember in 2008 when swine, sorry, the bird flu was, was kinda like trendy. This is how the Google trend was working for flu, for the flu keyword, okay, flu activity till November and then immediately On December it spiked. This is same thing as I explained to you when networking and the network example of network had been when he was hacking. Okay, this is how the anomaly detection works. And that something will happen. Okay, here we can see a Google Flu Trends from Wikipedia. That subgroup service from Google provides influenza activity for more than 25 countries. By aggregating Google search queries and attempt to make accurate prediction, prediction, prediction about flu activity. This project was launched 2008 and help predict the outbreak of fluids dislike right now we are going to COVID-19. We're going to see another example in the next slide. So this is how they predicted the flu. In two thousand, eight thousand nine. Let's go and see the COVID 19 here. Again, the source are here, can check it later on. This is the anomaly detection of COVID-19 in 25th January in 2020. Here you can see now this is the legend here. This is the legend. Gray means no cases. Like pink means from one to five cases. A dark pink means 60, 20 key cases. Red means 21 to a 100. Cases in a dark red means more than a 100 cases. Here you can see that the Wu Han, which is where the COVID-19 started, it's very, very dark red and the areas around it are in red. And step-by-step. You can see that let's say we have, here we have India, let's say. So here we can see that after how many days or weeks this COVID-19 is going to reach India. And according to that they can predict, they can prepare India or wherever the other countries are around China. For this outbreak. Here we can see the anomaly detection and 2004 and appeal 14, 2020 and how it all went to this world. You can see the legends here. Okay? Same thing if there was a fire on a specific area or there was a tornado, that there was an earthquake, there was a storm. Same, same idea, same method. This is what at how anomaly detection work, something weird is going to happen. Suddenly. Hope you liked the video and examples. I will see you in the next video. Bye-bye. 9. DS Apps and ML Terminologies: In this slide, we are going to see some examples from multiple fields and health care, just like we saw the diabetic classification problem of this guy got diabetes or not. Here we have something called medical image analysis, which was the MRI, brain cancer example that I just explained. And the transportation, we have something called self-driving cars, which is in tesla. The car drive itself. In finance, we have something called customer segmentation, just like the one I explained what Amazon and e-commerce identifying consumers, same thing, almost the same thing as customer segmentation, recommendation system. Recommendation engines and e-commerce mean like an Amazon. Recommendation engines that I explained analyzing reviews. Reviews are written in text. So it means I'm going to use NLP. I'm going to use something called sentiment analysis. If he's talking some nice words is going to be positive review. If it's bad, we're going to be negative review. And manufacturing we can predict if there was a problem in that machine or not. Let's say I have bunch of machines. And after awhile, let's say every two years. And this machine gets some issues. So before that machine actually get an issue, there is a system's going to say that, yeah, it's time for this machine to do some maintenance. And we haven't banking. You have something called fraud detection. Let's say I was buying, I always buy from and secure websites like Amazon, eBay. But let's say for some reason I bought from an unsecure website to the banking immediately going to know that someone might be stealing my credit card, so they will stop that transaction is a fraud detection. If you wanna know more about applications of data science in real life, I would highly recommend that you read this book for Dr. Eric Siegel. It's a great book. And the manager who want to know about data science are, or how data scientists can add value to any organization. You should read this book. We have something called machine learning terminologies. Something we always say in the field of machine learning and data science and you should know about it. The first thing is algorithms. What is an algorithms? And real? I mean, in traditional programming, there's the, they are called as a set of instructions. Okay? But in the field of machine learning, there are set of rules, okay? And statistical techniques that are used to learn patterns from the data and draw insignificant information from it. I mean, I remember when I said we give the data and the machine learning algorithm. Itself, find those patterns. Now those, if the algorithm was kinda like a tissue books, tissue box. If you have a tissue box, open the tissue box, you'll see a bunch of tissues inside of it. Okay? This is like a tissue box. Algorithms was like a tissue box and you open it, you will see some statistical algorithms, okay? Bunch of weird algorithms and stuff like this. And this algorithm is the one who find the patterns in the data. Okay? And then we have something called model. Now, model is the one of the main components of machine learning and numeric data. And then we feed it to machine learning algorithms. And the machine learning is this box and inside of it a bunch of statistical techniques and stuff. And then when it's finds the data, it creates a model for me. Okay? And the model is a trend data according to this one. So I have a data, I feed it to the machine learning algorithm. That machine learning algorithm is plenty of statistical techniques who finds the pattern? Algorithms is the one who find the patterns. After he find the patterns, he will create a model for me. And then I will check if this model was accurate or not, or if you give me exactly the output or not. I'm going to explain this in details in the next slide. So if you didn't get it, the spirit me, okay? A model is the main component of machine-learning. Model is trained by the machine learning algorithm. So these are the step data machine learning algorithm. After the machine learning algorithm finds the patterns in the data, it will create for me a module. Okay, predictor variable. It's a feature of the data that can be used to predict the output. Remember in in the diabetic diabetes, we have bunch of columns and rows. These are features and the last one, Let's go to diabetes. Let's say one means yes, he got diabetes. 0 means no. This the last, the last column is the response variable, is the feature or the output variable that needs to be predicted. Do when we want to predict this column. So this is the response variable. The response variable. What depends on these variables are these features, so these are called predictor. And this one is called response variable or the output variables. These are called attributes, features. Most of the time they call a machine learning features. Okay? Training data helps the model to identify key trends or patterns essentially to predict the output testing data. After the model is trained, it must be tested to evaluate how accurately can protect an output. This is done by testing the data. Let's say I have 10 thousand images of cats. I'm not going to use the entire 10 thousand. I'm going to use only 8 thousand for training. And I'm going to feed it to the model. And then to the algorithm, sorry, to the algorithm. And the algorithm is going to make for me a model. And then the other 2 thousand which are left, which haven't been seen by the algorithm. I'm going to give it to the algorithm until, till him to tell me if he predicted the image correct or not. Again, I'm going to explain in details in the next video. Okay. See you in the next video. 10. ML Terminologies in Details: Okay, let's take an example here. And let's say you went to your friend's house and she offered you a bunch of cupcakes. And you took this cupcake and you really liked it. And you're like, You know what? I'm going to give you the recipe. And she was like, No, I won't give you the recipe. So what you did that you took a bite of this cupcake and he started to analyze the recipe in your head. And this is how the recipe in your head look like, okay? Now, because you've been eating and taking byte from this cupcake more and more. So the recipe was little bit changing, let's say instead of 10 gram of floor became 15 or became 20. So in order for you to know that you actually going to make an exact the same cupcake that your friend made it. You took another cupcake from her house and you took it to your home? Okay. This is the original cupcake, the one that you'd like to make at home. And in your brain, you started writing the recipe, the ingredients. And you know that it's going to be milk. But you don't know how much, how many milliliters are literal is going to be. And so on. What you did when you took a bite with your algorithm in your brain, start to define the patterns. And the patterns was this recipe. Okay? And then when you met this pattern, so your brain was, the algorithm was finding the rules or the recipe. Here, the rules are recipes, okay. And after you made the recipe, the first cupcake you made from this recipe was this one. So we'll call it module 1. Module 1. And he took bite from the model one. And you're like, You know what? It tastes almost 91 percent as the final one. And you say you took another bite from this one because this one, you already ate it at her home. You go into you took this one from her home in order for you to match it with this one, the one that you're going to make it at home. This was made at home after your brain made the recipe in your head. So the recipe was the algorithm, the rules, and the model was the first cupcake humid, Let's say you made a lot of cupcakes. At the end, you were able to make something similar to the first one was, let's say 95 percent. Maybe the secret was and the toppings or whatever. So this is the difference between algorithm and module. Okay. The training data was that you kept taking bites from the original one. The more bite you took, the more your brain started to find the correct patterns. Okay, maybe there was something here with, there was something, let's say a cream here inside and a few only took a bite here. You wouldn't get to this area. The moment you took more bite, a new more bite, you reached the center area where you had to add more butter. So the more bike you took, the more your brain started to give an accurate recipe. And your brain, which is the training data, the test data obviously was the output, the output which combined from the output. And then you're like, You know what? I still need to do more training such as you take more bite from this one. This, these two are originally this one and this one. These are the original cupcakes. Okay, maybe different than the image, the toppings, but if you can see that it's the same, it's the same cupcake. Okay. I hope this example clear things up or maybe made it more complicated. I don't know. But this is how it works in real life. I'm pretty sure the diabetic example. Let me just go to the diabetes example here. Okay? We have 0, 1, 2, 3, 4 all the way. Let's say 10 thousand. Okay? And then I'm here to predict the outcome one or 0. Since it's a binary classification, then you're going to use the famous logistic regression. Regression logistic because there is something called linear regression. So the classifier is logistic regression. Okay? This is the algorithms. We already chose the algorithm because we know that the logistic regression is used for binary classification. Here we have 10 thousand. I'm not going to take the entire 10 1000. I'm going to take that say 8 thousand for training, and the 2000 remaining for testing. And these are called attributes or features here are called feature. Okay? And which one that I want? I want this one. I want to know if this guy, but this person with diabetes one means yes. 0 means no. Okay, So according to the terminologies that we have, the feature or the output value that needs to be predicted. Response variable is the last one, which is 01. And the training and the predictor variable are the other features which because of those, I will be able to find this one. The training data is the 8 thousand, the first 1000, and the testing data is the last 2000. Maybe I can change it, but for simplicity, I'm going to make the training is the first 1000 and testing is the last 2000. And the algorithm that I chose. Logistic regression. Why I chose the logistic regression. Logistic regression is known for binary classification. Okay? And the model, after I trend the data, I had a model. And then from that module I apply the test data and the model. And then I know if the accuracy of that model, I hope it was clear. Now. See you next video. Bye bye. And the last video I explained about machine-learning terminologies. I think I need to explain it a little bit more. So remember in this example of the cupcake, let's say, let's go to the diabetes or diabetes data, okay? And let's say we have that set here. And here. And here. These are the dataset for people who got the diabetes and who didn't got the diabetes. And then since it's a machine, it's a supervised learning and its classification problem. I'm going to use three algorithms, okay? Here I'm going to use logistic regression. And here I'm going to use decision tree. And here I'm going to use support vector machine. These are all algorithms and these algorithms are used for classification, okay? After I trained the data, trend, that data to this algorithm, I have a model. And I will call this one our logistic regression model, because this model was created after the algorithm found or learned that data or the patterns. These are, I feed the data to this algorithm and the algorithms find the patterns and it learns it. And after he learns it, he creates a model. And I'll call this one logistic regression model y, because this model was created by logistic regression algorithm. I think that's pretty much clear. Same thing here. I have it data. I trained it to the decision tree. After the decision tree learned from the data, he created a more than another model. And this one is called a decision tree model. Why? Decision tree model? Because this model was created when decision tree learned between brackets found the patterns in that data. And same thing here, this data was fit to subvert the chin machine, support vector machine algorithm. And then I have another model called support vector machine model. So now I have modal number 1, model number 2, and modal number three. After I fit that test data to this module, I had an accuracy of 87%, and this one was 91 percent, this one was 96 percent. So I'll say that support vector machine performed better than the other modules are other algorithms. This is how it works. I kind of felt that the last video wasn't clear enough. That's why I met a Part 2 of machine learning technologies as this thing that I just explained. It's kinda like the foundation for data science and machine learning and hope now you know the difference between algorithm in the model. Okay? Algorithms, It's bunch of statistical techniques that are used to train, okay, train the data. And model is created which represent the data. So I have a data here. I feed it to the algorithm. The algorithm learn the data by finding the patterns, and then it creates a model for me. And I'm going to call it logistic regression model because that model was created by the logistic regression algorithm. And then I'm going to test the data after I throw it onto the model. Because the model now it's model is learned. Now, the model was created after the logistic regression learned from the data, found the patterns, then I'll feed the test data, test data. Logistic regression didn't see the test data because I was training only by the training data. Okay. And then I will have the accuracy of how accurate the model give me the accuracy of this test data. I hope now it's clear. See you in the next video. 11. Data Science Workflow: In this video, I'm going to explain the data science workflow. How typically a data science workflow look like. First of all, we're going to have a problem, a business problem. And then I'm going to formulate that business problem into a data problem. And then after I have something called a data problem, I'm going to get the permission from the database admin or from the organization to get the data. And then I'm going to collect the data and store it in a database or data set. And then I'm going to prepare the data by checking the features, taking down missing values and doing some statistical analysis like a mean, median mode, descriptive statistics in these type of things. And then we're going to explore the data by drawing some charge and histogram and stuff like this. And then at the end I'm going to exert experiment which is building my data set, my training set and test set, building the model and train and test. And then at the end I'm gonna do it, my prediction. Okay, this is how a typical data science workflow look like. Let's say we have a problem that customers are buying my product. So this is a business problem. That the problem is that since they are not buying them, they're not buying my products. That's mean, I'm not correctly marketing my products. So which means I have to do something called cancer tumor segmentation. I need to segment, segment deviation. Sorry, I'm writing in my with my mouth mouse. So I'm going to segment my customers. Let's say these are the teens. These are the grown-ups are adults. These are girl teens. Those were the voice. These are the girl teens. These are the woman. And so on. And then I'm going to mark it each segment with its own product. Okay? Like if there's a PS five or Xbox, I'm obviously going to target the voice teens, boys and girls, teens, for example. Okay? So this was the business problem. This is the data problem. I'm going to get the permission from the company to get the data. And then I'm going to give me an ax to the database. Are they going to give me that? Is it basically database? Let's say this is the database. Okay. Let's say this is the database from university. And I have professors data. I have employees that are working as an admin, and I have students data. Let's say I want only the professors who work in science faculty. So this is a data set. Data set is like a subset of a database. Okay, if I want only the students, then that will be student data set, which is a subset from the big large database and so on. So they either gonna give me an access to their database and I can extract whatever the data I want. Or sometime they are most of the time they're gonna give me a dataset with the features that I want. Okay? The features means that student data, I need his GPA, I need his email, I need his mobile number, his name, and age, and so on. These are the features, the columns. And after that, after I collect the data, I'm going to store it. Obviously going to have to store it so I can get it back. And I'm going to prepare the data, which is trend test. And that's prepared in a form that I can play with it. And then I'm gonna do some descriptive statistics to explore the data and find that everything is fine. There is a non missing values. I have the perfect features. These are written correctly and whatever. And then at the end I'm going to do my experimentation and then prediction. This is what the data science workflow usually look like. See you in the next video. In the previous video, I explained the data science workflow. Now I'm going to explain it again, but this time I'm going to mention the job description of each process. What do I mean by that? Now let's see. Here we have collection and cleaning the data. This is the part where data engineer comes in. And then we have also cleaning in EDA. Eda is the shortcut for exploratory data analysis. Which is basically mean I'm exploring the data and applying some statistical concepts and charts and stuff like this. This is the part where data analyst comes in and the building the model and deploying the model. What do I mean by deployment? Whenever I create a model, I either going to add it as a feature to an existing app or system, or I deployed as a product. I'll give an example. Amazon Alexa is an example. Amazon Alexa is an example of an AI product which has an module built in which is basically trained on a lot of voice commands. Okay, So that device, Amazon Alexa, it's a product. What about the feature? Snapchat and Snapchat. If you use earlier version of Snapchat, there were no filters which make you look like a dog or give you some weird stuff on your head and and faces and stuff like this. But then they added the told you to update the Snapchat app. And then you'll get something called filters. So filters are features were added to an existing apps like Snapchat. Okay, so building model and model deployment is the job of machine learning. Most of the time model deployment of software engineering people. Okay, Now the whole thing is what data scientists to. So basically you're going to end up learning everything, okay? But in real life, you will be either, let's say data scientists in building the model, that the scientists in doing the EDA so that the scientists going to be an expert in something you know, that the scientist who came from statistical background, they'll be good at cleaning and EDA and that scientists will become who came from the software engineering background. There'll be good at machine model building and model deployment and so on. Okay, So this is what the data science process look like. 12. Data Types: Now we're going to talk about data sources, okay? Every like you do on an Instagram or Twitter, every e-mail you read or sent or write, every swipe of a MasterCard. You buy, whether it was from your grocery store or from a website. Everything in Twitter, even if you saw that tweet, you just saw that tweet materially like adhesion, energy to eat it. You didn't comment on that tweet, you just saw it that considered as a views. So that's a data source. Every single click, you click that source of data. We basically use the data to describe the present, and we also use it to predict the future. Then we have types of data. We have quantitative, which is, I will call it by numbers. And then we have qualitative, I'll call them categorical. We will take examples. Quantitative deals with numbers and they can be measured qualitative. They describe the thing and they can be observed, I mean the vaccine, but they cannot be measured. We'll take an example here. Same data, same data. Here. The quantitative data is that fridge is 60 inch tall, and this phage has two apples, and it costs 160 to 100. They can't be measured. So this is a quantitative data, same data here, same fridge. The qualitative data is that the color is red, and it was built in Italy. And the fridge smells like a fish. Okay, I hope this simple example explained the difference between quantitative and qualitative data. 13. DS vs ML: Okay, in this video, I'm going to explain the difference between data scientists and data analyst and machine learning scientists, or sometimes they call it machine learning engineer and data engineer. Now let just take step-by-step. The first one is Data Engineering are basically data engineer's job is to collect the data and store it. For me. Accompanying. If you need a data, let's say you have a business problem, just like we said about the and the previous, previous videos, that our sales are in that much high, people are not buying my product. So this was a business problem. Now, once mean as a data scientist, I convert it or reef formatted the business problem into data problem, which means I need customer segmentation. I immediately wrote the features that I want. And I said to data engineer that, that engineer, I wanted to get me this data. So data engineer's job is to built a data pipeline in order for him to get the data and gave it to me as a data scientist. So here basically maintains the data access, okay, this is the database, the big database, and this is the pipeline, and this is the data set. Okay? So he builds this pipeline for me so I can get the data, so maintain the data access. This line should be maintained by him. And he's going to build a data pipeline and the storage solution obviously going to have to store the data in a place that on the Cloud or something. He's an information architect because he is, he knows how the database work. We will see in details. In this video the skills that you need to have in order for you to become a data engineer. So basic data engineering means it's the signs of collecting and validating information, which is data that data scientists can use it. And his job is to build the pipelines and prepare and transform the data to data scientists. Basically, in short, his job is to get the data for data scientists, okay, by building the data pipeline. Now to data analyst after the data engineer build the pipeline and he basically re-establish the pipeline for me. And once I start getting the data, I have a data analyst who's going to prepare the data for me and explore and visualize the data to so I can make sure that wherever the features I needed the data are correct. So he's going to perform simple analysis, descriptive statistics, charts and histograms and stuff like this. Make simple reports and dashboards to summarize status for me, as a data scientist can have a general look at the data. And obviously it's going to be the person who's going to clean the data if there was some And unnecessary features is going to remove it if there are some missing values, is going to use them, simulate system average and some stuff like this. Okay? He's going to prepare the data for me. So basically data analysis is the process of cleaning and transforming data and discover useful information. Obviously, when the data analysts going to apply descriptive statistics is going to draw some charts for me. Okay? Obviously you can get some data or some useful information. And am I detect some outliers here, okay? And data analyst is the person who uses statistical tools like Excel, SAS, minitab, or wherever our language to interpret datasets. Okay, and this data set we're giving by data engineer after he built a data pipeline. Okay, now we have that the scientists know basically that scientists, he built this one. And this one. I mean, he also prepared the data and explore the data. And he does the experimentation in prediction. Okay? He knows machine learning and he knows everything that, that analysts knows. And he know how to build their machine learning model. Now we have machine-learning engineer or a specialist. Basically, we can see that he does exactly the same what data scientists do. But I put two double check mark here because machine learning engineer is the one who actually build the model and deploy it. Okay? I mean, like most, his job is going to be here, here in this section. So the machine learning is the study of computer algorithms that improve automatically through experience and by the use of data, which is basically what I told you that I have data, I feed it to an algorithms and the algorithms learn by itself. This is a machine learning and machine learning engineers, the one who, who does the training and testing and stuff like this. Why I didn't explain data science and data scientists. And the first videos of this course, I did explain what is the data science and what is it that scientists in here? And here we can see that data scientists actually is the big boss. Here. We can see that the scientist is the big boss. He knows this one. I mean that a scientist can work as a data engineer and he can work as a data analyst and he can work as a machine learning engineer. Okay? Now you'll be like, Okay, if I am working as a data scientist, you might be working data scientists, but your job might be more of a data analyst. Sometimes more of as a data engineer. If you come, want to become a data scientist from the end you have, let's say your database admin, database, admin database developer. Okay, and you want to become data scientists so that the engineer field will be suitable for you if you are coming from statistical background or business analysts, that the analyst is going to be the perfect job for you. If you have a software engineering background or a programmer or developer, machine learning engineer will be suitable for you. If want to become a data scientist, you need to learn this. And this. This, sometimes you'll be working, Let's you're working in a company as a data scientist. You might be working more of machine learning and less of a data analyst. But that doesn't mean you shouldn't know this because we'll be supervising their job. Here in data engineer, his task is to extract and acquire the data from different sources and maintain it. Remember the example that I gave you and the previous videos about the NLP, that Twitter sentiment analysis of the positive and negative near neutral tweets. That engineer's going to build the API and extract the data from Twitter and prepare it in a CSV file and give it to the data analyst. And that analysts is going to analyze the data and visualize the data and draw some charts and statistical analysis and stuff like this. And then we have data scientists is going to extract the information from the data by building model and machine learning. Now, you'll be saying, okay, what is the difference between the scientist and machine learning? It's the same thing and be like Look here, it's the same thing. Then what is the difference? I mean, in this chart, you told me that everything let me give you an example. Let's say you're working on Uber. Uber company are left. I'm pretty sure you know one of them. Okay. And you have two persons. We have data scientists, and we have machine learning engineer. Sometimes when you call the Uber, the application is going to show you that five-minutes and the driver will come. She prepare yourself, everything is ready. You go downstairs. Suddenly there's this five minutes became 15 minutes, which means the algorithm that is working and Uber needs more training. So machine-learning engineer who's working in, in Uber, his job is to be improved. The machine-learning algorithms that are used An Uber. Now what about the data scientist? The one who's working at Uber company. He's going to check the timing of these people who are calling Uber. And let say if it was a school timings, that mean that this person who's calling Uber driver in this time, there is a big chance that he is or she is a student. So I might give him some student coupon, discount, some groups. These type of things. This is what data scientists is going to do. This is the difference between machine learning and that assigns the bargain. They're both going to use machine learning algorithms. Both are going to use R and Python and statistics and math and stuff like this. Okay, i'll, I'll, I'll explain it more. I'll explain it more. Let's say the data, let's say you work. Have you seen this? I don't know about this shift, but in this type of TV shows, okay, The chef, he got everything ready. See, the plates are ready. All the ingredients are ready with the exact same measurement. This is a data scientist. Okay. I'm pretty sure when he prepared his show, he asked his assistant that go to the farm and get me a fresh vegetables and go to the butcher shop and get me a fresh meat. The person who gets distinct the raw things are called Data Engineer. Here we are getting the data, okay? And then the data analysts get the data from the data engineer and prepare it exactly how the data scientists want. Let's say he brought two KG, two kilos of tomatoes, but in his recipe, neat one kilo. So the analysts is going to take only one kilo and he's going to let say, chop it into small pieces. Same thing. He's going to take only the rib-eye steak and prepare it and prepare everything, all the seasoning, everything, everything is going to put it here that the scientists are going to come and explain and do the dish, the final product, obviously the data scientists or the chef, he knows distinct. I mean, he's an expert, but he always needs his assistance. This is the similarity. I'm trying to give you an example between data scientists and data analysts and data engineer. Machine learning obviously going to be the guy who's cooking. But it's the same. Machine learning and data scientists, they're the same. And in a matter of learning resources. If you go and check and see all the courses on Udemy will be like data science and machine learning bootcamp, data science and machine learning diploma, data science and machine learning certificate. Because they're the same. But at the job, there is a bit different because other scientists more of a generalized job. Okay, hope this example clarifies for you the difference between data scientists and data analysts and data engineer. I will see you in the next video. 14. Skills Needed to be a Data Scientist: In this video, I'm going to explain the skills that you need. In order for you to become a scientist. Alice, start with soft skills or soft skills are usually learn after you practice. We have critical thinking. Why critical thinking? Because critical thinking is going to help you when you transform the business problem into a data problem. And then you're going to be like, okay, which features read I need from each database, should I take 2? This is going to help you a lot in critical, create critical thinking. Creativity in finding solutions. Your job as a data scientist is always finding a solution, okay, fixing a problem. A good communication or storyteller. And as a data scientist, you will know all this mumbo jumbo, math and coding skills. And your job is going to be to convince the stakeholders that whatever you're doing is going to help their business. So you need to take their permission. If there are stakeholders, are the managers, they don't approve, you won't be able to do this project. And a storyteller means after you do the project and you do the algorithm, and you find the solution, is going to be your job. And explain them, to explain it to them. Problem-solving, as I said, your job is going to be problems solving most of the time. And you will be always willing to learn because there's always something new, new software, new package. And you method Baghdad, communities, IoT and stuff like this, okay? And hard skills. Obviously the first one is math and statistics. Now, you shouldn't supposed to be math is, okay. There are certain subjects in math you need to know, but not everything in math and not everything in statistics. Okay? Ethical skills while ethically skills. Okay. Ethical skills, because you're dealing with data, you're, you're actually managing data. So you'll be actually working with very critical data, your employee data sheet basically going to know everything about them. So it's much better if you do not talk about this type of information and data with other people, especially outside the organization, because it's not going to be ethically Templar skills because you will be dealing with other department. Let's say you're working as a customer segmentation. So you go to the sales department, you get the data, you have your own data analysts that engineer, and so on. So you'll be able to, you'll be working with project manager, will be managing the project in working with a lot of people. Lifelong learning skills like time management, Pomodoro technique, how to write notes, this type of thing, because you'll be consistently learning communication skills, is also as mentioned here. But here it was mentioned as a storyteller, but communication skills, writing emails, requesting some data from that engineer and stuff like this. Real-world projects skills. You'll be working with real data, real project. So you need to have at least you've solved a real project. You actually sold a real project. We're working on a real company, okay? Machine learning is the core skill. Like what is an overfitting, underfitting these type of things. That a visualization skills you should know how to draw a chart and how to conduct make a chart and understand the chart and explain the chart, whether it was an Excel or R or Python or both. That wrangling pre-processing, which is a data analyst skill. But you should know that encoding skills Python and R. You should know Python or R. And our recommend that you know at least both of them. Okay? These are the most common skills in order for you to become a data scientist. See you next video. In this video I'm going to tell you how you can start studying for data science. Now these are the resources that I have. Then I used these resources to learn that assigns, you can enroll in a bootcamp. Okay? Basically bootcamp is going to give you a crash course from each section, like a crash course in Python crash course and Pandas crash course and Matplotlib crash course in data analysis, crash course in statistics and math, and crash course in machine learning. So if you're a complete beginner, I would recommend that you first take at least a theoretical course. And then you enroll into a bootcamp. You can check the Wood comes. Rather it was online or offline. But boot camps are still one of the best places to learn the science as most of them end up getting hired, okay? And then you have courses just like the one you're watching right now. You can take courses from wherever the platform you like, but make sure that that course is exactly what you need. We have something called path and track. Its tracks basically means a truck is bunch of courses. Bunch of courses in a sequence, okay? Most of the people, when they go to courses, Electric after I finish this course, what should I do? That's the answer that if you go to the path something called for example, data science in Python, truck or data science and, and our tract. So it gets enrolled to that truck and then you just start taking the courses in sequence. Okay. They kinda like the boot camps, but they are most of the time online. These are online and offline. Okay. We have webinars. You can learn data science from watching our webinar. For example, some platforms provide webinars from companies that use data science in real life. For example, if I have a company, a retail company, and I have used a real life, Let's say churn prediction problem. So I will be doing a webinar that before the, before applying this project, this was the issue in after applying a project, this is how I solved the issue. So webinars, I would recommend them after you finish a path or track or course. Because basically webinars are for those who want to know how the design is used in real life. So watch it after you finish the Bootcamp or or cars or trucks. There are some general webinars. For example, if Someone is offering a course and so they'll do a webinar. Basically they're marketing their own course. So you can watch that webinar, which can be a general webinar or general overview. So there are two types of webinars, okay? And then we have books. There are many books I would recommend that you start reading books and that assigns a few. Don't like reading books, then you can go to these, but I will still recommend to read books, especially research papers. And then we have articles. You can read articles from multiple blogs. Articles are still a good way for complete beginners to learn about data science and engage with the author and the discussion. And we have competitions you can enroll after we finish. Learn the data science. In order for you to get some money and real life experience you can just enrolled to a competition, you might end up getting a certificate or money. Okay? In both ways, you don't enroll to competitions and projects unless you took a practical course. So you can show your skills in these areas. I have used literally all of them and their grid. Again, you choose whatever you like. If you are one of the persons who would like to read, then books would be books and articles going to be very, very great things or one of the best places to learn, if you like videos then boot camps and courses, and pattern tracks. And now it's going to be the perfect resources for you. If you'd like to attend a real steady group or make a groups or stuff like this, then bootcamp is going to be perfect source for you. If you'd like to just watch it online, you can get an online Bootcamp or online courses like the one you're watching right now, or an online part and trucks. Okay. These are the resources that I have used to study data science. Now in this slide, I mentioned some of the certifications that you can get and the field of data science. Okay? This one is from EMC upside. You can take this course, sorry, the certificate with teaches an art. And I will leave the slides. So you can download it and you can just choose whatever the certification you like and enroll too. It depends on your background and criteria. Some of them are from EMC, some of them are from SDSC, some of them are from Microsoft, from Google. So choose whatever you like, okay, and check them in details. I just mentioned one of the best certification in the field of data science. And these are the best books that I have used in order for me to learn. The, you can read this book before you even start thinking about getting into data science. It will teach you how to think as a data scientist. You read this book after you take a bootcamp of the designs as this is a very detailed book, it's written that sounds from scratch. It's not for beginners. Select from intermediate, become beginners to intermediate in the middle, somewhere there. But it's not for beginners. Okay? Obviously is going to teach you Python. And we have this book. You can read this book if want to know how to use data science and business, which has I highly recommend. You can read this book to give you a general overview about data science. And this is also a great book. If you want to become a data scientist, you can read these books, basically give you a general overview. These books will teach you how to think as data scientists, how to dissect the problem into small pieces and solve it in data science point of view, this book talks about the history and some theoretical stuff about data science. By this, we come to an end of this course. Thank you so much for watching. Don't forget to review and grade the course five-stars if you think it doesn't deserve five strands, please contact me and let me know how I can improve this course and stay tuned for the upcoming courses. Thank you so much, buh-bye.