Want to be a Data Scientist? | Kumaran Ponnambalam | Skillshare

Playback Speed

  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

17 Lessons (3h 2m)
    • 1. What this course is about

    • 2. About V2 Maestros

    • 3. What is Data Science - Part One

    • 4. What is Data Science - Part Two

    • 5. What is Data Science Part Three

    • 6. What is Data Science Part Four

    • 7. Data Science Use Cases

    • 8. Data Science Life Cycle - Setup

    • 9. Data Science Life Cycle - Data Engg

    • 10. Data Science Life Cycle - Analysis and Production

    • 11. Skills Required for Data Science - Part 1

    • 12. Skills Required for Data Science - Part 2

    • 13. Roles in Data Science

    • 14. Challenges for a Data Scientist

    • 15. Building your skill set

    • 16. Looking for Opportunities

    • 17. Conclusion

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class

Data Science is considered the sexiest job of the 21st century. It is a hybrid field requiring unique skills. You have heard about iteverywhere, but not sure what exactly the job is and whether its a good career choice for you. This course aims to educate you about

  1. What is the life of a Data Scientist?
  2. What Skills are required to become one

Data Science courses and degress are expensive, it is a good idea to know what your are getting into before spending your money.

Meet Your Teacher

Teacher Profile Image

Kumaran Ponnambalam

Dedicated to Data Science Education


V2 Maestros is dedicated to teaching data science and Big Data at affordable costs to the world. Our instructors have real world experience practicing data science and delivering business results. Data Science is a hot and happening field in the IT industry. Unfortunately, the resources available for learning this skill are hard to find and expensive. We hope to ease this problem by providing quality education at affordable rates, there by building data science talent across the world.

See full profile

Class Ratings

Expectations Met?
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.


1. What this course is about: Hello. Welcome to this presentation here. This is your instructor, Cameron on in this section of the presentation, we are going to see what this course want to be a data scientist is all about. So let's get rolling. The demand for data science and data scientist has been growing day by day. It is considered the sexiest job off the 21st century. You know, the latest bus word and the ending You keep hearing about this fun all over the place on the Internet, on all your child's on all your emails, whenever you're talking whomever you're talking about and it is the heart and happening field in software, it has got significant projected demand for jobs around the world. Are you take into consideration any kind of article? Our newsletter that is coming out about projected job demands in the I T field. You see, that data assigns is one of the highest ranked jobs all over the world. For the given future, everyone has been interested in becoming one on you would see a lot of bus around those people talking about Okay, I don't want to be a data scientist. I also want to do data signs. But the issue is there are a lot off questions. They're not there in your mind. And what are these questions? What exactly is a data scientist? You? This is a pretty unique term. A data scientist. It is not like a programmer are an architect or a designer. But you see the word scientists coming into play in the I d feel So you're kind of curious about what is so unique about the job data scientist. What are the data centers do? Who? How does a data centers every day of work look like? Is he going to be doing a lot of programming and our lot off analysis and design is going to be writing documentation presentations? Are you just gonna be managing projects? You don't know what skills and training are required? Data science is a brand new skill set that is coming into place. You're definitely not. Are no learned about it in your bachelors or master's degree. So you have Dr Weir a few skills to become one. So what are the skills? What challenges does a data scientist face? And this is both a good and a bad thing. Sometimes data any kind of me job provides unique challenges, and some people love them. Some people hate them. So what does the job data signed this give you? How does it challenge you? And then are you going to love it or not? Is it a good fit for me? This is the big question everyone has in their mind in that You want to see if this job profile is going to be a fit for you So that once you become a jet data scientists, you will be still interested in continuing the job for a long time. How do you acquire the skills required? This is another challenge. How do we go and acquire the skills? These are brand new skills. So how do you get them, Andi? Finally about to look for jobs? What are the website? You know, how do you go searching for jobs and stuff like that? So what is the objective off this course? Is this the answers to all your questions are currently scattered all over the web. There are so many mid bits all over the web. A boat on about what a assigns is how I was going to look like one of the challengers. What skills are there? But these are There are hard to find definitive answers because there is so much bias and what people are writing about a lot off personal opinions are coming into place. So how do you actually designed the course? Provides you one place to get all of them, and that is the objective of the course is to put all of them together in one court so you can go through them on, then make an informed decision as to whether you want to become a data scientist or not. The course explains the field of data science and what it takes to build the skills for it . And then it's objective is to help you decide if you want to become a data centers again. The first question is, if you want to become a data scientist, and if you decide to become one, how can you become so? There is a if question, and there is a what question that will be answered based on this course. So that is the objective off this course, the expected audience for the course. Anyone who is thinking about getting into data signs software professionals will be the first category of people who would be interested in getting here project managers because all of a sudden they see that data science projects are coming into their picture. Mathematicians on, then a business domain expert. What discourse is not. That's the next thing on. I won't dispute. To be very clear, our friend is there is no technical training that is being presented as a part of the score . Was just giving you enough information to make a decision. If you want to be a data scientist or not, and there are no software skills thought in this course, I hope this course is going to be in for mating for you and help you to make a decision. So best of luck on DA Let's keep talking, Thanks. 2. About V2 Maestros: Hello. Hi. Welcome to we to Mastro's that their science experts. So this is an introduction to our company and instructor who we are. We do. Mastro's is training organization fully committed to data science education. We're not focusing on any other skills that were purely focused on data signs on We have focused on building deployable data. Science skills were focused on deployable data science skills that people can take on used in their everyday work. Through online training, we had also focused on delivering our trainings through online media. Our goal is to develop data signs practitioners who can deliver business tresses in commercial organisations. We are not into theory. We're not into research. We are in tow, developing data science practitioners who can go into commercial organizations on use their skills to deliver businesses on. We have significant industry experience in data science practice. So about the instructor and this is me. My name is Cameron Porn Emblem. I am a seasoned data science leader with a passion for teaching allowed teaching, allow, making courses and making skills with other people. I have 22 years of experience in software development overall on I have experienced in mission learning their basis cloud systems on project management domains. Onda. I have experience in teaching data science through online courses. Before you can look at my blawg at common p m dot blocks part dot com, you would find more details about me on some of the research work that I've been doing. So thank you for taking up this course and help. This is going to be off interest on and off value to you guys. Thank you. 3. What is Data Science - Part One: Hello. This is your instructor, Cameron on in this section. We are going to see what is data signs. Data Science is something we have been hearing a lot about. But what exactly does data science consist of? What? What is what is it really about? So we are going to see two things in the session on then. The first thing is about waters data, and the 2nd 1 is want is learning. So we're going to see some definitions off water. Things constitute a data science. So some of the things you are going to be seeing in this session are maybe things you might say are obvious Things inherent. I think you have been used to, but it is good toe. Take a second look at the definitions of each of them because they mean a lot at a data signs. They in fact, form the very foundation off data science. So let us go through all these definitions here. The first thing let us start about what is Data Saints, David Signs is the skill of extracting knowledge from data. We have something called data. And then there s something this raw and then you look at the data and you extract knowledge , Knowledge could be thought about that information inside on his signal. There are different terms being used to deaf person knowledge, but basically might something that you extract from data that is useful. And then you use this knowledge to predict the unknown. So you learn something about the past from data, and then you use that information to predict what is going to happen in the future. And that is what data signs is all about. David Sames. One of the girls is to improve business outcomes with the power of data, you can do prediction, but what is the use off? The use of it is that you want to use the data signs to improve business outcomes, and you're gonna be improving business outcomes using data. And that is what data science is all about. There isn't. Employees are technologies. Theories are drawn from various area broad area that is not restricted to a single domain. So you have mathematics in their statistics, information technology data. If the clever technologies programming languages, we actually use a host of different techniques and theories and areas. When it comes to data science And what is a data scientist? Another scientist is a practitioner off practitioner off data signs. When is a practitioner off? Their Since we're talking about somebody who uses the theories and theories and all the technologies and the skills of data signs to produce a better business outcomes Andan assigned this typically has ah, or should have expertise in a data engineering knowledge ticks statistics on DA any other in the business domain. Also on typically data signed. This investigates complex business problems and uses data to provide solutions. So the most important thing here is used ODA to provide solutions or data is the driver for a data scientist. So let us go into some of the definitions of data. What exactly are we talking about here? When we say data, what are the various things that you are do learn about when you're talking about data. So we are going to go through a set off definitions here again. They may be obvious to you, but it does go to say, take a second look at all of them. The first thing we're gonna be talking about is what is called an NDP. An entity is a thing that exists which would research and predict in data science. So an entity is a thing, an object, something that exists in the real world about which we are going to be working on. So you have a data science problem in the data sends problem. You have a set of entities that you cara bolt on. You do some research on them. You get data about these entities and then work on them to make you predictions. Entities always have a business contact. There is a business context, which is the business problem you're trying to solve in which the indeedy exists. So example of an entity like a customer, a customer off a business is an entity. Customer is an entity the most most popular entered A. I would see about whom we do a lot of research and do predictions. A person at a hospital is another entity. Now you see that the customer of a business and the patient of the hospital might actually be pointing to the same person, but they have different business contacts. So different business contacts means the same person. We bother about different information about the person the person might be doing different things as a customer, as opposed to what he would be doing as a patient. Entities can be non living things, too. Like, for example, a car. So card is on the like that are not out of non living things on which you also collect information and you predict stuff going on to the next item. It is characteristics what are characteristics. Every entity has a set of characteristics, so these are properties offered entity that is information about the identity us. We call them maybe static information because they're kind of bounded to the entity like name, telephone number, age. Those are all characteristics, often entity on properties also again have a business context at doing different business contacts. You quadra about different characteristics for the same entity or the in particular person place in that given business context. For example, if the customer characteristics you would bother about our age income group gender education for a patient, your body double again about AIDS, so the characteristic called age repeats. But now you have a different set of characteristics, especially to being a patient like blood pressure, weight, family history. So again there is a business context of the business requirement, which drives what characteristics you bother aboard an entity again. Cars. When you look at cars, you talk about make model year of the engine type of engine like four cylinder or six cylinder on the wind number of the car. So these are all examples off characteristics. You might also call them US proper piece. For example, properties I thought one of you bothered about is what is environment? Environment points to the ecosystem in which the entity exes are functions. Entity doesn't exist in a vacuum. There is an environment in which an entity exists. So in that environment or other entities, other entities of the same type other entities of a different type, like a patient and to be exists in in hospital, along with other entities off the same DeBlanc with other patients. That may also be other entity types, like doctors and nurses, entities that are non living things, like ambulances, an entity record. A system that is used to monitor patients might be an entity. So under these, all these entities exist in an environment so environment. The shadow immunities on multiple entities exist in the same environment, environment affects an in today's behavior, so that is the most important thing. The same entity might behave differently in different environments are even for the same environment under different conditions, experience in the environment. The same entity might be here, too. Friendly examples of anointment garlic for a customer, the country, the city, the world close. The customer resides in Persian again that maybe the city the climate with hospital where the patient is currently in for a car. It is. But the card is being used mainly for city driving on highway driving that becomes the environment that climate cars perform differently under different weather conditions like heart. Whether was no snow conditions, cars have different behavior. So all this case is what you see is that the environment affect how the indeedy behaves now comes and even what is uneven, uneven. There's a significant business activity in which the entity participates. Entities doesn't sit simply there. It does something. If somebody does something to the identity, that is what you call an even some business activity and even again happen except environment. You, an entity like a Persian, goes to the hospital and in the hospital. The entity is treated for are there isn't set off s that are administered of the given patient on. Then you have some results coming out of these tests. All these are evens. Example off uneven. Might be the customer browsing a website customer making a store, visit a customer getting a sales call from some company to sell something. All these are evens in the girls off portions. It is like the doctors. Was it a blood test for a car? The smart does that According to goes the comparison test. Like if you go to any of this car related websites, you do see that they do like comparison tests. All of them are evens in which an entity participates behavior. So even does something there on entity participates. But what does behavior? What does the entity do in the given? Even that is the entities behavior. So even an entity goes, does something on water. Best in that, given even is what he called the behavior of the entity ended. He might have a different behaviours in different environments and different situations on , for example, in the case, off a customer, a phone call in a phone call water where the customer talks is the customers behavior. The clicks room for a website visit like which links the particular custom website visitors clicking when he's browsing the website That has another type of behavior? No, The response. The customer has to say its offer. I was seeing years. No, the customer happy said. All of them are different behaviors off the customer patients no cr lighted and nurse cramps the potions complaining about something. The patients, you know, falling asleep, showing any kind of bs, any kind off symptoms. All of them are behaviors of the patient and cars likes good acceleration, the stopping distances, all of them represent as a form of behaviour off the entities. No, these are all that those things that you see in the real world, like the entities evens and behavior on Now comes introduction toe data 4. What is Data Science - Part Two: introduction toe data. No, there is something called an outcome. So what is an outcome? The result often activity deemed significant by the business. So you have events in the evens. There are entities and it is behave differently in different evens. But all these evens typically have some form off outcome which is important with the business on outcome is a result often activity the result of a business activity, for example, so on outcomes can be value. Outcomes are values, right? So outcome values can either be bullion like yes, no, that the particular somebody took a test. They passed our fail. It's a bullion. Berlin is basically yes or no on tape of data. The old can can be like a continuous value like a numeric value. Somebody took a blood pressure test for the blood pressure. Value is a continuous value. Can range anywhere from, you know, $100 under. That is continuous value or it can be some form of a classifications. Classes. Basically, somebody took a review off. A review of a movie Onda relating that you gave might be a class like excellent, very good group fair. But that is kind of a classifications are type that, as the outcomes can be off any of these different tapes. Examples of outcomes in the girls of customer where the customer do a sale is a bullion at the sale value. How much they bought it for as a continuous value continues, meaning the value or that you're a present is basically anywhere from 0 to 100 or 2000. It's a continues value patient, the girls of patient. The outcome can be the blood pressure reading, which is a continuous outcome on the pipe of diabetes. Depression is identified us into the class like diet A diabetes are Type B die, but these it is a class. In the case of cars, the smog levels is a classifications. The level of small globules like ABC. There's a classification stopping businesses. That happens because you do a test for a car, which is an even on them. In that event, you're measuring the stopping distances when you do jam the brakes and what how much distance it would take to stop come to a full stop. That is a continuous outcome. The smart past, our failures there is a bully and outcome. The type of car. Let's say that a sport corner of family sedan that is kind of a classifications, So these are different outcomes that happen as a result of some. Even so, outcomes are off. But the important thing in data science, because typically indeed assigns what you're trying to predict as outcomes in the future. You ever see more about that? Later? Now comes what is called as an observation. So what is an observation? An observation is a measurement. It's on my chairman often. Even so, you measure something about an even deemed significant by the business. So you basically measure and even measure important things and uneven that are important toe the business we're talking about. It captures information about the entities and ball. So given even my no multiple entities involved the characteristics of the entities, the BA curious of the entities, the information about the environment in which the even happens on the outcomes. So on observation is information about all these things that happens and uneven. You basically go and collect all this information and recorded in some form on observation is typically called the system off record. So you anywhere you go, you see that people are recording information the other days to record them in journals, some logbooks and stuff like that. No, everything is automated, computerized. There are scanners who are scanning these information automatically are somebody is entering into the computer, which are for murders. It is called the system off Record on example of observations are, in the case of customers, there's a phone card Rikard. It's also called cdss in telephone department. At transaction like a buying transaction, somebody goes to the store on ghosted by something he goes to the point off sale counter on the transaction is recorded there. Our, um, email offer. An email comes to you, offering some product at some value. No exciting. You do buy something so all of them are observations. If you look at a patient, Dr Bissett recur at test result, a data capture from a monitoring device. All of these are observations, different types of observations. And finally, let's look at the car. In the case of a car, a savage Ricard is an observation. The car goes for surveys and the end. Their findings off the mechanic are logged in the service record. A smart as a result, is an observation. So all of these a RAB diversions captured in some form and recorder and store. So finally we get to data what does a data set? Adela said, as a collection off observation. So every observation, records and even about the center of entities, a collection of observations for a business becomes a data set. Eight. Observation in a data set is typically a record this week. Call it lower logical record that look at physical record may be given. Observation might be recorded in multiple forms. Multiple user interfaces that could be like master detail relationships. All that is fine, but here we're talking about a logical record that represents an observation. Typically, you would have observations having an ivy like a transaction idea, test idea serial number, something like that. So a day I said that the collection of loose observations each record has a set of attributes that point characteristics, behavior outcomes. So if you look at the Excel worksheet, you'll see that typically every rule would represent like a record on an observation. The Excel worksheet itself is, a data said. On every door was an observation on every column is basically attribute that points to either one of the characteristics of the entities. The behaviors are all comes address that can be structured reader Expedia, police records, spreadsheets It can be unstructured. Also, Twitter feeds are an example of unstructured data on newspaper articles. They're not called to be like semi structured like email. So a data scientists, you typically deal with different types of data like structure data. Unstructured data are somebody structure data and breeder SanDisk elect and work on data said that is the bread and butter for a data scientist is data data and more data on data is collected as data sets collected, stored, worked upon on predictions are made based on the data sets. So, they have said, is the core off data science. Ah, what is structured data? The example that you see on the right side is an example of structure data data that where the attributes are labeled and distinctly visible. You see that every attribute in that particular you guys label separately, like when ready when the name currency I d watching number. Everything is Lobel. It is distinctly visible whether it is being used in the U. Y. Whether it has been stored in the database. There is what you call a structure data data being labeled and stored separately. It was easily searchable in, additionally, credible because they had labelled separately even in new do storage in a database, their student, different columns. So it is a vital right and SQL statement toe query This data. It can be, of course, your story easily in Terrible's tables, maybe, like database tables or Excel worksheets, it's easy to store structure data in general. Unstructured data, on the other hand, is not labeled. So there's going to be like continuous Tex like you see on the right side is a text country about a master three a car. So this is the continuous stocks in which the attributes are not distantly label, but their present within the data. So things that are highlighted, an ardent you see, are different attributes like compact compact is the type of the car hodge bags, a type of the car. A six speed transmission is the transition the courthouse, so all of them are present inside the data, but not distinctly label. So this is what you call us. Unstructured data continue sticks. There is no far matter but on your daughter is hidden are embedded inside that next. And quieting, of course, is not going to be easy when it's acquiring. We're talking more about not visual inspection, but we're talking about writing computer programs to extract information off. These are not going to be easy. Now comes the third farm with just semi structured data. What you see here, an example here is an email. So what is in the email is part of the data is structured and part of the data is in structure. So in the emails, you see that some my tributes are distinctly labeled. Like, you know, the from address 200 cc subject their best into Lobel and available as a separate columns are separate pieces of information, whereas others maybe he didn't within the endear texted use either. So you are both structured and and structured a that mixed up In the case of a semi structured data. Some examples of some instructor data could also be like example. Documents are semi structured data. Some information is available in attribute some information in the Syrian part of external documents. That's all examples off somebody structure data so in summary. What? What does that we have seen? With respect data we have seen an entity characteristics, environment, even behavior outcomes on observations and finally, data set. So this are the key, uh, foundations on which data and there's and just build upon. So it is good for you to know and understand each of them. So this completes this part off the section. We will continue on more A. None of the presentation. Thanks. 5. What is Data Science Part Three: Hello. This is your instructor Cameron here a continuing on waters data signs. We are going to be talking about what is learning in data science parlance. What is learning on which is discovering knowledge from data? The first thing we want to note as what is a relationship relationship again forms one of the foundations off data science on when we talk about relationships with talk about relationships between attributes. So after buttes in a data said exhibit relationships, that is, you have a non observation. You have a set off observations in a data set on attributes that you see in these observations exhibit what are called relationships, relationships, model the real world and haven't logical explanation. When we say model, the real world relationships are basically something that is happening in the real world. It is not something out of the blue that you would see something in the data set. The data said. Whatever data you have, the relationship that shows is something that exists in the real world. For example, age and blood pressure levels. The relationship between them is that as aged goes up, the propensity for higher blood pressure keeps going up. The higher your age. The higher might be your blood pressure levels on. There is always a logical explanation associated with that on. The reason in the medical field, they say, is that if you're more weight, obviously you allow more fat you Larmore clogged arteries, which would lead to higher blood pressure. So there is something that's happening in the real world, and there is a logical explanation for it. An explanation is a very important part of data science. When you see your relationship, you should be able to explain why it is happening, because that's when we can say whether the relationship is incidental or it happened by chance that does. It does exist something like that. For attributes A and B, the relationship can be like when a occurs be also occurs. You have two attributes and be so whenever is a current bees also agreeing, Let's say, whenever a sale happens, something else also happens like when a sail off a cellphone happens. A sail off a cell phone cover also happened. So things that happened together when a because B does not occur doesn't like the negative relationship. When you and be your kind of mutually exclusive mutual exclusiveness is again a kind of relationship. The 3rd 1 has been very goes up, be also goes up. So that's under that type of relationship and where a increases be decreases. So that's like another negative relationship. So when you're two attributes the values off these attributes, the values that are seen in this attributes show any off this kind of relationship. Not all entities will exhibit relationship that will be always some entities where you will see some relationships someone it is, don't exhibit any relationship at all. Other Golden Learning is to look for entities which together exhibit some form of relationship on relationships can involve multiple attributes to like. When a is present and be increases see will decrease so multiple activities together might exhibit some form of relationship. So this is kind of an overview. Off water relationships are now. Let us go and see what are some of the examples off for relationships like any. Take a customer as aged goes up, spending capacity goes up, so there is a relationship. It will age on revenue from the customer, so in age goes up, spending capacity goes up. There's a logical explanation that as age goes up, possibly the person is making more money so that spending capacity is also high. Now when we talk about relationships and data science, these are not very concrete relationships. You know, it's not literally like a farm lad that they happen all the time. Now those kind of things, like 100% kind off relationship is good. But what we see here is overall, in general, kind of relationships, like when age because of spending capacity. Koza. Not all customers, not all mould customers are going to send in more, but most of them, that is what we talk about as a relationship. The other one is our bun. Customers buy More Internet Bandwidth There is a relationship between the location of the customer on the bandwidth patches, but a customer again, possibly because they're doing more browsing and you look at the patient again. There are a lot of relationships you can see. Like all the Persians have more prevalence of diabetes. There is the relationship between age and disease level all way. Patients typically have higher cholesterol levels. That is, the relationship between weight and head. Really again, there are scientific reasons why these things happen. You would take a car. The relationship between the number of cylinders and the mileage it gives so more doesn Linda's, unless the mileage. Because there is more burning happening when there are more cylinders. Sports cars have higher insurance rates now. This is not a quickening relationship, but you will see this as a business relationship like sports card. Whenever the cars off a spite type a sports car, it's insurance rates are typically higher. So there is a lotion to between the type of the car on the insurance rates, somewhat things about relationships. One of the things you want to bother about is Benussi relationship between two attributes. Is the relationship consistent? Are the incidental relationships can also be said as patterns, patterns what you see in data patterns of behavior. Sometimes the pattern of behavior may be consistent because it happens all the time. You can repeatedly, when it happens all the time, you can actually predict such a behavior in the future. But as there could be incidental patterns, incident that relationship. Also, when it's a incidentally, it just happened by chance. That might be no logical explanation for an incident that behavior are an incident pattern . So whenever you see a relationship, it is very important for you to make sure that the relationship is consistent. Was is incidental. Consistent relationships are what you need for data science. Relationships are also called us correlations that is, the technical term that you will see are being used. Correlation between two are entities are two attributes is when with what do you see as when a goes up and be goes up, It goes up. And Biscoe don't Austin That is body called correlations of Correlation. Is that the tinkle mathematical term you talk about when you talk about relationship? It's and finally you could are people talk about signals and noise When it comes to data science signals are nothing but consistent patterns are consistent. Relationships you see in data Narcisse. Incidental patterns are incident relationships. You day in data. So if you have been hearing about these terms, signal and noise there, nothing about relationships, relationships that are meaningful was his relationships that happened by chance, which are not predictable, which are just incidental. So that is the difference. But being signals and nice now comes, what does learning We take talking about mission learning and this learning and that learning and all forms of learning. So what exactly is learning? Learning implies learning about relationships. That is the most important thing you want to know about data saints do. Their saints has mission learning. Learning here means you're just trying to learn about relationships between these attributes. That's what learning is all about. It involves taking a domain like a remain leg hospital domino business. Don't mind understanding the entities and the attributes that can represent the domain collecting data about all of them on understanding relationships. Being these attributes, this understanding relationships between these attributes is what learning is all about. So models is the outcome of learning. So what do you do after you learn about something is you build model about it Now? This learning when you're talking about here learning happen all the time inside the human brain were consistently collecting data inside a human being human brain, consistently continuously learning about things and continuously building models on. We used this models all the plane without even our knowledge. Subconsciously, we are continuously learning about things on what we talking about here in terms of data science is just learning this kind off made into a proper process on the learning happens outside the human brains in missions. That is what, as a little difference between learning that happens inside the human brain, and learning that happens with missions is as a more a process to wait. There's more data out of it, and there is more of a of doing it. So what is a modern? A model is a simplified, approximated representation off a real world phenomenon. So there is a really well phenomenon. It was happening. And when you do a model you're trying to first build a simplified moral. You're not trying to put too many things into the model. You just trying to take the most important things about the real world phenomenon on then, building a simply fight representation on approximated re presentation off the real world phenomenon. You can actually go on bill as complex models as he wanted a person holding, but usually pin people build models. They wanted to be simplified, so it brings out all the important factors you want to bother ever and ignores everything that you don't want to bother about. So it's a simply fact approximated the presentation off a real world phenomenon. It captures the key attributes, the key attributes of the entities on their relationships on Let's Say, an example of a Model could be to be a mathematical model. A mathematical model is something that represents the relationships like an equation. So you can write an equation that present relationship between the attributes like for example, you can come up. But this is a formal I got from somewhere in the world. You're is a farm. Love how you can do the mind. Blood pressure. This is an equation. So a black pressure record of 56 plus the age of a person in the 560.8 plus weight of oppression in 2.14 plus Israel level of oppression in 2.9 So what do you see here is you're trying to compute blood pressure from blood pressure's one attribute from three other attributes H weight and LDL. Now, this is an approximate competition of the blood pressure. It is never going to give you the accurate value a tall but it could be. It could be approximately close to the real world value So here's a formula which presents a mathematical model off how a blood pressure can be related toe. Three other attributes. Weight, age and Ellie levels. That could be another model, which is click a Decision Tree model. It is like a logical model where you ask a series of questions on the series of questions that you ask. You include questions about various attributes and then, based on that, come up with an outcome like you wanna be, you want to see. You want to predict something like buying a music City and for that thought that you can come up with a decision model like this if age off the customers. Legend 25 on Gender of the Customers mailed by Beyonce A city called Yes, So you used to attributes gender and age on based on them, you're trying to predict with the outcome, which is with the calendar customer by of Beyonce CDR. Not This is another type of mortals. Acura's. Your models depend on the strength of relationships between the attributes. Sometimes the relationship between the attributes are very strong, such that you can predict, like with 100% guarantee that Okay, if I see this. I'm definitely sure this is going to be the outcome. Sometimes accuracy is not that much. So in that case, you might want to combine multiple attributes NC if you can increase the accuracy level. Sometimes there is no relationship at all, eh? So it can be in any form or any kind of varying scale that you might get there. But model overall is a simplified approximated the presentation off something that is happening in the real world. 6. What is Data Science Part Four: Once you have a model, what you can do is prediction, so a model can be used to predict unknown attributes. Simple example. This year we already saw that there is a formula. Blood pressure equal the 56 plus agent 2.8 plus waiting, 2.14 plus earlier, Linda pointed a +09 So you have here a formula that relates for attributes blood pressure, age great and LDL know what this means is that if you know three off this for attributes, you can predict the foot one, so that is what we call us. Prediction. So when you soap a computer, you can say compute are you can say predict when he's a compute, you're guaranteeing 100% accuracy that you know, this is the formula when you're painting your mostly approximating. So you have four attributes three or four attributes here. If I know any three of them, I can just really get this formula toe. Compute whichever after, but I want Oh, if we know three of them, I can predict the 4th 1 This is what you call prediction. Prediction from a model double equation can be considered a simple prediction algorithm. Simple thing on dilation. Skips can be a lot more complex, leading to more complex models and prediction algorithm. So what you see in that the equations are very simply find model of us or something really simple as a problem gets more complex, each a little later, more complex, learning more complex models on more complex prediction algorithm. So that is what we have been learning this all about. Learning is all about data, sets, relationships, modeling and prediction. So let's talk about what I predict ours on outcome. So when you're whenever we're talking about our data 100 I sense you talk about predictors and outcomes. So what are they? Outcomes are attributes that you want to predict. So whichever attributes you want to predict, they're called outcomes, like in the year earlier formula. We want to predict blood pressure. It is called the outcome. Senators are attributes that you want toe use to predict the outcome, so you have a set of attributes. What do you want to predict? The outcome? Everything else that you use to predict the outcome, our car predictors so you might have 10 attributes in your data, said one of them may be your outcome, and three others may be your practice. I mean, not all attributes have relationship with the outcome. Attribute only those that have a good relationship with outcome. Variables will obviously become predictor so predictors and outcomes and obviously predictors and outcomes will show some form of relationship because that's all you can predict outcomes from them predictors. So learning is all about building models that can be used to predict outcomes, which is the output using the predictors, which is the infant. Here are some examples we are gonna go back to the same three examples. In the case of a customer, the predictors are age, income, range and location on. The outcome can be Is the customer going to buy your protect or not a patient? The printers can be age, blood pressure and weight on the organ can be. Is the patient that die? But they could not on example of a car might be like the predictors, maybe use things like cylinder, number of cylinders and acceleration on. You might want to predict where the car is going to be. A sports car are a family car. So these are what you call us predictors and outcomes. One of the most important things you want to know about is humans were submissions. Humans understand relationships and predict all the time that happens in the human brain without even week, we being conscious, aborted. We keep collecting data, we keep, We keep understanding relationships. We keep building models in our heads on. We keep predicting all the time, any any time you produce, you predict. Okay, I think this is gonna happen. It means that you're using a model that you built inside your head to predict something you say. I think it may happen. It's a week model. Say I'm 100% sure that this is going to happen. It's a very strong modern but human being can only handle the night amount of data, Right. But, for example, I shall keep shopkeepers. You have seen them. They know about their best customer of the longtime customers. They know what their customers like and what the customers want. Andi, whenever a customer comes in, they usually address them. My name and the immediately know what these customers want. Even with the customer asking for that, they're gonna go big, died. Um, and it would be them. But human being can only handle fill in the amount of data so they can know about the preferences off 100 customers. Not at like 10 million off them. What happens then? That's when machines are computers come into play, right? We want to store all this general in customers information in computers. Andi Let the computers learn about the preferences on help you. The missions come to come into play when the number of entities on the data boredom is large are huge and their incomes mission learning when you 100 or work with your computer's toe, collect all the data, do all the learning, build all the models. Ondo. The prediction. That s where we it comes, becomes mission learning. That's when it become mission learning, predictive analytics and data signs. So what does data saints, entities, relationships, modeling and prediction. So what is data cents? It's all about picking a problem in a specified domain. Understanding the problem domain, the entities and the attributes and the behavior and the evens collecting data sets that represent the entities we go collect all the data that you need, and then you discover relationship from the Reiter. That is what you call learning on when computers do this. It's called Mission Dunning. Permission. Learning is not something, although the world is nothing about. It's all aboard missions. Learning about certain things are discovering relationships from the read eight as and then build models. The president relationship. The mortal can be like a mathematical model. It can be a decision tree model. There can be other types of complex models to, and what we do in really build models is we use past data When you know about the protests . You know the auto, the outcomes. So you know the values of craters. You know the values of the outcomes. Andan using those values, establish relationships on from the relationships you build models. And once you build a modern, you can then start predicting You can start predicting for the current or future data when you know the prototypes. But you don't know the outcomes, so use the past to learn the build models, and then you predict the futures when you don't know about the outcomes. Here is an example of what website shopper would do in the case of greater signs. That isn't example, the problem would be to predict the shopper will buy your smartphone on what they're going to do about it. You get all the past portraits history of all the shoppers, right? You collect shopper characteristics like age, a gender income level. You collect seasonal information when they do buying, like what kind of things they buy during winter was a summer. Was this Halloween? What's a swell of Wednesday? You collect all those 11 data that is there. Then you build models. You build models that talk about relationships, about what goes up or what comes tone. When the customer buys on, the customer does not buy. So you basically tried to let the other attributes that you know so the outcome. So you look at all the values off the other attributes when the customers are buying, What does that values of the attributes when the customers are not buying? So you see that a dame the value off on a tribute age it's greater than 25. The customer buys the value of ages lead less than 25. The customer does not bite. There comes a relationship. Let's try to use this relationship to build a model And then you try to predict, which is whenever you see a customer who's age is greater than 25. Yeah, this guy's going to buy that. So you make predictions. So when a nuke shoppers browsing predicted, the shopper will buy, you use the model and predict in real time. But the customer is going to buy a product or not on. Okay, what I will do with the production now that you know the customers know going to buy are not going to buy is you can do some actions like you wanna offer Childhelp These days, whenever you go to any website, you see that a small pop up comes up and say, Do you want to talk to your live agent? So live agents are costly. They're human beings. You pay them a lot of money, so you only want to offer live, age and help. So shoppers who you think are going to buy your product so you can make an intelligent decision as to which shopper you want. I want off live Agent. Based on this prediction, this is an example of how data signs would work for you. Thank you 7. Data Science Use Cases: so hello. This is your instructor Cameron here. And we are going to be looking at some of the data science use cases. They don't see how the world is benefiting from later science. The use of data science is growing exponentially. Every day has been growing exponentially for the past few years. I was spreading itself across multiple domains and, like business signs are finance and impersonal life. Also on a recent advances in computing power. In terms of hardware, in terms, off software, a lot off opens or so far is coming into the world like the whole are dope ecosystem on predictive algorithms. The combination of all of these have made it very cost effective for you to apply data science in commercial use these days. Okay, let's see some of the examples of using data science. The first letter start with finance for finances. All aboard making money on saving money. So fraud reduction. Credit card fraud reduction is a very important application of our data. Science is being used. So what happens in credit card fraud is that credit card fraud exhibit at tint certain patterns in which they happen whenever you look at transactions that are related to credit card fraud. They exhibit some pattern, some kind off a relationship between the various entities and their attributes. And it is these patterns that are basically captured in the historical later on. They are used to build models off fertile and transactions. So the historical data has good transactions and fraud transactions, and there, then used to build models as to how a fraudulent transaction is going to look like. So whenever a new crime section occurs, that transaction is immediately of elevated. Using computers, using the model to come up with what he called us a fraud score. A Fraud court school basically tells you whether the particular transaction is a fraud, fraudulent transaction or not. It is a school, maybe from 1 200 on whenever there scored causes, especially threshold. It's immediately flagged as a prop. Possible fraudulent transaction It is. Then some action is taken like the calls are being made to the credit card owner as toe asking. Whether isn't doing all these transactions. Sometimes the credit card is immediately blocked from further transactions until they make the verification. So there are some actions taken like this So far, the direction is a very important application for later science in the financial world. The second application you would see is about retailing, So you will see that whenever you go to a website and do your shopping and put some items in your shopping cart immediately, you see some recommendations coming up. Like in the case of a Maison, you would immediately see a recommendation like items frequently brought together. How do they make this recommendations is again? Items exhibit patterns on how their freak brought about together, like cell phones and accessories books, some items that are frequently bought together. They exhibit those affinity patterns. So based on that the bill, what are called affinities course between the items. So between any para five attempts that is an affinity score assigned. The higher, definite is called, the more frequently these items have been brought together. So what happens next? Whenever one off those I attempted bought by a new shopper immediately, items with high affinity scores toe that item order like them are immediately recommended. So you used the videos course to recommend more items to the Sharper, with the idea that if the power shoppers have bought them toe together. Possibly that's how the next shop. But it's also going to do and that value to do more cross selling and absolutely contact center. So we have contact centers, which have been traditionally used for customer service. The use of contacts and there's have grown today to dome or sales a lot off, more high end sales and support, and they also started using data signs to improve their performance. And how did or did they do that is this They have started scoring colors. As for less agents, so past interactions are used to score colors burst on their value value in terms of how much the business value was, ah, war type of color. They are how much business they have already done with the with the company they're using. That was called the colors. They also excusing brings course for agents based on the ability to sell high selling organs. Was a low selling agent or agents who are the ability to handle a specific type of problem , like agents who can handle problems in the specific product are specific type of let's and network issue was is a phone issue that things like that So what did then do is they're trying to mak the right colors with the right agent. Based on this course on idea is, once you might be right, call us with the right agents. It is going to optimize your business outcomes and then call recordings with so car. You see that whenever you are talking to a contact center, they're always going to say your call may be recorded for quality purposes and what they do with these call recordings is they're gonna play machine learning algorithms on these recordings to understand the quality of the call on outcome and use them for future enhancements. And finally, we look at health care now predicting disease operators been a friend. Dusting thing that has happened are flake is you can predict disease outbreaks by looking at what people are searching in Google and what they're tweeting and twitter. So data set this collector from public domains like Google searchers and Twitter feeds and stuff like that on these data is always linked with the location information. So whenever you're googling something over, you know where you're putting something. The location from Mario doing that is always collected, and then this information is then collected. Like what you're putting about our water that you're Googling airboat along with the location toe, come up with Pat. And so are people doing these kind off queries on a specific disease from a specific locality. That item wanted that on the more the moment you start seeing some patterns off toe, people are tweeting more about a specific disease specific location. That is a possibility that there is an outbreak that is happening there. This kind of information is now being used to start predicting this is objects. What is the good thing about making predicting about disease outbreaks is that the government can create in a more proactive manner. You see that this is starting toe or breaking a specific locality. The government can immediately start marshaling its resources to start sending some preventive care. Um, or many sends more doctors. Stuff like that there can organize, like a couple of days in advance on prevent more or brace that is happening in the same area. So don't assigned is helping to prevent our at least manage these disease outbreaks. So these are some of the interesting applications in data scientists is like a very few popular application. That is, in fact, a lot that is happening there on duh. I hope that you will be able to do some more reading and industry and all of them. Ah, and in the near future, thank you. 8. Data Science Life Cycle - Setup: Hello. This is your instructor common here. I'm in this section. We are going to be seeing what a Data's Signs projects life cycle is all about. So we're gonna talk about data science projects what its activities are on, how they are sequenced. So let's start with some introductory notes. Data science efforts are typically ex Uranus projects. So when when any of the many companies are business wants to do anything but data signs, they typically create projects like people wanna build software. They create software projects on for project. They set of some objective, some gold and then go about executing them. Similar to that, they had other signs. Efforts are also executed as projects. So one thing to note here is that data science project should be considered like research projects. They're not like building operate projects they're not. They do not have things really certain stone that you can just go and execute and get away from it. These are research projects. There are a lot off thinking involved. There was a lot off reworking board and until you achieve the objective so they should be considered as research projects, not like software build and operate kind of projects. The projects are starting inundate like any other projects they do on the projects. Do have faces and activities on transition happens between faces and activities, and it has sent projects involve a lot of back and forth between the faces. Then it's morning star, like really of waterfall model. It's more like an iterative model if you want associate that with something related to software development. So in this section we will talk about what data saints, project faces and activities are. What is the importance of each of these activities on how their transition kind of from one to another, and also some of the best practices? We just going to talk about them? So here is an overview off the reader. Science projects and activities you'll see there there are, like full, broad categories or stages in the reader. Since project that is the set of phase on, there is the data engineering face the analytics face on the production phase in the center phase, you just prepping up the team with so what they have to do. The data injuring for years is all about getting data and training of data on working with data the good way. Shape bar. You can do the third stage, which is the analytic stage. So Alex is all about exploring the data and getting some meaningful information or the Fed . So it's all about learning and predicting on Once you do the analytics face and come up with some sort of recommendations, you can then go to the production stage where I actually build out some data products that then do everything you just did in an automated fashion and in a repeatable fashion on keeps producing you outcomes that you desire. I'm only going to the first activity face, which is the set of phase. The first thing you want to go in any innocence project is what you call the goal setting for the Innocence Project. Every day, the essence project will and should have a gold. If anybody wants to. What a dozens project, which is like, Okay, let us look at the data and see what you can get out of that. That project is doomed for failure. Data Science Project should have a specific gold I do for the team to go after. So the team's effort will all be focused on achieving this goal, and the activities will also be based on what do you want to achieve so room. But there's that Projects without goals are drivers, our cars without a driver. So if people somebody cause going to come and tell you that, Okay, we're gonna do what it has since tragic. Just look at the data and see what we can come up with. That project is going nowhere. So that has been the experience off many, many people who try to do. They doesn't projects some of the examples of gold setting our leg. There's no predict which customers will churn in the next three months. That is a goal group that treats that we're getting about our company and then group them based on the sentiment off the tweets are identify patients who have a possibility of getting a heart attack in the next three months. So you're gonna predict the customers, Joan, you going to predict the sentiment of the tweets? Are you gonna predict patients who are gonna have heart attacks? The girls can be anything like this, but the most important thing is to have a well defined goal before you start on your project. The second very important thing that you want to focus on is understanding the problem domain. Unlike software projects, even in software projects, I would say that understanding the business domain is a zoo. Good thing in the case over data science project, it is necessary for all the team members to have some basic understanding off what the business problem remaining is all about. So when we say we need to, we were talking about a problem coming. We're talking about the business basics like you're in the finance feel of the Sierra feel or the medical field, understand some basics about the business, you know? What is that business all about? How does that business make money? One of the business processes involved in what are the workflow on some of the key performance metrics in the business? And that is very even in a larger data science teams. There is always somebody called us don't mind export. I don't mind. Export is a very critical The My expertise is a critical part off a data science team, so large teams typically might have a domain expert who may not be a technical guy is not a static sit as guy, not a programming guy, is just somebody who knows the business. You keeping Keep him in the team in orderto help you with understanding the problem. Domain submissions? No, this is an important thing. Missions only noble numbers and strings. They only do garbage in garbage. They need humans to associate any meaning to these numbers and strength. The mission Don't missions don't understand business. Human beings understand business on in data science. It is important for you to understand and validate anything that there is going to come up with and that can only be done by humans and for the humans to do that, they need understanding of the problem. Knowledge of the domain helps the teams to understand the entities involved the relationship, the patterns, any kind of knowledge discovery you do you need to validate them. And the violation can only be done if you know what the problem don't mind is all about an adult. On this understanding of the problem, domain helps you to validate all the assumptions. More importantly, does you do identify error So the data has some matters. How do you know? What if For example, you're looking at a day Dan, and let's say the age of the person shows up 600 years. The moment you look at it, you know that extended isn't wrong number because there's nobody who's 600 years old. But you can only do that because you know, the domain age is a very commonly used terms. Everybody understand what it is about. What? What about something like cholesterol level? How do you know what is a valid cholesterol level? What is not a valid questionable? If somebody has an illegal off 1000 is it possible? Is it a normal number into the high number Isn't an invited number? You can only tell if you know the domain, and that's why domain expert is is required for you after understanding the domain. The next phase is understanding the data associated with the data. We have seen enough over data and some of the other sections. So here, back to it, business processes on book flows generate data. A lot of data, some captured, some not captured. But wherever the data is captured, there are multiple things like the Application data 100 that you do in various entry applications that are reports There are visualizations. There is automated data coming from Since our data feeds, there are Web clicks that you get in a browser. Every click is also one of the data feet that our point of sale transaction that have been recorded and there are social media our data feeds. Also, it's all of these are business data that is being generated through multiple sources. They have been stored in multiple systems. Some are on the cooperate network. Summer on develops off. There is data everywhere which you might want to use. Data, of course, can be structured, unstructured or semi structured. This again, we have seen this before on data have different origins. Are there sort of different cellos And they might have a lot of logical relationships, relationships, of course, or the key to any kind of diner management understanding, data Understanding what data you have is a very important thing for a data scientists. What is that you want? Understand about the data? You won't understand the source of the data. How relabel is the data is it is it is at machine generated or is it entered by the humans ? Human beings Is that a possibility of somebody? Man? Uploading the data entry are putting in drawing data and getting away from it because our how good the data that you're gonna use for your analysis is what is going into the mine, how good your predictions are going to be. So data has to be valid your to make sure that that data is not man operated by somebody. For some other reasons, you need to understand what kind of processing and transformation steps are performed on the data. Amore reportedly has some data that have been discarded by somebody during the passing because they thought it is not important as some duplicate data making its being to the processing. Are you losing some data because you're doing some summary ization or not? All those things you need to understand of old Lolita how the leaders toward other student enterprise databases clouds Neuf those feeds how the data is synchronized between these different sources of data. You know, when somebody and as data in place a day there might also be going in place be so what? They're really synchronize between each other. What are the relationship that exists within the data. I know. Let's see what kind of things? Like the foreign key relationship between the data, The i d here should match the I d there and stuff like that. Ordering off creation when is ordering, you know, use like the first ordering something like, Okay, on the agent first goes and enter something in system may. Then he goes, and kids are according system be. Then he does something insistent. Steve, this is where the understanding of your business process helps you to understand how the data is being created in what order is being created. Also on understanding data helps the team to identify possible sources off your predictive patterns. And where are you getting these patterns from Rio always violate whenever you see a part on whether there is valid or not. So it is important for you to understand how the day does coming from and how it was created. Understand your patterns themselves. Sometimes the patterns might be created because of the building. There has also been created. So things that are real complex at this point to explain. But an understanding of the data in general is a good thing to have for a data scientist 9. Data Science Life Cycle - Data Engg: The next phase we're going to talk about is the data enduring face where you set up your done always set up and data engineering. It's all the dirty work that you have to do to get the data from various today to the form we wanted to be. So there does out there all over the place. Therapy unmanaged. You gotta get that data together. Get your act together, get all the data together, beat them up, put them all to one single logical nice destination where you can then do any off your further analysis. The first stage in data engineering is data acquisition acquiring data. So where your job is to acquire daughter from different data sources that they may be enterprise data basis, Like maybe sitting in an article database on my sequel databases, it now might have to be done through cloudy piers. There are a lot off obligations on the cloud. They give you a P. A is on the cloud like salesforce, for example, you got to go and get data through the AP. Eyes read. I might be coming to a scanner feeds off sensor feeds like barcode barcode scanners. It may be coming through social media, you might have a download. Social media like Twitter and Facebook. All of them are sources of data. Each of them present a different kind off use case in a different kind of challenge for you . Sometimes the data fits might also be coming in real time. It may be coming in bulk. It might be coming, introvert. A data also. So all that creates different problems for you. One of the most important thing about data acquisition is sanity. Check checking, making sure that you have got all the data that you need. And there is no data that is lost in the transport layer. Eso the tannery. Test checking is an important part of data acquisition. It is a most cumbersome and time consuming step to set up why it is cumbersome. One time consuming to set up on not saying toe acquired to set up is because when you have all these data sources, what comes but that is things like security. There are people who are owning these databases. There are security policies involved. There are sharing policies involved. So you gonna spending a lot of time establishing connections to the missions involved on the human beings who control the missions on this can be really time first frustrating. Because I guess the data scientists, if you are really close, right, greater that us like heaven. If you already are Indictees department on the door. Esos It's also 90 department. Possibly You don't have a lot of issues, but you are not in the energy department, you or maybe a concert and you're in a different department on your data is sitting there and enterprise data basis. It is sitting in the cloud. Then it becomes all the more cumbersome talk to all the people in Wall explained to them why what data you need, what you need the data and what war former editors and getting them to share the data on going through all the organisational crap is going to be a lot of time and effort involved . So this is a very cumbersome, frustrating This is the day they worked at your to do data cleansing. Once you get the data, you had to cleanse it. Why do you have the cleanser? Because data have different degrees of cleanliness and completeness. Not all data that you're going to be getting is clean and complete structure data from corporate applications like, you know, sitting in the database are actually clean and complete, so you don't have the very about it. This already clean, already complete already in the former. Do you want them to be? No problems but data that you're getting from Internet from social media from Voice Transcript all of them might need significant cleansing. You know, there has dirty incomplete on all kinds of multiple formats on Let's say, if you look at any of the Twitter feeds, you know, they're not complete sentences, that a lot of abbreviations and Parkins of things Junkin sitting there, they're all needs to be cleaned up, examined and missing data. That's another big important point. What about missing data? You might be missing attributes for certain. Collins are sort of maybe missing values for certain attributes. How are you going to handle them? Are you going to give them a value? Because if you put something like a main there, for example, your mission learning algorithm doesn't understand any is gonna think inmates under that value if you put zero as a value for some number. Your garden was gonna take. Okay, zero is some value. How do you tell you? Mission Learning algorithm zero means and not available Another where says it has got some value. It's not a easy thing to do. A lot of times you have to put a replacement and before they die in there and they made affect your mission learning algorithms. So missing data handling is a very key decision to make here. Cleansing example are like you're normalising date formats right there. Sometimes realize an imam dd dd mm Oil over mm really know all kinds of former. You want a normal ease and standards them toe 14 months before you can start using them standardizing on decimal places. Sometimes the data is coming on 1.23 Sometimes it is used going to use the exponential format for a number. And all that needs to be strategy is once again under the classic. One is the last name. First name was the first name last name. How one names represented in the data. So So you're getting know what farmers they are. All of them needs to be like standardized. There is one part of the cleansing process on. More importantly, if you're getting like text feeds from somewhere, you have to do a lot of cleansing for text that that's a whole demining itself. What do you do with text cleansing? That's all that work needs to be done before you can start using the data for any other analysis. Data transformation data after a cleansing might have to be clamps on toe. A different former are different shape before you start using it. So the reason for data transformation is extracted information from data while discarding unnecessary baggage. What does unnecessary baggage is against the mind by the girl with what You're searching the data. So if you don't need some data, we don't need some levels of details. You can summarize them and discover all the unnecessary baggage that is their typical aided Moore's processing and summary ization. You try to summer associated logical activities. Levels on transformations help cutting down the day. There's signs on many mazes for the processing used idea. Why do you want Oh, so some transformation is you want it with that data into a shape that you wonder can understand better, like you can collapse a number off the course into one logical records that represent the entire thing that happened from examples you might want to see here is that visitor comes to a website and he clicks a number off the pages in the website. You might wonder somebody's all of them into one single record. But if that's all the level of we did you need, you might want to do some language translations between multiple languages. If there's medical sensor that that that is coming, let's say there is a sensor that is capturing your blood pressure every second and sending you and blood pressure reading. Maybe you want to summarize it by interval. You can take a 30 minute interval and then summaries and say in this 30 minute interval, what is the maximum trading? What is the minimum rating? What is the average reading things like that and summarize it. Also, it I can depend someone your use case, what kind of transformation you want to do and summarize it. In this case, summaries After transformation comes data and Dishman. Embellishment is about adding some additional attributes later that improves the quality of information. You wanna add some more information to your data that can make your analysis a lot better. So what kind of information that you can add? Ah, for example, you can get information, the demographic information from a customer database to a point of sale transaction record . So the point of sale transaction record is just gonna have your customer name, your customer credit card number and what products he brought. Now you can get the customers demographic information from 1/3 party that I be like one of those customers aid, you know, marital status, education, income levels. And you can attach that to this data. Once you had I said that to the data than what he can do is you can do some analysis as to which products people buy like people. Let's say milk, who buys milk other people who are male or female is the people who are over 20 fair below 25. You can do all those kind off analysis once you can. Endless chore data to our tradition. Information. Things like you can no logical groupings of patients by past medical history, like you can attach a patient's past medical history to his current visit. Then you can look at and see, You know how people spar with past medical history. Different kind of medical history, perform, are, are are walking off things you do to them. So encouraging data is a very important step. Adding more data, more meaningful data gives you better insights into what data you have there. And once you're done with all of them, you're going toe. Persist your data, but you save your data in some need. A sensible fashion process. Data is stored in a reliable, retrievable data sync. So you want to process all your daughter and put them in a nice relabeled retrievable data . Sync on all the liver information captured in a single look record as much as possible. You have data coming from multiple different sources. The best thing for you to do is if you can get them all organists as logical record like one single long record that contains all the information you need. You shouldn't be doing a lot off foreign key kind of things. You rather want toe de normalize them and put them all in the same record and put them all together. So further questioning and analysis is made really easy for you. An example, would be like a little souls transaction. You can take the point of sale data. Are the customer demographic information to it on the item characteristics to it, like you have the item that is purchased, you can say type of item. It's diary in a working, updated Does stuff like that and you can also add, like sales associate performance information to it so that you can are then new analysis off a Sales Associates performance based on the product being sold based on the customer demographics and stuff like that. So you can wanna put all of them together in a single straight record and store them. That is the step that has called the data persistence and finally, are scaling inquiry performance are pretty important factors. Of course. There's good in tow. The data architecture domain where the Data Architects is. Job is to architects. Job is to design your data, sing in such a way that it can hold all the data that do you have and has got a reasonable scaling. Its got good quality performance and all of that to help you in the next step, which is the analytic steps data of course, you can store them as flat files, traditional SQL databases. And then, of course, today you have all the big data technologies like Hadoop on Hard Open its databases, like hedge base that you want to store your data. So this complete the second face off a data science project. 10. Data Science Life Cycle - Analysis and Production: Hello. This is your instructor Cameron here are continuing on the data science lifecycle. That thought phase is a narcotics where you're trying to learn from the data and do your predictions. The first step in analysis is what is called exploring three data analysis R E d A. In in shot form. A very famous short form in data science. What are you going to do when e t. A. Is? You want to understand individual attributes patterns that you take an age as an attributes . You won't understand things like the range minimum values, maximum values, the frequency distribution, me, things like that. The next thing you wanted a was understand relationship between the attributes like what does the relationship between age and you're buying pattern relationship between income on da gender, things like that. How does one change in one affect the other? In other words, you're turning all about relationships in this face you're trying to do. Some graphs are trying to do some analysis and understand more about what you see in the data. You then do. Reasoning is the behavior explainable? Whatever relationships up in patterns you're seeing in the data, is there an explanation for why it is so it does not. If you don't find an explanation than possibly, there is a possibility. Often better. Or maybe it's a new pattern. That's something you want to discuss and then figure out you are. Look at our players and then decide what you want to go with them you want, whether include them or exclude, um, are depends on bone. What the Outlier Valley West. And it's a use case by use case basis. You decide on what you want to do without players. Possible errors in processing you can only find but exploited it and listens. That is a very good use off the process. Let's take an example again on off patient waits. We just discussed a few slaves back. The moment you see of eight off like 600 you immediately know that there is something wrong with that. There is a possible error. It was also the what you call out players suppose there are a couple of patients who are a 70 75 years old. Everybody else is like 40 lesson for 40 years old that maybe you want to decide and eliminate those two records without players. That is one possible our client processing. You want to go on you? Of course you want understand relationship between the patient Wait and on the diabetes level, the cholesterol level on the family history and stuff like that. And finally, you violate your findings with the domain experts When say, Hey, this is what I'm seeing in the data. Does that gel with what you already know about something new, you want to talk to them and understand how things are. The next step is inferential analysis. What do you do in inferential analysis is look for signals. You know, you look for patterns, you look for consistency in those back and you look for correlations. You look for reasoning. This is kind of an overlap with explore a treaty down. Unless is, this is more in depth and more focused and more methodical that you do here in French in analysis, then you check and see if the patterns are consistent and reproducible. What you mean by consistent is do you see the same part on month after month after month? You see that that's a As the rate increases, you see that cholesterol levels increase does it happen for your patients? Every month, every month you get a new set of patients and you keep seeing the same pattern. Do you see the same pattern across? Let's see cities across countries across different races, all that as a part of inferential analysis. And then you do some statistical test to see that the findings that you see with the data said that you have. Can that be extrapolated to the India population like you have data from San Francisco can the same, and this is be with results, be the same if you extrapolate it so they and their us out of the entire world are they going to be different? It's all those just you do as a part of infringing analysis again. And let's take an example of patient. Wait, was the diabetes. You do the all this in French In analysis like you might take fast data from one state that California do the analysis and then see how California compares with New York at the patterns of seeing R Calif. Are the then you look out races. Look at Asian Americans to Asian Americans in California showed the same pattern location Americans in the New York. Our donation American showed the same pattern as African Americans. Worse is other people. So you do all these kind off segmentation and then you do all these profiling during the inferential analysis on do you come off it and valued all your findings during this process ? Once you know, inferential analysis the next stages modelling. This is where all your mission learning all guard comes kick into play are you are play early immersion learning all gardens to build models on what you do in model building is your typically tried to build multiple models using different algorithms on different data sets. This is all the techniques that are there and mission learning. There are some techniques about how you can segment your data sets and the multiple substance and then use them to build models and test models on. Then, how different algorithms can be used on this is all the domain off mission learning is all about. If you take a course in mission learning, it was just one line that has bean explode through the entire course. You, of course, have to test your models were a crazy again. Their methods for how you do that in machine learning you Finally, I am. If I your best performing models when we say best performing, we talk about accuracy. We talk about the response time and the resources used, so you have to again make some tradeoffs. Yet as to what your best performing model is all about, let us say one model against you. 80% accuracy on it takes one minute run. That is another model that gives you 85% accuracy, but it takes one hour torrent. So which one is more important to you? Are the more hung up for the 85 or 80? Odd. Is it okay for you to have an 80% accuracy but have a reasonable response time? So we heard about it. Look at all these three things, like the accuracy, the response time and the resources used. The computing power that is required. A building models Andi. Then to say, what is going to be your best model so model that you build at the end could be as simple as a decision tree or equation. It can be asked complexes. The neural net for depends on the problem and the data in question. So but at the end of the process, you do have a model that you select based on the different algorithms and the different tryouts that you know affray, 1,000,000,000 models. Then you're going to go and do all your productions using new data again that adverse have you can test the prediction, test your models again, a part of the mission learning courses that you will see. You have to keep validating your model accuracy. So you just join build one model tested once and get away from it. But you're gonna be trying. Befriend models are sometimes even combination of different models and then see which one gives you the best accuracy possible. You're going to be trying that my people are tires and variations in this process of trial . Maris again the best time you can use your There's a lot of that's why I call it the research project. At the beginning, you're gonna do a lot of research year tray of different things and see which one works best for your specific project, a response time, resources research, all of the mechanical, especially when you have to make predictions in real time like a Web search. Sharper has just got into your website and is browsing through your website making clicks, and you want a predict in real time. But the shoppers going to buy are not. Those decisions have been made like in real time, you know, within a second with as minimal results as possible. So you're a pick your algorithms. Based on that, you want to keep measuring improvements. So as you keep working or different combinations off the production of guardians of traditional governments have two parts. One is the model building part, and second is the prediction part. So you have to look at both of them and see whether there are better at both of them. Sometimes some production of goddamn takes more and the build model, but they may be very fast doing the production parts off different things there. So again, you have to keep measuring all your algorithms how they perform, and then they keep comparing them and then see which one is the best one you want to choose . You might even have simulations. Assimilation may be as simple as mathematical simulations, or you might build software that can similar certain use cases. Assimilation is used to validate whether water suppress your garden was saying that in this given situation this could be the outcome. So unit is similar there that can, similarly, that environment. It can similar what the NDP is doing their environment and then see if the outcome you're predicting is what you're going to be getting. So simulations are complex piece of software. Sometimes people don't build them to validate the predictions. Once you do all these model building on production, the last step that you do in this case has come up with a set off recommendations. What do you do here? Is that at the end of this project, a recommendation need to be provided the project owners Okay, on what you have done, what are the algorithms to be used and where are the expected benefits? So all of them, if you put together in a nice presentation and present their toe the product owners and here comes to catch another science project made have no recommendations to make it the data's that does not exhibit any explainable patterns. We have been talking about the essence all being about learning from relationships. If the data that you have do not exhibit any pattern, any pattern between the outcome on any other variable. If the outcome is not predictable from the data that you have, there is nothing you can predict Desai. Simple as that. That does not mean that the data since project is a failure. You can have a product with chills. Let's look at our customer database and see if we can predict customer churn at the end of the project. We so you can come up and say, Based on the data that we have, we cannot predict customer Chung that it doesn't mean that the Essence project is a failure . The neuroscience project will only work if the data has burdens, so it is not default off the data. Scientists, if your data does not at fault, are any patterns, of course, is the father of data Scientist is the data has patterns and the data scientist fails to find them. But the data does not have any patterns. It is not the fault of the data scientists, so this is another important thing to note. Sometimes unexpected patterns are discovered that made lead to other benefits, so you might be looking at the Dodi with especially goal in mind. Like you're looking at the credit customer churn. But you may see that. Okay, I see some nice patterns. These patterns might be used to predict something else. Like you might be using those data to predict upsets, for example. So a data science project might have this site shoot a side benefit. So you might say, OK, I see this nice pattern here. Maybe we gotta dig in deeper. Then you go create another day, dozens project for that, and then continue down that parts off the door. A science project would also come up with these site of benefits. In fact, a lot of them can come up during the process once you start looking at the data. And, of course, you finally make a presentation on the recommendations. Told the stakeholders the last one of the things you want did not. Here is the iterations that are require, even though the steps are less that here they're supposed to be done in sequence. You do go keep going back and forth between those steps on that maybe burst on intermediate or at the end, analysis and feedback So after you do all your analysis, you shouted with the domain expert. You shouted with the other project stakeholders. They might come back with some feedback that may force you to go back on, then redo the analysis burst on new light that has been shared on the data that you have. So people may have different objectives, different prospectors that might give you new triggers to go back and look at the data that is a commenting. India signs the product in that their response to the findings in the data on then it can take him in multiple analysis parts. If you have, then comes the final face that is the production face or the production icing face. We just implement continuous processes that you two lawyers are all the work that you've done in the earlier faces. Ondo start doing something on a continuous basis year. So here comes what is called building date up products. So what is the date? A product and the product is an application that works on data, gets something out of data and uses it to achieve some objective. It's simple as that order Later product. So once a data modelling and prediction. Ill guarded himself, firmed up. You know, exactly what do you have to do then you better get a product. So what is the better product is basically production izing, you know, making the court the quarters no more and no turning from than 80. You can nothing. You make 1/4. This production quality will all the error checking in place with all the management and monitoring in place that can do that will do all the steps that we have talked about. All the data injecting steps. So you are give us automate getting data feeds from all your data sources and then you have to automate these applications to run regularly. Look at the data that is coming in and it starts cleaning of the data, transforming the data, persisting the data. Then all your analysis code will kick in. Andre will start looking at the data regularly and start building models. So all of them are daughter products in one word, they have to be running continuously regularly and keep producing, keep getting data and producing these models. And, of course, the production part After brands real time, you know, bash for whatever way it has to run. And that again is another data product that parents regularly uses. The more model the model that has been built to make prediction when and wherever it is required. So building there s air protection, the final part that is more like this is very it's more like software originating this motor software project. Actually, if you want to say because you know exactly the department that that they're on already does convert them into a software product, you would be just one needs to have quality software rigor in both development and testing on it can be deploying enterprise and cloud models depends upon border later product is supposed to do. Of course, the most important thing here is also that you need to get operationalized data feeds. The data feeds from all the radar sources. No, they have to be continues. When I say continuous, it's instantaneous. You keep getting them as they happen. Sometimes you're getting this daily. Do that, Adams. Sometimes you know, once a V 15 minutes and travel 30 minute Printable defense depends on your use case, but it has to be operationalized so that there that keeps coming regularly. You don't have toe work with somebody every day to get the data. It's all automated here. And of course, we talked. As we doctor bordered, there s products perform all the cleansing transformation under reporting every reporting is a key thing you want to be doing here and finally pulling off all data might be necessary. You know, as you get about committing Gator, that's gonna be a lot of data, especially once you transform the right out to the form you want all the raw data. You know, you might want to keep them for 10 days, 15 days and then thrown them out. So that completes all the steps that you have to do in a typical our data science project. But there is always something called, ah continuous improvement. Once you deploy a data product, they're always changes in the business environment that might affect all your learning in production. So this is something to dream. Remember everything that you built as a data product. No algorithms, the algorithms to models they made that their accuracy might go down because off changes in business environment and also the learning and production stuff has to be the value that periodically at approximate intervals to make sure they're continuing to show their careers levels they orginally over Minto have on revalidation need happen when their business process gene, You know there is a changing something in the business project process that where the entities behave is changing the world, the environment. And Richard is going to work us changing. So obviously you have a very, very did everything you're doing here. So that might have to be under the child project that was Maker made in his project are an improvement project that has to come up periodically to validate. What you have been doing is all fine. A force agenda better model should be ongoing. No, this is important thing. We just can't would want and stop there we does to be continuous. So in somebody for what we have seen so far, data science projects follow a life cycle. Data science projects are research their projects. There is a lot of experimentation and sometimes no understand. So that's something. That's why we keep calling it. It's a research type project signal in the their dad drive results, not the guard comes. Duda is more important than the algorithms themselves. Multiple iterations might be necessary before reasonable results are achieved. This is another thing you want to remember. So there's not a very a stage serious in a data science project where thinks that are done or should be done. So help. This has been helpful to you. Thank you for for your listening, but 11. Skills Required for Data Science - Part 1: Hello. This is your instructor, Cameron here. Welcome to the session on beta sign skills and roles. So it's about holders. What and what skills are required for them. So let's talk about the skills that are required to be a data scientist. And you should have been seeing a lot off diagrams like this about what kind of skills are required for data science across the Web. So it's about what the will says. They talk about hacking skills, math and statistics and knowledge, mission learning, substantive expertise, scientific methodology. Ah, hacker mindset, domain expertise, specialization, advanced computing. So you keep hearing all these terms all over the place, and you might be wondering what other real schools that I need to build upon away from where you come from. Most of you guys are going to be coming from a 90 background. So what is that? You have to learn? So let's walk through waters required. Here they are saints practitioners in general need a wider where 80 off skill than any professional. So that is something you want to understand and accept the fact that if you have an idea professional, you can work in a related Lee narrow field within right narrow skill set, and you should be pretty okay with. But if you want to be a data science practitioner, you need to have a wider where 80 of skulls are computing skills that you need A generally falls into two main domains. You need to know air programming language. Plus it's work benches and tools associated with that, and you need to know a database technology even know multiple things. Yes, that was more than good, but no. For starters, you want to at least master one programming language, and one day of this technology, we will see what they are in the coming up in the slights and coming up Matt. Statistics on machine learning concepts are required at a basic level, so you need to learn all them to another school that you might need Our visualizations skills, soft skills and management schools are so let us start to walk through each one of them. Now let's start with the most controversial skill I would call that is math and statistics controversial maybe a too much of a word to use. But the reason why I'm using it is every time people look at the school's record for data science, and they see math and statistics. It kind of puts them off who have been doing Ah, a lot of people who do engineering they might have a course and started sticks on DNA. Not not a lot of people like that course, for whatever reason, that people who really loved that and there are people who don't like it. And and so the moment they see something like statistics and math showing up as a requirement for data saints, they start wondering, I am an I t professional. Why should I know statistics and mad? I've never been comfortable with those topics before. I'm not saying all of the all of us are Maybe at least half of us are not comfortable. Maybe that's the nature of the subject and the nature of people. Whatever the reason is the questioners. Do I have to learn it? You know how much annoyed for becoming a data? Scientists let us first address the question similar to any other engineering major understanding of some mad concept as essential for data saints. A mathematics is the foundation off engineering on a lot of other subjects too. So it's good to know a mat. Ah, concept that you need to understand for for their signs are things like central tendency like mean median mode, variance, standard deviation. Ah, probably a regression on some inferential statistics. Ah, pretty much when you take any kind of book on statistics are basic service sticks. It would should cover all of these ones. The problem is that any of these books starts showing you formula after farmland reform land. That's when you get confused and you get frustrated on there. Of course, some books that just focus on theory and Northern all the formula You might want to go after them. You've done in the case. Expert level skills are needed only for specialists. If you're a statistician was a part of the team, then you might need this export level schools other ways. You know, basic level schools are okay on. The reason is that a lot of these techniques required are already implemented as libraries in various programming languages. So you're not going to be reading a lot off coat, implement any of these. You're not going to write code to complete compute, mean and standard deviation. That are already tools that delivered their libraries that do it. You just need to understand the concept and be able to call those libraries to do the job. The recommendation here has learned the basics. It is always helpful. It's good to have some status. Six. Background If you are in the data science field, our mission learning concepts are the key area for raider science. One thing you want realize, and this is something that a lot of people are confused about. Practitioners of data signs do not implement the algorithms. When I say implement, they're not going to sit and write code to implement the actual algorithms. They only use them. It is very similar to saying, you know, operating systems. Not everybody in the I d Feel is gonna sit and write Operating system called They're not. They're not. Everybody is going to write develop their own operating system. There are few operating systems you have there have some people who called them, But most of us are going to just sit and use the operating system in the same information learning your inaugural go and sit and implement Emission Learning Court. You're mostly sitting and going to be using them. So that's one distinction you want to understand. If you are very interested in writing machine learning type of court, you might want to focus on being an academic field or something like that. The wreck typical practitioner won't be doing that Multiple algorithms. Exurban mission. Learning. So there are different types of algorithms based on our different domains, like their algorithms based on Matt probably be in statistics and neural networks. Information Terry. A lot of them. A. None of them are, ah, best of for all cases. So that's why multiple garden fences, each one performed different. They give the best performance in different use cases, so you didn't know all of them on They're still being invented. There are the basic algorithms, and everybody is trying to come up with a hybrid, especially in the academic world. There's a lot of them being inventor and implement understanding of what this mission learning algorithms to is pretty mandatory. At least the basic nine or 10 algorithms that you have is a military on implementation, providing various programming languages of library. So all of them, mostly all these algorithms are available s libraries, in fact, if you go toe something like Are there are multiple libraries multiple implementation off each other? Mission learning concepts available so there's no dirt for implementation is just You need to know how this got items book on. Be able to call the libraries and used them differences between a girl. And that is one important thing you need to learn for. About mission. Learning is what these algorithms are. How do they differ from each other? What are the advantages and shortcomings and which use cases do their work and they do not work? How do how are the accuracy levels are how their performance levels are? These other ones you won't understand because that helps you to pick the best all guard Impossible. Whenever you have a data science project on the recommendation is mandatory. You will need to learn to use them. There's no way you're going to be going away from it unless you just want to focus on the data engineering part off our data signs are moving on to the other technology. Let's start with the database technology. SQL. When we say it's cool, it's all about writing. SQL Statesman's Alright, enquiries extracting data inserting data and the various tools and work benches that are available to use. S Cuba on manipulate data. It is the most stable, most popular and widely used. Killed. Four. Data ST So, yes, SQL is a very popular schools for data signs. Knowledge of reading SQL is mandatory. I would say you can be a data scientist without knowing outright SQL statements. You know it's one of the very mandatory skills that you need typically comes with a database product like Article, my sequel and Microsoft SQL. You know there are always and associated database product and with the data is going to recite, even if the ended rather do produces going to reside and something like a hard up system, the socio that maybe still sitting in a skill basis, our tools and broken just making albums this year. This is a very mature technology, So obviously there are a number of tools, and we're sure tools and work benches available that makes your life a lot easier. When it comes to analytics and mutualization. The shortcomings include the scale of will be for very large, with assets on handling unstructured data. The two main reason why 100 came into the picture. This is the two reasons why people went and developed hard up technology. And that's because I know Sq had these shortcomings. The recommendation is you needed if you want to work with the U. S. So there is no way to weighs about about it the next day. The best technology you wanna bother about is the huddle ecosystem. Now, sq, learning SQL is maybe easy are made. You may already know this killed, but how does a fairly a new technology that's been around for a few years? It is the most popular technology when it comes to problem domains. Where there are large data sex, there's larger deserts, and that is unstructured data. That is the main reason why the whole hedge t of US five system on the map produced technology came into picture. The tools to be learned here in your search area first map, produce high H base Impala. There are a bunch of them, then they're still being invented. My hope is the Mission Learning Library that comes with huddled, and it has a limited skate off set of very scalable algorithm, so it's very functional. But the usability is not that good With my hood on it is open source technology, and it makes it very easy to adapt it open source technology, and you can easily downloaded and solid and use it if you are really next system. This is other main issue. This is the main issue with Hard Up is that it does not easy the use When I said not easy related Lee when compared to other things like SQL, not a city use it needs significant programming afford for doing any kind of job and you look at hard up and it's tools. There's just the court rules that are available there. The user interfaces are not that great. The outputs are not that great. You need to read a lot off according to do any kind of job with the hope ecosystem, the tools and fuck but just are just evolving. People are just coming over the tools and work sex and still discovering technology there and are still work in progress on the skills development, Rick, more significant time effort and resources investment limit that you can't have just a laptop on you. You want to sit and learn all your hard up technology hard up trans only on the next. If you want to install a basic Lennox based hot up what they call the one box set up you have. Bradley's have, like a db of memory on your box for it to have any kind of reasonable response time body with simple things. So there is some results investment time, and we spend a fortune. Most meant requite here. If you want to learn about the heretical system on the recommendation is check if you are comfortable before that happening, this technology. So check on before you want to say I wanted to dive deep into this technology and learn about it. It involves significant time, effort and results investment for you to learn this technology, so remember what that 12. Skills Required for Data Science - Part 2: programming languages. The first programming language we want to talk about us are it is the most popular language for it assigns closely followed by Python On the most, this is the most popular language for the deaf science it started. It started out as a programming language for statisticians, mainly to do statistical work. Are has been there for about for quite some time, like more than 10 years. But off late, it has evolved from just being a programming language for statistician. Are toe supporting a lot off libraries and tools. That's about all stages of data sense when they say, also just of data saying Tried from later acquisition toe toe dinner transformation toe analysis to wish realization it can do all the stuff. There is a lot of good libraries available, makes it very easy to use that is very minimal. According involved in our. It also has a workbench where you just it's likes like CN program kind of thing. You just write one line of code immediately, run it and see the results on the other side. You can just double up your coat step by step. It's it's amazing. It's It's a really good. Good room for doing data science. Kind of work really good with data. Um, the only problem that you have will come to that later. It's open. Source are serpents are the tools, ideas and libraries are also open. Source. There is a very good support eco system on the web, initially going find all kinds of help on the web. It is an excellent set of libraries. Makes any job a breeze so excellent in that sense, on the only limitation and the only limitation is scalability. And that is a big, big, big limitation. Its credibility in terms of how much of data I can handle. The amount of data I can handle is the amount of memory that you have on your mission. It has to load all data and memory and only that it can use. That is the biggest limitation because of which are only used during the experimental face off their science. But you're getting the data looking at the data, taking a small piece of data and trying Natalia algorithms. So that is where art is being used. The moment you get into production. If you have this huge, you have the record your thing in another programming language that can steal. So that is a limitation. Big limitation for our But are is a very important skill for data science, because a lot of times you're just sitting and working with data, bidding up later wrangling with data. So our is a great tool for that, and you'll see our being the top school required in any of the data science related job requirements. Recommendation is choose our our fight on one of them. If you want to learn both great, but you always have to start somewhere. So do our fight on the next programming language is, of course, bite on equally popular as our for data signs. So it comes with my different background, but it's equally popular again. It's it's agenda. Item has a general purpose programming language that can do more than just more than data signs like it can. You can build a Web applications with it. You can do a lot of things I wouldn under their signs with Brighton. So it's a very finally mature language that is there. It is, of course, open source on it easy for programming, extensive library stools and packages and the scalability off my turn depends on the package is used. It is kind of also it's a straightforward programming language. It's going in order you to program. You need to build scalable, be endured, are use a packet that are four scalability. So there's nothing different there on the recommendation here is Jews are bite on one of them. Are you going to Joe? Someone which over actually both of them look very similar. When you start using them, it's not going to be something If you already have a background and pytorch go with pie time are you want to pick our choose your whatever you're comfortable with No some little research and then juice Whatever you're comfortable with SAS, Stata, SPS s and my club the Zahra set off products that are available in the market that does similar while these are data science lated work benches is our alternatives to our and fight on. They have excellent support for our babies Data science functions like acquiring the dark, transforming their darling started stickle analysis machine learning all of them the only issue with them are a big issue with numbers. They're not open source their product that needs to purchase, and there are very costly products. So that is the one that is limiting Grady usage. These products have been there for quite some time, but they're not new products on their very costly on their companies who have no stood there in them, and they're using them their dominant. But the usage have been related, Lee declaring because of the emergence off other open source water neckers. That's what is happening in this field. The recommendation here is that learn only if your organization is already using it. I wouldn't recommend you going straight into them without learning are by time unless the agony of your organization is already using it and you have been learned. So there is the recommendation I have. Other people might have different opinions, but this is the recommendation that I have so much for the technical stuff, the main knowledge, a very important key skill we have seen in other places. Like when we looked at the data science lifecycle Well, when you looked at what data signs we see, the domain knowledge is a very critical skill. There's all numbers and characters. Missions can only crunch them but cannot interpret them. The dummy knowledge is critical to understand what the reader say's and once girl that you cannot just build but training. You're not gonna have any trainings available to use it. Build on my knowledge and swords are dumbing. It was something that your dog bark on. You have to learn on the job and build Escalon. The recommendations eyes that if you have been working on a specific domain, you're working in a company which the Sierra, which is in a finance to mine or medical domain, kinda learn more about it and build upon him. I mean, use what use the company exploited that you have Dr People who know the domain well and build some skills in there. If you're a fresher out of college and you want to get into data science, the recommendation is that no focus more on data enduring type of jobs to begin with, which is primarily programming language programming, jobs for extracting They die. And I know cleansing data and stuff like that and then on the job build domain long language skills, presentation skills are going another very important thing. Their ability to go shoeless Joe Dan President results in a convincing manner is a maker big break skill in data science so you can sit and do all the work you want on their in and with data. But the most important thing is finally convinced the project stakeholders as to what you're finding Zach and that you can only do with good presentation skills. You need to know it's not not just about getting good data. You also need to know how to present the data out of people in a convincing manner. That is a very important skill. I don't see your include any of the graphical package of Microsoft office, of course, then graphical libraries and fight on our or any of the work benches. JavaScript. There is a lot off tools for presentation that is available, and of course, it is good to learn. I don't them. And again, you may not be going to specific training on these ones. Rather, you learn them as a part of other trainings, like our fight on, or even like Microsoft Office. Our data science projects usually find shortcomings in other people working on the mother. When you're looking at data and your sayings that there is an opportunity here to improve. The first question is, why is this opportunity not exposed to you? Leo. Who was the guy who's working on this domain and why is that? Why is this girl not about of this opportunity before? So obviously it's kind of finding fault with somebody else work. You know, at least some people trying to think like that and they don't get defensive. So it's very important for you to have a presentation skills where you can talk to people in a way that it doesn't hurt them, doesn't make them feel back and rather tell them that. Okay, Okay. This is how the data is. No, no, it's not your fault that has not been discovered so far. But we're now discovering something here on. We're going toe improve upon it. It's a pretty important skill the president, that it cannot be understated. How much how much this skill is required for you, especially if you are a senior there, a scientist or a concertante kind of person. Other non technical schools. You're not going to learn these anywhere. But just remember that you gotta build these ones on the job teamwork more than required and more than the irregularity team where that you need. But I believe you to work with non technical and the non cooperating things. You'll find that a lot. There are a lot of non technical people who are going to question they their science, you know, we got a worker with them. There could be non cooperating teens. You gotta learn to work with them. You need a startup mindset. That's where they keep saying hacker, when certain on, don't expect smooth processes and well set requirements. You know, they're not going to be built up their comment, and I agree. Smooth processors. There's gonna be some issue. Things are gonna be changing very dynamic, and you should be able to work with that. You need a hacker Monday 50 Because you gotta hack the data to get information out of it. You should be comfortable in commotion. There's gonna be commotion with data. A lot of people talking a lot of things this way that way used to be comfortable account, but that can have a commotion and be able to get your welcome. You know, you can't expect smooth processes, especially in data signs. You need to be able to handle frustration, especially when you're not seeing the signals. You want to see the accuracy levels ever Prediction. Algorithms are not up to the mark. Yeah, that's gonna be a lot of frustration, which you are to work on. You should be finally able to handle criticism. People are going to criticize on the work the data that researchers theory of evidence, everything. So be ready to handle criticism. This is all that goes. This is a huge set of skills that are required for the other signs. Some you can get by training, some you cannot get by training and which some the you need to learn on the job. Thank you. 13. Roles in Data Science: Hello. This is your instructor Cameron here. So this presentation is about the various roles that you will see in data science over. You hear a software professional is a generate title. So when you say somebody is a software professional, it's a generate title you can call somebody. This is offer programmers off. Unless something like that, it's a generate title, and so is that it. As scientists over new city data Scientist, it points to a broad category of roles and responsibilities. Like you would say, somebody's a soft professional. A software professional might play different roles in a team based on skills. Interesting experience like according design testing, architecture management. ESO is a data scientist, so the date assigned this can also play different roles again based on their skills. Interesting experience. So this is something you might want to realize that the data scientist is a very general broad roll our category, and there are a lot off sub rows and sub categories in which you might be actually working on in smart teams. Everybody, everybody, everyone does everything or just one person does everything. In large teams, there are distinct roles and responsibilities. So later signed us is a broad category in terms of role definition. So what other videos? Droves. In data science, we start with the first role, which is the data engineer old. The data engineering roles responsibilities are to write code The dining there is gonna be reading a lot off court A mostly caught try data write code for data acquisition, persistence, operationalization, anything to do with data transformation. All of that is usually returned by a data engineer the data and you know, soldiers to help the analyst. There is a notable analysts We're going to see help the analyst with building predictive algorithms. Help the leader leaders on the liberal. We're going to see Help the leader with reports and visualization. So this guy is like the Handelman for the team doing a lot of courting work a lot, often integrative work to help the other roles. Eso the skills and experience required for a data engineer is basically would call an I P. Degree 0 to 5 years of experience. So this is like the junior programmer kind of person in your team. The skills required are our fight on SQL and big data to start at position, and this is the push in which, if you are a fresher, if you have, like a one or two years of experience, you can expect to get into this kind of question. This does not require any kind of domain experience, but us. You work on the role you build that domain experience, girl, that is going to help you in other future advanced rules. The next world we're going to see is that often analyst the analysts again, I'm saying it's a role it might be called The friendly in different jobs are in different companies, but he had the role responsibly disassociated here Rolls name. I have given us analysts it maybe again called something else. The main responsibilities are wrangling with the data. So looking at the data and seeing ah, working off things you have to do with this data, visiting on what kindof transformations are necessary, what kind off cleansing is necessary? The analyzes wrangling with the data unless the data toe in different patterns, so the analyst does, the explore a treated analysts, is to identify our data patterns and relationships on pretty predictable relationships. Stuff like that building testes, predictive algorithms, look at the various algorithms. I tried to use them for the data and see what kind of accuracy you are getting and keep playing our own, weathered and then finally produced the recommendations as to how you can use the data to produce some good predictions. So this is kind of the meaty role meaty technical rolled. And there are signs that meeting technical and with a very critical role, as you can see the skills and experience required for this told us like an idea or a master degree, even a even a PhD. You will see a lot of data scientists profiles asking for a PhD degree fighting 20 years of experience like a massive amount of XB incident where this is more like a software architect. Our senior analyst kind of a position babies, programming or statistical background is required for not analysts. Most least database and programming statisticians also come to play this role after they become some pick up some programming skills. Our fight on SQL and Big Data again, the standard set off skills that are required for a day that scientists that has to be significant domain experience for analyst unless needs to have that to be successful in their job. I saw this is Theodore definition for analyst The something like a key technical role, like the technical leader of the architect, kind of a role when you call somebody is. And unless the data science world, the next rollers that ever statistician, which is a very specialized roll. The responsibilities here include recommending I'll guard Adams, and Strategies said decisions are not that in small team. They're usually in large teams trying to solve really complex problems. So they're going to be, unless ing the algorithms themselves and then coming up with new algorithms if required. Coming up with new strategies if required to solve the problems, they do a lot off inferential analysis, which is more like doing all your statistical tests on the data to see if the inferences will hold good. The level of statistical validation on they're not usually found in small team. They usually found in pretty large teams dedicated statistician kind of person. The skills required for a statistician are like a statistics degree, usually a doctorate fighter. 20 years of experience are minuscule, is what they would be usually family with, just manipulating data. Not really any kind off programming work are is also being used more like a work bench kind of thing. Rather than being like a programmer. Significant domain experiences again required for a statistician are going to do his gold better. The last role you're going to be looking at is that off a leader, the leader is more like a like a project manager are the team leader or engineering manager . You know, however you want to call it this guy is the manager off All the other people working in the dean their responsibilities? A. Primarily our project management, a managing the entire project buy in from stakeholders, the other one who we're gonna be talking to the stakeholders to get the buying on what our the predictions the team is going to produce. People and expectation management are big working with people both inside the demon outside the team to get the things done. Managing expectations for data science is on the deck. Critical job that they have to do a visualization is a responsibility for them. They had the one who are going to be preparing the presentations on bell commendation and stuff like that so that this is these are again guys, I said, Like project managers in the team, their skill set are like a professional degree. Even MBS conduit Project managers with Project management certification. CanDo this with somebody like with a 90. Backgrounder. Stuff like that fighting 20 years of experience Once again, they come from a management. Artists offer engineering background. This is an opportunity for people who are No. 90 people, somebody with an MBE or something to come into data science and do something here. Project management goes at it quiet, of course. Yeah, on domain experience is preferred. It's good to have the leader toe have some dim inexperience again, I'm not. You will are going to be seeing people for every Dole people combined roles and do the job . This is just to understand what the roles, responsibilities and skill requirements are. Additional road that you will find mostly in large teams are test and validation teams. Same where you have quality assurance teams in software development. You made out our test in violation teams whose job is to write test cases for data enduring task execute as publishers are similar to software engineering quality assurance reviewers are another important thing that you might want in your team. These are basically senior domain experiments and specialists. Their job is to review the findings and recommendations on basically coming on them because these are domain experts. When you come up with findings, they can take a review them and say that your recommendations are good. Other implement at other are the Are there any shot falls in your recommendations are sharp falls in the analysis things you might have missed as a part off your analysis on they are basically like a devil's advocate. It is good to have them as a part of your team if you have a large team and you if you can afford these reviewers, then comes the Operation Spartan off the new go production. When you go production with all your data products and getting all your data feeds, you need people who can manage that production environment basically made in all your databases and they're set ups. Run all your data products as processors, and should the disability enter their working day in and day out, making sure the data is coming in when it has to come in the all your programs are running on time. Eso operations is under their part off their science team. Typically, they may be a part of a different team. They may not be a part of your data science team, but that is a role. Sometimes in very small teams, the data centers themselves play this kind of a role for operations team composition. You can start with one person things. You call them the commander because they do everything. This is very you see, Job responded. He called data scientists. We see that there are always job openings called data scientists, because most trouble you're sitting and doing all the things. But most probably you're just doing another job. Only Toto. Four person teams. The roles are combined between the experience and junior members. The statistician is usually missing. One of the senior person usually is the analyst Andy Project leader on the junior people. Do all the coding and the handyman work. Last teams have dedicated member for each hold on, but there will be multiple members playing one road, the similar to a software development project here. This is how it will transition from a C one person team to really large teams and it is always recommended toe. Have external review teams as a part of your skills. Cough your team set. So these are the different roles desire, different roles that you have in a data science roll group on. When you call somebody the date assigned us you're basically going to be doing one are more off these drills. Eso hope does think helped you in understanding what the rolls off the data science signs scientists are. Thank you. 14. Challenges for a Data Scientist: Hello. This is your instructor Cameron on in this section, we are going to be seeing what are devious challenges for a data scientist. One These challenges, what are these challenges that are unique to their signs? The challenge is injured are going to be both positive and negative. A challenge is a challenge. It is going to be positive. If you like the challenge and you want to work on that challenge, it is negative. If you don't like the challenge, it's all about your perspective and your preference. So a challenge is a challenge. We lough challengers in something we love some kind of challenges we do not like other kind of challenges. So it's all about our own perspective and preference. Getting into data science is a significant investment of time and effort. You're going to be doing a lot of studying a lot of prepper, so some careful consideration is required as to what you can expect in Adidas and job. Are you up to it? You? You like the challenges offered by the job, but it's always better to know a friend and then realizing later of trying to do something you do not like So what are the various challenges? The first challenge? Indeed. Our science is the technology challenge. It is a new and evolving field, especially thinks like big Data technologies. It's a new one. Evolving field. Multiple new technologies and tools are required here for data science. Well, things like programming languages like our and Pitre are okay. No, people are coming up with more up tools on tools and techniques, mostly tool sets and products at a much higher level. And that is something that is still evolving. So new options are coming old rapidly that a lot of people working on a new kind of technology on the stabilization of this technology is going to take a few years. It is very similar to like 24 years back. When you look at database technologies, they like about 2030 database products in the market. Now you see that everything that stabilized a lot off people have gone away. It's only like two or three technology that are staying. That kind of technology. Stabilization in data signs is going to take a few years, so be ready to adapt on discard as you go with respect to technology that is going to be continuous additional effort on learning these new technologies and adapting to them. So you're gonna be investing in training and learning a lot of time for a continue. It is going to be a continuous process if you like learning new things all the time. No, this is your dream job. So you like learning new things, new technologies as they keep coming out. This is going to be your dream job. The next challenge is that of the jinx skills that are required for their signs On the first, of course, is Martin statistics. So Martin started six. Some people love it. Some people hated, so that is the challenge. Not everyone is comfortable with my ready questions or when you were using all these equations coming or you're not pretty comfortable. Some people are really cool, but it so at the data centers, you are to learn basics off this mountain statistics, and you might have to use them to justify your work. But the question is, are you comfortable with it? If not a data engineering within data, science is kind of a good Norman adoption that you might want to pursue domain knowledge is another object skilled we have seen in other presentations as toe Ah White is critical. Knowing and specializing in a domain is critical for long term success. For growth on, it's a great opportunity to learn business side of things if you like it. But if you are again, you just want to be into your programming more. You don't want to be focused on domain knowledge. Then again, data enduring is a good alternative with NATO. Sign So math statistics and no, my knowledge are considered key skills in a generator science domain. So it's up to you whether you want to take up those challenges are not data itself is a challenge being comfortable. But large amounts of data is paramount for success and data things. So I've seen a lot of I D professionals who are not that comfortable working with data. The question is, Are you comfortable with huge tables and excel sheets and all kinds of data? You have a screenful love data and numbers and strings you Are you comfortable with all of them? Ah, bargain. Incomplete data can be frustrating when you work on the day that trying to wrangle with data bar data and incomplete data can be very frustrating. Reading a Breda and a proper shape that can be a very I'd rated one time consuming process your to do something that something might take five minutes, comes back and see it is not enough. Then do something else and you keep going on and on. And this could be a very time consuming process. So that's four data, so date back and it will be a challenge in data science if you are not comfortable with it , searching for signals. So whenever you're doing this analysis part, you're trying to search for signals within data. You're trying to look for relationship between various entities and attributes. So what is gonna happen they're looking for signals in data is like solving a murder mystery. This is something you would find with experience that it is like solving a murder mystery. No, you were looking at clues. You are searching for the clothes you're trying to trace those close to something. You go for something, you find something else, you might not find something at all, and you might just love it or hate it, and you might not solve it all. You know, that's the thing. You will not solve the mystery problem at all. But this can be pretty intellectually rewarding for you. Curiosity of a card is required to be a leader. Scientists on got to get into trouble because of that. This just a journey that is long. But the end result can be rewarding. But they set. Searching for signals could be very frustrating. You might break your head. You might have sleep rest lionized as toward us happening with the data. Why is there not later not behaving as it is supposed to behave? You know, all that is going to be some. It is a challenge. You love it or hate it are there Since projects are research type projects, so they design projects on more research kind of projects, they're not well defined projects it not like a software development projects where the requirements are set in stone. Every feeling the you I you know exactly when the field us to be worked by relations have to be done, how it has to be recorded. But that's not how the residents projects are that the requirement is mostly like an objective on then you just go after the objective and you keep changing. Keep changing parts, even the object. You might start evolving as the project goes on. So the requirements are not set in stone. They will keep changing as the project progresses. Rework is a given. You're gonna be doing a lot of free work with data. And that, too, with data, confusion and conflicts will profile on at the president's project. Might not have uninterested in the sense that at the end of the project you might find that there are no signature at all in your project. Fine. You got ability for all these stuff. Operational processes, lifecycle methodologies, best practices. All of them are still evolving for data science. It is going to take a couple of years before you have some solid stuff. In this case is in operation processes a life second methodologies, Asprilla's and best practices. But you do well in organized work. Are you doing allow being in the middle of chaos? No. This is the research type project. You want to do the research kind of stuff. Yeah, this is a great place to be in. So know that before you're getting into data science. The work steamrolled dynamics are more flu urinated. A science project compared toa, a software development project. For the various reasons we gave before there are going to be technology challenges process challengers changing directions. All these make our teamwork dynamics to be more fluid. There is going to be a lot more interactions within the team. A lot more conflicts within the team s o b ready for all of that. If you are interested in science, you must be a trooping player toe work in large projects. And if you're a lone warrior, you maybe you want to be focused on small projects where it is just you and you're just sitting and doing all the work. But in large teams, being work is more important than it is there In the software development project, I would stress that the next tiling is a very important one. The ecosystems understanding off data signs. What do you mean by the ecosystem? Ecosystem here means the other teams and departments in an organization you work with the other teams and department. Maybe your customers, you bend ours people from sales, more marketing, logistics, operations, icty, other people in general. Top management. This is what we call here as the ecosystem, and what is the challenge here? They are saying it is a very new feel and not everyone's understanding it. In the same way the term day lessons itself as so many meanings, so many definitions, it's still evolving. One of the reason I have thought of this course is because that understanding is something that is not table. So that is, by there's a whole. Of course he has to understand board data sciences. So if this is your case than what is the case with all your ecosystem, people have different understanding of what area saints can do for them. Some people, everybody has heard it about in the Internet in the news and chats and stuff. Some people, people think that it is a solution for our problems. It is magic. Some people fingers Bure, what our So there's all kinds of opinions on. People are opinions anywhere between this magic and whatever kind of opinion. So the problem with all of that as their expectations will not be like they have different expectations than what data sense can actually do for them. What that means is that you know, everybody is in learning more currently. Not everybody understands data science and water can deliver towards true US form. So you are to be patient and you should be Eric educating them, nor get frustrated with this. So this is a challenge with their science. Given that it is a new field, the ecosystems understanding up later, science is not fully right. The next challenge, of course, is working with other teams like there is the business. What is a de conflict? Their sense teams might come out of the business team are from the idea, teen, whichever place it comes out off, that is always the conflict with the other team, which is I does not build their conducting. Sometimes your data science teams are from coming with the night be, in which case, What happens is that the data science is going to be doing a lot more influencing work. Then I d does than the business might not like it if the data science team is coming from the business side of things. The ideas, Quinn, a question I was gonna ask for requirements all the time and stuff like that and they're not going to like it. So there's gonna be a lot of conflict between best as an i d. When it comes to that science, that is always the thing about sealing somebody else job, because whatever has been done, whatever we are planning to do with their assigns, somebody else's already doing there in their minds. So when you look about talk about predicting things like, what is gonna happen next, there are people. I mean, prediction has always been happening in businesses, but predictions have been done by people. That is what they call us experience. They use that experience to predict how the future is going to behave. You have an experience serves per salesperson on the experience else pay person for going to start talking to the customer and say, This customer is going to buy on arm You know that there comes out off their own experience , which is nothing but their own internal data and their own in done and modeling and learning. Now that was going to be replaced by a computer that is going to be always this challenge because you're gonna be replacing that intelligence service person with the company order, which is now going to predict whether the customer is going to buy, are not so. There is always a stealing somebody else's job. I think that is gonna happen as ourselves. You will see some resistance. You will see some negative actions. So working me that that teams is going to be a challenge if you are a data scientist and this is the last of all taking criticism. So be ready and be comfortable with more than expected criticism. The criticism is going to come from the ecosystem. It's going to come from your own team from the top management it gonna be credit. There is gonna be criticism about data science itself. There's gonna be questioning about what it can do that is going to be questioning and criticism. The award, the work you do as a data scientist. They're going to question the outcomes off your project. So that's gonna be all kinds of thing that is gonna be coming there on always, you know, before becoming a day. That sign doesn't know how sensitive antics skin you are on. You should learn to positively cope with criticism, so be ready for criticism. It's not like you have a equipments. Druckerman who called against yet and deliver aspect of that comment on your dime. That is not gonna happen. People are going to criticize about the work you do until you know the deal assigns cyanosis thing. It's a evolves. Better people understand it as science better. That's going to take a few years under. Then it's a very fluid situation at this point. You're getting into it now. You got ability for all of that. So this is a list of challenges you might face. If you're a data scientist, it's up to you. Assess them and see if you're going to be ready for all of that. This is going to be Howard. It s and this job is gonna look like. So elevate this and make an informed decision. That's my recommendation. Thank you. 15. Building your skill set: Hello. This is your instructor, Cameron. In this section, we are going to be seeing about how you can build a skill set you because set for becoming a data scientist. So the skills to focus on we have seen this earlier in the roles and skills requirements for data science. You're to focus on some programming skill like our fight on or even Jabba assassin. SPS is ormat produce. What of us killed you choose to focus upon. You need to focus upon a database skill like s Cuba are the huddle ecosystem. And you gotta learn about something about math and statistics. Also, something about building domain expertise. So you got this bunch of skills to focus upon. So have this checklist in place. So how are you going to develop these skills? So before that, you want to decide upon what is the role you gonna play? I'm gonna be like a lone crusader, in which case you're going to learn all the skills. Are you going to be a part of the team so you can just focus on one roll? I go to be doing, like consulting, not a full time job? If you are a consultant person again. You got to learn, Master, A lot of skills are supposed to. If you're looking for a full time job, you can just focus on one school again. You got to choose your role. Whether you're going to be like a data engineer was no less toward statistician. What is the leader? You? Can you just based on your experience or your It's love lack off it are all but also aboard . What kind of things you want to focus upon the first thing you want to do and folk the world in your skills that is exploiting what you already have. If you have been working in the I d feel for some time, you've already picked up some technical skills. You know you have skills in SQL database development, like developing programs that work with databases. Database management, like a DB, a kind off work. You might also already be doing that, so try to exploit them, use them and project them. Having you're going to interviews, programming skills at designing architects, skills, stuff like that. I remember if you look at data signs, job requirement that, say, 10 years experience in data science you know that kind of thing. They never get going to get a person like that. So you have toe, you know, use these schools that you already have and project them as part of your data. Science schools, domain schools. If you have been working in a domain adapted, you know, you were working in a finance company Try to learn something more about finance and say that. OK, I'm familiar with this domain. Stuff like that. Try to learn more about your dorm and build your schools that in your domain this is not something You're going alone in a training course. So you have to use your current work experience to build these kind of skills and then, finally, management schools if you're doing consulting, if you have been doing Project Management, if you're doing delivery like you're like a 90 manager on engineering manager, all those skills are going to come into play and help you in your job for being a data scientist. So if you have being doing any of these, that is going to give you a lot of experience and working with teams interfacing with different departments, working with people, getting their concerned then on getting them to Uppal stuff and stuff like that. So all that is going to come into play when you're going to be doing data science. So try to exploit all these skills that you already have building skills. What is your type? You know what kind off a learning person you are. Different people have comfort levels with different kinds of learning. You know, things like self study. Worse is guided studies. Some people like doing self study. Some people like being in a guide and environment. Some people tried to learn by reading. Some people try to learn by listening. You know, there are gain options that people have online. What's his classroom? Kind of an environment again. Different comfort levels for different people. So which were, once you choose which our farm off study usual. Be ready for a lot of hands on exercises. You know, data signs involves a runoff and the next phases couples getting them done, you're gonna have a lot of fun. With all these exercises, you will definitely need technical help. You know, Web is a great resource to get technical help. There are a lot of farms out there where you can go and ask questions and get help on when you're doing any of these studies. Try to build a good buddy system with whom you can talk and, you know, interact and understand things. Body system is a good way toe learn data, science, their senses, not love or something like There's a mastering Java book you go read and your family with that it's not going to be that easy tools. Of course, that suits you based on the continent. Flexible Bi Onda, the guidance that is available in the course. Whatever suits you know, make an assessment as to what you need and then choose based on that. So let's look at self study. What are the advantages of self study? It is cheap. It was flexible. Do it at your own time, and most importantly, you can create your own syllabus in which kills you can choose the things that you want to learn and ignore the things you don't want alone. The challenge is, as that is, a new field. You know, leaders ended. Silver is a brand new field. One book is not going to suffix. It's not like I took like a master in Java book, and then I'm done with it. It's not going to be like that in a lot of books. Tracking to success requires a lot of discipline on a lot of persistence. Andi, you're going to need somewhere to get some case studies and project work. You know, when you do self study, that is not something you will get when you read a book, you know, find way to some project worse and get some hands on exercises on what data centers all aboard if you take a guided course that Brundage, of course, is going to be a mentoring and guidance. Given that this is a brand new field, I think they're still evolving. This kind of mentoring and guidance can be really priceless. Forecasters cradles and comfort completion. When you have a gated coast, it has its own scheduled. It's it's on assignments. You had to complete things, your take tests, and that makes a lot more discipline that has usually appear student network, who can whom you can work with on getting things done and learning more. The challenges, of course, on graded courses are there going to be pricey. You need to make time for them, and some topics can be challenging against artistic. You know, there could be some topics in there, but you don't want alone. You're not comfortable learning, but you have to do it because it's a part of the course. There will be certain things that you already know, but you should love to do them because of the part of the course. So if you are doing your own self study, you can choose what you want to learn in a guided course. You're a pig, all of them, you know. So that is a challenge there. So one of the learning options there to start with degrees and certificates are for three universities. A lot of universities are coming up with degrees and certificates for data signs, as some people call it. They signed some people call it and all it takes data, and I'll text a lot of different names being given to these degrees and certificates. But all the universities are realizing their data center is going to be a very upcoming field with the lard off student traction. So they are trying to come up with one there. It's available birth in the classroom. More ask for less on ordinary online mode. It was a full, fresh curriculum. The professors and they're in. The universities are teaching them. There is recognition for these degrees and certificates, which is good. You can say I got a degree or a certificate from this recognition. Was I supposed to saying I got it to self study. That is always a huge difference. It takes a lot of calendar time, especially it's part time. It takes the time and afford to get degrees and certificates. You know, it's not going to be straightforward. It is very pricey. Know the university courses are very pricey. Got to spend a lot of money trying to get them. So that is something you want to consider based on your resource availability. The next option is M. O. C. All the online, massively open courses that are coming out there. This is the new form of learning that has evolved in the last couple of years. On data science is again on the top of that list that coursera offers the specialization certificated data signs. After lady checked, I heard that the in pro tops and 14 when discourse started. There are 2.8 million registrations for at least one course with the certificate that's massive. Udacity and edicts also offer scores and the courts, and they are signs This is very cheap. The whole coursera courses, like $500 on, do all the veracity details. All of them are like less than $1000. You can get a full fledged certificates. They're backed by some really cool university. So that is good. But mostly the so study and peer help. Yeah, there have some lectures out there. Yeah, but a lot of that is going to be PM study and herself help. That is how you would most desired so again unless them and review them and see if you want to do them. Next comes the online marketplace, where there are a lot off websites that are giving you a number of offerings on courses online courses. At this point, I want to say that Veto Mastro's are also we are also coming up with our own courses on data science. So do check out of the same website. Here are two c. Look for the courses. So again, a little marketing here for our own company we know are also going to be doing online courses on data science. So in the online marketplace, there are a number of offerings out there, the cheap, cheaper. It is flexible to your schedule. It's got access across the globe. So if you are a travelling person doesn't matter. You can go anywhere and learn that if you're another more part of the globe on doesn't matter, you can still learn it. Ah, the issues with that as it does limited mentorship, you know you're alone on your own. Andi, make sure that you read the reviews and ask for opinions on the various options that are available in the online marketplace again, something you want to do. A careful consideration for there is a new option that is coming up that is called the boot camp option. There's a new category of training that is coming over. There are a few companies off offering this book camp type of training. It is an intensive on hands training that is provided for a few weeks or a couple of months that are really food Fletch project that are offered as a part of these boot camps but the issue is that you need to make significant time investments. Some of these boot camps are really, like full time eso You got a divorce like 34 weeks into the boot camp. You can't do anything else so those that I'm commitments that are required. So you are being on all of them if God's a graphic and costs, of course, but it's a good option for somebody who's new, I d. Because you got some really programming riel project experience pretty quickly, and that is something you can show on your estimate. US toe. You did an online boot camp kind of project against something that you just read something or a completed and only and certificate. It has more value that way again, you have the access assess the suitability off this kind of training, Do your own thing. The final thing correct comes to okay. I'm going to be learning all these things, But how are you going to practice your skills? You know, people are always asking you about experience and where are you going to get that experience from? You can learn it from anywhere. There are a lot of opportunities to learn. But there are very limited opportunities to for you to good Experian under skills that you're learning the first place to look for that, as in your own organization, see if your current organization have some opportunities for their sense. That is a glad place for you to start multiple reasons when you are already in the organization, so your can run through the organization on Get Some Get some activities done here. And second, you're already familiar with the dough. Mine, your family. With this business, you're familiar with the company and exactly how your company does the business. Now those are all advantages for you. You know you can use that as an advantage to participate in a data science project. Your company's having. You have experience with the company, and that does a great skill set for data science, so you can use that the good into a greater sense team. If your company starting one, the other option is you can start your under dozens project and tell your bars or somebody who's willing to listen that you want to try out something you know you want to try out something with the data that you already have And see if you can predict something clearly your own data science project and try to see if you can come up with something for your company. That is a great place to start practising skills for data science. There is an open competitions available. There are a number of websites that actually hosts a lot off open competitions, though they come up with some people offer data sets to these companies so they often this data set for public consumption. And they also put up a problem statement on Anybody across the world can participate in them too. Solve them. There are some incentives and prices for them. The good great thing about this is that these are real world scenarios are problems you're trying to solve. You can learn from these websites and Kagel crowd on our X k d d cup hacker rank. You can go look at the existing competitions. You can look at the submission that these people other people have done and learn how they solve the problems. You can practice on the same data set and see what you can come up with. You can also participate in this competitions to see Have you performed? These are all great options for you to practice your skills that you have built so always of them. But there is that people are going to be asking about Have you practiced your skills That your practical experience and these are a couple of places. But you can go and build it expedience so higher up this presentation has been useful to you in terms of have you build your skill set. Thank you. 16. Looking for Opportunities: Hello. This is your instructor, Cameron. In this section, we are going to be looking at by we can find opportunities to work in data science. We'll start with data scientists Saturdays. You might have already heard of ordered, but data centers salaries are going through the roof on. That is a fact, not just a hype. Your salaries are high for data. Scientists, Europe and Asia lag later. Science is still picking up in Europe. In Asia, US typically is in the forefront of technology. Any technology usually start in the U. S. It starts in Silicon Valley. It starts in the new are you are gay area and then start spreading across us and then it takes some time before it gets into your open. Isha, The greater signs is still picking up in Europe in Asia, and currency ratios also play a factor in terms off what salaries are being offered in various other countries. A good US survey salary surveys available from birch works dot com. They happened in a very good salary survey for dinner scientists in the US The salary survey tells you about different job categories, different skills required on different area than us and what kind off sanity the command approached salary survey do take a look at it. Salaries are generally a generally better than other I D disciplines. This is one of the highest paying salaries within the idea. Domain for data signs managers, of course. Do you one better on? It's a great opportunity. If you are an experienced professional transition from a regular, I'd be job in during a science. The data science salaries are good on. New opportunities are dog are coming up. So this is something experienced. Professional shooter likely consider the transition into data science. So what are the opportunities in various dumb various countries? Geographies and start with The USA USA is the leading country for data scientists software and where companies are building upon their ascents expertise. Big companies like Lincoln and Google and Facebook and Amazon. Now these people have pro into the world What you can do with data sense. They have huge data sign steams and huge investments and data science on their made millions of dollars using data signs already, and they have shown the world how data signs can be used in business. Now there are a number of new sauce startups that are coming up. They're focused on some form of data signs. You know, some of the companies that are for data signs as a service, some off for tools and technologies. So there's a lot of work going on in data signs in the U goes inter places. Other commercial companies are beginning to exploit data signs to better their business outcomes. They're seeing what the big companies like LinkedIn and Google and Facebook have done. They're trying to see if they can again, uh, do the same thing with their Dota. So our companies have investment, invested a lot of money and character, a lot of data over the years. So they're trying to see if they can use later signs to do something with the data that they already have and start predicting business outcomes. So us is the reduce country for data science. At this point, consulting and outsourcing jobs are just speaking up consulting. Yes, there is a lot more opportunities in consulting with comes into their signs because a lot of companies are trying to first do a feel kind of thing, trying to get a consultant to look at the data and see if there are opportunities before they start investing in their own thing on outsourcing is kind of picking up. People are not in the overturning outsourcing that are that that much in their science, But it might come up later. Outsourcing is just picking up. Our question is, in Europe, if you are residing in, you know, what are your opportunities? Were working. Indeed are saints. Generally, Europe lags USA by like a couple of years in terms off technology adoption. So any new technology typically starts in the U. S. And then slowly Mostow, Europan, Isha, a technology adoption is just picking up. There are a lot of sporadic interest in danger signs across radius business companies that you do can expect a sizable increase in these opportunities down the line. In a couple of years, you can see a lot more interesting data sense companies making a lot more investments and data signs on a lot more job opportunities coming up for data signs across Europe, expect opportunities in the finance, banking, CRM and science sectors, those of the domains where you would see some a lot of growth in data science in Europe education institutions again are starting to build up on our data Saints offerings. But the good news is that, of course, that there are a lot of Weber's offerings that anybody can learn from anywhere in the world . So education shouldn't be a problem for people in Europe. The job opportunities will start trickling down as time goes on. So that is the news. What I would issue a Pacific Australia is leading in both opportunities and salaries when it comes to data signs actually has been the foreigner for data science in the in the Asia Pacific region. Singapore is showing a lot off promise, and there are signs that seem to be a lot of interest in that small area. Business Arrests are showing interest in building data science teams and doing some data science work so you will find opportunities in Singapore for data signs. China is slowly starting to pick up on the technology adoption part of the business. You don't haven't seen a lot off business adoption yet, just a technology, a technology adoption that is starting to grow off course, you're going to see more opportunity coming up in China. India is burning up a significant talent base for there are signs that are a number off turning marketplaces in India offering their science courses at off. Students have shown interest and picking up those courses. Offshore companies are building up island and anticipation of works. A lot of companies are investing Indian, US and style, and and building a Breda's in sterling on and and India will continue toe be a major off source resource provider like they're being in i d. So you will see a sizable talent playoff. Talented. What? Us and the Europe in the coming years from India. This is how, ah, the various geography size up in terms off their data science opportunities today. Where do you find jobs? You know, Let's start with what are you the positions for which people are asking for? There are, of course, more technical jobs than management positions. Door assigns is still being considered within the parlance. Often I D team are in engineering team, so the regularity manager of the engineering manager typically continues to manage the data science teams also, so there are more technical jobs than management positions. At this point, titles for the jobs very like their assigns Advanced Knowledge takes data engineering, data management, predictive analytics. This lot cut off different kinds of titles. The titles they're coming out. So what is important for you is to look into the job requirements to understand what the exact role is. So look into the job requirements and see what it is. What the job demands you from you to understand what exactly this role is going to be. So there are a lot of names being used across the world. Data scientist. Pretty noon. So, you know, not many have more than two years of experience. So what that means is that if somebody is asking for data science Jerious opening, they're not going to find a lot of experience. People are playing for that. So education and enthusiasm might be great alternatives to experience. You know, you can still apply, and so you know you're interested in coming up there and learning something new. You know there might be interesting are recruiting you. That is not going to be a lot of people who have more than two years of experience in data science. Even though the position demands something like seven years and 10 years experience in data science. They're not going to find somebody with that kind off experience that easily. What other job sakes? Friday, you can go searching for jobs for the Saints. Internet will continue to be the best search option for opportunities. No Internet is, in fact, almost like adding 90% of the job. So just today happen over the Internet and all the regular general popular job sites do Cataldo the specialization. You know wherever. Wherever you have looked for jobs for 80 you can go there and look for their lives and jobs . There's nothing special about this. One here is considered a resins is considered one of the categories big categories within I . D. Jobs so you can go there and look for the specialization. I'm not going to talk here about more about which job sized sites to use. You might already being family with a lot of these job sites, so go and use whichever one you're comfortable with. You should be able to search for them. There are some specialized sites where you can find data's signed specific jobs. These are these are websites, dedicated data science, so, like data signed, central cargo and Cody Nuggets are some website that I dedicated the data science. So a lot of data People who looking for data science professionals do a play in these website. They don't have a job forum in which they do go on a plate. So do what you could check out these fathers because they offer data cents exclusive jobs in these websites, education and training. Very good jobs we're talking about when this education and training, training of education, jobs like a professor and Ayla signs are a lecturer and data signs instructor. And there are signs even those are kind of starting to pick up. So that's another opportunity for you conserving in off, shoring the estrogen in fancy. But consulting jobs are against starting to pick up eso. You might also look for those kind of jobs, you know, consulting for data signs, temporary jobs in a contract job. They're also starting to pick up. But they're still in kind of an in fancy for when it comes to their signs. The best place to look for a data science job today is your current company. This is my personal opinion. Why is your current company the best place to look for in a dozens job because you already know it do mine. Business and organization. Domain skill and expertise is a very important skill when it comes to dinner signs. And you already have that experience for your current company, the great place to, uh, play your newly developed skills that you went through a training. You put up some skill set, you need a place to play it. Your current company is a great place to a plane. How can you do that? Is you can offer to work in a greater sense project. You know, if you have people, you can talk to some people and you have. You can come up with an opportunity yourself, talk to your people. Your manager. It's manager on Say that. Okay, I think I can use data science to something here. I can use some data that we have here and try to look at the data and see if there's something I can come up with some kind of a business opportunity, some kind of an implement of the business. I can go using predictive and other days you can do within your own company that is great because you can find a job within your company to do your thing. You will get some experience on your company's also going to benefit out offered. And you're going to be, like, afford or not within your company when it comes to technology. So all those are great for you. So create opportunities, if you can, within your own company, so you can I only for projects gender. Your managers control. What data science Can improve business. Show possible hard revenue are cost savings to in projects, and that's most importantly, their signs. The benefit that you're gonna project but data scientists has to be something like hard revenue and hard cuts. Cost savings. You got a project? You sure some dollar numbers. People are going to be interested. They may say enough on a project. You can do some small project. You know, that may not cost a lot of money. If you're using existing resources, show some revenue on def. That revenue. Think proofs or tobacco? Yeah, your company might grow in data signs. At least you will get some opportunity to work in the real project and get some expedient. So your current company is a great bed when it comes to, uh, using your eBay skills and gaining some expedience. I hope this presentation waas useful for you. Thank you. 17. Conclusion: the instructor, Cameron, here we have come to the end off the want to be a data scientist course a big thank you for you for registering for the course and being a student. We hope this course is being useful for you. We add read a master set out with a course objective to educate you on what? The field of data signs All aboard. And I believe we have achieved that. So thank you for your time. We hope this us This was a rewarding experience for you. If you don't like the course, please do the positive common for the course on, please to recommend the course to the people whom you know So everybody else can benefit from it. Don't look forward to more courses on data signs from Vito Mastro's This is your instructor calmer and signing off. Thank you By