Microsoft Azure Machine Learning (ML) Fundamentals | Qasim Shah | Skillshare

Microsoft Azure Machine Learning (ML) Fundamentals

Qasim Shah, Digitization and marketing expert

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
12 Lessons (1h 35m)
    • 1. Introduction

      3:12
    • 2. What is Azure ML

      14:04
    • 3. How does ML Studio work

      17:59
    • 4. Data science

      9:08
    • 5. Getting started with ML Studio

      10:32
    • 6. Conducting an experiment

      14:42
    • 7. Converting data with Windows PowerShell

      1:44
    • 8. Uploading data onto Azure ML Studio

      3:05
    • 9. How to process data

      6:22
    • 10. How to test and evaluate models

      4:48
    • 11. How to deploy a web service

      2:40
    • 12. Business case

      6:39

About This Class

Welcome to Microsoft Azure Machine Learning (ML) Fundamentals course For Beginners - A one of its kind course!

The flipped classroom model with hand-on learning will help you experience direct  into the course as your begin your learning journey. Be sure to watch the preview lectures that set course expectations!

In this course, you'll learn and practice:

  1. Working in Azure ML Studio 

  2. Creating a machine learning experiment

  3. Modeling a real business use case

  4. Learn the basic concepts of machine learning

  5. Understand  best practices, and much more....  

Transcripts

1. Introduction: digital transformation is happening in a big way. Customer experience and centrist initiatives are taking group everyone. Automation and AI generate new insights and amplified for activity marketeers. Use of AI is expected to grow more than 50% in the next coming years. Machine learning has a lot to do in making this happen. What if I had a crystal ball? I could predict the future. Well, that's what machine learning promises to do. Predict the future. Machine Learning has the power to predict outcomes for business. It can generate brand awareness reduce churn, Dr Advocacy. But again, these are just a few examples off the top of my head. Hi, my name is Qassam shot and welcome to Azure Machine Learning for beginners. I've been in the industry for over 15 years, working for various organizations across the globe. The last 10 years I've been driving digitization for the organizations that I've been involved with. The use of AI and use of machine learning has allowed me to create great insights organizations to drive business growth and take it into the new digital era. Machine learning will become the underlying infrastructure. Don't accelerate automation. The New York Times today reported that 49% off work activity today can be automated. Automation offers advantages off lower costs and increased returns. That's a recipe that every business wants tohave. The time for machine learning is now. This course is organized in the way to get your familiarized with machine learning as Microsoft has done it. In a short, I've organized this course in a flip classroom, setting along you to do the hands on training in the beginning, whereas lectures are towards the end. So for the people that are not familiar with machine learning and data science, I would suggest that you go ahead and do the lectures that air the last three lectures for the people that are a little bit familiar with machine learning and the concepts of AI, you can dive right in and start hands on to see what Microsoft Machine Learning Studio allows you to do. Now you guys can see from the agenda. There's six different lessons. The 1st 3 our hands on tutorials and the 2nd 3 Our concept lessons and the first tutorial will get your familiarize with Microsoft Azure Learning Studio to see what it looks like. What it feels like what you're able to do with it. Then we'll go ahead and dive into creating a first experiment. And lastly, we'll look at a real business case and see how they've used machine learning and implemented it within their organization. The last three lectures are more concept focus. We'll talk about what is your machine learning? Well, look at what the machine learning studio in Azure can do for you. And then lastly, we'll look at data signs in a very, very broad view. Again, No data Science is a very in depth topic, but I will take you through an overview of what data, Simon says, to help you understand the genetic terms and concepts that we're gonna work out throughout this course. And if you're not a technical person, don't worry about it. I will walk you through everything step by step so you can familiarize yourself and again and again come out of the scores and being comfortable with machine learning and knowing what machine learning and ai. Actually, it's I hope you guys will join me in this journey to understand what machine learning is and how we can implement it within our organization. 2. What is Azure ML: Hello, everybody. And welcome to the first lesson in as your machine learning or as your ML for beginners. So this first lesson, we're going to look at what is actually as your machine learning the different topics we're going to cover. First little working look at machine learning as done by Azure. We're gonna look at what you can do with as your machine learning. And they were gonna go on to look at the differences between a couple of options that Microsoft Azure has given us in machine learning, which is a studio as compared with the service. And then, lastly, I will take you through opening A as your account, which Microsoft allows you to do for free. We're gonna look at how can go about doing that, which will help us throughout this course when we look at doing some demonstrations on actual use cases in machine learning. So what is machine learning? If you Google machine learning, you'll end up being more confused than get any sort of clarity on what it actually is Now. The definition you see on the screen encapsulates the ideal objective or ultimate aim off what machine learning is as expressed by many researchers in the field. What, which is machine learning is a science of getting computers to learn and act like humans do and improve their learning over time an autonomous fashion by feeding them data and information in the form of observations and real world interactions. So, in essence, machine learning is trying to teach a computer how to be more human and think like a human . So forecasts or predictions for machine learning can make APS and devices smarter. For example, when you shop online, machine learning helps recommend other products you might like based on what you've purchased or when your credit card swipe machine learning compares the transaction toe a database of transactions and helps detect fraud. And when your robot vacuum cleaner vacuums, room machine learning helps to decide whether the job is done or whether it needs to vacuum . Or so what is machine learning as we're going to see in Azure? Now, as I mentioned in the Gender will look at the couple of different options that measure has in machine learning, which is the studio and the service but in general is basically a cloud based environment that could be used to develop train, test, deploy, manage and track machine learning models. It fully supports open source technology so you can use tens of thousands of open source python packages with machine learning components such as tensorflow or psych. It learn rich tools such as Jupiter notebooks or the visual studio code. Tools for A I make it easy to inter, actively explore data, transform it and then develop and test models as your machine learning service also includes. Service also includes features that automate model generation and tuning to help you create models with ease, efficiency and accuracy. It also lets you train your local machine and then scale it out on the cloud. The most things nowadays are cloud based or eventually will be clawed based. And as your machine learning already in capital encapsulates that within itself. So as you can see by the diagram on the screen as your machine learning what you prepared the data, let's you experiment with the data to try to train and test your model that you've developed and then finally lets you deploy your data on the Web in terms of a Web service and then continuously monitor it to help you make better and more knowledgeable decisions in your business. So what all can you do with as as your machine learning? And now, truthfully, the possibilities are endless. But as your as your machine learning service can auto generate a model and auto, tune it for you automatically. Now what can you do with your machine learning? Truthfully, there are endless possibilities off what you can and can't do with as your machine learning it's It's a service that can auto generate a model and auditory. For you. It's it helps you build and trained highly accurate machine and deep learning models. You can deploy it locally within your organization or as a production Web service on the cloud. It helps you evaluate model metrics, retrain and redeploy because one of the things we will see throughout the scores and machine learning is it's very rare that you will get it right the first time. You will always need to retrain the machine learning aspects, the models and redeploy them because again, things are changing on a constant basis, and the whole objective of machine learning is training your computer, and it's very hard to get it right the very first time we need to have many iterations on it over and over again in order to get it to a model which will work for our business and the most importantly, what makes as you're more compelling to a lot of us, especially people who are not that technical. It has a very nice portal, which allows us to manage our machine learning models that we created and the data that we have. So there's many different components that we can show that we can choose. So there are many different components that we can choose so there. So we can choose from many machine learning components available in open source python packages such as again Like I mentioned psych. It learned tensorflow pie towards CNT K just to name a few. And once you have a model, you use it to create a container, such as a doctor that can be deployed locally for testing and then as a production rub service in either as your container instances or coup Burnett service on again. All of this can be managed and deployed using the azure portal, which we will see as we move along throughout this course, get now as your machine learning. Give this two options on how to manage the machine. Learning that we're going to be doing the first option is an animal studio, and second option is Animal Service, and Emel Studio is more gooey based. It's basically a collaborative dragon. DOT is basically a collaborative is basically a collaborative dragon drop visual workspace where writing code is not really required, and it could be used to test and deploy very quickly and very efficiently. It uses pre built and pre configured machine learning logarithms and data handling modules . We can use machine learning Studio when we want to experiment with machine learning models quickly and easily. And the built in machine learning a lager or lager and the built in machine learning algorithms are sufficient for our solutions, whereas a machine learning service is if we want to work in a python environment, we want more control over our machine learning a log rhythms, or we want to use open source machine learning libraries. No one thing to keep in mind is models created in studio cannot be deployed or managed by machine learning service. So if we're using studio, we have to stick with the studio, whereas if we're using a service, the email service, we can deploy those models in the studio as well. Because again, the service gives us more control over our models over what we're developing on over what we're not developing. Let's go through a demo off. Hard to open up a azure account which weaken do for free. And the best part about it is Microsoft gives us a $200 credit in this free as your account when we're using it. Way we need to do is sign up for B as your machine learning service, and we navigate to this u R l as your duck Microsoft dot com Was she learning service? And here we have an option to start our free subscription to as your machine learning service. We're gonna go ahead and click on start free. We also have the option to buy, but again for this demonstration for this course. And in the beginning, if you are not familiar with that, your machine learning, I would always suggests creating a free account, especially if you have an option to create a free account. So here's gonna ask you to sign in. If you don't have a Microsoft to Kahlan, you would need to create one. But I already have ones are I'm going to just select my account to use and sign in once I've signed in. It gives us a basic form that we need to fill every thought, the form it wants to make sure just again. It avoids spam and avoids bots. It will need to verify the phone number that you have provided. Biscuit just sends a basic text message with a cold that you need to input. Go ahead and verify that code and another thing that it requires and some people might not like. This is you do need to provide a credit card number in order create their count. But rest assured, the credit card is not charged unless you specifically decide to upgrade to a paid account . But again, just for Microsoft. Avoid bots and for them to avoid people creating unnecessary accounts. It does require a credit card number two be in. Put it in there. So as soon as you have put in a valid credit card number, here's agreement that you need to agree to and then also if, for example, you would like to receive any tips and offers from Microsoft related to azure and machine learning, you can also like this box. It's a preference if you like toe read updates and news on marketing, which here I just leave it unchecked. I try to avoid any unnecessary emails clocking up my inbox. But if in the beginning you are getting new into leisure and machine learning, I would suggest you select this box. Does she can stay updated in terms of what Microsoft and what as you're is doing in machine learning. So finally gonna click on, Sign up where is going to confirm our information and as soon as that's done, is going to take us to a to the dashboard off Microsoft as your machine learning service. So again, like I mentioned in the beginning of the course, there are two options where markets off gives us, which is the Machine Learning Service and the machine Learning studio. So this is the service, and as you guys can see, it gives us very detailed information on what we can do in terms of creating a models with python and the different languages, whereas the studio is Mawr Dragon drop focused. So once you have created an account in machine Learning service, it also translate into the same account for machine learning studio. So we're going to do is navigate to the studio version, which is studio dot as your ml dot net, and we're going to use the same credentials we just used to create the service account in Studio. And here we go. So this is the Microsoft Machine Learning Studio. Whereas this is the Maxim Machine Learning Service right off the bat, you guys can see there is a major difference in terms of the look and feel off the service as compared to the studio. The studio is very dragon dropped, very gooey, focused, whereas the service is a lot more technical focus where we can do a lot more programming and a lot more play with the models that we want to use and machine learning. Those are the steps you need to do in order to create a as your studio and azure service account. And once you have those created, that's when we can start creating our experiments creating a web. Sir, our web services and notebooks. And again we will go through this in much more detail in terms of what the machine Learning Studio has to offer in the third lesson of this course. So thank you guys for watching the first lesson. I hope you guys enjoyed learning what is machine learning and what is a German she learning . And as your homework after this lesson, please go out and create your free account with as your machine learning service, go through some of the options that we had that there are over there and also the same for the machine learning studio. Please familiarize yourselves with look and feel of both of them. For this course, we will focus more on the machine learning studio and what it can do for us to think again for watching Please go out and create those accounts and we will see you back for the second lesson. 3. How does ML Studio work: Hello, everybody. And welcome to the second lesson in azure ammo for beginners. And this lesson we're going to look at how does as your Emel studio work And if you remember from the first lesson, this course is primarily concerned with as your normal studio as compared to the azure amel service. So the different topics we're going to cover in this lesson, our first of all, we'll look at some key machine learning terms so you guys can familiarize yourself as you move along throughout the scores. You'll hear them quite often. So it's good to know what they actually are and what they actually mean. And then also look at some secondary term just to keep in mind, which don't occur as much as the key ones. Well, look at the interactive workspace that, as your most studio has, and then we'll go on to look at what are the different components often experiment if we want to do an experiment in our business. If you have a use case, what are the different components that we just keep in mind? Then we'll go on to look at deploying a predictive analysis Web service and then finally we will close out with some basics to keep in mind with as your animal studio, which will help us in our next lesson when we actually do a tutorial on as your Amel studio . So moving on some key ML terms to keep in mind the machine learning terms, believe me, can be very confusing. So here's some definitions of key terms to help you throughout this course and throughout your journey in machine learning. First and most important, one is data exploration, descriptive analytics and predictive analysis. The data exploration is a process of gathering information about large and often unstructured set of data in or defying characteristics for a focused analysis. Descriptive Analytics is a process of analyzing that data set in order to summarize what happened so besting an exploration. We're looking at the raw data when we get into the analytics. We're looking at summarising that raw data in some sort of way that makes sense and then predictive analysis the process of building models from historical or current data in order to predict future outcomes. So predictive analysis is basically the core of machine learning helps us to predict, for example, how maney chocolate bars were going to sell in the future, or how many customers will most likely be entering our store in the future or on the weekends or on Mondays or on Tuesday. Next one is supervised versus unsupervised learning. Supervise supervised learning. Logarithms are trained with label data. So, in other words, data comprised of examples of answers wanted. So, for instance, a model that identifies fraudulent credit card use would be trained from a data set with label data points off known fraudulent and valid charges. Most machine learning is supervised. I'm supervised. Learning is used on data with no labels, and the goal is to find relationships in the data. So, for example, you might want to find groupings of customer demographics with similar buying habits. Is an example of unsupervised learning and, lastly, model training and very evaluation. A machine learning model is missing an abstraction of the question you are trying to answer , or the outcome you under predict models are trained and evaluated from existing data. Trading data is when you train a model from data. You use a known data set and make adjustments toe the model based on the data characteristics to get the most accurate answer. Erzurum L A model is built from analog or the module that processes training data and functional modules such as a scoring module. Lastly, evaluation is once you have a train model, you need to evaluate the model using the remaining test data. So you use the data you already know the outcomes for, so that you can tell whether your model predicts accurately. So, basket, you look at the historical data, see what happened, and then you look at what this model is actually predicted, and then you can compare to see if the model is actually accurate or if it means to be tweaked. So they have some key terms to keep in mind and some other terms that we will come across quite often, a few of whom I'm sure you guys have already seen and a few of them you have seen in this course. So I won't go through all of them, but just a few key ones of these to keep in mind. First of all, a log rhythm. What is in the lager them? You've heard me say it quite a few times throughout this throughout the first and this lesson. So in Logar, them is basically a self contained set of rules used to solve problems through data processing, through math or through automated reasoning. Ah, couple other ones that will go through quickly for you. Anomaly detection Just like the name says. It's a model of flags, unusual events or values and helps you discover problems For an Apple credit card. Fraud detection looks at unusual purchases categorical data data that's organized by categories and that could be divided into groups. So, for example, a categorical data set for autos could suppose could specify year May model and price continuing a continuation of their is classification is basically a model for organizing data points into categories based on the data set for which category groupings are already known. And one last one that I would like to go through because we will be looking that looking at it quite often is regression. Some of you who are familiar with statistics might have come across this or during your college days. Or if you have a master's during a master days, you most likely I've come across this term called regression, so it's basically a model for predicting of value based on independent variables such as predicting the price of a car based on its year and make. And we'll be doing quite a few regression analysis in machine learning. And most machine learning does quite extensive progression analysis in order to predict outcomes that we're looking for now. These the rest of the terms are defined in the handout that you guys have with this lesson . So please go through the rest of these terms, and I've also included a number of other terms that you will come across here and there in your machine learning venture. So please go through them familiar as yourself with these terms, because it it's good to know when you read something what it actually means. And it's good to familiarize yourself with these terms so it makes doing machine learning and doing some of the work that will be doing an azure studio much easier. So what's an interactive workspace as your most studio gives us? An interactive visual works based easily build test and iterating on a predictive analysis model. If you guys remember from the first lesson when I compared studio with service, I mentioned that studio is more of a visual display, a drag and drop, whereas a service it's more coding. It's more language based. And this interactive workspace is basically that dashboard that, as your machine learning, gives you in order for you to work on your models. Developing a model isn't it treated process As you modify the various functions and their parameters, your results converge until you're satisfied that you have a trained and effective Milo, and the studio gives you an interactive visual works based. In order to do that, you can drag drop data stats and analysis modules onto an interactive canvas connecting them together to form an experiment. What you run in machine learning studio. And we'll look at how to conduct experiments a little later on. And to reiterate your model design, you can edit the experiment, save a copy of Desire and run it again because, like I mentioned most, machine learning is an iterated process. You will be changing different variables hair and there quite often, and quite often you will need to run that model again to see if that prediction is as accurate as you want it to be. And then when we're ready, you can. We can convert your experiment, your training experiment, toe a predictive experiment and then publish it has a Web service so that the model can be accessed by others because a model is no use if you're going to keep it stored on your PC, we want to make sure that if we have developed a model for our organization that is available for the right stakeholders to use, and one thing also to keep in mind is, I've also included in the handle for this lesson and overview diagram off the Azur machine Learning Studio capabilities. So please go through that. It gives you a good overview in terms off all of the capabilities that the machine Learning Studio allows you to do. It's a very simple interface, and again, we will go through that in the next lesson. Looking at the components often experiment as it relates to machine learning. An experiment basically has two main items that has data sets and then as modules, Davis said, is data that has been uploaded to machine learning studios so that it can be used in the modelling process. A number of sample data sets are already included in the studio for you to experiment with when you sign up and again, we will look at that in the next lesson, and then we can also upload our own data sets when and as needed. Modules, On the other hand, is analog rhythm that you can perform on your data. And ML Studio has a number of models ranging from data ingress functions to training, scoring and validation processes just to give you some examples off what models are included, and again, we will actually, visually look at it in the next lesson. One of the modular has is Compute Elementary Statistics, which calculates elementary statistics such as mean standard deviation, etcetera. It also has a linear regression again that goes back to the beginning of the lesson that you guys need to familiarize yourself with these terms. So Linear Regression basically creates an online Grady Int descent based linear regression model. And there's a number of other Mott modules ever already pre built in ML studio. And just as a side note, if you're working in a mail service, that's where you would actually manually create your own models as and when you need them based on programming languages. Now, once we have developed an experiment, we've tested it. The prediction is as accurate as it possibly can be. Based on the historical data that we've looked at weaken, deploy our model as a Web service and to develop a predictive and also model you. We typically use data from one or more sources, transform and analyze that data to various manipulation and statistical functions and then generate a set of results. Developing a model like this isn't intuitive process. So as we modify the various functions and their parameters are results converged until again, we're satisfied with the prediction that it's done. Animal studio gives gives us an interactive visual works based easily build test, and it rate on a predictive analysis model again, like I mentioned, weaken, drag and drop as and when needed. And then once we're happy with that, we can, through a click of a button deployed as a Web service, and it can be available to the rest of your organization or to a certain set off stakeholders who will be using this predictive analysis or this set of data? No, some as your ML basics and the best way to show you guys. Some of the basics is Microsoft has actually developed a very, very useful infographic that takes you through what is has your machine learning studio and where the basics that we need to keep in mind. And again the euro, you guys can see you can download from there and again. I have also included it as a handout for this lesson. But just to take you guys quickly so you guys can see this. Infographic gives you a good overview about animal studio and about as your machine learning. And like I mentioned, machine learning simplifies data analysis and empowers you to make decisions based on data , because nowadays lots of organizations make decisions based on hunches based on past experience. But in today's day and age, data is everything we collect tons and tons of data and everything that we do. And that data is no use to us if we're not using it properly. And machine learning allows us to use that data to make knowledgeable decisions that will differentiate our business as compared to our competitors. So how does machine learning allow us to do that? So here this infographic basically takes you true answering a few questions. For example, ask yourself. You want to predict values or you want to find unusual occurrences or you want to discover structure or you want to predict categories you want to detect anomalies. You want to cluster similar data points First fall asked herself what it is that you actually want to do. What you actually want to know, whether you want to know how many cars you're gonna sell next month, what they want to know. How many customers are going to come into a restaurant on a certain on a certain day or on a holiday. So that's the first step. Is knowing what you want from that data? Because if you don't know what you want from, the data will be lost in a black hole. Off data will overpower you, so it's very good to have clarity in terms of knowing what you want. And then after that is where the machine learning helps you decide whether the prediction is between two categories or between multiple categories. Whether it's a simple answer or whether it's a complex answer, for example, of prediction between two categories is like you see on there is this tweet positive yes or no? Will this customer renewed the certain their service, Yes or no? Which of the two coupons draws more customers again? Yes or no question, whereas more complex question is what is the mood of this tweet again can be several different answers with service. Will this customer choose again? Depending on the number of services you offer can be a multitude of answers on the Lastly, which of these several promotions draws more customers again? You can see a comparison between a simple predictive analysis as compared to a more complex predictive analysis that in walls more than two answers. And as your ML works by teaching the software to find patterns in the current data so you can seek out patterns in the future. Davis again The whole point of machine learning in Azure is to predict the future for you. So here it gives a very good example. Let's say again that you rent cars and you want to predict the demand of your product so very, very simple steps. First step is get the data. For example, the cars that you sold last year next up is preparing the data cleaning the data combining the data sets preparing her for analysis because the more clean your data is, the more accurate the prediction will be. Third step is training the model. Feed the information into the machine to teach it what you wanted to predict. The force that would be to score any value at the model, test the model's ability to predict the original data and evaluate its success and then finally predict future demand. Use the model to predict future spikes and shortfalls in demand. So in Step four is where you basically conform based on historical data, if this model is correct or if it's incorrect. And like I mentioned, since this is a iterated process, if in Step four you find out while it really didn't predict properly, then you go back to Step three and change your model, or you go back to Step two and prepare your data, clean it more efficiently, more effectively and again, I'll let you guys as a homework go through the rest of this infographic, which again gives you different examples of what regression models are available. What a now anomaly detection models are available, clustering and someone so again These are all the pre built models that are available in Azure ML studio. So please take the timers homework to go through. The rest is infographic, go through some of these models that are already available toe. Familiarize yourself with them to know what they actually do. Toe. Know what actually predict and will help you figure out which one will be right for your organization. So thank you again for watching the second lesson. I hope you guys found some insight into what as your ammo studio can offer you. And in the next lesson, we will actually take a live demonstration off what? As your ammo studio actually looks like And what different options are available? NML studio for us to analyze our data. 4. Data science: Hello, everybody. And welcome to the lesson on understanding data science, So data size can be very intimidating. But I'll introduce the basics here without any equations or computer programming jargon, since this is a fundamentals course. So in this lesson, we're going to cover five different topics. Will look at the different type of logic questions that are within data science. Well, look at what it entails to ready your data. Then we'll talk about asking the right questions as it pertains to data science. Next week we're going to look at predicting our own answers, and and finally, I will talk about not reinventing the wheel because there are a lot of resource is already available on the Internet for you. So it's there's no need to do something over again. If it's already been done, it might surprise you. But there are only five questions that data science answers, even though it's a very complex subject. In essence, there are five basic questions. So let's start with the question. Is this a or B? This family of the logarithms is called a two class classifications. It's useful for any question that has just two possible answers. So for example. So, for example, if you work for a tire manufacturer, one of the questions that we can ask is, Will this tire fail in the next 1000 miles in the next 10,000 miles? On the answer that is either a yes or no or in retail, we can ask a question, which brings in more customers of $5 coupon or 25% discount again, it's either either or answer the second question. The second question the data science answers is anomaly detection. For example, if you have a credit card, you've already benefited from my only detection. Your credit card company analyzes your purchases so that they can alert you to possible fraud charges that are quote unquote weird might be a purchase at a store where you don't normally shop or buying an unusually priced item. And again, this question could be useful in many ways. For example, if you're a car with pressure gauges, you might want to know, Is this pressure gauge reading normal? If you're monitoring the entrance, you want to know, is this message from the Internet typical on nowadays? Lots of wires and the mouth are out there. This could be very, very useful. The third question that it answers is how much or how many? And that uses regression logarithms, for example. For example, you can ask, what will the temperature be next Tuesday, or what will be my fourth quarter sales? Be so again, the question is, how much or how many? The fourth question is, How is this organized? And that uses clustering a log rhythm because sometimes you want to know the structure of a data set. How is this organized for this question? You would use clustering logarithms. So some examples of this questions are which viewers like the same type of movies or which printer models failed the same way. Because by understanding how did has organized you can better understand and predict behaviors. And the final question is, what should I do that uses reinforcement learning a wall girth? This was based inspired by how the brands of rats and humans respond to punishment and rewards, because these logarithms learned from outcomes and decide on the next action. Typically, reinforcement learning is good fit for automated systems that have to make lots of small decisions without human guidance. Most of the A I features that are coming in. Coyote are using law enforcement learning a longer than record it because they're teaching machines how to act more like humans. For example, a self driving car is a very good example off one main item that is using reinforcement. Learning along with before data says, can give us any answers or any answers that are usable. We have to make sure that we have high quality raw materials to work with raw materials being data and in data science. There certain ingredients that must be pulled together that are considered core ingredients . The 1st 1 is relevance and select, Name says. We have to make sure that we use data that is relevant to the answer that we're trying to predict. So the next angry is connected data. And as you can see on the screen, we have two different options. Disconnected data and connected data in the disconnected data. There are lots of fields that are missing, so we wonder. Trying to predict over in machine learning is trying to predict an outcome. It's going to have a hard time building a relationship between Dostam columns because of the missing data, whereas in the connected and the machine. Learning will have lots of information work with to predict a valuable answer. The mixing room is accuracy. If you look at the target in the upper right, there's a tight grouping right around the bull's eye. That's of course, pedicure. Oddly, in the language of data science, performance on the tiger right below is also considered accurate. If you mapped out the center of these arrows with you that it's very close to the bull's eye, the errors are spread out all around the target, so they're considered imprecise, but they're centered around the bullseye, so they're considered. I correct. Now look at the upper left target here. The air Is it very close together? A tight grouping, their precise. But they're enacted because the center is way off. The bulls up the air was in the bottom left are both inaccurate and imprecise. The final ingredient is sufficient data. Now think of each data point in your table as being a brush strokes in the painting. If you have only a few of them, panty can be fuzzy, and it's hard to tell what it is. The same goes for data. If you have if you don't have a few data points, is going to be very hard to predict an answer. So you want to make sure you have sufficient data in order to predict the answers that we want. We've been talking about whole machine learning on how data science predicts the future or predicts what you want to know. But it can't be just any question. It has to be a sharp question. A vague question doesn't have to be answered with the knee or a number. A sharp question must be answered with a name or a specific number. Now let's separate. Probably find a magical apple, the genie who will truthfully answer you any question. But it's one of those mischievous genius tried America's answer as vague and confusing as he can. So you want to pin him down with the question so airtight that you can't help but tell you what you want to know Now? If you were to ask of a question like What's going to happen with my stock, you might get an answer like the price will change. That's a very truthful answer, but it's not really helpful. But if you were to ask the shop question like What will my stock sale price be next week? Here you have no option but to get a very specific answer again. It's very important that the question that we're trying to ask the question that were trying to answer is very specific. It's very targeted toe what we want to know and to what will help out our business case. To develop a predictive analysis model, you will typically use data from one or more sources, transform and analyze that data, various manipulation and statistical functions and generate a set of results. Developing a model like this is an iterated process. As you modify the various functions and the parameters, your results converge until you are satisfied that you have a trained, effective model. One good thing about as your machine learning studio that it gives you an interactive visual works based easily build test, and it rate on a predictive analysis model. It's a very useful tool, especially when you're starting out with machine learning and predictive analysis as you're a machine learning studio is the best time for you to use to start getting used to machine learning to start integrating ai into your business decisions and you're business cases. And lastly, don't reinvent the wheel. There are lots of resource is available on the Net that you can use to help in your machine learning journey, and one of the best ones is the Azure Gallery. It contains resources, including the collection of machine learning experiments, models that people have built and contributed for others use. And these experiments are a great way to lovers of thought and hard work of others to get you started on your own solutions. And again, everyone is welcome to Bross would because it's free. If you're clicking on the experimental, see a number of the most recent and most popular in the gallery, or you can browse them all oil and browse the industry's. So there's a multitude of different options that you guys can use to browse through the different experiments to find the one that matches your business or your use case very specifically. So thank you everybody for watching. I hope you guys found information on data science useful and as over cause I'm please go out to the Azure Ai Gallery and browse around the different experiments that are out there because you might find one that matches very closely to your business or to your use case. Thank you again. 5. Getting started with ML Studio: everybody. So in this lesson, we're going to look at the Azure portal. Specifically, look at the ML studio. We're gonna familiarize ourselves with auto log in with the different options that are available that will help us out in the future lessons when we actually start developing the experiments and start predictive analysis, the first thing I'm gonna do is going to the Azure ML studio portal. So that portal can be found at studio dot your ml dot net. After I get there, I'm going to sign it. So once you guys have signed in, you guys see a few experiments that I'm working on. But for the first time you're signing in, this will be empty. So this is basically called the as your work bench or another words is biscuit the dashboard you guys see when we first log into the machine Learning studio on the left? Inside are all the different options that are available depending on the type of subscription that you have. This is for the free subscription that Microsoft offers for the first year when you're working with azure portal. So these are all options that they allow you to use for free. But if you have a paid subscription, there are many more options that are available, depending on again the level and type of subscription that you have to the azure service. So for this machine learning course, and for this machine learning demo, the free version works just as well. Where you have projects we can create new projects and experiments is where it lists all the different experiments or predictive analysis item that you are doing. One good thing about studio is if you click on the samples. Microsoft has a ton off pre built experiments already for you that you can use to start working on your own or that you can use as examples to develop your own. So if there's certain experiments that you want to do in marketing, let's say for direct marketing, you can use one already built by Microsoft and just expand on that or modify it based on your specific needs. So that's one Vega thing about the M L studio are the ton of experiments that are already pre built for you. So as the first homework assignment, please just go through and click on some some of the experiments that sound interesting just to look at how they look and feel for you, because when we develop our own experiments, this will definitely help out just to give you guys an example. Let's say that I want oh, quickly take a peek at what an experiment looks like. So I will pick a direct marketing experiment, and it will take me to the already pre bold experiment that Microsoft has done for us. And here's where we consume in to see all of the different variables that are being used. And again, we will go through these and more detail when we actually develop our own experiment in the next lesson just to go back again. This experiment section list experiments that you're currently working on and the samples are this pre built samples that Microsoft already has for you. The Web services section allows you to see all of the different Web services that you have published. So this is after you have completed your experiments and you have published him as a Web service. They will show up here will allow you to view the links and the ur eyes for all of the experiments that you have done that you want to use or or that you want tohave your business stakeholders use. We also have an often for notebooks and again, these are more used for programming. Or if you want to modify your own modules, you will see these here. The data says All you to see all the data says that you have uploaded or again like with experiments. Microsoft has a number of pre uploaded data says that you can use as testing to do when you're developing experiments. Let's say you want to developing experimental do predictive analysis on a certain item, but you want to test out to see if your logic is correct. You can use one of these pre bowl data sets is to practice on creating experiments and familiar as yourself. What different logics will work in what circumstances train models you will see after you have completed the experiments and you have trained the different models for conducting predictive analysis. Off them will show up here again, depending on how Erman experiments that you are conducting, you will see that many trained models here and then finally, the settings tab allow you to change different settings off your workspace, whether it's the name, uh, this the description, you can see how much available disk space you have used it again for free disk space. They get marks are gives you 10 gigabytes off space, but for paid work space again, it can be considerably larger, depending on again. What tier of subscription you have with Microsoft. One of the good things that you are able to do is you can invite other users to your workspace. So, for example, if you want other users within your organization to see your experiments or collaborate with you on the experiments that you're doing, this is where you can might them to your workspace on the bottom. Here, you can see in white more users, and here is where you can input in their email addresses, and they will get an invite to join your workspace. There's a very good tool, because again, most experiments that you will be doing our most experience that organizations duenwald more than one person. It usually involves a team, so this is where the entire team can collaborate and work on a single experiment or multiple projects again. You can have project based teams you have experiment based teams just depends on the type off set of that you have or that you would like to do and then one another. Good thing is, it gives you an option to join machine learning forms or read machine learning forums. It's a Some of the forms give you very good insight in terms of answering some questions that you might have for what kindof regression or kind of analysis. You should run on what type of data. So as your homework assignment, please go through list off forms that are out there because you might find some that could be very useful for you and your organization or you in a specific project or experiment that you're doing. So it's a very good practice to see what others are doing in your industry. So again you can see there's quite a few different forms that are available with a variety of different questions. And again, if you have one specific question, you can also post your own question and hopefully get answers from other experts in the field. And lastly, what's if we want to create a new experiment? Or let's say we want to upload our data. If you can see the option for new on the bottom left hand corner. If you click on that, this is where it allows you to upload your own data set. If you the here is, you can go ahead and choose and upload whatever data is that you have for your organization . Additionally, you can choose different modules again. These air already pre built modules by Microsoft's, is We're working in studio for em. For Emel Service Again, you build your own modules from different programming languages. The experiment on again these air already pre built experiments by Microsoft to make our job a lot easier. So depending on what type of analysis that you want to do with your data, you can pick and choose whether it's K means clustering. Whether it's our model, well, there's linear regression with the binary classification again, depending on your problem, depending on what question you are trying to answer, there are tons off pre bold samples already for you or again, we can. You can always start with a blank experiment and build your own that is your preferences. So again, this depends on the lover. The level of expertise that are within your organization or within your team. So if you have the sufficient expertise, you're more than welcome to build your own experiment. Or if you lack some expertise within your team, I would suggest I would highly suggest you use one of the pre bold samples that Mark Soft has done for you. It just makes the job a lot easier, especially you don't have a statistical expert on your team, and in the project window is where you can see and create new projects projects. How is multiple experience experiments with them? So again, just depends. If multiple departments are using are going to be using machine learning, you can have a project for each department for for the marketing department, for the logistics department and then within those projects, we can have multiple experiments within the marketing project. We can have multiple experiments for direct marketing for email marketing, for social media to see what kind of outcomes we can predict and what campaigns are working in what campaigns are not working. The last option that it gives us for creating new is a notebook and nor books are for people who are more experienced in machine learning. So if you have a python expert or if you have a programming expert, if you have a Jupiter or Jae Sok expert, this is where you can create your own logic. You can creature models or use one of the pre built models that Microsoft already has and just add onto it. So again there are already pre built examples for you. This notebook resemble gives data scientists a complete walk through on using Jupiter Notebook within as your machine studio. So again, there's multiple walk throughs. There are multiple examples, so that was a quick overview about as your animal studio. I hope you guys got a little familiar with what it looks and feels like. On the next lessons. I will be taking you through a few experiments in terms of designing our own and doing some predictive analysis to see how it works. So I hope you guys enjoyed this lesson, and I look forward to seeing in the next 6. Conducting an experiment: Hello, everybody. And welcome back to endure. I'm all for beginners. In this tutorial, we're going to walk through how to use studio for the first time to create a machine learning experiment. The first thing I'm going to do is log in to my studio account. And for this experiment, we're going to use one of the Davis s that Microsoft has pre uploaded into Azure ammo studio in its library. So the whole concept behind it is trying to predict the value of a product based on its features. So for this specific one, we're going to use data set related to automobiles. And the whole point is to predict what price to sell an automobile ad based on the features that are within it. And this could be translated to any product that you're trying to sell or you're trying to go to market with. So let's say you have developed a specific product and you're trying to figure out what price point to release that into the market. So this can help you decide what is the optimal price point based on the features and based on other variables that you can use instead of these variable that we're going to look at in this example. So basically, in this experiment, we're going to look at the five mean steps. They're involved to build an experiment, to create, train and score our model. So the first step in performing and he kind any experiment is getting the data. So there several sample data sets included within studio. And like I mentioned, we're going to be using the one that's Pacific for automobile price data. So the first thing we're going to do is click on New on the Bottom Left Hand. We're going to start a blank experiment. Now. The experiment is given a name by default, as you guys can see up here, and we're going to change that to Karameh. Bill Christ. Example. Left hand side. You guys can see these are all of the different options that we can use when we're designing an experiment. So the first thing you want to do is get the data. I'm going to click on save data sets and you guys see a couple of examples. 1st 1 is my Davis has. So anything that you pre uploaded yourself you've used previously will show up here and again, if you click on samples, this will give you all of the samples that are pre uploaded by Microsoft. And the easiest thing is just doing a search on top. So I'm going to search for automobile. And here we go. We have automobile price data. I'm just going to drag that out into the experiment field. Now, once is there, I can also see what the data actually looks like. So if white click on it, I could go to data set and visualize. And this will give me a visual depiction off. What the data set actually is now just a tip. Data sets and modules have input and output ports represented by small circles. So this when you see one small circle here as an output, So to create a flow of data through our experiment will need to connect an output port of one model toe on input port off another at any time. We can click the output port off a data set or a module to see what the data looks like at that point in the data flow. So again, for this specific data set were trying to do or trying to ultimately predict is the price, which is the last column you guys here. So this is what we're trying to predict. Based on the historical data that we've uploaded in this data set the next step after uploading the data is preparing the data, and most data usually requires some pre processing before it could be analysed, For example, you might have noticed some missing values in some of the columns is to give you guys an overview. If you go back into the data, we can see that this normalized column has many values that are missing, which might skew our data, which might skew our predictive analysis. We're just going to remove this all together, so the first thing we're going to do is add a model that removes the Normalized Losses column completely, and then we can add another module that removes any role that has missing data again. What I'm going to do is I want to select columns, just expand this. You guys can see what it looks like. So we have two options in data manipulations like columns in the data sets like columns transform. So I'm going to select columns and in data set and just drag it out here. And if you click on it on the right hand side, are all of the properties for this specific select columns in the data set Experiment Model launched. The column selector we want to do is exclude column names is you get to see I have excluded the column by the name of normalized losses going to click on OK, and that shows up here. Next thing I'm going to do is match the raw data with this. So this flow we see we have the raw data and it's going to remove normalized losses column from this data. Next thing I want to do is clean any missing data that we have. So I'm going to search for clean data. There we go clean missing data. I will drag that out here also after I drive the clean. Missing data variable out into the field and the properties is where I can select what I want to do with any cells that have missing data. I can substitute it with a specific value, or I can substitute it with the mean median mode, etcetera, or I can remove the entire column or I can remove the entire row. So for this example, what I'm going to do is remove the entire road, and I'm going to link this data set would clean the data. So now we have the flow of the raw data. We're going to remove one column and then we're going to clean the data set. And at this point in time, what I'm going to do is click on run. Why did I run the experiment now? So, basically, by running the experiment, that column definitions for our data pass from the data set through the columns in the data set model and through the clean missing data model. So this means that any models we connect to clean missing data will also have the same information. So this is basically a clean version off our raw data, so anything that goes below this will start using this clean version. So that's why again, I always do this as a preference, but you don't necessarily have to do this after we've done that to our data is cleaned up and defined the next thing. The next step in machine learning is defining the features, so features are individual measurable properties off something that were interested in. So in our data set, each role represents one automobile and East column is a feature of the automobile, such as four doors, automatic, manual and so on. So finding a good set of features for creating a predictive model requires experimentation and knowledge about the problem that you're trying to solve. Some features are better for predicting the target and others. Also, some features have a strong correlation with other features and can be removed. For example, if you're driving in the city, the mpg used then as compared to the MPG used in highway are closely related, so we can keep one and remove the other without significantly affecting the prediction. So we're going to do is build a model that uses a subset of features in our data set because we can always come back and select a different set of features because this is again an iterated process. We can go back and forth, and most of time you will need to go back and forth to do that. So what I'm going to do is select a set of features or I'm going to do is select columns and data set. I'm going to launch the column selector The rules I'm going to do. I'm going to include column names and I want to include in this analysis I want to include Onley these columns because these are the set of features I believe are the best predictor off what the price should be in a vehicle. And again, this can be dependent on any product that you are trying to sell or that you're trying to go to market with. And again, sometimes you guys will see this red exclamation mark. This biscuit means if you put on it, the import data set is unconnected because obviously there is no connection. Here is what I'm going to do, I'm going to do is just connect this Data column. So once that's done, it produces a filter data set containing all the features we want to pass through the algorithm, and that leads us to our next step, which is choosing the right a lager of them for this experiment. So what we're going to do now is use our data to train the model, and then we'll test the model to see how closely, it's able to predict the price is no classification and regression are two types of supervised machine learning along with arms Classifications predicts. An answer from a defined set of categories such as color and regression is used to predict a number, and since you want to predict a price, which again is a number will use a regression, a log rhythm. So for this example, we're going to use a simple linear regression model. We're going to train a model by giving it a set of data that includes the price. The model scans the data and looks for correlations between an autumn wheels features and its price. And after we've done that, we're going to test the model to see how accurately it is predicting the price. And one thing we're going to do is use our data for both training the model and also for testing it by splitting the data into separate training and testing data sets. Because the reason you want to do that is because with this you can kill two birds with one stone. This will not only train the model, but you will also see if the variables that you have set here, and the regression model you're using accurately predicts the data that you want to predict because you don't want to train the model and then find out later on that it's actually a wrong model. So this way we can train and see if it's correct the same time. And if it's not predicting properly, we can always go back and use a different type off a logarithms. What I'm going to do is split the data. This column. I'm going to split and the split data again. We can choose if you want to split it 50 50 if you want to split it 60 40. What I'm going to do displayed it 75 25. So we'll use 75% of the data to train the model and then 25% of the data for testing it to see if it's working properly 75%. So again, after around the model, you guys can see that there's check Box Joel down here, so it's basically completed our experiment down to here, down to splitting the data so it's got there are data. We removed a column. We clean The missing data were using a subset of features, and we've also split that substantive features into 75 for training and 25 for testing. So now we want to do is select the type of a logger them that we want to use just to give you guys an overview in terms of what different a log rhythms are available. If you click on machine learning, you guys can see all of the different allow algorithms that are available for you to use. What we want to use is a simp apple linear regression. So the longer than I want to use and the next I want to also train my model someone to search for trained model and in training my model. I want to use the linear regression log a rhythm, and this is the data that I want to use. What you have trained model we can use a column selector and the properties pain. This is basically what we want to train a model for. What are we trying to predict? And again, as I said in the beginning, we are trying to predict the price that we want to sell our vehicle for, So I'm going to put that in this column again, we're training our model to use linear regression to predict the price. After done that, I'm going to run the experiment again and again. When you guys see the hourglass, it means it's basically running the experiment, depending the size of her data. That could take some time. But again, since we have a very small subset of data just for this demo, ITT's works really quickly. So now that we've trained the model using 75% off our data, we can use it to score the other 25% of data to see how well our model function we're going to do is find a score model and dragged that out into the field. Use the other 25% of the day that we split and use it for scoring the model. And finally, we're going to run the experiment for the last time. So once that's done, what we've done is we've trained the model, and we're also scored the model to see how accurately it's been trained to predict the price and last we want to do is evaluate our results. So I'm going to click on evaluate model dragged that out into the field to connect the output of the score model into the input of the evaluation model and then experiment to finally complete the evaluation. And once that's done, what we want to do is see the results. We click on the output port and click on Visualize. It will give us a visual depiction off the results. So the following statistics are shown for our model. It gives us a mean absolute error. Our route means squared error, a relative absolute error, a relative squared error and a coefficient of determination. If you guys are not familiar with these terms, these air generic statistical terms and as a very good homework assignment, please look these up and get a meaning off what the's actually means. It could be familiarized with what these numbers actually stand for. Each of these air statistics, obviously smaller, is better. Smaller value indicates that the predictions more closely matched the actual values and for the coefficient of determination, the closer it's valued as toe one, the better the predictions and you guys can see it's 10.91 So now that we have our first completed machine learning exp merriment. We can use it to continually improve the model and then deployed as a predictive Web service. We can keep changing the features we see in our prediction because, as you guys remember, I selected a subset off features that are available in these vehicle use different subset of features to see if the prediction is better using those features as compared to the ones we have used now. So again, it's a continually changing processes and iterative process and will want to make sure that we keep doing that in order to get a optimum model that we can use for predicting the price of each vehicle. Thank guys again for joining me for this tutorial where we ran through a sample experiment . I look forward to welcoming you back for the next lesson. 7. Converting data with Windows PowerShell: Hi, everybody. This lesson. I'm going to take you through a quick walk through on how we can convert the data set format. The original data set uses a blank separated format, and machine learning works better with a CSB file. So I'm going to show you quickly how we can convert the data set by replacing the spaces and commas and convert that into a CS file. There are many ways to convert this data more the best ways. And easiest way is by using the Windows Power Shell Command, I'm gonna go ahead and open up Windows Power Shell then that I need to navigate to where have saved the file you have partial open will need to navigate to where you have saved the file. So, Director Command just list all of the directories that are available. I have saved on the desktop, so I'm going to navigate to the desktop. And after that I can type in a simple partial command to help it convert that fall into CSP . So soon as that command finishes, you guys conceive that I have converted this far out this toe, see SV file and go ahead. Double click and open it up just to confirm thank you guys for joining me in this quick tutorial on how we can convert files to see us using Windows Power show. 8. Uploading data onto Azure ML Studio: this lesson, we're going to look at how to upload data into as your machine learning studio and when you're working with data science, 50 to 80% of the time spent on machine learning project is used to get clean and organized data. It's one of the most important aspects in machine learning, because that's what everything else is based on. So we have to make sure that we have the right data. It's clean, and it's organized in a way that it will help us predict an answer. So where can we get our data? First and foremost again is your local machine or your local network, whether it's ah, static CSB file or a text file or, more importantly, which is happening? Mawr and more nowadays are online sources. Whether it's websites, its SQL databases, blobs is document DP's more and more organizations online repositories where they keep their data. So again as your most due to his ability for you to either upload them directly from the local machine or it can go out and grab them from any of thes online sources. Enough both. Siri, I'll let me show you how it's actually done on how we upload data so we're going to do is use data from our credit card company. So I'm going to use the standard to train a Predictive Analytics model. We're done. Our model should be able to accept a feature vector for a new individual and predict whether he or she is a low or high credit risk effort. Logon to our the Azure ammo studio. I'm going to go ahead and go to data sets and then in the bottom left, I'm gonna click. I knew, Data said. And again, we're going to use a local file that we've just downloaded. I'm going to go ahead and choose to file. And here's where we can enter the name for our new David said. To help us identify what it is I'm with, name it German credit card and provide an addition description just to help you identify and keep the data separate. And that's always good practice to make sure you have an optional description. There we go and again in our data says this is the dashboard where we can see all the data that we've uploaded what's been submitted, who has been submitted by the description the data type one it was created and the size and what project it is associated with. So right now we've done is just uploaded this German credit card risk information toe our data sets. I think that's for generating this lesson. And I look forward to welcome you to the next lesson. Thank you. 9. How to process data: Hi, everybody, and welcome to the lesson on data processing. So in this lesson, we're going to look at what is data processing. Will look at the different factors off prising data in terms of editing your metadata, data preparation and splitting data and followed, Went will go into a hands on tutorials. I can show you what I mean by all of these terms. Now let's listen. We looked at where we can get our data, whether it's a local source or on online source. And when we have that data, we have to make sure that that data first fall is useful because we're going to be using that data to predict an outcome, to predict an answer that will affect our business. And in order to do that, we need to make sure that the data is useful and that machine learning can use that data predict a valuable outcome. Next, we need to make sure that we remove any unnecessary information because nowadays I t systems collect a lot of data, and a lot of that data could be unnecessary for the outcome that we want to predict. That leads us to the last point which is preparing it for a specific problem. Because if you guys remember from one of the lessons I mentioned that we need to ask a very specific question for machine learning toe work properly and in order for Hammel studio or machine learning in general to predict an outcome, we need to make sure that the data that we have can be used to answer that specific question. It's enough by theory. Well, let's go through an actual demonstration off preparing our data. So after I logged into Amel studio, I'm going to go ahead and start a new experiment, and it's always good to give your experiments name that will help you identify what they're being used for as a member. From the proofs. Listen, the data, slowly uploaded, was from a German credit card company that identifies trade risk. So we're going to name this experiment accordingly. So after I've done that on the right hand, I can see a save data sets, and if I expand that, I can see where I have uploaded all of the data sets myself and here is where I can find that German credit risk information. I'm going to drag that out into the field. Once that's done, I can right click and visualize to see what the raw data actually looks like. And you guys can see there's about 100 rose off information in this raw data on one important. We know that this data did not come with any column headings. And since that was a case, M. O studio assigned random column headings for these columns, the first thing you do in processing this data is assigning column names to these columns to help us identify what they are for not only our knowledge, but also when we eventually published this model in the Web service. The headings help identify the columns to whoever will be using this service in order identify the columns we need to edit the metadata. I'm going to drag added metadata model out into the fields. I will link the raw data with edit meta data, and as you guys can see, a red exclamation mark shows up in the edit metadata model. That's because we have not set any properties for this. So in the properties pain on the right hand side, I'm going to launch the column selector, and I want to select all of the columns because none of the columns have any names assigned to them and back in the properties Pain On the right hand side, we can see a new column names option. This is where we're going to type in the names off the 21 columns in the data set, and each name needs to be separated by a comma. So are new column. Names should look something like this where it had all the 21 names. And as soon as that's done, you guys can see that the red exclamation mark is also gone. What I'm going to do now is run the experiment. And when I visualize the meta data, you guys can see that each of the columns now has a name associated with them to help us identify what these numbers actually mean. Next in experiments, we need some data to train our model and some data to test that model. So we're going to do now is split the data set into two separate data sets of one for training and one for testing. So I'm going to drag the split data model out into the field and connect the edit meta data with splitting the data, as you guys can see from the properties pain and most Julia splits the data by default. In the 50 50 fashion, 50% of the data will be output from the first port, and 50% of the data will be output from a second port. And again, we can change this toe whatever we want. If you want to use 60 40 70 30 just depends on how much data we want to use to train the model of Hamas data we want to use to test the model. We're going to leave it as default and use equal amount of data for training and testing the model. As I mentioned in the previous lesson for this. For this credit card data, the cost of misclassifying a high credit risk as low is five times higher than the cost of misclassifying. A low credit risk is high, so we need to come for this, and in order to do that, we're going to generate a simple our script. What that's going to do is each high risk is going to replicate it five times, while each loris is not going to be replicated at all. So what that will do is even at the high risk and the Lorries to help, and most will predict a help demo studio predict an answer for us. In order to do that, we need to run a simple our scripts. I will dragged execute our script model out into the field. And as you guys concede that and the properties pain by default. This is the R script that it will execute. So we go ahead and open that up. What we do is type in our customized our script, and we need to do this for both off the output ports of a split data because again one is used for testing, one is used for training. We need to make sure that we do run the R script on both of these ports. The easiest way to do that to replicate is we just copy and paste that same our script, and there you have it. What we have basically done in these few steps is pre processed our data. So the admitted data we have made this data useful by signing those columns specific names and by splitting the data and executing the R script. We've prepared the data for our specific problem because we've accounted for the high and low credit risk applicants. Thank you guys for joining me in this lesson, can look forward to welcoming you in the next lesson to retrain and evaluate models. 10. How to test and evaluate models: Welcome everybody. Thank you for joining me in this lesson where we're talk about training and evaluating the model. One of the benefits are using Machine Learning Studio for creating machine Learning Miles is that it gives us the ability to try more than one type of model at a time in a single experiment for us to compare results. This type of experimentation helps us find the best solution for our problem. Nomine in this experiment that we're developing will create two different types of models and then compare their scoring results. You decide which, along with them we want to use in our final experiment. So let's go ahead and take a look at the different models that we have available for. So here's we would go back into the experiment that we have been developing in the past couple of lessons and left inside in the machine learning paying. You guys can see all of the different models that are available for us to use. For the purpose of this experiment, we're going to be using a couple of modules. One is a two class support vector machine, and the second is a two class boosted decision tree. What I've done is who dragged the two class boosted decision dream model into the field, and I'm also going to drag the support vector machine. But I've done is dragged the two models out into the experiment field. Next, I want to train the model using these models. I'm going to drive the train model into the field also. I mean, if you guys remember in the proves example, when we're talking about pre processed pre processing the data. We had created an R script to help us with the low and high credit risk differentiation. So I also want to connect the R script with training my model. Now, when I've done that, you guys could see that there's a red exclamation mark on training model. That means some of the properties need to be identified for this and the properties pain. We're gonna go ahead and launch the column selector, and the thing that we're trying to predict is the credit risk. So if I start typing, I gives me a list of all the column names we want to select credit risk. You click on OK, and we're going to do the same for the two class support vector machine. Now, just a little explanation about about SBM boosted decision Trees worked well with features of any type. However, since the SVR module generates a linear class of the model of the generates, has a best test air with all numeric features have the same scale. So we need to convert all the mic features to the same scale. And in order to do that, we to normalize the data. This transformed our numbers in tow. A 01 range, The SPM model converts string features, too categorical features and then binary 01 features, so we don't need to manually transform string features. Also, we don't want to transform the Credit Risk column, which is calling 21. Since it's numeric, Here's guys can see that I have included the normalized data into the field. When I click on Lauren was data we can select the properties and in order to the transformation that we to do, will require the 10 H method. After I do that, I can lost a column selector. So the lost Coast Lecter. I had to make sure I'm in. No columns on include column type all the Merrick columns that I want to exclude column names and here's reckon, exclude that credit risk. And lastly, after we've trained the data, we want to so score and evaluate the models not to evaluate the two scoring results and compare them. We used to evaluate model. We're going to connect the two scores to evaluations. What is this going to do? Is going to evaluate this model on the right hand side, and it's going to evaluate this model on the left outside on again, for one were using that to class bullshit decision Tree and for the oven reasoning with two class SPM finally go ahead and run the experiment. Once that's done, you guys can see there were green check box, and we can go ahead and right click to visualize the results. And the value model produces a pair of curves and metrics that allow us to compare the results of the two scored models. And here we can see the score data set or the score data set to compare to highlight associated curves. And in the letter for the curves scored data said correspondence that left in port port of a valued model and in our case. This is the boosted decision Tree and the score data set to compare corresponds to the right in port port, which is the SPM. And by examining these values, we can decide with model is closest to giving us the results that were looking for. And again we can go back and it rate on our experiment by changing the parameter values in the different models. Now the science and art are interpreting the results and turning the model. Performance is outside the scope of this walk through. But please look out for our future courses where I will take you through an in depth look on how to interpret these results and how to reiterate our experiments if these results are not what we're looking for. Thanks for joining me in this lesson. And I look forward to welcome you into the next one. We would deploy this experiment as a Web service 11. How to deploy a web service: Hi, everybody. And welcome back, as you guys have noticed in the press lessons, we've looked at uploading data. We've looked at how to create an experiment and pre process our data and then went out to look at how we train and evaluate and score our models. And here is where we have developed our experiment and our model to score. Help us develop the credit risk score based based on this German credit card information not to give others a chance to use this predictive model weaken deployed as a Web service on azure and up to this point, we've been experimenting with training our model, but the deployed service is no longer going to do training. It's going to generate new predictions by scoring the users and put based on our model. So we're going to do some preparation to convert this experiment training experiment, toe a predictive experiment, and in order to do that, we need to remove one of the models because again, the main reason we put these two models is to help us decide whether a boosted decision tree or SPM better helps us predict the credit score. So let's say we've decided the boosted decision. Tree is a better predictor off the credit risks or we can go ahead and remove the SPM. Now. Once you've done that, we are ready to deploy this model using the two class boosted decision treat to get this model ready for deployment. And we need to convert this training, experimental predictive experiment and the simple step to do that is on the bottom. We can deploy as a predictive Web service and again you guys can see what happened. The train model is converted to a single train model model and stored in a model palette. The Web service input and Web service output are both added thes. Identify where the user's data will enter the model and where the user's data is returned. We need to take one additional step with this particular experiment. If you guys remember, we added to our script models to provide a weighing function to the data. That was just trick we needed to for training and testing so we can take out those models in the final model machinery studio removed one execute our skip model when it removed the split model. Now we can remove the other one and connect the metadata editor directly to score model, and we run the experiment for the last time. And here we concede that the Web service has been deployed and derived from our experiment , and this is again where other users can go input information and derive a credit risk score . Additionally, it gives you an A P I key if you want to integrated to any of your other websites that your organization might be working up. So thank you guys for joining me in this lesson where we talked about how to deploying it as a Web service. 12. Business case: Hello, everybody. Welcome to the tutorial on experimenting with a real business use case. So this experiment, we're going to demonstrate a demand estimation using regression and the data that we have is from a bike rental company. This experiment demonstrate the feature engineering process for building aggressive model again by using the bike rental demand prediction as an example. This example demonstrates a feature engineering process for building a regressive model using bike rental demand. Prediction as an example, and I'm going to demonstrate effective feature injury will lead to a more accurate model. The field that wanna predict is account value ranging from 1 to 977 and that represents the number of bike rentals within a specific hour. That's essentially what we're trying to predict are the number of bikes that will be rented in a specific hour and again because our goal is to construct effective features in the training data. What I've done is built four models using the same lager than but with different training sets. So because our goal was to construct effective features in train data, I've built four different models using the same Malaga rhythm as you guys can see there's four different sets. That's a, B, C and D said. Is all of the variables while BC India has all of the time variables in terms of when each bite was rented. Each of these features that captures different aspects of the problem, feature said be, captures the recent demands. SETC captures demand for bikes at a particular our and D captorsdemands for bikes at a particular our and a particular day of the week and the four training data sets that were built by combining the feature set as follows. The 1st 1 we built feature Set A, only the 2nd 1 features A and B, and the third is A B and C and the fourth are all four off the variables. And did you guys see here on the screen? On the left hand side is 20 said one hears turnings. That, too, here is 27 3 and on the right hand side, his training set for now, the model that we've used. We've used the regression model because the label column, which is the number of rentals, continues continuous real numbers and given that the numbers of features is relatively small, the features are not too sparse, and the decision boundary is very likely to be nonlinear. And based on these operations, I've decided to use the boosted decision tree regression or longer than for this experiment again. Overall Experiment had five major steps. Getting the data data, pre processing feature engineering, trading the model and then finally testing and evaluating it. So the data again, I've already uploaded here. The data from the bike rental company after getting the data at the next step, is the data pre processing or readying the data? So this experiment muse, the metadata editor and the project column to convert to numeric columns, which is the weather in season into categorical variables. And then we also remove the four less relevant column, which is instant casual, registered and time of day. So in this ad, meta data is where we've changed the weather and season, and then the columns is where we've removed those four columns. On the next up was feature engineering the normal. When preparing training data, we have to pay attention to two requirements. First is finding the right data, and second is identifying the features that characterize the patterns in the data And then if they don't exist, we need to construct them to provide a better predictive power. And this is what we call feature engineering. Now, in this experiment are constructed four copies of the data set resulting from project columns and used the execute our script. Would you guys see here? So after we have constructed the data we've engineered the features that we want on the features we know will give us a predictable answer. The next step is training the model, and this is where we choose a longer than that we want to use. And again, there are many different kinds of machine learning problems that you guys saw in the data science lectures. But for this experiment, we've used the booster decision tree regression model and again commonly used non linear log rhythm to build models that I mentioned before. This is an iterated process. So, for example, if this aggressive model was did not or does not work out for you, you can always go back and change the regression model to something else to see if that gives you a better predictable answer. And I've also used a split model to divide input data in such a way that the training data was based on the data from year 2011 and the testing data was based on the year 2012. And finally we train, evaluate and score the model. The score model scores a trained classifications or regression model, against a test data set that is the model that generates predictions Using the train model that is, the module generates predictions using the train model, whereas evaluate Model takes the scored data set and uses it to generate some evaluation metrics. You can again visualize thes evaluation metrics. I've just ran the model, and you guys can see there's an hourglass next to most of them. That just means that it's currently running this experiment and it gives you a time on the top right hand. So school is the experiments done? You couldn't see that Green check boxes are on all of the variables, so if you see the results of the right click on evaluate model, we can visualize the results as you guys can see for the first data set, which had features eight. We have a mean absolute error of almost 19 whereas for a and B. We have an absolute there, 51 then we compare that with our other data set. So this was a B and C. We have an absolute care of 47 and all four of them ABC. Andy, we have an absolute here, 48. So what this basically tells us is features A, B and C and A, B, C and D provide us the best results in terms of predicting how many bikes we rent it on a particular day. Whereas if you compare these two, there's not much difference. So the D feature does not make that much of a difference. So So if we just rely on A, B and C, we will get an experiment that gives us a prediction that we can rely on and based business decisions. So again, this was this as a recap off what we did again, we uploaded the data. We got rid some off the metadata. We edited it. We got rid of some of the columns and then we've broken up this experiment into four different data sets and used the regression model. Allow us to decide what variables play a deciding factor and letting us know how Maney bikes will be rented on a specific day. So, thank you guys for watching this tutorial, and I look forward to welcoming you to the next lesson.