Machine Learning with Python and Scikit-learn for absolute beginners | Engineering Tech | Skillshare
Drawer
Search

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Machine Learning with Python and Scikit-learn for absolute beginners

teacher avatar Engineering Tech, Big Data, Cloud and AI Solution Architec

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

    • 1.

      Introduction

      1:02

    • 2.

      What is Machine Learning?

      1:19

    • 3.

      Machine learning process

      2:03

    • 4.

      Types of Machine Learning

      3:17

    • 5.

      Creating an Anaconda Spyder development environment

      2:39

    • 6.

      Python NumPy Pandas Matplotlib crash course

      14:21

    • 7.

      Creating a Classification Model using KNN algorithm

      15:07

    • 8.

      Saving the Model and the Scaler

      4:08

    • 9.

      Restoring the Model from Pickle file and using it locally

      3:04

    • 10.

      Exporting the model to the Google Colab Environment

      4:20

    • 11.

      Understanding Flask web webframework

      4:08

    • 12.

      Creating a REST API for the Classification Model

      5:05

    • 13.

      Linear Regression

      9:02

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

87

Students

1

Project

About This Class

This course covers how to build Machine Learning models from scratch using Python and Scikit-learn libraries. Course structure is captured below

  • Machine Learning process
  • Python, NumPy, Pandas basics
  • Building classification models using Scikit-learn
  • Deploying classification models using Python Flask web framework
  • Building Regression models using Scikit-learn

As a prerequisite students should have basic programming skills and high school level Mathematics knowledge before getting started with this course. No prior knowledge of Machine Learning is required.

Meet Your Teacher

Teacher Profile Image

Engineering Tech

Big Data, Cloud and AI Solution Architec

Teacher

Hello, I'm Engineering.

See full profile

Level: Beginner

Class Ratings

Expectations Met?
    Exceeded!
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction: Welcome to this machine learning course using Python and scikit-learn designed for absolute beginners. Will start with the class scores on Python and various libraries. Then we'll dive into building machine learning models using scikit-learn. You will also understand how to create a risky paper, your machine learning model using Flask framework. This is a completely Hudson course. As a prerequisite unit, have some programming background and high school level mathematics knowledge. To get started with this course, no prior knowledge of machine learning is required. Will be explaining all the concepts step-by-step and teaching you how to build machine learning model from scratch. 2. What is Machine Learning?: Let's understand machine-learning. In machine learning, we read patterns from data using a machine learning algorithm and then create a model. Then we use that model to predict output for new data. For example, if a model is trained to predict customer behavior, you can feed in a new customer profile and it can predict whether the customer wrote BYOD not based on its age, salary, and other parameters. If a model is trained to classify an image, whether it's a cat or dog. The new feeded new image_id to predict whether it's a cat or dog. A sentiment analysis Modern can read text and predict whether the sentiment is positive or negative. So what exactly is a model? So model can be a class or object or it can be a mathematical formula. And how do you deploy and use the model? The model can be stored in the file system in binary format. It can be stored in a database column, in blog or other formats. How you can take the model and create a rest API and make it accessible to applications or what HTTP protocol. Or you can simply take the modal coordinate, the widget in another program. 3. Machine learning process: Let's take a closer look at the machine learning process and understand when our model is ready for deployment. In machine learning, the algorithm looks at the data, derives pattern, and creates a model. Let's start from data. Typically we received broad data and then we do the data preprocessing. Data preprocessing involves stapes, lake data cleansing data standardization, fixing issues with null values, missing records, unknown values, and various other things. During data preprocessing, we also convert categorical values, two numerical values. Because machine learning models can work with numerical. This step can be performed within the machine-learning boundary, or it can be performed by another team. For example, a team which specializes in big-data Spark, which is a very popular technology for data preprocessing. For many models, we also do feature scaling. That is bringing all the features to the same scale so that the model will not get biased or influenced by the particular feature. Once that is done or data is ready for machine learning algorithm. Depending on the problem we're trying to solve, we might repeat this process several times to get the perfect data. For our machine learning algorithm. We feed the data to an algorithm, a model. But is that the final model? Once we get a model, we test the accuracy. We refine to the model to get higher accuracy. If we go back to the data preprocessing step and generate the data again and feed it to the algorithm again and to get the model with the desired accuracy. Apart from accuracy, we also check whether the data is overfitting and underfitting. And once we are happy with the model, we deploy a particular Watson to production. So that is the final model and that gets used by different applications. 4. Types of Machine Learning: Let's understand different types of machine learning algorithms. We talked about customer profile, learning from customer behavior based on certain profile and applying that learning. Let's look at it in detail. So when we say customer profile, it could be AID salary countries, gender. Based on that, let's say we know whether a customer is purchased in the past or not. One starts with purchase, euro starts but not purchased. If we feed that information to a machine-learning algorithm, you could look at this past purchase data. It will look at these different features in their behavior in terms of purchase or not, then create a model. Here the output is always one or 01 means purchase, 0 means not purchase. So this type of machine learning is called classification. When we are predicting certain number of classes from the input data. Let's look at another example of classification. When we feed an image to a model and the model recognizes it is a cat or a dog. That is also classification. If we create a machine-learning algorithm with different images which belonged to three classes. It's a cat, dog, and cow. And if we create a model, that is also classification because our prediction is always limited set of values. There is another type of machine learning called regression, where instead of predicting a class, we predict certain values that could be a continuous value in terms of house price. You might have information about area, number of bedroom, and distance to the bus stop or city center. Based on that, if you have to create a model which will predict house price, that type of machine learning is called regression, where you predict a continuous value instead of predicting which class the output belongs to. Classification and regression are called supervised machine learning because Algorithm lands from the data. It lands from set of features and the behavior. Fitting information about house price for a set of features. Or you are fitting information about whether the customer is bought or not. The algorithm is learning from that. And then it is predicting output for new set of variables. This is supervised machine learning, where you tell the algorithm what to look for in a particular dataset. There is another type of machine learning called unsupervised machine learning, where you feed certain data to an algorithm, but you don't say what to look for. For example, you could feed a salary, country, gender, and how much the person is spinning. And ask the algorithm to group them in a way so that you can take certain decisions based on that. Typically nucleate clusters using unsupervised machine learning, you could create different clusters like young spenders are high-income, high spenders. And based on that, you can decide which customer group to target in your marketing campaign. This is unsupervised machine learning. In supervised machine learning, we split the data into training data and test data. Typically 70 to 80% data is kept for training the model, and remaining 20 to 30% is used for testing the model. 5. Creating an Anaconda Spyder development environment: We lose Anaconda spider for machine learning development. Search for download Anaconda and go to their website. Click on pricing. Scroll down, select the individual edition which is free. Click Learn More. Click Download and pick the right version for your operating system. Once downloaded, click on the installer. Except that Thomson condition just means fine selected directory. Make sure there are no spaces in the directory. I would recommend selecting both the tick boxes because you can make Python NEW dalda environment variables. Click on install. The installation takes about 20 to 30 minutes. Once completed, click Next. We don't need to select this to click Finish. Started for record, our spider had lunch. Spider will first create a working directory where we will store all the files. Directory under user engine. This will be my working directory. Go to the top right hand corner and select there directly. And that, that would be working directly. Now let's create a new button. We'll write helloworld. File is Python or lower selected and run it. You can bend it using recycled. And we can see helloworld dot console. 6. Python NumPy Pandas Matplotlib crash course: We'll be covering Python, Numpy, Pandas and matplotlib. In this lab. You are already familiar with these Python liabilities. Then you can skip this lecture and move to the next one. Let's create a new file using spider and start coding. In Python, you can declare variables without giving datatypes. And he put now populate a string value we may, Python will not complain. Can spider variable explorer, you can see all the variables and their value. Let's say three, be equal to five, then break into a plus b. Select this 31. So we can see that the output is getting printed in Python. You can perform all kinds of arithmetic operations. Python is a data type called list. And you declared that within square brackets. And then you specify a list of elements. And you can then grab elements specifying the index number. And index number starts with 0. We'll print this out. Then. Do, DO whatever index one, and so on. To grab the last element, you specify minus1. You can also specify three in this case, but minus1 would also give you the last element. That way when the list is very long, you can easily grab the last element by specifying minus1. And then if you do minus two, it will give you the second last element, that is 30. So this is how we can declare a list and grabbed different elements. And the list can have communist enough different data types. In Python, you can write a loop by giving a condition colon and hit enter. In Python, both single quotes and double quotes are fine. Space and indentation batters in Python. So if we write this like this, it'll give her loop ends when Sundance. Now if I write something here that is outside, if look outside the loop is getting printed, debate change the condition. It'll print both. There are many ways you can write a for loop in Python. So I can say for i in range ten. So this should print the value of i is starting from 0 to nine. So these are the ten values. You can also look through this list. My enlist, you can say for i in my underscore List, tie, it, printed all the elements of the list. And let's do another operation on the list, which is picking all the values from the first list, multiplying them by three, had been creating the new list. In Python, you declare function with the def keyword. Calculate some, let's say a, B. And we can get done this sum. And now we can call this passing two values and then we'll get the sum. You can also return multiple values. So we can see that both the variables are getting populated. So desire you can return multiple values from a Python function to create a file in Python, use with open and then write some content. You can see my file in the file explorer. It has sample content. Note that the mode is W here. That is what rating. You can add more content with an app campaign mode. Let's execute this and check out the file. You can see more contents getting art it. But you can also work with the W mode. Now you should see that new content, heavy things should get overwritten new content. So these are, we can create a file in Python. Let's now understand NumPy. Numpy is a popular Python liability for scientific computing. First we need to import numpy, will import numpy as np. And now we can do all NumPy operation using NB. Many of the popular machine learning libraries, scikit-learn, they're designed to work based on Numpy arrays. You can create a list. Let's declare a list. And we can create a one-dimensional array from the list. Let's take out this value, sample number one DRA. So this is a numpy array object, will now create a two-dimensional NumPy array. It has four rows and three columns. Should create an umpire two-dimensional lattice. You can easily reshape them pair is. So this is a for row three column array. We can reshape it to two rows and six columns. Note that when you reshape, the original Arabism get reshaped, you can store it in a new way. It has two rows and six columns. You can reshape provided the total number of elements match. You cannot have two files because it has two elements. If we reshape with, let's say one minus1, it would create one row and maximum number of columns. Similarly, you've got to reshape to one column and maximum number of gross possible. You can do that by specifying minus1 and one. You don't have to count how many rows or columns are there. We will have this as new edit three. So this is how we can reshape NumPy R is sometimes doing machine learning processing. You might have to extract rows and extract columns and do some operations to this reshaping would be very useful. You can grab a portion of the Numpy array. So this means give me first draw up to third row but not intuiting the third row. Second column up to fourth column, but not including the fourth column. Let's see what we get. So the original array doesn't get changed. We live to populate it to a new array and see the output news sample. We got rosy at index 12 and column at index two, because there is no column at index three. Pandas is a popular Python library for data analysis. You import pandas saying import pandas as pd, that is convincing. And pandas one-dimensional arrays known as cities. So this is very clear that cities, it's one-dimensional. Their advantage with parentheses, you can give your elements and name. For example, I can say 10203040, but I can give them a label. Let's check it out. You can see that the index ABCD, you can grab a limit specifying the number, index number ours perspective. If you do Sample Series two, you get 30. You can also grab it by saying sampled cities to see. That will also give the same value. You declare a DataFrame, which is a two-dimensional array using pd dot DataFrame function. And you can pass a two-dimensional list and you'll get a DataFrame. We can see the pandas DataFrame. And with Pandas, you can also give rows and columns and label. Should now we have row one, row four, column two, column three. And you can grab elements specifying row name, column name, or specifying the index number for each row and column. So column three years, 36912, which is this. And you can grab multiple columns by specifying both the columns. To grab rows, you're pre-specify a Lucy location and give Roden m, you will get the row to grab a portion of the DataFrame. You can specify a boat row and column names and get that person. So we're getting column two, column three, row two, row three from the sample data frame two. You can also specify index location in instead of liberals to get a portion of a DataFrame. This is rho 0 all the way up to row two naught including row two, column one, up to column three, not including column three. If you don't specify anything, you get all the rows and all the columns. And if you are up to the last column, you say black minus1. So you've got 14710 to 581. That is column one, column two, and all the rows. And we said grab all the columns up to the last column. So this is how we can grab all the columns and all the rows, but excluding the last column. And a subset of a DataFrame is a dataframe if it is two diamonds smell. If you are grabbing one row or one column, it could be a series. In Python, you can use tape to check paper of any variable. You can easily convert it to Pandas by invoking dot values when he machine-learning liabilities are designed to our Putnam PRA. So do the conversion using dark values. This is now a NumPy array. You see two opening and closing brackets. So it's a two-dimensional NumPy array. You can store this in a new NumPy array. This is now a numpy array. We grabbed a portion of the DataFrame and converted it to a NumPy data with dark values. This would convert the last column to a NumPy data. Let's look at an example of filter operations on DataFrames. So we are saying here, get me those samples where column one values are greater than for. Wherever it is greater than 48 gives you true. Otherwise it gave you false. Sample DataFrame. You apply that condition on the mid DataFrame. With Pandas, you can easily read CSV files are indeed get-up file. And as is the read_csv, let's read a sample.csv file from our repository. We would say store data dot CSV. Pandas would load the CSV file to a DataFrame. And if we check our DF Now, there, that's pursuit has been loaded into a DataFrame. We can check the file also. So these are huge in part does you can easily load all the rows and columns to a DataFrame. With df.describe, you can get video statistical element about the DataFrame. Like how many rows, what is the mean and standard deviation? You can get additional info with dF dot info. What data type and what are the columns? Df.loc head would give you the first five rows. You can take sample of a DataFrame by doing head. And you can also specify how many rows you weren't in the head. So this get-up premise three columns. We can grab the first two columns and convert that Vietnam. Now let's go to variable explorer and check x. So it is the first two columns because we excluded the last column and it has been converted to an umpire. To convert the last column, you simply grab the last column. You do not have to specify the range. And the last column will get converted to an umpire. It's a one-dimensional area. Finally, let's look at the matplotlib library. Using matplotlib, you can visualize the data by drawing different bloods. Spider is applauds tab where the plots will get created. You import matplotlib like this. Now let's declare two lists. And we'll plot x and y. We got inline-block by default, we get a line plot. When we plot to get a scatter plot, you say plt.plot scatter. And you will get a scatter plot. You can give labels to your blood and also a title sample plot, x and y-axis. Let's create a plot for our data we read from the CSV file. We'll create a new plot. And X6 is we leverage the y-axis will have salary, and we'll grab columns and pass it on to the plot function to get the block. So you can see plot for our data which will aid from the CSV file. This is an example of a histogram. So this is about Numpy, Pandas, Matplotlib and some basic Python. This is not everything that is out there in those liabilities. However, this much knowledge is sufficient for you to get started with machine learning programming using Python. 7. Creating a Classification Model using KNN algorithm: We have the store purchase data. We have data for different customers. There is in their salary and whether they purchase or not. Based on this data, we'll build a machine learning classification model, which will predict whether a new customer with a certain age and salary would buy or not. So in this is in salary or independent variables. We'll build a machine learning classification model using kNN, which will get rid with distort parties data. Let's understand k-nearest neighbor or k-NN machine-learning algorithm through a very simple example. Imagine we have cats and dogs shown in this diagram. On the x-axis we have weight and on the y-axis we have height. All the green ones are cats because obviously they would have less weight and laicite and all the blue ones are dogs. And if we know the height and weight of a new animal, let's say this new one in the center. Can we predict whether it's a cat or dog? Knn algorithm? Besides on that, based on the characteristics of the nearest neighbors. Little k value is five. We look at the five nearest neighbors can, based on that, we decide which class the animal could be lumped two. For example, in this case, there are three greens and two blues. That means there are three cats and dogs who have similar characteristics as the new animal. So this AnyVal is more likely to be a cat because the majority of the animals belong to the cat class in the nearest neighbourhood. So this is k nearest neighbor technique where outcome is predicted based on characteristics shown by the nearest neighbors. And the kava Louis typically five. Let's apply this technique on the store purchase data. We have the data in the project folder. We can spidery up to select your project folder here. And then we can go to files and see all the source code and files. So this is the stored purchase data we have using which will build a machine learning classification model. Let's create a new Python file. Will nematodes ML Pipeline. We'll import the standard libraries. We're assuming you are familiar with NumPy and pandas, which is a prerequisite for this course. In spider, as soon as you type you get all the errors or warnings. It saying we are not using Numpy pandas, that's fine. We'll be writing the code for the same shortly. Now let's load the store purchase data to a Pandas DataFrame. We live it training data, dataframe, which will store the store purchase data. Note that will not be cleaning with the entire data. We'll have some records for training and for testing, which we'll see next. But training data pandas DataFrame would store the entire CSV file data. You can run the entire file by selecting the cycle, or you can run the selection. Let's run the selection. You can go to variable explorer, click on cleaning data and we can see that is salary purchases have been loaded to the training data dataframe. Let's get some statistical info, boat cleaning data. We can see various statistical information about the data. How many records? We have, 40 records. We can see the mean, standard deviation and some other statistics about the data will store the independent variables in an IRA. Will take rose up to the last column and stored them in a dependent variable X, which is a NumPy array. Let's do that. So this should populate agents salaried. Next. Let's go to variable explorer and checkout. We can see that agent salary have now populated in NumPy array will populate the purchase column, which is the prediction to and at the Numpy array away. So this should populate the last column and store it in way too. This is our y, which is the dependent variable or the one we're trying to predict. We have aids in salary and x NumPy array. And we have y, which is the purchase data. For not purchased. One is where parties. So that is stored in a Numpy array. Now we have the independent variables and dependent variables in two separate Numpy arrays. Next, using scikit-learn will separate the data into training set and test set. And we'll huge 80-20 ratio, 80% of the data for training and 20% for testing. Scikit-learn is a very popular library for machine learning using Python. Scikit-learn comes pre-installed with Anaconda spider. If I'm using a different Python environment, you might have to install scikit-learn using pip install SKLearn style is the command to install any Python libraries. Anaconda spider comes with scikit-learn, numpy, pandas, and many other libraries that are required for scientific competition and machine learning. We're using scikit-learn, train, test split class to split the dataset into two parts. Now once we do this, we let the training set and the test set. The training set will have 32 records. We said 80% data will be used for training. So we've totaled 40 records of which 32 will be used for cleaning. So this is extreme. And weight train 32 records for trading. And x-test has heat records. Similarly weight this will have eight records. This is the data for testing the model. Next, we'll feature skill that data. So that is it, salary are in the same bridge and the machine learning model could not get influenced by salary, which is in a higher range. Let's run this. Now we can see the scale data. Standard scaler distributes the data in a way so that the mean is 0 and standard deviation is one. Now both the A's and salary or in the same bridge. Next, we'll build a classification model using the K nearest neighbor technique. Will have five neighbors. We lose the Minkowski metrics. To build this classifier. Minkowski metrics works based on the Euclidian distance between two points. Euclidean distance is nothing but the shortest distance between two points. That's how it decides which neighbors are the nearest. Next will fit the training data to the classifier to clean it. So this is where the model is getting drained. This is the classifier object which is been trained with certain cleaning data, which is, is it salary is the input variable, head purchases the output variable. The classifier is our model. Will quickly check the accuracy of the classifier by trying to predict. For the test data. Classifier has a predict method which takes a Numpy arrays input and returns as output in another number. So this is our x and this is the weight. And let's see what is the prediction. Wavelet six set for one record. The model predicted accurately. For all of the records. We can also check the probability of prediction for all the test data. Here we can see that wherever we have more than 0.5 probability, the model is predicting that the customer owed by the customer would not buy. Mobility is helpful when he loved to sort data from the prediction and the customers were more likely to purchase. The history. The third one is more likely to purchase because the probabilities 0.8 or 80% will check the accuracy of the model using Confusion Matrix. Confusion Matrix is a statistical technique to predict it courtesy of a classification model. The way it works is pretty simple. If the actual value is one and the model predicted one PRINCE2 project. We lose 10, it's false-negative. Similarly, 00 is true negative and 01 is false positive. It can also be represented in this format. So once we know all four types, we can easily determine the accuracy. So they couldn't see is true positive plus true negative divided way. All four types of predictions. No matter which classification technique you are using, kNN or any other Confusion Matrix can be used to calculate the accuracy of the model. Cyclic learn and other machine learning libraries. The built-in classes to Jen bit confusion matrix permit Julian predicted data. Let's create the confusion metrics will pass the actual value of the test set, that is weight test and the predicted values, that is white bread. And get the confusion metrics from the cyclic land confusion matrix class. Go to spider variable explorer. And we can see the confusion matrix over here. We have three true negatives. For true positives. Only one false negative and false positive. So this model is very good it, because we have only one false positive or negative from eight records. Let's calculate the accuracy of the model. And we'll print the quiescent 0.875. So our model is 87.5% occurred. So this model can predict whether a customer with a particular agent salary, goodbye or not with 87% accuracy. You can also get the intact classification report to understand more about precision recall and F1 score. So we've taken this toward purchase data and created a classifier which can predict whether somebody would by R naught. That model or classifier can be used to predict whether a customer with a particular agents salary would BYOD naught. So let's try to predict whether a customer with H porter Sal day to day 1000 good biochar. Note that this model takes a NumPy array and returns an compare Europe to create a Numpy array from agents salary, feature skill that data, and then feed it to the classifier. Because the classifier is trained on feature skill data should have been shirt to data you are fitting is also feature scaled. Same technique, which is standard scaler In our case. And the prediction is 0, the customer or not by somebody with age 40. And cell D2, D3 budget would not buy is. But this model, we can check the probability of the prediction for the same data. Classified as a predictor parameters using which you can get the probability. So the probability is 0.2 or 20%. That's why the model set to customer would not buy. Let's try to predict for a customer who is age 42 and salary 50 thousand. This time the model set the customer or buyer. Let's check out the probability. It's 0.880%. So there are 80% chances of the customer buying a machine learning model, greedy. It's a classification model. It can predict whether a customer with a certain agent cell D would by R naught. So this is the classifier we have, which is the model, and we are fitting data to this model to get output. Next will see various model deployment techniques. How we can save this model and deploy this model in other environments, including some of the cloud provider environments. 8. Saving the Model and the Scaler: We have built a kNN classification model, which can take is in salary as input parameters and predict whether a particular customer with that agents salary would by R naught. Let us now understand how to save the model we have created. To recap the model tending process, we read 40 records from the dataset and identified 32, that is 80%. For training. Those are represented here. And then we use standard scaler to scale the values so that the mean becomes 0 and standard deviation becomes one for both agents salary. For many models, killing is required. Otherwise the model might get influenced by values which are in the higher range salary in our case. And you can use standard scaler or any other scaling mechanism. Once the data is scaled, we feed that to the model in a two-dimensional NumPy array format. And we get an output which is also a numpy array with one column. Internally, the model applies kNN technique. It looks at the output for each record and tries to optimize the formula so that the overall liquidus you would go up. There are various ways we can save the model. For some we can extract the formula. And in some cases we'll have to save the modelling binary format so that we can restore it and then use that model to predict output for new set of data. We'll see that in action shortly. If anybody wants to predict with The Model, two things. Don't need the classifier model. And they would also need the standard scalar if they use some other technique to feature skill the data, that the model might not give a correct result because we have used a particular standards killer. We would also export it along with the model. With the classifier model and the standard scaler, do the prediction in any Python environment. Let's see how we can save and export these objects to other environments. Python is a technique called pickling, using which you can store Python objects in serialized or byte stream format. In another Python environment, you can be serialized these objects and use them in your code. So let's understand how we can pickle the model and standard scaler were built in the previous lab unit, we import the picking liabilities file, kNN model.predict are willing limited classifier dot pickle. If we do not want to tell which technique we use to create this model, we can simply name it as classified or quicker. And using pickled dot-dot-dot method, we can store the classifier object which we created earlier in print to this classified or pickle file. Similarly, we can clear the pickle file for this killer. Will store the standard scaler In a CDART pickle file. Here, wB means the file is opened for writing and in binary mode. Let's execute this code. And we can go to File Explorer. And does see that classified or pickle and ACWP kilobit created. You can also verify the same in the Explorer. So these two are binary or serialized files for our classifier and standard scalar objects. In this lab, we have seen how to save the borderland standard scalar in binary format using Python pick celebrity. Next we'll see how to use the pickled files in another Python environment. 9. Restoring the Model from Pickle file and using it locally: Till now we have seen how to create a model and store it in the pickled format. We have also stored the standard scalar objects in binary format using picker liability. Next, we'll see how to DC relays and use this pickle objects in another Python environment. It could be on-premise or it could be on cloud. Will first try to use the pickle files to the local environment. Let's create a new Python file. We'll call it use model.predict. We first need to import the libraries. We also need to import NumPy. Next we'll DC relays and store the classifier in a local object in the new program will use the pickled dot Lord method to load the classifier that vehicle using read binary format. Similarly will read the scalar to a new object. St.petersburg will be loaded to local scalar objects. Next, we'll use the local classifier and the local scalar to predict whether a customer with age 40 and suddenly 20 thousand goodbye or not. Before running it lets clear all the old variables. You can click here and removal old variables. You can also clear the console by right-clicking and doing clear console here. Now let's run this program. Now we can see that new prediction and which is 0, which is matching with the previous prediction. Let's take the new probability. This is again 0.2 for the customer with age 40 and suddenly 20 thousand and delays the classifier object and the local scalar object. Then we have tried to predict whether a customer or buyer not using this D's related objects in a new Python program. So this program doesn't know anything about how the model was built or trade. It picked up the modelling scalar from the pickle files and use them to predict. We can also try to predict for each 42 and salary 50 thousand. Earlier we got 80% probability. We should see the same output here, 0.8, and prediction is one. Customer buy. So you've seen how use Pickle files in another Python Program, which doesn't know anything about how the model was built and how the model was trained. We tried this in a local environment. Next, we'll try it in a cloud environment. 10. Exporting the model to the Google Colab Environment: Next we'll take the pickled files to the Google collab environment and try to predict their. Google collab is like a Jupiter environment with some visual customization. And it has lot of pre-built libraries for machine learning and deep learning. You can just login using your jimmy lady or Google lady and then create a new notebook and start coding. Let's create a new notebook I've already logged in. Will give this file a name. We can go to tool setting and change the theme to dark or adaptive. Let's send it to dark. Colombia is like a Jupiter notebook environment. You can simply type code NDA, hit Shift Enter. You'll see the output. Or you can click on the Run icon here and run the program. And you can right-click Delete sin or you can simply click here and delete sale. In Kuulab will find most of the machine learning and deep learning libraries pre-installed. If something is not installed, you can do pip install here and install it. Wallabies like Linux environment. You can do exclamation mark Ellis and see all the files that are present here. Currently, there is nothing that is a sample data folder within your Columbian moment. And all the files get saved to the Google Drive. Will transfer this to pick your files to the Colombian moment. We'll go to our GitHub repository. And we've already uploaded the pickle files to this repository on GitHub, futurist skilled ML model deployment. Select the classified or typical. Greatly can download and copy the link address, go to the Colombian Robert and do a Linux W get. And the path makes sure the file path is row. Get the file, do ls to see if the file has been copied or not. Next, let's get the standard scaler. Click on a CDO pickle, right? T can download, copy link address, not do a W GET and get the standard scaler pickle file. Now we can see both the pickled files are available in the Colombian moment. We've uploaded the morals to the Colombian moment. Here in this notebook. We don't know how the models were built are trained, but we can use these models to do similar prediction as you have done earlier. Create a classifier object. We'll call it classifier collapse. Create a scalar object. And we'll use that classifier and scholar to predict. Simply type the variable name and hit Enter. We'll see the output. So the prediction is 0. It is same as what we got earlier for a customer with age 40 and suddenly 20 thousand will get go probability also. You can print it the same cell also. The last land gets printed. So we're seeing 20% probability of somebody with age 40 and solid 20 thousand buying the product will do the same for age 42 and san-serif 50 thousand. Prediction is one. Probability is 0.6 because we did not put the right edge. Let's run it again. This time we're getting 80. This is how we can train models in one environment and take them to a completely new environment and run them dead. You are giving the model to another team or third-party. They did not know how you built enter into your model all the noise. It's a classifier, it takes value in certain format. And Gibbs doubt. 11. Understanding Flask web webframework: Next we'll understand how to expose the machine learning model with rest API restarts per representative State Transport. Rest is a popular way of extensive data in the real world. You can build an application using Java, Scala, or any other technology, and you can expose it with a rest interface to the outside world. And indeed, client wants to use your application or access the data, they can do so using wrist data is typically extends in XML or JSON format over HTTP protocol. Flask is a popular framework to build Rest API for a Python application. Let's first look at a helloworld flask rest API application. Then we'll dive into exposing our machine learning model to the rest API. In spider created new Python file. We'll call it flask helloworld. To build a flask Christy EPA import Flask and the associated request object from the flask library. You can go to Flask documentation to learn more about how to create a Flask application. For now, just follow this syntax and with a very few lines of code, you can build it as TPA. We'll declare an Endpoint class model. And who will receive post request in this application. Using post, you can send some data to the rest API and receive a response. If you use Git, you can only receive a response. Let's have a hello world function. In this example, we will send the data in JSON format and receive it in JSON format. Here, whatever data we are receiving the request in JSON format, we're storing it in request underscore data will pass the model name in the request which will retrieve and displayed to the user. Anybody could pose the model name invoking this last modelling point displays simple string you are requesting for a with Python string interpolation. We're displaying that model Nim. Now let's add a main method. Will specify the port number so that when the app is started, it'll run it that particular port. Let's launch the application in the local environment. If anybody wants to use it, will invoke it with this class model you are. Now to run. It will go to the command prompt and start the program. Let's look at the command prompt premier. Now let's start. Though. Helloworld program, darpa is now started. We have created a simple rest API which is running at port 8 thousand. Let's now see how to push data to this app and receive a response. We'll create a new Python file. We'll call it restaurant a scope plant dot. Since you'll be sending the data in JSON format, let's import JSON First. We also need to import the request library. Request is the HTTP library. And you could just hovered over it and read more about this. Using requests. You can send HTTP request. Now let's have a variable for the URL. In the server name. We can add localhost, or we can put the IP address that was displayed in the console, 1270018 thousand, which is pointing to the local host, will have very simple request data in JSON format with one key and one value. And we're passing kNN is the modal Limb. Now we'll send a post request, possibly URN enter data in JSON format. And from the response object, we can extract the text and print it out. Now let's run it and see the output power. Now we can see the output you are requesting for a KNN model, which is coming from grace TPA. 12. Creating a REST API for the Classification Model: Next we'll create a list EPA for the machine learning model so that anybody can invoke the risk EPA and do prediction. Let's create a new Python file. We'll call it classifier rest service dot pi. Let's copy the code from the HelloWorld Python application. And we'll import pickle, import numpy will lord the pickle files. We'll use the local classifiers to predict the data. For any hedge. And salary will retrieve the agent salary from the request will first represent h, then the salary. We are now passing gays in salary edge variables to the classifier to predict. And whatever prediction we have, we'll return it. The prediction is and pass the prediction variable at different time. Now let's run this application. We'll say Python classifier, rr wizard. By now it is running at port 8 thousand. Let's clear the Machine Learning class. We'll call it a meld rest client. Let's copy the code from here. And instead of having mortal kNN, now, two parameters we leverage, which is a numeric value, let's say 40. And we love salary, 20 thousand. We are passing two variables now. And with these two variables, we are going to call the classifier predict method to get the prediction where there is going to be 0 or one. And based on that prediction, back to though client now, let's run it. We'll run it at a different port. Lets clear the console and are there to print statement for agent salary so that we can know what is insanity or being passed. Let's run it and see if everything is fine. It compiled fine. Let's now run it from the command prompt. It's running at port 8 thousand to now. And we'll go to the mail client and call it with age for the Sangre 20 thousand. The prediction is 0. If we call it with age 42 and salary, 50 thousand recorded, the prediction is one. Instead of two final prediction, we can also determine the probability or risk TPA. We can see the prediction is 0.8. And if we change it to 4020 thousand, we should get 0.2. We have seen how to create a rest API using which are the clients can access the machine learning model and get the prediction. And these clients there might be running in Python, Java, or any other language. They can send data over HTTP and receive a response to what is GDP. So when you make a rest call will not only about how the application is written. This is how we can expose your Python machine learning model to other applications which are written using Python. 13. Linear Regression: So let's understand linear regression through a simple example. Unlike classification where we predict the class of the output. Here we predict continuous values. For example, if this chart shows what is the car price for a certain number of cylinders, then given a number of cylinders, can we predict the car price? This type of prediction is called Regulation. Now given this data points, how do we determine the car price of a new car for a certain number of cylinders? Using linear regression, we can easily solve this problem. Linear regression is nothing but trying to find the line that best fits these points. And how do we determine this line? It's calculated based on a formula called Y equals a plus bx, where a is the intercept and b is the coefficient of the line. Now for it need new point, if we know the x value, then we can easily determine the y value by using this formula. Scikit-learn and other machine learning libraries, they provide you a class using which you can feed different data points and get this aggression or the predictor. How does the model determines the best-fitting line? And how do we know the accuracy of the prediction? So that is done by a simple concept called r-squared, which is also known as coefficient of determination. What this means is how good is the line compared to the line which is represented by the mean value of all the points. For example, if this is the mean value of all the data points, we can also predict using this mean value. But if we are coming up with a new line width linear regression, we need to see how good is that lane compared to this line. Now to calculate the R-square value concept is simple. You calculate what is the error for each of the points. That means how far is the line off from the actual value? For any point? If this is the actual value, the pointed with the vertical red line intercepts the predictor is the predicted value. The distance in red represents the loss or the error in prediction. You calculate loss for each point. Do a square of that, and add it up, you get the sum of residuals that is shown in the numerator here. Similarly, you calculate how far is the mean line from the actual value that is represented in green here. So that is sum of squares up totals lower the error lower is the value of sum of square of residuals. So the numerator will tend to 0. When the model becomes more accurate. That means R square value would be closer to one for a higher accuracy modelling. So higher the window part square better is the accuracy. And R-squared can never Maxwell loop one. R-square is also known as coefficient of determination. You may or may not remember the exact formula of R-squared. But for any model, you will find a method to get the R-square value. Hollywood to check is whether it is close to one or not. If the value is close to one, then you know that your model is very accurate. Let's apply this concept and solve a use case. Then we'll see how to extract formulas and then use the formula to predict output for new set of values. We have a new dataset called house prays dot csv. So it is two fields, distance and price. So distance represents what is the distance of the house from the city center and, and represents what is the house price. So as you can see, are higher the distance lower is the price. Now, how do we calculate how sprays of a new house, which is at a particular distance from the city center. We need to build a machine learning model using the linear regression technique, which you learn from this data and create a model using which we can predict house plays for new set of data. Let's import the standard libraries. This time we'll also import matplotlib so that we can plot the house price and distance. Next, let's load the dataset to a Pandas DataFrame. So as you can see, the advertisement loaded to the pandas DataFrame. Let's describe it to get some statistical info. We can see there are 40 records and the mean, standard deviation and other values. Let's separate out the independent and dependent variables. X will have the distance to the city center and why you left the house price. At this point. We can also plot the house price and distance to see how it looks on a chart. We can see that there's a linear relationship. As the distance increases, the house prices going down. And that is in a linear fashion. Now using linear regression will have to find a line that best represents these points. And using that will predict output for new data points. We'll comment it out for now. Let's run it again. Now using scikit-learn train test split will create the training data and test data using 32 records for training and eight records for testing. Scikit-learn provides in linear regression class using which we can create a regression object that will be our model. So this the aggression is the line or the model which has been trained on the training data. From regression, we can easily calculate the R-square value. There is a score method which gives us the R-square. Will print the R square value is 0.807. From the regressor, we can easily determine the intercept coefficient for our intercept is 610710. Let's now get the coefficient. Coefficient is minus 72635 because our house prices going down as the distance increases. So that's where we are a single negative coefficient. Now, anybody who wants to use our model can take this intercept and coefficient and get the house price. We do not need to send them the regressor class in binary format or export that model. All we need to share is the formula. So our formula becomes Y equal intercept plus coefficient multiplied by x. So it is the sequence 610710 minus 72635 multiplied by the distance will first predict using the predictor method, will feed the training data to the regression and get the prediction. So this is the predicted house price. Let's compare it with the jailhouse place. We can see that for some cases it is very close. In some cases it is little bit off from the actual price. These are the actual prices, these are the predicted values. We can also plot the predicted value and the actual value. Created a scattered plot for the actual values per predicted value period L line plot. So this line represents our degree or our predictor. Now for any new point, we can easily determine the house price given the distance to the citizens. Let's now predict the house price for a house which is a 2.5 mile distance from the city center. The value is coming around 1449100 to 0. We can also get the same output using the formula y equals intercept plus coefficient multiplied by the X value. So we got four to 91 to 0. Now to share this model with anyone, we can share the formula. We can also create pickle files and create rest APIs, but this is one of the option that is available to export linear regression models.