## Build Your Own Machine Learning AI Spam Classifier Within 30 mInutes

#### Rakesh Chinta, CEO NAG CORP, Harvard University, Google

Play Speed
• 0.5x
• 1x (Normal)
• 1.25x
• 1.5x
• 2x
10 Videos (41m)
• Step 1. to 2- deconstructing our spam test list

7:03

1:38
• Introduction to naive bayes theorem

2:30
• Importing the data set

1:16
• Creating the app.py

1:00
• Processing the data

1:18
• Step 2 explaination

5:54
• 4-4.2 theory of algorithms bayes theorem

4:15
• Applying count vectorizer and splitting training and testing data

7:49
• Accuracy, precision, recall F1 score

8:03

Machine learning is everywhere. From self driving cars to face recognition on Facebook, it is machine learning behind the scenes that drives all of it. If you’ve ever used GMail or Yahoo Mail, you must have seen a folder named “Spam” where all unwanted mail goes in. Have you ever wondered how that works? That’s machine learning at work, too!

The basics of machine learning

The word “machine learning” has a certain aura around it. Journalists and entrepreneurs talk about it as if something out of the world happened. In reality though, it is much simpler.

Machine learning is a field of computer science where computers can learn to do something, without the need to explicitly program them for the task. First, the algorithm is made to look at a certain set of data, in order to train it for the task. Then, we give the algorithm data it has never seen before, and perform the task on this data.

Thus, a machine learning algorithm can be thought to have two phases: “training” and “prediction”. For each of these phases, we use various mathematical methods.

There are a wide variety of machine learning algorithms. Depending upon how these algorithms “learn”, they can be categorized as:

• In supervised learning, the algorithm is provided with data, along with the correct answer for it. So, if we were to develop an algorithm to predict house prices, and you gave the size of the land and the price to the algorithm, it would fall into this category.
• In unsupervised learning, the algorithm is provided with data, but the answers are not provided to it. It is upon the algorithm to find structure in the data, and figure out things from there. They are commonly used in places such as market segment analysis. We don’t know what kind of market segments are there for your product — and the algorithm must figure it out.

Again, based on the type of output that a machine learning algorithm produces, we can categorize them into two types:

• Classification: These algorithms produce outputs that categorize the data. For example, an algorithm which takes in medical information about a patient and produces a diagnosis that may be only one of “no cancer”, “lung cancer” or “colon cancer” would be of this type.
• Regression: In regression, the output types are continuous valued. For example, consider the previous example of predicting house prices. The predicted price would depend on the size of the land. Unlike regression, we don’t have outputs that nicely categorize the data.

As you’ll see later in this article, we’ll train our filter using a collection of spam and non-spam(aka “ham”) emails. So, we’ll provide right answers to train the filter, and later in the prediction phase, its output for a given message would be either “spam” or “ham”. So, this filter is an example of a supervised classification algorithm.

4

Students

--

Projects

0

Reviews (0)

#### Rakesh Chinta

CEO NAG CORP, Harvard University, Google

Rakesh Naga Chinta is an Entrepreneur, SDE Intern at Google, Strategic Business Analyst, Author of several best-selling books.

A Harvard Alumni, with a burning passion for problem-solving and Entrepreneurship.

Previously worked as Software Engineering GSOC intern at google, Now is running several startups and ventures: where his skills are tested and sharpened every single day.

CEO and...

See full profile