Machine Learning for Absolute Beginners - Level 1 | Idan Gabrieli | Skillshare

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Machine Learning for Absolute Beginners - Level 1

teacher avatar Idan Gabrieli, Pre-sales Manager | Cloud and AI Expert

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

24 Lessons (2h 12m)
    • 1. ML Level 1 Promo v2

      2:39
    • 2. Welcome!

      6:38
    • 3. The Rise of Artificial Intelligence

      4:22
    • 4. Artificial Intelligence

      6:14
    • 5. Classical Programming

      3:12
    • 6. Machine Learning

      7:13
    • 7. Deep Learning

      7:51
    • 8. Applied vs. Generalized AI

      4:19
    • 9. Why Now?

      9:21
    • 10. Introduction to Machine Learning

      1:20
    • 11. “Black Box” Metaphor

      3:04
    • 12. Features and Labels

      5:09
    • 13. Training a Model

      4:49
    • 14. Aiming for Generalization

      11:23
    • 15. Classification of ML Systems

      1:44
    • 16. #1 - Supervised Learning

      6:16
    • 17. Classification

      5:23
    • 18. Regression

      7:13
    • 19. #2 - Unsupervised Learning

      3:40
    • 20. Clustering

      5:09
    • 21. Dimension Reduction

      5:48
    • 22. #3 - Reinforcement Learning

      5:57
    • 23. Decision Making Agent

      7:04
    • 24. Quick Recap and Thank You!

      6:40
  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

114

Students

--

Projects

About This Class

074ed502

Machine Learning

The concept of Artificial Intelligence is used in sci-fiction movies to describe a virtual entity that crossed some critical threshold point and developed self-awareness. And like any good Hollywood movie, this entity will turn against humankind. OMG! It’s a great concept to fuel our basic survival fear; otherwise, no one will buy a ticket to the next Terminator movie ;-)

As you may guess, things, in reality, are entirely different. Artificial Intelligence is one of the biggest revolutions in the software industry. It is a mind-shift on how to develop software applications. Instead of using hard-coded rules for performing something, we let the machines learn things from data, decipher the complex patterns automatically, and then use it for multiple use cases.

AI-Powered Applications

There are growing amounts of AI-powered applications in a variety of practical use cases. Web sites are using AI to better recommend visitors about products and services. The ability to recognize objects in real-time video streams is driven by machine learning. It is a game-changing technology, and the game just started.

Simplifying Things

The concept of AI and ML can be a little bit intimidating for beginners, and specifically for people without a substantial background in complex math and programming. This training is a soft starting point to walk you through the fundamental theoretical concepts.

We are going to open the mysterious AI/ML black-box, and take a look inside, get more familiar with the terms being used in the industry. It is going to a super interesting story. It is important to mention that there are no specific prerequisites for starting this training, and it is designed for absolute beginners.

Would you like to join the upcoming Machine Learning revolution?

Meet Your Teacher

Teacher Profile Image

Idan Gabrieli

Pre-sales Manager | Cloud and AI Expert

Teacher

Class Ratings

Expectations Met?
  • Exceeded!
    0%
  • Yes
    0%
  • Somewhat
    0%
  • Not really
    0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. ML Level 1 Promo v2: hi and welcome to this training program about machine learning. My name is he done, and I will be a teacher. Machine learning and the umbrella terms artificial intelligence are exciting and engaging topics gaining tremendous momentum every well. It's a mind shift on how to develop applications. Instead of using hard coded rules for performing something, we let the machine let things from data, decipher the complex Parton automatically and gain new knowledge. Companies are looking for ways to utilize those technologies in practical use. Case as features in products, physical products like our mobile phone on the village, your product like a communication system in a website. It's a game changing technology, and the game just started. The market demand for skip people is growing, and as a result, the data science community is becoming the hottest place in the high tech industry. Still, machine learning is a complex topic divided into many sub topics, and we can easily get lost while trying to figure out where to start and what kind of skills we should develop. This training program provides a comprehensive yet state forward and sequential landing pad for beginners. You can follow the complete landing pad step by step, All decide what levels are relevant for you. Level one is a starting point for laying out the heretical foundation off a i N m l will start by learning the connection between a I am L and deep learning. Then we'll zoom on the temps in machine learning what is supervised and unsupervised Learning reinforcement, learning, training data set testing data set a trained model optimization algorithms, machine learning pipelines and more. Level one is designed for anyone. Anyone that would like to understand the big picture sign up today and stop a little exciting concept of machine learning data science. Thanks for watching and I hope to see inside. 2. Welcome!: hi and welcome to the first level in this training program about artificial intelligence and machine learning. My name is he done, and I will be a teacher. You probably know that for the last couple of years, machine learning has become a core important technology for many products. Those AML algorithms are everywhere. There are countless going amounts off practical application, off machine learning already applied in many places. The big players in the market, you know, like Google, Amazon, Facebook, Microsoft and many others are investing a huge amount off funding on ongoing AI research initiatives while trying to lead the market with new practical use cases. Many respectable high educational organizations have some flavor off AI educational and research programs. There are hundreds off new startup companies that their co product or service is based on a I. As the market the first viator. Most of the large enterprise companies like, for example, Delic, um operate owes are building dedicated AI teams toe better, utilize the going amount off collected data, a spot off the digital strategy off many a large organizations. AI is becoming a top priority direction. Just check by yourself. How many a data scientist and she learning engineers, AI researchers, software engineers, product managers. Project managers are needed right now in this area. The market demand for skilled people is going rapidly correlated off course to the amount off money being invested today by the industry. I don't know if you heard about it before, but some countries like China, U S, Russia and others already developed a dedicated strategy for a I. Those countries identified a I as a major technology in the upcoming future. In 2017 Russian President Vladimir Putin said something like that. Artificial intelligence is the future not only for Russia but for all humankind. It comes with colossal opportunities but also threats that are difficult to predict. Whoever becomes the leader in this fell will become the ruler off the world. Sound a little bit scary, but I just wanted toe emphasize that unlike other technologies, AI is perceived differently by the business industry as well as by governments. There will be a national level race on a I between countries. Which country will lead that technology 10 15 years from now, leveraging their economical and maybe also the military potential off a I This introduction it just to let you know that AI is coming big time and I think each one off us, you'll have some level off understanding about this important topic. I would like to share with you some insights I gained while diving into the subject off a I when I started my personal journey. While exploring the subject, I was quickly overwhelmed by the amount off knowledge and skills that I needed to understand some of the basic fundamental things about a I and ML. It is easy to get lost, many educational, a sources that are available on the Internet or while using books. I'll quickly jumping into complex math and statistical calculation that will get use the motivation to learn the subject. And some people will probably say this is for PhD graduate students. It's too complex to understand. On the other side of the spectrum, some off those educational sources are focusing on practical implementations, meaning example, using some programming language like pie tone or any other development for him walk. My assumption is that a large portion off students that are going to join this training would like to understand the concept off a I without really going into the practical side and without trying toe read some heavy statistical e books. Another challenge is the number off buzzwords we will hear about AI. For me, it was a challenge to understand how things are connected to each other so I can better understand the big picture and the relation between sub topics and before diving into a specific topic is an outsider or beginner trying to understand the evolving A. I would. It may look like magic tricks performed by First Computer, programmed by really smart people. We enter some data into some machine learning black box, and somehow we can get amazing output on the other side. What's going inside that box is a mystery. So my main objective in this level level one, is to make the AI subject more accessible. Easier to digest in this training. We're going to open that black books we will look inside and understand in high level how this box was created it what are the building blocks inside watch will be our expectation. While using such books and more things, that's it. For now. I think we're ready to start 3. The Rise of Artificial Intelligence: Hi and welcome back. We are ready to start our training. Hey, let me ask you something. What is the first thing that comes into your mind when I say ai artificial intelligence? It is an interesting exercise. Think about it for a second. Is that thought located more on the positive or negative spectrum when talking about a I I'm sure you have seen. Or at least head about the TV show game off drones. Well, some of the famous phrase is saying winter is coming as a warning message that the heart called winter season is going toe heat their lands and make their life. How there if we on a log this phrase to artificial intelligence, saying a eyes coming, some people will immediately picture in their minds. They're famous Terminator movie Serious. Where the skyline its meaning, the artificial intelligence entity created by humans. He's taking over the world while developing self wellness and super smiled. Killer ho boats are coming in all directions. Okay, it's funny and scary at the same time. Those people are really considering a I Toby, a future danger to all humankind's. In other cases, Ai is a technology is perceived is a real risk to take a over many human jobs and increase the overall worlds unemployment rate because automation is a co use case off A. I think about an AI doctor that is performing basic medical it diagnostics replacing in some cases, a little doctor or another case will be self driving public buses that are walking without human drivers. Some jobs are going to be automated. An AI is going to be used for making such automation on the other side of the spectrum. Moving to the positive side AI is perceived is a better way to handle many challenging and complex A tasks and overall help to make progress in multiple domains. For example, a I can be used toe quickly, develop new important medicine or to improve old safety by enabling self driving cars. It is used to improve the interface between humans and all kind off electronic devices. There are hundreds off a I applications that already power many types of products and services we are using today. To be honest, we can't predict the future, but we have a responsibility to shape it, ai. We live a positive as well as negative impact on society, like many other technologies that are already out there. But one thing for sure is that a I is coming, and it is coming very fast. The industry embraced the power of such technology, and we are just scratching the surface with the potential use case off A I maybe you will be the next A. I eat Rapuano while creating some amazing new products powered by. I just remember to send me 5% off your total You never knew. Okay, Just kidding. 10% will be fine. Okay, So we know that a I is coming and the big waves are just about the corner. But hold on for a second. What is the definition off ai? How it is related to machine learning. Let's talk about it in the next lecture. 4. Artificial Intelligence: I guess by now you have the world's AI artificial intelligence, ml machine learning and maybe also deep learning as a starting point. Let's define those times at a high level and how they are related to each other, starting with a I Artificial intelligence was founded a long time ago is an academic discipline back in the 1956 with the fundamental idea that some complex intellectual tasks that are performed by humans on a daily basis can be also performed by machines? Those machines can me make or simulate human cognitive functions such as learning and complex problem solving? Nobody will argue that today machines or computers are already very good. When performing some task, they can calculate things that, at an amazing speed, they can be used in a production line toe perform automated, repeated task with great accuracy. But this is not a I. Those machines and computers are pay poor gummed with the knowledge to perform those a putative task. Let's take more complex scenario, like recognizing an object in a picture, something that I will mind is doing all day long. With an amazing speed and flexibility, we can easily scan a picture with our eyes and in a second identify objects in that pictures. This kind of task will trigger billions off neurons in our mind that will be used to identify those objects based on patterns we learned over the years. Trying to mimic such a task in a computer like identify object in the picture is very complex, and until recently it was impossible to achieve a usable use case. Wait. Stand up programming. So when a machine can mimic a complex cognitive functions, like identify an object in the picture or recognize a human voice and many other very complex task, it is often described as artificial intelligence. Let's take, for example, a the scenario off playing a chess game between a computer and some person. Okay, I don't know if you play chess before, but it is quite a complex collective dusk. It is very hard to consider all the options in a chess game, taking into account the future moves that may be performed by the other player looking at East away. The dominating approach for creating an application that can play chess game was by creating a huge amount off explicit rules that we mimic that task. The computer will be poor ground with needed rules. They needed knowledge for selecting the best next move, while leveraging off course the horsepower off a fast computer, a computer can perform many complex calculation while checking a very loud space off options and strategies. Such kind of program can easily win many professional chess game, so it will be conceded, is an AI entity. And for many years, such complex programs running in a fast computers that are simulating very complex task were considered as a I. Now there are two main downsides off this approach. The approach of trying to mimic complex human thinking by creating a huge amount off explicit rules. Okay, the 1st 1 is that someone needs to think and program all the game logic and started edges into the program, which can be very hard work. You probably need the team off professional chess players toe design such logic and then translated into lines of code. Many lines secondly, okay, The second downside is that it will be as good as it was initially. Poor gummed. Unlike the human brain, this program is not learning anything. If it will be defeated by some human players. It will not let anything from that experience. There is no closed feedback loop. It will create a new set of fools that will help to win the next game. Someone should open the program code and make it better by changing a wedding. More rules. So something is missing you. Okay? And maybe you can already guess by now. What is it? Well, the missing part off a I is the flexibility to learn. Okay. The human mind has amazing flexibility to learn to adopt. And now we are moving to the next phase off the AI Evolution pack, which is based on the concept off machine learning machines that can live. 5. Classical Programming: in a classical or so called traditional programming. The input for the computer is first awful. A set of fools. What to do Beirich case It is the whole concept. Off programming create an application program that will cover the needed task and use cases looking on all situations the poor government will study. They needed task to be done and then write a set off fools to perform the task, tested several times using different scenarios. Evaluate the result and correct the help coded rules inside the program. After, you know, programming the knowledge inside the program. The second important input is the data that is representing something in a chess game. That they put data will be the move just performed by the other player. Now, based on the pre defined programming ghouls, the provider data will be analysed and we will get some answers. Some decision in our example. It will be the next move Toby performed by the computer like moving the Queen to this new location. Okay, we're trying to win the game. This traditional approach has a major limitation. When the task is very complex and things are changing, the program will become very complex and also very hard to maintain. We talked about the chess game, but there are many examples off such complex task that traditional programming is not really walking anymore. Let's use another example, okay, toe, emphasize this say concept, which is very important. Let's talk about speech recognition. If I will try to write a program that can identify only two words like yes and no. Then I need to write a group of fools that can be used by the poor graham toe. Distinguish between those two words. For example, the poor gum can measure the derivation off each world. Yes, is taking longer than no. Oh, maybe use their sound wave PH level. I need to manually find tools that can help the poor come to identify the right world. This approach may be useful for identifying towards yes and no, But what about identifying 100 different worlds? What about 10,000 different worlds? It is becoming a scaling challenge. It doesnt make sense anymore to try approaching it with traditional programming and poor Graham. By ourselves, the logic on how to identify words we need to consider a different approach. And, as you may guess, it's about machine learning. 6. Machine Learning: What if we can somehow let the computer len new knowledge by looking at the data? Okay, using the example of the chess game. If the computer will win or lose a specific game by performing a group off steps, it can land from that experience. Like a human player, it will create a new set off rules that will help to win the next game and keep improving all the time. Former poor gumming perspective. It is a completely new approach, a new state of mind. Take a look on this simple drawing that nicely represents this simple but very powerful new programming approach. Imagine there is some machine learning black box that is supposed to perform a specific task. I'm calling that the black Books, because at this point I don't know what is going inside that books. But I know two things input and output off that box isn't expected. Input into the box and then some expected output form that box. Now this machine learning black box is a little bit more special than other boxes. We can provide it with examples off input in the elated output, and using this information, it will lend it will extract Parton's form the provided data and establish automatically some level off knowledge or rules that can be used to perform a specific task. This process is called training, and it is based on multiple types off mathematical algorithms. Something will cover in the next section. Let's say that the task off our machine learning black box is to receive as any input a picture file and identify if the object in that picture is a dog or a cat. How can we teach or train this machine learning black box toe? Identify such pictures automatically? Well, if I will ask you to teach a small child if a picture is a dog or cat, then you probably take a nice book with many pictures off dogs and cats and present those example toe that little child. This picture is a dog. This picture is also a dog, and this one is a cat. After showing that child some examples, he or she will be able to identify a new picture off a dog or a cat they never saw before. Their brains created some Parton's that help them to perform this task, they learned. Now, if we on a look that toe the world off machine learning. We are actually doing the same thing. We first need toe, teach the machine toe, identify some Parton's in the data, and then we will be able to use it to do something useful. This training or learning phase is expecting that someone will provide it with examples in the same way we show a little child a group of examples to be more accurate. This is one type off machine learning called supervised learning, which is a very common, and there are additional options that will be discussed later in this training. Now, practically speaking, doing the training or learning phase, the M L Black books will have to input streams, the data and the answers about the provided data. In our example, it will be a large group. Off pictures is the data, and for each picture, the identified object Is that a dog or a cat? This is good labeling the data. Okay, those are the expected answers. The Animal Black Books will digest this information, which is a process called training, and finally it will output a set off mathematical transformation. Rules toe. Identify if a new picture is a cat or a dog. Those mathematical transformation rules are in cups, elated together to something that is gold, a trained model. This is the output off the machine learning doing the learning process, and finally, this trained model will be used. Toe classify. If a new unseen picture is a dog, a Wicca, think about it for a second. It is almost looks like magic. We can train some machine learning black box that will perform some complex task only by providing it with examples. We don't need toe think and designed the complex rules that are needed to identify some partners in the data. The machine learning black books will create those rules automatically. This is why it is called machine learning. Those machines are basically learning from the data without being programmed by a preset of fools. In addition, and this is a key thing to remember, we can use the ongoing new data stream that is coming in tow there and machine learning black books toe. Adopt the knowledge inside that box so it can handle a dynamic involvement, learning new things, all of that without opening or changing a single line off code in the program, everything is automated. This approach is so powerful that machine learning is now the most popular and also the most successful sub field off artificial intelligence. This is why those terms are used almost interchangeably now. In addition, toe artificial intelligence and machine learning, which is a sub branch in a I, there is another important terms called Deep Learning. And let's review that in the next lecture. 7. Deep Learning: so far, we saw that the machine learning books is supposed to learn something. There is a training process, but we didn't open that black books yet. We don't know what's going inside. Still, let's assume that there are a variety off metal to training model inside the machine learning books, which will represent some kind off knowledge. Those methods are based on different kinds off mathematical and statistical algorithms that will be covered later. Now imagine that the ML knowledge, also called the trained model that was created doing their training phase has some size or capacity, like a small brain, medium brain, very large brain. Using this analogy, we can assume that a large brain can hold much more knowledge than a small brain in machine . 11 ing. We're not using small, medium or large brains to describe the capacity off a trained model. It is actually described. Using the concept off layers and layer is the basic building block off in deep learning. A single layer is like a data transformation face, and you can have several layers that are connected to each other's so they output from one layer can be the input to another layer so each layer receives an input, transform it with the set off mathematical functions, and then it will pass those values to the next layer. And then it will propagated until a reaching the final L. We can have one layer in our mission and existent two layers or hundreds off less when describing some machine learning system. We don't see those lives. They are hidden. We just see the input and the output off the old machine, their own existence. If we put very small amounts off flavors inside the animal system, like one or two, it is good shallow learning. The learning algorithm will catch a relatively small amount off Parton's while learning from the data, which could be more than enough for a specific use case. In addition, the training time in cello learning will be fast, making it an attractive option in most simple use cases. We can, on a look that way, to a small brain just to think about it for a second in nature, there are some very sophisticated insects with very small, tiny brain, which is still sufficient for rendering many tasks so shallow learning is not a bad thing. It is just more simplified approach that is good enough in some use cases in a later lecture will talk about the algorithms being used for shallow learning on the other end. If we put more hidden layers in, I will machine learning books in our model, there is a chance to collect more knowledge more Parton's from the data. This is gold deep learning, and it is referring to the simple fuck that Morelia's I'll use to inside to represent the partners. The knowledge, the depth off the model is basically the number off layers contributing to the to the modern. Those layers in the planning also called Newell Networks, as they in some level inspired by biological neural networks as we have. You know, humans and animals. Now, how many lives are being used inside the machine learning system? Well, today there are system with thousands off layers. As you weigh guests, a bigger brain can handle much more complex a task. The planning is probably the most exciting, filled under Michelle Eric because it is helping to handle much more complex use cases with its larger data sets. Today, just now, nothing is black and white. Nothing is perfect there also downsides to consider when using deep learning. The planning requires the use off more complex algorithms to train, and more than and because of that, also, much more computing resource is compared to shallow learning air. For example, hardware is common bottleneck when handling the training off a complex, deep learning model. They're off course, all kind of hardware vendors that are providing dedicated AI acceleration out. We'll just to handle deep learning. There are also a public cloud providers like Amazon in Google that provides a I cloud services to train a deep learning model so you don't have to purchase the hardware by yourself. At this point, we can go back and finally understand the following simple diagram and see the connection between the three terms. Artificial intelligence is like the umbrella concept off machines that can perform complex intellectual task that are usually performed by a only by humans. Those machines can mimic or simulate a human cognitive functions inside the AI cynical, and it's part of the evolution off A. I. A sub field evolved, which is called machine learning Machine learning is adding the missing power, the important self LENNY capability to machines using machine learning, we can now handle much more complex scenarios while learning from the data. Instead of fusing holes based programming, we let the machines land from the data. The next thing is the amount off knowledge we would like the machines to learn. Sometimes we will have use cases that are based on shallow learning, and some more complex use cases will be based on deep learning. So the planning is basically a sub filled off machine. Learning which options to select while approaching some specific task is the job off a data science. And, as you may guess, it's not an easy task. It requires substantial experience for selecting the right development algorithms as well as performing all kind of complex. A find Yuning doing the system development and those kind of things will talk later during the training. 8. Applied vs. Generalized AI: some people will jump and say he done Are you telling us that now we can create machines that can learn and perform multiple tasks like a human being? It's like creating a terminator Robots that can walk, run, talk and communicate and make quick decisions. That's a little bit disturbing. So the answer is, Let try to relax. We're not dead yet. The existing industry implementation off machine learning is focused on performing very narrow task. This is the input, and this is the output, meaning each ML use case is used to accomplish a single specific task. Those kind off A I use cases are called week AI O, also called Applied AI, is they are focused on no task that can actually be applied in practical applications on the are different. The human brain can process and identify information for multiple sense. ALS, like specific smell, sound waves a self, a structure object in a picture and then process that information is second and decide own actions. We off course take that for granted, but this is an amazing biological machine. Learning is data collected from our sensor located all over our body, a automatically translated into some knowledge we are learning all day long, but machine learning is not really learning. Go handling data. The same way is our brain is learning. Our brain is holding billions off small and interconnected Nunes that communicate with each other using electrical a signal. It's a very complex is structure on the other, and machine learning is basically just some simulation in a computer memory, and the knowledge is told, using mathematical functions and parameters we are not creating and storing information is the complex, newer network structure in our brain. It is extremely complex and almost impossible today to create in a I machine that will perform a multiple task that are coordinated with each other. Like the human brain. This is called generalized ai, also called stone gay. I, meaning the intelligence off a machine that can understand, and then almost any intellectual dusk that a human being can land a tow perform. I don't sing the near future, a situation off a. I. Robots that behave almost like humans. The technology and even the mathematical theory is not there yet, but you never know. Maybe 10 20 years from now, each one off us. We live 23 ai friends. Probably such kind of complex social capabilities. It will be supported in the virtual space, meaning the Internet. As I mentioned right now, the practical side off AI is focused on Apply the I Nao, a task off machine learning that are visible and can provide some tangible business value. So 99.99 9% off all AI use cases you will encounter in the future will be related to if light ai. 9. Why Now?: and other interesting questions is about timing. Why a guy is becoming so popular today. Why now and not 10 2030 years ago? What are the driving forces pushing AI forward today? Well, first of all, it's not the first time that a I was considered as the next big thing, and a huge amount off funding was invested in this field. This is called in a negative way, the Ai winter, and it already happened twice in the past. Each time A I is a market, you know, discipline topic went through some cycle, starting with a strong hype around it. Everyone is talking about the I and there are some initial success stories. It creates a lot off expectation about the potential capabilities off their new technology and then after fuse when the actual practical result I'm not fulfilling the high market expectation, it is leading to some level of disappointment. Invest stores are taking a step back and stop funding those initiatives and the field off. AI is entering some cold winter season for a couple off years with less investment. Still looking on what's going on in the industry today? I don't think we are entering another cycle off AI winter. There are many a machine learning use cases that successfully materialized into very useful product and services. Many companies are using a I in multiple industry domains, and it is keep going all the time. Now back to the question Why now? Why it is becoming so successful. What are the ingredients making? A. I flourish everywhere. As you may guess, data is the first ingredients they take the energy or material consumed by the Emel engine . Everything is about data. And today, more than ever, data is available for me going amount of data sources. Every action performed on Elektronik devices like your mobile phone is recorded and stored . Every action we performance, some social network is recorded. When we talk a picture on Facebook, we just creating some data on that picture. This is called user generated data. And thanks to the Internet and all kind off Web technologies, the volumes of data are exponentially going. So they take is everywhere. The technologies to collect, distribute and stole data are quite mature. I enabling great flexibility for many organizations. We have a lot of data. That's great. Now the focus is shifting to what we can do with A with the data and machine learning that is based on algorithms is sitting nicely on that sweet sport, helping toe utilize the data in more automated way to find interesting Parton's that can be translated to some useful use cases. The second ingredients for machine learning is hard. Well, take about the situation that you have one million off images that will be used to train some animal system. Let assume one image is taking around five seconds process by a particular hardware set up . If we have one million images, it will take 58 days to process the data. It's a long time to try anymore than which will slow down the end to end process to train, test and optimize machine learning system. In many cases, the HARDWELL is a bottleneck when handling a large data sets. As you know, inside any computer, there is a model called CPU Central processing unit that is doing most of the calculation. The CPI we're using today in a standard laptop is much more powerful compared to very expensive 10 years old, Scipio. Those see pews are becoming faster and more sophisticated, however, they are used to perform. Many task inside a computer, which makes them more genetic and less optimized for specific applications. Let me give you an example. Most of the heavy graphics video games will not run so smoothly while using just a generic CPU. They need something else, which is called GPU graphics processing Unit. This is a dedicated high performance chip Wendell Fast graphics data manipulation. Probably you have such kind off a GPU on your laptop as they already became a standard feature in many computers. Now what is their relation to machine learning? Well, one of the most common method to train and run in machine learning system is called deep Learning. This method requires a lot off computing horsepower and with a big surprise. The GPU technology, which was initially developed to be used for video games, is now evolved into a new market use case, meaning a machine learning to train machine learning. Now we there with help off powerful new GPU chips. The training time is it used, making the process much faster and more affordable. Giant like an video I and Intel and many others are investing un believable. Our in the money resource is toe try to lead this domain while developing a dedicated the AI artwork based on the evolving ecosystem off cloud computing. There is also a little an alternative to train animal model animal system. Instead, off purchasing an expensive hard was set up to training models. We can end computing power directly over the cloud a public cloud. Players like Google, Microsoft and Amazon are providing cloud based services to train Emel assistance. And this approach is is a great alternative for small to medium players that can easily utilize the latest AI hardware without buying anything. The last thing I would like to mention is the evolving development frameworks. There is a growing amount of high level programming language that are helping to create machine learning programs while utilizing a piece of code being developed by someone else in the poor coming language. It is gold libraries, for example. They will known pie tone language includes hundreds off libraries, toe easily ran the needed algorithms basically don't need Toby a mat. Experts run an algorithm someone already encapsulated in it inside some library that can be used when developing a new ai poor gum. There are also new types off AI friend walks that are helping to manage in the life cycle off training models handling versions off mortars. If the software industry in that perspective is transformed into more high level programming, things are becoming easier to develop and maintain, which will help to make those advanced the coaches a more accessible. Okay, that's it for the introduction section about AI machine learning and the planning. How I hope it was interesting as well as a useful thanks for watching so far and see you in the next section. 10. Introduction to Machine Learning: hi and welcome back In the previous section, we had some high level overview about the topic off artificial intelligence and the connection to machine learning and deep learning. We briefly talked about the interesting new approach off, letting machines land from data instead off writing complex will based programs. Now we're going to dive a little bit more, trying to uncover the basic terminology of machine learning This high level introduction. It will help us to develop their the ground flow, and then we will be able to build more complex, say topics On top off it, I'm going to talk about the black books metaphor that sometimes is used to describe the machine learning system. The concept off features, labels and examples that represent the input and output off such system, the meaning off a model and the lifecycle off a trained model. And finally it. We will also talk about the main challenges off training in model, which are called under fitting and over fitting 11. “Black Box” Metaphor: The first concept I would like to talk about is the black box metaphor that is used sometimes to describe a machine learning system. Any machine learning solution is supposed to perform a specific task, like predicting the price off real estate a property or classify the type off object in a picture. Or maybe the ability to decide the next action in a chess game. Most probably such a machine learning based system will be used as a model in a larger solution. Okay, think about an ML system that can classify if the incoming e mails are spam. Oh, not span meaning spam detector. We will have a complex email solution that is doing many things, and inside that solution will be some ML classic file. As a small, more dual that is walking with other models in this larger email system, this ml classifier is supposed to get incoming emails as input and produce a binary output pair each email spam, not spam. In that context, we can describe this Emel classic file as a black box. On one side, we feed that books input data, meaning emails that is a brain representing the knowledge and caps elated inside that is taking the input and produce the output. It's actually a simple, high level representation. Often, Emel system input the brain in the middle and output. It is commonly described as a black books because in many cases we don't know what's going inside that box the brain. There is some challenge toe articulate and represent the patterns inside. It is not clear how the animal system created that knowledge during the training phase, and also what to expect while using that books. Is that going to be 80% accurate, 90% accurate? How smart is the brain in that books? Still, this black box metaphor is a little bit misleading. All those challenges are well known, and there are all kind off methods toe answer some off those important question that I just erased. Anyhow, I hope that by the end of this trading, you will be able to better understand what's going inside that mysterious machine learning black box 12. Features and Labels: Okay, let's get down to business Using the simple diagram representing an Emam system. We have input output and some brain we acknowledge in the middle in machine learning features are the input data. A single feature represent a specific input viable? Think about the task of predicting the market price off a house. A single feature can be their size off that house, but this is not enough. We will need more relevant features that can better represent the house, so the system can better predict the market price. The features that can be used as input to such an assistant could be the size number of bedrooms Street area off the whole condition and so on. Another example will be an email spam detector, and in that case, the features off such a meal system can be the content off the email, like the sender address. At the time of in the hour of the day, the email was generated and sent number off Spammy Woods like free money and so one In simple projects. We can have 10 20 input features like we just saw while in some very sophisticated machine learning project could use 1000 even millions off features. Mathematically, the list off input features will be represented as a vector with the size off n like x one x two x three until xn as the sentence is saying garbage in, we will get garbage out. We'll talk about it later in this training. But it is important to mention that selecting the right features is a critical step in the process, something that is called features selection. Okay, moving next. A label is the output off the machine learning system. It is the thing we would like to predict or classify using the system after we trained that system. A label could be the price off for real estate property, as we just talked about the identified object in a picture like This is a dog that that's a cat and so one. The type off incoming email spam, not spam. The root cause off some events and the words being used in a video clip and much more mathematically, the label will be presented is why, as a simple matter equation, why equal toe the function off X? Now we can describe the meaning off examples, which is another kito an example is a single instance off data represented by the letter Eakes. There are two types off examples labeled examples and unlabeled examples. A labeled examples includes the futures, which is the ekes and also the labor related to such features. Which is why think about an email, a user manually labeled a as a spam. It is a labeled examples, or think about an image as input, which is basically many small pixels. All those pixels are input features, and inside the picture there is a specific object like a specific type of animal. The label will be the name off that animal. Okay, the type of that animal. If we have the image file and the type of animal inside, then we have one labeled example. If we have just the image file without the identified object inside, then we have an unlabeled example. A large group off labeled examples can be used as a training data set to train a model. If I will summarize, we have features as input to the Machine Learning system, a label as the expected output. If we have a group off data instances that are already labeled, then they can be used as examples to train a model, which is called a training data set moving next. Let's talk about training and model. 13. Training a Model: it's part off the prerequisite off most machine learning projects. We need the training data set. A training data set is a large group off labeled examples. A machine learning system is going to learn Parton's inside the training data set and stole that knowledge in something that is called a model. This model is supposed to define as close as possible the relationship between features and the target label in a common type off machine learning method called supervised learning. The way to create this kind off model is based on analysing a large group. Off labeled examples were basically training our model using the label examples. Once we have trained our model with those labeled examples, we can use that trained more. They'll toe predict the label on unlabeled examples. We're looking on the life cycle off a model in him in a machine and existent. We have two main phases. The training face also called the learning phase, which means creating the model. The main idea is to utilize or used some learning algorithms that will build a model using the training data set. There are all kinds off learning algorithms, and we'll talk about them later after training a model, we would like to use it in the real world to perform something useful. So the next phase or the next stage off a model is called inference. In machine learning inference means applying the trained model in an actual machine learning system, walking in a production environment for making ongoing predictions. It is also important to mention that this inference represent a specific snapshot off the trained model in many practical machine learning use cases. The system will keep training new and better models all the time. Using new data and in some time interval irritability, place an existing inference with a new one. As a nice example, Let's say we would like to develop a mobile application that can recognize the type off animal form a picture taking in real time by the end user. Okay, so a sponsor off the training phase we live a very large data set to training more than like one million images off different animals. This training process, it will be performed in a dedicated, very expensive hardware set up, located in a centralized data center. All the images used for training are located in the same data center after the training is done. OK, now we're moving to the next stage. We have a trained model that can be sent as a snapshot toe the end user mobile devices. So when someone is taking a picture using hell, his phone, the local application on the mobile device can immediately identify or classify the type off animal on that picture. The influence stage, which is the actual prediction of gasification, is done on the device level. Okay. Another option is to perform the inference stage in the centralized data center so they end user will take a picture. The picture will be sent ISRO data to the cloud. Okay. And in the cloud, there will be an application will which will use the latest model to perform the inference . The result will be sent back to the end user. A device there, off course, all kind of pros and cons like latency issue and so on Today, because of the end user devices that we're using today are getting, you know, faster and better. So they are more and more use cases where the inference can be done on the end user device 14. Aiming for Generalization: We just talked about the concept off training model in the previous a lecture. In that context, it is important to understand that this is probably the hardest and most complicated part off the whole process. When developing and machine learning solution training a model is not an easy task. Like just run some algorithm on the training data set, and we will get the best trained model. The challenge is to make the model more generic, making sure it is performing well on unseen data. We need to remember that the learning algorithm created the model while trying toe optimize something. It's all about optimization. It's a process off adjusting a model step by step to get the best performance on the training data set on the other end. The objective off the machine learning system is to be able to make good prediction on data it has never seen before. This is called generalization. A well generalized model is a model where the Parton's lend from the examples provided in the training data set can be successful, used also on new, unseen data instances. That's the whole objective off any machine learning solution toe make good prediction on new data or cannot own the training data set. As I mentioned, this is not an easy task, and there are two main challenges to overcome before we can get a well generalized model. Those a challenges are called under feeding and over feeding. Starting with under feeding under feeding refers to a situation that the trained model is not walking well on the training data and, of course, cannot journal eyes to new data. The trained model didn't capture the Underline structure of the data. If this is the end result off the learning algorithm, then something is not walking. Take a look on the following. Simple to charts. We have multiple points as the training data set and one linear line created by some learning algorithm, which was supposed to represent the model. The line itself is the model. We can easily see that on the left side, this linear line is not really representing the Parton's in the data, which is the problem off on the feeding or the other end. The line in the second graph is not linear and can better represent, but the Parton's it has better fit to the training data. Now what are the main reasons for under fitting the 1st 1 is that the model is probably too simple, and we need to build and more complex model that can better land the underlying structure off the data. In that case, it makes sense to try a different learning algorithm. For example, here we moved form an algorithm that will build a linear line toe, an algorithm that can build none leaning in like OK that will better represent the underlying data. The second reason for under feeding is that the training data set is not good enough. Maybe there are not enough examples. Or maybe they input features off the provided example are not informative enough, like providing an algorithm just the size off a house without other related features. It's not enough. On the other end, under feeding is also a standard transition. Face off any training model doing the training process. The learning algorithm will build and adjust the model while performing a sentence. Lumber off iterating in at the beginning, off training. The motorway under fit the training data because it is just started to model the relevant Parton's and in each learning innovation, the model performance should improve again and again, making the model a much better fit. Toe the training data OK, so it will be a transition phase form and under fitting model until toe a fitting model. If we keep trying to improve the model, there is a danger that we can create a model that is although feeding the data set and now are moving to the second challenge. After reaching some optimum point. When the algorithm is running over the data set, the model testing performance will start to degrade, which means the model is starting toe all over fit. The training data learning patterns that are too specific to the training data and will be relevant to new data is a simple analogy. Let's say we just both a few a grocery items in the local supermarket, you know, like 20 items doing ah, well, vacation in a different country. The overall price off that basket was relatively cheaper than our expectation. It surprised us in a positive way, and when going back home, we told our friends that the supermarkets in that country I'll unbelievable, cheaper, whether you think is it a reasonable conclusion? Well, not really. We just all the generalized a pattern about something formula. Relatively very small amount of samples for that small amount off items we both in the supermarket. It is making sense. It is nicely feeding with the conclusion. Maybe we are right and maybe we'll completely home. It makes sense to check a much larger amount off items in the supermarket before making a conclusion. Now, when the same thing happens in machine learning, it is called over. Fitting overfeeding is a very common situation when training models. It means that the trained model we created performs very well on the training data, but it does not generalize well to new data. The model is not performing well or new data. Looking on the same graph when drawing a line that is perfectly connecting the points is an example off overfeeding for those points in our training data set, it is perfect, but when we use it for new data points, this model will not perform so well. But why were using the training data set? We encounter such an overfeeding situation well. There are few common reasons. The training, later said, is a simple, off, much larger distribution. If we take 100 items in a supermarket Toby, compared on a price level with the same items in a different supermarket. It is a small simple. It's not all the items available in the supermarket, which can be millions of items. Maybe if we will compare 1000 items or 10,000 items will get much better distribution off data about the items in the supermarket, the same challenges when training and model using the training data set the training data set is a simple it is a group off examples. It will be a large, simple size that will resemble as good as possible. The true distribution off the data. This is the key issue to remember. The training data should represent the distribution of the data as much as possible. Otherwise, it will just over fit the training data. The next reason it can be also too complex. Model objective. Off a model is toe fit the data well, but at the same time, for the data as simple as possible, it is a careful balance. If the model is too complex while trying toe fit perfectly that the training data, then we increase their risk off over fitting anyway, Even if we took a very large data set, how can we discover such problems? How can we trust that the model will also make a good job on new data? Maybe it is over fitting the training data set, and it's not generalized well to new data. The answer is that we need to test the model performance on a separate data set to check and validate that I will Model is walking well on new data. It is called the test data set. The concept is quite simple. We will have a group off examples to train a model in another group, of example to test the model. We'll talk about it later in this training as a quick summary of the key things to remember . Training a model is not an easy task. It's actually the cold job off a data scientist when building a machine learning project. The challenge is to make the trained model more generic, making sure it is performing well on unseen data making. The model will generalized there too many challenges to overcome a under feeding and over feeding. Other fitting is when the training model is not working well on the training data set and off course cannot generalize to new data over fitting, which is more complex. The problem is, when the training model we created perform very well on the training data. But it does not generalize well to new data. The model is not performing well on new data. Okay, Overhaul of the section was a high level introduction to the basic machine learning terminology. Please post a question. If you would like to ask something in the next section, we're going to talk about the main classifications off machine learning system. 15. Classification of ML Systems: until this point, I explained that the machine learning system is learning by training a model using labelled example. This is partially true. The process off learning or training, which is a quote concept in machine learning, comes in different flavors. Different types, off learning algorithms. It's actually a spectrum of options. Those options can be classified off, grouped into the following three main categories. We have supervised learning, unsupervised, learning any enforcement, learning from the name supervised or unsupervised. You can guess that it is related to the degree off supervision. Okay, how much a human is needed to supervise and control the process off learning. Okay, reinforcement Learning is a completely different approach that will discuss in this section each one off those categories. There are multiple learning algorithms that can be used, the selection off the most relevant category and that the best algorithm to perform the job will be based on the required objective. If we would like to classify a picture, then this type of task should be performed by the best relevant learning algorithm. If you would like to training machine how to play a game, then I was learning Algorithm will be different in this section. We're going to talk about the concept off each learning category and understand the types off task that can be achieved while using it. 16. #1 - Supervised Learning: okay. Are you ready to make things a little bit more complex and interesting? Let's start with the first category called supervised Learning. Supervised learning is the most common use case off machine learning. It was actually the method I used so far for presenting their machine learning concept in the previous sections. And most probably, this will be your first touch point if you will decide to move into the practical side. The name supervised leavening originates from the idea that training a machine while using this type of approach is similar to how humans are learning under the supervision off a teacher looking on a regular school class. We have a group of students and a teacher doing a lecture about some specific topic. The teacher will provide several examples while teaching something the student will use. Those examples analyze and memorize them, something that will help them toe extract the Parton's from those examples at a later stage . Based on the provided information, the student will be able to solve similar problems off all. The teacher decided what kind off examples toe present and how many. He or she basically supervised the learning process in supervised learning we trained the machines by providing them a set off examples. Each provided example is a pale, consisting, often input object and the desired output value for that object. It's called label data set. Okay, this is what we talked so far. The fact that both the input and output values are known qualify the data set is labeled. The label data means some input data is already tagged with the correct output. As an example, let's say that the machine I would like to build will be able to identify if an object in a in an image is that dog own at a dog. So it's part off the supervised learning process. I will need to have a large group off images and pet each image a labeled value, saying that the object in that specific image is a dog or not. Those images are the label data set, also called the training data set that I need to use while trying their machine learning. Practically speaking, in some use cases, it is the biggest challenge to get such a label data, but this is the coal requisite in supervised learning is some preparation step. I also analyze the list of pictures and removed some off them from the data set. Maybe some pictures are missing a label. Maybe some of them are not so clear. And maybe some of them are by mistake related to different type of animal. Or maybe some pictures are coming with different resolutions that I need to normalize them toe the same. A resolution baseline it is called cleaning the data and practically speaking, In some cases, this process will require substantial time and effort. Finally, when the data is ready, okay, Clean normalized. A supervised learning algorithm will analyze the training data set while trying to decode the relationships between input and output. What kindof Parton's can be found to transform the input into output while looking at all provided examples, It's like the concept off. Leave us engineering. The algorithm will search what kindof steps are needed to reach from the input with output . And finally, it will produce a trained model that can be used for mapping you input into predicted output. Okay, get ready for our first mathematical formula, which is goingto be unbelievable, complex and machine learning. Black books with an input and output is basically some kind off that a transformation that can be presented as a generic formula. Okay, X is the input into the machine, which can be a group off values, as as we talked about, they can also called features. Why is the output off that machine? The target value now the the functions within put eggs is basically some mathematical transformation functions or met pink functions discovered by the algorithm doing the training process. Machines are very good at optimizing functions. Under some constraints, the learning algorithm will use the label data set to find the optimal parameters for that transformation functions once they learning algorithms identify this optimal values. We have a trained model which can now be using animal system that is supposed to do something. So the aim or target off a supervised learning algorithm is to find the best mapping function F that will be used to map their input. Viable X with the output valuable. Why, Based on the training data, there are two very typical task that are performed A using supervised learning. The 1st 1 is called classifications is the 2nd 1 is called Regression. Let's talk about each one of them 17. Classification: Until now, I provided an example off identifying if an object in an image is a dog or not a dog. This type off task is called classifications, and this is a very popular use case in supervised learning. Think about animals service How the system can identify which email is a spy. I'm over Regula legitimate email. This is a classical use case off classifications. The machine learning solution Toby implemented in such email service should automatically classified if a new email is a span or not. We can't even imagine our life today without using such features in in any email system. This type off classifications is also called binary classification, meaning only two options to classes doing the training phase. The classifications algorithm will be given label data points with emails that are both spent and not spam. Using this information, it will create a model with a mapping function, moving form, a XT. Why, then were provided with an unseen new email. The model will use this mapping function to determine whether or not their email Ispat Other classifications task will require multiple values. Okay, multiple classes, which is also called multi class classification, like the task off identifying if the Coehlo off a specific flower is yellow, green, red or blue. So here there are four classes. We can build a binary classifier or multi class classifier using shallow learning or using deep learning algorithms. For example, one of the common classifications algorithms under the shallow learning category is called support vector machines. Okay, it's a reminder we're still under the supervised learning category and under the classifications task, which means we would like to build a machine learning system that were classified data as VM support Vector machine is an algorithm toe create such type off classifier as point in space that are mapped into separate domains. Let's see that in a visual way. Imagine we have a data set off people with the weight and height in machine learning terminology. This type of information are called features. Okay, We talked about it. One feature is the weight, and another one is the heart. In addition, each person can be classified as a male over female. So this is a binary classification task. Now that group off data points can be placed in two dimensional space. Okay, X one and x two. Okay. The weight and height like you can see him now, can we draw a line that will separate between the two groups, somehow male and female? Using this information, they wait in height features. I can, of course, menu Lee. Try to draw the line here and maybe move that a little bit here. There are many lines that might classify the data, but it's not going to be an optimal line. It is better to find the line that represent the largest separation or margin between those classes. The job off the support vector machine algorithm is to search for this optimal line or better call it hyper plane. Okay, because in two dimensions it's a line and it's going to be a plane in three dimensions. In our example, this line should break those points into two groups. Took classes and it is basically a simple math formula. It will find a line that is the maximum ulgen, meaning the maximum distance between data points off both classes. Okay, more margin will increase the chance to classify correctly a future data points. So when getting a new data point, the machine learning system will use this line to decide if the point is related to Class A or Class B, is that a male or the female again? I don't want to go too deep you. So let's zoom out A. To the big picture. We talked about the first typical task in supervised learning, meaning classifications. One common metal toe build such classifier in shallow landing is called support vector Machine algorithm. The task off gasification is very common in practical machine learning. Exist them. Let's move to the next type of task in supervised learning. 18. Regression: The second very common method in supervised learning is called regression, and maybe you already encounter. That is a statistical method it to analyze and predict data. I used it multiple times doing my engineering the great a long time ago. It is very straightforward metal to predict a continuous number based on historical data. Let's say that we would like to build a machine learning system that can predict the price off real estate products like houses. If you will ask a real estate expert how to evaluate the price off a specific property, then he or she will use multiple attributes. It can be the size off the property, the number off forms off the whole condition location, the average price off similar houses and many more attributes going back to our machine system. The objective off the algorithm will be to predict the price. Based on such attributes, it's like making an AI real estate agent a price is an example off a continues number. A continuous number can be an age off a person of a product weight, some school in an exam, income off a person, annual company revenue and many more on the other and gender is not really a continuous number. It's a group off possible options, like a female or male, so we'll handle it. Using classifications is we saw Evelio to be able to predict a continuous number one off the most relevant types. Off algorithms is based on regression. The concept off regression analysis is widely used for data analysis, but in addition, it is also used as one of the most basic forms off machine learning. So what is regression? Irrigation is a set off statistical method. It for estimating the strength off the relationship between a dependent, valuable and one omo independent Viable. Such relationships can be linear or nonlinear. The most common form off ligation analysis is lenient regulation. But there are also different types off aggression algorithms, for example, logistic regression and polynomial regression. At this point, let's talk about the first option called linear regression. Linear regression algorithms lends a model which is a linear combination of features coming form. They input examples. There is a dependent valuable labeled why which we would like to predict and independence group of valuables able X y next to and so forth. These are the predictors. Why is basically a function off X valuables, and the regression model is a linear approximation off this functions. The basic assumption, which is not always true, is that there are linear relationships between the dependent and independent variables. Looking on this simple, one dimensional linear regression graph, we have one input feature X, which is the independent valuable? Why is they depended valuable or the predicted output? All the points in the graph are the training data set, and we can easily see that there is a linear relation between X and Y. The algorithm will search for the best fit linear line for finding WN big where w is the slope off the line. Describing how strong is the linear relationship between X and Y and B is the intersection with the UAE acts describing the arrow in that model. Now how the algorithm We know that this is the best line. Well, it is using some something that is called coast function. It will take any available point and measure the distance between the actual point that we have in the data set and the points over the line. The line that they algorithm created is a model the distance represent the URL in the model in the area regulation. The cost function is called Minsk Well, hello embassy, which is basically the average off squared a row between the predicted value and the actual values. So the goal is to reduce this Evo off course. But taking into account not just one point, we need to take into account all available points in the data in the training data set. Finally, this line, which is the trained model, can be used to predict new data. Points will insert new input X one and get his output there predicted why one which is on that specific line the generic equation off linear regulation with multiple input features . We look something like that well, X I specific location is the features for the date, and W Y and B are the parameters that are discovered doing training. This is the idea about building a model. Geeks are the it features in their input data, and the algorithm is tryingto find those parameters w on B that will describe as good as possible that specifically in the rough now imagine x one x two x trio basically group off attributes that describe a real estates products. So using regression, we can build a model that can be used to predict the market price off such real estate property as an example. Okay, let's do a quick rika. We talked about the most common type off machine learning, which is supervised learning in supervised learning. We supervised the learning process by deciding which label data instances would be part of the training data set. And there are two main tasking supervised learning we can use shallow learning algorithms like as VM for classifications task immigration algorithms for prediction. All use Newell Networks under the concept off a deep learning Let's move to unsupervised learning. 19. #2 - Unsupervised Learning: we talked about supervised learning. Using this method, we must provide labeled data or so called examples as part of the training phase. If I'm building an image classifier that should identify the the type off animal on a given image, then I need a large number off examples images that are labeled with the type off animal in each image. This information is used during the training phase. Unfortunately, the vast majority off available daytime. Any application in many industry use cases is usually unlabeled. We know the input feature X, but we don't have the labels. Why to train our model If we still want to use supervised learning, then we can consider several options, like searching, available label data from other sources that can be free in the Internet. Or maybe a purchase a label data set for Mattel Party, a company. The next option will be to label the data somehow is a manual process. Let a group of people expert go over some portion off the data set and labor early. Okay, This can be an expensive and very slow process and in some cases the amount off manually. Label data will not be good enough to train a good model. Still, it is a practical option for using supervised learning. As you may guess, the next option to consider is to use unsupervised learning, which is not as white, spread and frequently used as supervised learning. Unsupervised learning is learning without a teacher supervising the learning process, the goal is to identify automatically meaningful. Parton's in on labeled data. We don't need to provide the algorithm a label data set, which makes it a very attractive option to some use cases. Unsupervised learning is used for two main fundamental a task. The 1st 1 is called clustering, and the 2nd 1 is called Dimension Reduction. Clustering is about summarizing and grouping similar instances together into clusters. It is helping to find a small number off attributes that will represent the patterns in the data and by doing that, uncovered the underlying structure off the data set. Clustering is a method is widely used for search engines, customer segmentation, Szmyd told image segmentation, simple data analysis and more will talk about it in the next lecture. The second type of task is called dimension reduction, which is about reducing the complexity off the input data this method, under unsupervised learning is sometimes used to pre process the input data and compress it before feeding into a supervised learning algorithm. OK, the idea will be toe compressed the data while maintaining its structure and usefulness. Let's review each one of them. 20. Clustering: clustering is probably one of the most common use cases off unsupervised learning is the task off identifying similar instances. We shared attributes in it in a data set and group them together into clusters, coping a set off object in such a way that objects in the same group of more similar to each other than the does in other groups. Their output off the algorithm will be a set off labels, assigning each data point to one off the identified clusters. Take a look at this graph, which represents many data points we have in an unlabeled data set. We can easily see that some points are Vensel in specific areas, meaning they may have something in common. These were clustering algorithms can do the job they can use the input features off the data set and automatically a sign Each data points to a class toe. Let's call all the data points in tree different Kahlo's red, green blue. In that scenario, the clustering algorithm will find those three class tres and automatically labeled them with something that is called class. They I d. Cluster number one Cluster number two cluster number three. So, after clustering each cluster is assigned a unique number called This Cluster I D. And each data point or instance will be assigned toe one plaster i d. This kind off information that was identified automatically by the algorithm may be a useful insight about the data set that we can use. Clustering is used in a wide variety, off use cases in the industry. For example, it is used for market, also called customers. Eg mitigations. I mean all business companies so they would like Toa better will know and understand the customers. Okay, who they are and what's driving the purchase decisions. This kind off segmentation can help to adopt products, services and also market campaigns toe each identified a segment. For example, suppose the business has data about customers such as demographic information and also del historic purchasing behavior. A clustering algorithm can identify sub segments off the whole market, where a particular type of product is very successful and helping to design a focused market. A message to that specific segment. Another interesting use case is called anomaly detection or out lie detection. For example, the scenario that you need to detect defects in a manufacturing process off some product. So that will be all kind off sensors that are measuring different physical characteristics off the products. And then you can run such clustering algorithms to find a data points that are too far from the centre off a specific laster. Okay, which makes them look like Anna normally like this size off the product, the bone really off the product. And so one another example will be if I'm taking a pictures off product during the manufacturing process and then trying to identify products with defects using again the method off clustering. The tell one that I would like to mentioned is called semi supervised learning. This is a method that is sitting between supervised learning and unsupervised learning. The idea here is that we can run a clustering algorithm on an unlabeled data set that will create a few class tres as labels OK, like cluster number one, number two, etcetera. Then I will get a very small amount off plasters that I can a manually labeled like this cluster is read. This cluster is blue. This graph cluster is green whatever a terror that I would like to use and then I can propagate those labels toe all the instances in the same cluster and now solidly, I have labeled data sit that can be used for training and model in supervised learning. Very interesting approach. Let's move to the next common task in unsupervised learning called dimension reduction. 21. Dimension Reduction: they mentioned reduction. Okay, What is it in supervised learning? Okay, While using some classifications or prediction algorithm, one big challenge toe handle is the number off input features that the algorithm needs toe analyze. Let's say that those features are dimensions. Now think about a high resolution image that has millions dimensional piece off data in pixels. Each pixel in the image is actually three. Dimension is described using red, green and blue, so we will have three millions dimension to describe a single image. So what's the problem here? More features. More dimensions will require much more processing time or computing. Resource is like memory storage, networking, and sometimes many off those features are collected it to each other and therefore redundant for the algorithm. In other cases, some off those features will have a very weak influence on the machine learning if outcome . So what if we can perform some pre processing toe? The data is a step before applying that in a supervised learning algorithm and reduced the data size, reducing the dimension off the data. This is well, they mentioned reduction algorithms coming toe play. They can be used for it, using the number off valuables under consideration, helping toe simplify the data without losing too much information. This is a common pre processing step for prediction and classifications that task as a simple way to visual that process. Take a look at this three dimensional pipe. We can reduce one dimension and describe a despite in a two dimensional plane like, for example, this circle. We're looking on the pipe from upside in the X and Y plane and also as a rectangle looking from the side in the X and said plane. This is a very simple example of dimensional reduction. It will be useful to talk about the vory common use case. We're using this approach. Let's say I would like to build an image Classifier sparked off some object detection system using supervised learning and for performing that training task. I have 10-K images, we 640 under 640 resolution. But each image an image with Kahlo's is basically a large group off pixels. Each pixel can be represented by the combination off. Three basic Kahlo's head green blue algae be the red, green and blue use a eight beats each which have an integral values between 0 to 255. This makes around six million possible Carlos. Okay, this is the space off options. In our case, if we have this resolution per each image than the whole UN compressed size off, that image will be around 1.2 megabyte. If we need to process 10-K images with 1.2 megabytes bill each, it's a lot of data. In many cases, the objective, often amel system does not require such level off details. Think about the situation of identifying a cow in an image the machine does not care about . All the cola range off a call to understand that this object is a car, so it makes sense to transform and compress the daytime that picture as a pre processing step. You know, we're example. What if we can reduce the space off possible Coehlo options? So instead of using 60 million Kahlo space, we can reduce the image quality in tow just 256 space off Kahlo's. In that case, the image size will be just 0.5 megabytes. Then I will take all my compressed 10-K images and feed it into my supervised learning algorithm, which will be much faster process because there is less data to process. This is being done by performing. It's something that's called image segmentation is about off the dimension reduction algorithm by clustering pixels according toe the Coehlo okay and then replacing each pixel Kahlo with some mean Kahlo off its cluster like replacing group off shades off red in tow. One single red color, which is enough and better amount off information toe the algorithm to identify a specific object. 22. #3 - Reinforcement Learning: the last learning type we will encounter in machine learning is called reinforcement learning, and it is completely different approach compared to supervised or unsupervised learning. In reinforcement learning. We're not using a group off labeled or unlabeled examples as input to train and model. I guess it may sound a little bit strange, but don't worry. It will be clear in few minutes. This method is used as a framework for decision making task. Based on goals, it can be used to perform a complex objective while performing multiple sequence off actions. For example, it is widely used in building A I system for playing all kind off computer games while trying to achieve superhuman performance. It is used for teaching robots to perform tasks in a dynamic environment or building real time recommendations, system for website and much more. It is not as popular as supervised and unsupervised Blanik, but it is getting momentum while ML practitioners are trying different approach toe handle kind of complex task. Let's take the example of playing a chess game. The objective is to win the game by deciding how toe play multiple turns that are correlated to each other. Every move would like to play has thousands off future options to consider while also anticipating the other player. What is going to do and every move the two players are making is changing the ongoing state off the game. They influence the environment, which is the game boat. By taking sequence off actions, it is a dynamic environment to make it even more complicated. Sometimes the result off actions are delayed. We decided to play a game in some strategy and Onley later doing the game will know if we have done a good decision or not. The fifth pick is delayed, and it is sometimes difficult to understand which actions lead to which outcome over multiple steps. This type off task that involves some level off bi directional interaction between the machine and the environment is not easily feeding into what we talked so far under supervised or unsupervised learning. We can't use here techniques like a classifications, clustering or making prediction based on historical data. The way to handle such kind off task is by using the concept off reinforcement, learning and the best place to find an example off. A system that can interact with the environment is to check what Mother Nature developed over billions off years. The concept off reinforcement learning is very similar to the way humans and other animals are learning and some off the algorithms being using reinforcement. Learning Well Inspired by Biological Learning System Each one off us can be described as a sophisticated biological machine that interacts with the physical environment in an endless feedback loop. Almost every action we are performing will ever some kind of feedback which right things and get feedback. And based on the feedback we're learning, let me give you a very simple example. If I will try to pick up a 20 kilogram weight in the gym for a freestyle training, then the immediate feedback will be that it's too heavy for me to exercise, so I can decide to try much less weight and drop it, for example, toe 10 kilogram. And maybe the feedback will be that it's too light for me again. Based on the feedback, I can raise it to 15 kilogram and continue to perform those adjustment until getting the best weight. That is perfect for my training goals. How did I know which one is the best when I didn't. I tried a few options and lend form the experience based on the feedback I got while trying to pick up some options. Now, many things we learned doing our life are based on such a continuous feedback loop based on actual experience. Think about how you learned to drive a car. We can't learn how toe drive only by reading a user guide. There are, of course, basic cools. We need to learn and follow while driving on the road, but the actual pound off operating a vehicle and head dealing a variety off road. A situation is something that we must learn from experience, like the interaction with the car is a machine that we need to operate the interaction with the old conditions. Interaction with other divers. Learning form interaction is a fundamental idea in our daily life, and this is a great analogy for reinforcement, learning, learning from interaction 23. Decision Making Agent: so going back to machine learning reinforcement Learning is a method being used to let machines learn how to be a based on interaction with the environment while focusing on some and gold. We need to define this end goal like winning a Chairs game, but we don't need to tell the machines which actions to take those machine must discover which action will help to achieve the goal they can select. The actions form a space off possible options. Those algorithms a are penalized when they make their own decisions and rewarded when they make the right decisions. The visual way to describe a system that is using reinforcement learning is by using to building blocks leavening agent, which represent the machine and the outside environment. This learning agent must be able to sense the state of the environment to some level and be able to make actions that can influence the state of the environment. To be clear, this agent is not necessarily a physically fully functional robot or something like that. Agent could be some sub component in a larger system or some software model. It's part off a sequence off interactions. The agent will decide which actions to perform on the environment. Those actions will, of course, change the state of the environment. And then the new state will be translated to some new medically would value that will be used as a feedback signal toe the agent. The idea is that this reward signal is helping the agent toe navigating. Understand which actions will help to achieve. The goal is like a feedback loop, helping agent to land from its own experience and then select the next best strategy to get the most reward over time. Using again the example off a chess game, the chess playing agent will play such a game board and perform ongoing moves while making decisions. Those moves out the actions performed by the agent on the environment. The environment, in our example, is the game board. The goal of the game is winning, so the agent will be rewarded for winning again. Let's use a new diagram and some simple matter described this situation. The process off Learning Hill is based on multiple steps on a time dimension. Time is represented by the small letter t like t zero t one t two tea tree, etcetera it each time step t The agent festival receives an analyzed the state of the environment represented by the capital Letter s in a particular time t And using this information combined with knowledge gained so far, it will select some action that represented by the Let the capital Little A with specific again the same time t now one step later like t plus one spot off the consequence off its action the agent receive or calculate a numerical reward signal That is gold our capital Al AT T plus one And it will actually find itself in a new state s T plus one. So we have, like groups s zero a 01 s one a one hour two and etcetera. This is the all idea off sequence off a state action and reward. Assuming the agent just started to interact with the environment How the agent decide which action will be next? Well, it is similar to the concept off learning something by trial and ever you can teach a child how to ride a bike by explaining to him Oh, hell, the holes off riding a bike. It will learn by trying many times and learning from each experience reinforcement. Learning is building a prediction model by gaining feedback, form random trial and eco and leveraging the communicative insight that it was collected from previous interactions in our chess game. Example, the agent will start to play without knowing anything, exploring the space off options and then take actions during the first game. It will be a very bad player, and as a result it will get very strong negative feedback while losing games. Now the algorithm running inside agent is trying to maximize the world, meaning winning the game so it will try a different strategy. It's part off the trial in the oil's a method for making better decision. Some off those actions will lead eventually to a better result, and the agent will learn from that cumulative experience. So this is the concept off reinforcement learning reinforcement. Learning is used in applications that the machine must make a sequence of decisions, and those decisions are coming with positive or negative consequences that is collected is a feedback. The Fitbit going back to the agent is used to learn from the experience and basically get better and better in each iteration, like playing a chess game 1000 off times. Sometimes you win, sometimes you lose. But every game you learn something and get better. The cumulative knowledge on how to achieve a specific goal is really in forced again and again by experience. Now we know why it is called reinforcement learning. 24. Quick Recap and Thank You!: hi and welcome back to our last section In this training, I would like to recap the things we covered so far for creating an end to end story. Artificial intelligence is an umbrella term for the fundamental idea that some complex intellectual tasks that are performed by humans can also performed by machines. When a machine can mimic or a complex cognitive functions like identifying an object in the picture or recognize a human voice and many other complex task, it is described as artificial intelligence. Still, machines can perform complex tasks using brick programmed a rules. But in many cases, this traditional programming approach cannot scale upto handle very complex task. Something is missing you. The missing part off A I is the flexibility to learn, which is the next phase off the AI evolution path, moving to the concept off machine learning machines that can Lynn. So instead of fusing brick poor gumballs, we let the machines land, formed a data extract knowledge automatically and then use the knowledge to perform a complex task. The knowledge in a machine learning system is basically some mathematical transformation, mapping functions that are in caps, elated in layers. We talked about shallow learning, meaning using very small amount of flares. Okay, this is the knowledge the brain in the machine learning system, all using deep learning with many layers. The parameters off those mapping functions are identified automatically doing the training process by a specific up optimization algorithm. We also talked about the options to train and mission landing system. The first very common option is called supervised learning, and this mattered. We need toe Supervise the learning process by providing the machine learning algorithm. Labeled examples is a training data set. This training data set will be used to train a model that can be used by an ML system in a production environment, something that is gold inference. Training a model is a complex process off selecting the relevant algorithm tuning the input features, cleaning the data, testing the performance off the model tryingto overcome two main challenges under feeding in overfeeding, and the final goal off a train model is to be well generalized and to handle new, unseen data. There are two typical task that can be handled under supervised learning classifications and immigration classifications is a very popular use case in supervised learning, and it is used to classify the input stream like image classifier. Oh, spam classifier. We can build a classified using shallow learning or using deep learning algorithm. The second very common method in supervised learning is called regression. It is a very straightforward method toe predict a continuous number based on historical data. The concept off regression analysis is widely used for data analysis, but in addition, it is also one of the most basic form off machine learning moving next toe unsupervised learning One of the main challenges when using supervised learning is to get labeled data set because in most practical application we don't have it. We will have an unlabeled data set. Unsupervised learning is learning without the teachers supervising the learning process. The goal is to identify automatically meaningful partners in unlabeled data. It is used for two main fundamental task. Clustering and dimension reduction. Clustering is about summarizing and grouping similar instances together into clusters. Okay, like cluster number 12 tree etcetera. It is widely used by search engines, customer segmentation, application, image segmentation and simple data analysis. They measure reduction is about reducing the complexity off the input data while maintaining the structure and usefulness off the data. This metal is sometimes used as a pre Pelosis a stage to the data before feeding it into a supervised learning algorithm. The tailed learning option is reinforcement, learning. It is used in applications that the mission must make a sequence of decisions while interacting with it. The outside environment. Those decisions are coming with positive or negative consequences, which will be translated by the agent as a feedback loop to learn what is working and what is not walking. The correlative knowledge that the agent is creating on how to achieve a specific goal easily enforced again and again based on experience. Okay, the same way as humans are landing form experience. So those are the three options off machine learning system. It was a quick recap to connect the dots. That's it. I want to thank you for watching this training. I hope that you enjoyed it and learned some interesting things along the way. It will be awesome and useful if you can rate the course and share your experience. I'm planning to create multiple courses under the concept of machine learning and also on other interesting topics. So check out the bonus lecture at the end of the training and see which courses are already available today. I hope to see you again in my next training courses. Bye bye.