Demystifying Artificial Intelligence: Understanding Machine Learning

Christian Heilmann, Principal Program Manager at Microsoft

Get unlimited access to every class

Taught by industry leaders & working professionals

Topics include illustration, design, photography, and more

Get unlimited access to every class

Taught by industry leaders & working professionals

Topics include illustration, design, photography, and more

Lessons in This Class

- 1.
  
  Introduction
  
  1:44
- 2.
  
  What is Machine Learning
  
  5:25
- 3.
  
  How We Teach Machines
  
  5:48
- 4.
  
  Machine Learning to Help Humans
  
  5:28
- 5.
  
  Tools for Machine Learning
  
  3:44
- 6.
  
  Visual Uses
  
  7:54
- 7.
  
  Speaking Human
  
  6:07
- 8.
  
  Audio & Video
  
  6:32
- 9.
  
  Personalizing Your Machine Learning
  
  5:08
- 10.
  
  Ethics of Machine Learning
  
  5:32
- 11.
  
  Machine Learning & Creativity
  
  4:33
- 12.
  
  Final Thoughts
  
  0:32

Beginner level

Intermediate level

Advanced level

All levels

9,773

Students

Projects

About This Class

Curious about Artificial Intelligence? Start here with Machine Learning — what it is, what it isn't, and how we all interact with it every day.

Join product developer and keynote speaker Christian Heilmann for a fascinating class all about Machine Learning. From how we all use it to where it's headed in the future, you'll learn the ins and outs of how machines are processing our data, finding patterns, and making our lives easier every day. With a focus on how machine learning can power human interfaces and ease our interactions with technology, lessons are packed with tools and tips for developers, designers, and the curious-minded. Key lessons include:

Machine Learning myths, capabilities, and limitations
Tools to incorporate machine learning into your products
Visual and audio uses for machine learning
Ethical considerations for everyone

Whether you're a developer looking to incorporate machine learning into your work or are just curious about artificial intelligence today, this class is a behind-the-scenes glimpse into the world of cutting edge technology.

After taking this class, you'll have a clear understanding of how we all interact with artificial intelligence every day, what that means for your life, and how to harness it to make the world a better place.

Meet Your Teacher

Christian Heilmann

Principal Program Manager at Microsoft

Teacher

Chris Heilmann dedicated the last 20 years of his life to make the web work and thrive. As a lead developer on some of the largest web products he learned that knowledge is not enough without teamwork and good handover. He is the author of several JavaScript books and the Developer Advocacy handbook. He is currently a Principal Program Manager in Microsoft and spends a lot of time pondering how machine learning and AI can aid humans and replace jobs we're too important to do.

See full profile

Related Skills

AI & Innovation AI Fundamentals

Level: Beginner

Hands-on Class Project

Share your favorite Machine Learning tool or resource!

Excellent machine learning is all about the tools you use to harness it. Share a product you've made or a tool you're interested in using to help spread the word about your favorite resource! Ask questions, share your thoughts, and helps us create a space where we can all learn more about artificial intelligence.

Resources

Find Christian on his blog, Twitter, and GitHub

Tools from the class:

Explore More Classes on Skillshare

Designing Data Visualizations: Getting Started with Processing — Nicholas Felton

The Future of Work: 5 Mindsets to Power Your Career — Jacob Morgan

Mihai Chindriş 16 likes

Ade Bakare 14 likes

Jane Swenson 8 likes

Skillshare Member 5 likes

Class Ratings

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction: Hello, I'm Chris Hermione, I'm from Berlin, Germany, I work for Microsoft at the moment. I've been a web developer for 20 years, and right now I'm getting into the whole AI machine learning space to see how computers can help us even more. Artificial intelligence and machine learning is, to me, the next evolution of computing as revolutionary as the first factory was to the job market. In this class, you're going to learn about machine learning and AI, but not from a data science point of view, but from an interface point of view. I will help you understand where to get information, where to find tools to use, and how to use these tools to build your own interfaces to make them more human. So, you don't need to be a technical person to take this course, you just need to be an interested person and somebody who wants to learn more about artificial intelligence. Machine learning can do amazing things for people, and I think there is a great opportunity to build interfaces that are understandable by humans and that lower the barrier to entry to your systems. Where somebody in the past could only use your websites when they use a keyboard or a mouse, nowadays, people can do it by voice or can do it by just looking into a camera and logging in that way. That is an exciting almost science fiction idea that we can use nowadays but not enough people do yet. I hope that this inspires you to ask questions, ask questions to me, ask questions to other people, and also question the headlines that you will see about machine learning and artificial intelligence and the systems that you use. I want you to use machine learning and artificial intelligence to make human interfaces and make it available for people who are necessarily not too excited about technology. So, when you build something cool with what you learn here, please tell us about it. Well, I'm excited to give this class and let's get started. 2. What is Machine Learning: Machine learning is a way to tell a computer to do things repetitively over and over again, until it finds differences, until it finds patterns and until it actually sees what the data is about. Machine learning is not magically learning from. Something over and over again. You have to ask the computer a very detailed and a very precise question to get good answers. We cannot have like magical information coming out of computers because computers cannot think. They just can simulate how thinking processes work. There's two schools of thought, there's a few more, but two big ones. One of them is like The Terminator, where people are afraid of artificial intelligence, where people are afraid that they're going to take our jobs away, where they're afraid that they're spying on us and they're actually killing us in the end, which is just movies really if you think about it. The other side is Steve Jets' Star Trek kind of thing where we have this ubiquitous computer that we talk to, that is super exciting for us, that is our friend, that's just there in need. Her is a great example as a movie for that where people fall in love with an Artificial Intelligence because it is like cater to them, it's the perfect partner to talk to and it's the perfect machine to do things for you. We're not in any of these spaces. Of course there's terrible people using Machine Learning for evil things and there's people who are making it much easier for duties kind of- for using computers and for using your mobile phone. Just take the last generation of mobile phones now doing automated shots for you, making sure your selfie looks great, making sure the background is in sync and in focus without you doing anything. A lot of this is Machine Learning and Deep Learning, but we don't talk to people about it anymore, we just use them sociologically and also from a psychology point of view, we're in a very interesting spot there. We've got people who are either afraid or people who are actually preferring computers to people. I think we need to go back in the middle a bit and understand that all of these things are tools for us, for humans to be more creative. So if the Terminator school of thought is worried about jobs being taken away by computers, the Star Trek school of thought should be happy that some jobs are being taken away by computers and robots, because they're unhealthy for humans and they don't make any sense for humans to do. There's a great opportunity now with automation and computers doing things for us that we can free ourselves from things we never thought we could. Like my parents for example, were working class. They always worked in factories, worked in coal mines. They had to work to live because there was no robot that could take the jobs. The unhealthy, repetitive, boring jobs that they had, that could do it for them. So with these jobs going away and they will go away because it will be much more cost effective for robots and machines to do that, we have the freedom as a human kind to become more creative and to learn about the next new jobs that we don't even know about yet. Work doesn't have to be something where we just live off and we do because we have to get money for it. Work could be something that hasn't even been there yet. We could free ourselves from the shackles of production by understanding that computers can do these things. We just need to understand that from somewhere, we need to distribute wealth and we need to distribute intelligence and distribute education better. I hope this is one start where you get excited to learn more about this and you're not feeling afraid of machine learning and Artificial Intelligence anymore. There's a few things how I can help humans. The first thing is automation, that's the big one right now. There's error prevention, which a computer can tell us this is actually wrong, why are you doing this again? There's data reduction and muffling the noise when you have a lot of data and you only want to find that one thing that's different between all the others. Computers are great at finding those differences. There's prediction based on historical data. How many times have you used your computer and what can it do better for you? A great example for that would be your keyboard that you're using on your smartphone. It realized what you wrote and give you automated words already by typing the first two letters because it knows these are the words that you're using all the time. That's plowing through massive amounts of data, like if information is a lot of information from sensors, image data, an audio recording is a lot of information and we as humans don't think of it as that. But finding emotion in an audio recognition for example, is a really hard task. So computers are good at plowing through that amount of data and giving you only the results back, rather than you having to look at all that data yourself. The last bit is, of course, creating more human interfaces. Allowing us to talk to a computer like we would do to a human person and getting information back that are fit for human consumption and not just a list of tax or a list of results. In this first section, I want you to take away that Artificial Intelligence is not magic. There's nothing happening that computers can think, computers can be creative and fill in gaps that they don't know about. You are responsible of what you tell the machine and you are responsible what you get out of it. Intelligent Machines can only be as intelligent as the questions we ask them. You can benefit from a lot of recorded information and patterns being found by other people already but you've got to make sure that computer's thinking is just an illusion of thinking. It's not another human being and it will not replace human beings. We have the chance to be as creative as we ever wanted to be if we just understand to get the boring repetitive task computers to do. 3. How We Teach Machines: Welcome to this section of the course where we are going to look into the magic that happens how computers know to appear thinking. Where's the information coming from, what's going on there. I'm not going to talk about all the details, of course, but I'm going to hopefully make you understand that there's something there that isn't quite that fascinating and isn't quite that obvious in the first run when you think about it. So, when it comes to movies and when it comes to interfaces and movie stars, always this magical moment where information comes out of nothing. The best example is the trope of any movie when you've got the zoom and enhance of any like CSI episode where it's like, oh just this little grainy footage and like, enhance this, enhance this, and there's always great information in those where you find like, you zoom in on a screw of a license plate and then you see the reflection of a killer in the background there. Sadly enough, the world doesn't work that way. If the information is there and you have data that is corrupted or data that is bad quality there's not much you can do to actually find that information. However, over the last few years, more and more things cropped up that looked like that. There's a great artificial intelligence paper that shows you how you can get en face, for example, from an 8 by 8 pixel matrix and just keep refining it until you find out which probably was that face. Cameras that recognize people in train stations and things have become better and better at changing grainy footage into something else. We're not quite there at the CSI world where we can do this, but what is happening here is that we have so much data over the years being recorded, analyzed, and with machine learning and deep learning logarithms tact that we now can compare better than that. One of the great examples I want to show you is a thing that Google has released a few month ago called AutoDraw, and what it can do and you can see here is you can start drawing something, and if you're artistically challenged like I am, it does magical things for you. So, for in this case, I'm trying to paint a pair of glasses. So, I'm hardly getting a round spec here and hardly do the right line closing. But if you see up there, I can now click on this and I get a perfect pair of glasses and I can even have different shapes. So, it recognized from the outlines that I drew, that probably I want to do some glasses and not a bicycle where I would have had a handle on top of it which kind of could look the same and most of the time when I tried to paint the glasses, it looks like a bicycle so I'm very happy that this thing exists. The interesting thing is that the information that we have about these things doesn't come magically. Of course, a computer can find the rounding between two lines and does it straighter like when you use your outlines in Illustrator or whatever you use, it does these things for you. But finding out that I wanted to do a pair of glasses is based on something much more interesting. A few years ago, Google released a game called Quick, Draw where it asked people to draw something and say what it is. So, it says like draw a line and under 20 seconds. So, you now draw a line and it says. Oh, I know, it's line. So, computer actually says that it's a line. Draw a train in under 20 seconds and this is where I'm out because it's not going to happen for me. But millions of people use that game and had fun playing that game and competing with their friends and this is the dataset that was to start off that auto draw game later on an Autodraw tool later on. So, every time you upload a photo to Facebook, every time you upload a photo to Twitter, every time you give up a comment on something, the machine start recognizing that and start filtering it and when 10 people say the same things, then most likely that is looking like a train. We've been uploading information for years and years for free because we wanted to use the systems for free and in the background, machines have been recording that for a long time. So, the last thing where this information comes out, which is a very very interesting one, is a Google system called reCAPTCHA. Lately, reCAPTCHAs are like, if here's five photos or 20 photos of something, tell us where something is. This used to be distorted text and that was when Google had Google Books and some of the scans didn't work out. So, it used these systems for humans to clean out those data sets for free by having more security on your common forms. Nowadays, you will see much more that it is street names or street signs, and of course, street signs and cars, which of course means or points to that this data set will be used for self-driving cars to learn more about their surroundings. So, as humans, we are being monitored and we are being recorded all the time, but it doesn't necessarily have to be an evil or insidious thing. It's quite interesting when it becomes a game where people draw something and then later on other people like me who can't draw can benefit from it or it can be that you want to make sure that not a bot is trying to log in to your website and a human, and that human teaches a computer later on to recognize street signs around it, so the car doesn't run into other cars or pedestrians. This is how computers know how to fill in gaps, this is how computers know. It's all a game of data and of massive data and that's but Cloud computing, that's where machines on demand come in. You can do a lot of these things on your own computer, but it's most of the time makes more sense to rent a computer for a few seconds that is much more powerful than yours to do that kind of data mining, and data mining is for everybody out there. It's happening. So, let's make sure that we do it to fill gaps in information. There's a great opportunity to upload a bad image and find 50 ones that are almost the same and do the outlines for you. We are in a world where the zoom and enhance is not far off because we have so much data to compare with. 4. Machine Learning to Help Humans: So now, we're going to take a look at some examples how machine learning helps building very human interfaces. So, what I want you to understand is that machine learning can help humans become much easier or become much better at understanding what the world around them is, by comparing what we have to lots and lots of other information and making it better that way. So, one of the examples that you probably have been seeing for a while is Google Translate. A lot of people used Chrome as their main browser because it was the first browser to automatically translate a website for you when it wasn't in the language that you had. The Google Translate app on your mobile phone goes even further by analyzing images. So, you can go to a street sign in Cyrillic for example, and hold up your phone, and it gives it to you in English, translating it, what the street name is, in case you only have directions that were in English. In the past, translation services just translated from English to German for example, and did it word by word. But the more people use these systems, the more we understood that one word following another makes a much more natural sentence, and that way the translations became better as well. Google analyzed books, Microsoft analyzed books, and analyzed books as well to understand what idioms might be about? What metaphors might be about? How humans talk to each other? So, having a translation nowadays from one service to another is almost there that you can read it and understand, what's going on? So, translation was probably the first thing where machine learning was used in the web, we didn't even realize it, but it was so useful to have that it became a very very normal thing. Nowadays, people don't even know how much energy and effort went into it that your tweets can be in one language and you understand, what's going on there. Another interesting example, I always like is Google Maps. But there's very clever things in there, you could go for example just to Search Google Maps and say like, "How far am I from the Capital of France? It will analyze this and analyze that the capital of France is actually Paris, France, and then it shows me the distance from here to New York, and also shows me that there is an eight hour flight, and it actually offers me which flights to book. So, in the past something like that, would be I have to type in Paris, France. Then I have to say, type in New York as well. Then I have to go to another website and say like, "Which flights might be available?" By analyzing the patterns of how we use these systems, every click, every mouse moves, every interaction, the machines have become much more intelligent and giving us the things that we actually wanted. I would never have come up with the idea of typing in how far away am I from the capital of France. But a kid learning about geography for example, they would do that. They would not necessarily do it like I did right now in a browser, they would go to their Google home, or to their Alexa, or whatever other machinery they're like how far away is the capital of France and the machine would say, "Paris is the capital of France. It's so, in so many miles away. Here's flights in case you want to go." This is where I want computer to go. I want us to learn that like actually the next answer rather than the only answer that we came from it. We're still at a space or people like me, who have been using computers for so long, we've been conditioned to think of computers as dumb interfaces that need to have the right question. But this is a perfectly human question to ask the machine, and you get back something like that. If you want to see something pretty amazing, you could spend a bit of time looking at seeing AI. Seeing AI is an application on iOS from Microsoft that I built with a friend of mine, who is blind. He's a blind user, and he's- well, he's a blind human, and he's also a blind program which is fascinating to see. But he wanted to actually not have to ask people what's going on in the menus. So, he wanted to have an app where you can take a picture of the menu in a restaurant and say like, "Show me the headlines or read me the headlines." Or he wanted to have a special glasses on way you take a picture, and it tells him "You're looking at a dog, you're looking at the cat, you're looking at the Eiffel Tower, you're looking at the top of the Tower Bridge." All of these kind of things that we know because we compare them with millions of photos that people have already taken and tagged them as Tower Bridge or it's the shape of a dog so that's probably a dog. So, this kind of tool allows him to become much more independent and not need anybody else around him. So, you can try all these videos there and you can download and play with it yourself. That is based on these APIs that we're going to cover later on as well for you to play with. I hope these examples inspired you to build human interfaces that actually make things easier for humans without them having to do anything extra, without having to understand how they work, but just as a great thing in the background, you have no alternative text any images, I can create that for you. Not a problem at all. Machines are there to help us when humans when we failed to do things. So, these interfaces show you that we can think ahead of what your end users want to do next rather than telling them to do it step-by-step. The easier it is to use an interface, the more people will use it. The more happier users you will have, the more income you will have, and the more successful your products will be. We've got all these things here and it showed how it can be done without being in your way, but only there when you need it. This is how I want you to think about Artificial Intelligence in machine learning. 5. Tools for Machine Learning: Hello. In this part of the video series, I wanted to introduce to the players that offer APIs for you to get started with machine learning and artificial intelligence. The big players, that's lots of players in that market right now. It's a big thing, a lot of investment is going on, but the ones that have been doing it for years and years are the biggest companies in IT and all of them have different offerings that you can play with. For example, this one here is the Google Cloud API, you can see here you can try it for free and there's guides and resources, and all of these things are more or less the same. If you sign up for them, you try them out, there's lots of good documentation how to get started, and some of them have even try before you buy interfaces where you can play with the information and you see what kind of data you expect to bring in, and what kind of kind of data you expect to come out. So, this is Google Cloud, which is a very, very big player in this case worldwide available, and in several languages, and one of the big companies that are playing with that. There's going be at Google [inaudible] , there's always lots of talks about Google Cloud, how to use them as well. Amazon is, of course, the next big player machine learning with AWS. AWS is a Cloud platform of Amazon, Amazon Web Services that allow you to do all kind of things for machine learning and artificial intelligence there as well. A lot of the things in AWS are also connected with the other services in Amazon. So, if, for example, you want to interact with an Alexa and use the benefit of having this natural language processing there, you can write a skill for Alexa rather than writing your own service and using the under the hood services that powers Alexa. So, you can use that one as well. IBM Watson is another one of the big players in machine learning and it's been very good in its marketing. Remember, for example, that it played on Jeopardy and was winner of all these things there years and years ago. IBM Watson, the platform itself is very much about healthcare and about predicting what kind of diseases people could get, but of course, they have a normal AI set and machine learning set that you can use on their platforms as well. It's a B2B offering in most cases, but there is the Bluemix Infrastructure where you can set up smaller servers, or use it locally as well, and call an API and get the data back. Microsoft's Cognitive Services is what I would talk about in the next few videos, mostly because I know about it and I work for them, so that's the benefit there. I've been using the other ones as well, I'm not saying you need to use one or the other. Make sure that you read through the documentation, make sure that you read through the demos, and see which one makes most sense for you to do. For example, if you want to have a server farm in Germany, then probably the Microsoft offer is a better one than having a sort of a farm only in California or New York. So, think about where you can spend your money the best and also not spend as much money as you need to because it can get very expensive very quickly if you have a lot of datasets and they have to have very complex computation. So, make sure that you have enough money on the side and yet it would still be much cheaper than doing everything on your own machine, or your own computer because that means you have to change that thing all the time and every half, you really have to upgrade it for a new computational needs that we have. Just make sure when you sign up for one of them that you also will be able to afford it after a while, and also that they offer it for the future. So, playing with the big players might be a safer bet than playing with a cool startup that offers everything for free now, but will be gone in a few months time, and your data is gone with them. So, Google Cloud machine learning, the machine learning at AWS for Amazon, IBM Watson with the Bluemix Infrastructure and Cognitive Services of Microsoft are the things that I've been using and I will be talking about Cognitive Services in detail to show you what you can do with them and how they would be beneficial for your interfaces in the next videos. 6. Visual Uses: Welcome back. In this video, I'm going to show you how we make computers see things or how we seemingly seem can make computers see things, because all we do is comparing visual information and seeing what computers can find in them. Visuals have become a bigger thing. People don't write anymore. It's not fun to write on your phone. Voice recognition is working as well, but you cannot just go around in public and talk to your phone because it just feels weird. So, a lot of people just communicate with images only. We take selfies, we take pictures of things, we send emoji to each other. A lot of times, we are forgetting this case, that not everybody can see them, that somebody might be visually impaired, or they might just be on a terrible connection. Right now, I'm here with my UK sim card and everything is on edge connection. So, people sent me only pictures in a chat. I don't know what's going on. So, I want the computer to tell me what is at least in that image before I give it my sweet, sweet data and pay a lot of money for downloading that image that I may not actually want. So, over the years, we've been collecting images on the internet from wherever. It's trillions of images in Bing, and in Google, and in other search results or search engines. Everything has been indexed. Everything has been categorized. Everything has been compared with others. That way, we can actually give a good assumption of what a certain image is going to be. I showed you earlier things like the vision AI, APIs, and the demos that allow people, blind people to see what's going on around them. Now, we're going into details about these APIs, what they do, and what you can do with them, how you can empower your end users to do something useful with the information that they have. I'm going to be covering mostly the comments of services from Microsoft because this is something I can answer your questions about later on as well, and I've got lots of colleagues working in those departments. Even locally here, in case you don't want to wait for your answers for a long time because I'm in different time zones all the time. What we have here is the Cognitive services APIs from Microsoft. This is an API offering that allows you to send data to an endpoint and get it back. So, in order to use these things, you can either use the demos here on the website, just to try them out. But, when you want to try them with your own systems later on, you can have a developer write you a script, an automated script to put, for example, images into a folder and get the information back, or you can send a request to a URL, an endpoint, much like you would say google.com or microsoft.com. You just say the API endpoint, here is my image, and then you get a data set back with the information that you wanted to have. When we started this, we tried to make it a bit more viral, make it interesting for people to see what these things can do. So, one of the big demos was the How-Old demo that quite became a viral sensation and annoyed a lot of people as well, including myself. Because I learned that as soon as you have a beard, it recognizes you to be a bit older, and I am old but not that old in most of the cases. But, you can go to how-old.net, for example, and click on this photo here, and say use it or upload your own photo. It recognizes the gender and the age, or perceived age, or perceived gender of the person in that image. Once again, if you didn't like it, you can complain about it with this link and see what's going on. We also wrote a long blog post explaining how how-old.net works, what APIs it uses and the code is available on GitHub to try it out yourself as well. So, if you click on to view source here, you can actually get the information and you can get the code to play with it yourself. So, using the APIs and services, I'm going to talk to you about, you can build an interface like that pretty simply if you know how to build a web interface and you know how to send data to an endpoint and get data back using whatever you want to use, react, angular, all the systems out there. You can build something like that for your end users in a nice way. The real important thing is, when you think about it, is facial recognition because that is where the most the future of a lot of things are happening. Logging into your website by looking into a camera would be a nice thing to have, and weirdly enough, it's not that hard to do. Recognizing that one person is in one photo and also in another one, is another interesting thing to offer for your end users. So, these are APIs that we consider, to a degree, dangerous because you want to make sure that you're doing everything right. But, when they work, then they're actually beautiful. Because I love, for example, going to Facebook and finding out when people took pictures of me at conferences that I don't know about. I found some nice interesting pictures that way. So, you want to make sure when you use these APIs that your end users are aware, that's what's going to happen, and you also want to make sure that you explain to people that some of these things are a guess work. So, when you say, for example, the how-old demo gave people older images or younger and were happy or they were unhappy about it, but you just want to give them as a guess, and say this is what it is. This is what machine learning boils down to a lot of times. Machine learning gives you educated results that are guess work. They're not 100 percent there. Computers make mistakes. They don't make mistakes, but we make mistakes asking the wrong questions or giving them the wrong data. So, you want to make sure if it becomes personal and as personal as a facial recognition, you want to have an interface that makes people feel welcome and not scared. That said, if you want to use the face API, there are several things that it can do. It can do a face verification. So, it finds a person in one photo, and then it finds the same person in the other photo, and it says to you the two faces belong to the same person and a confidence level of 0.73, in this case. So, in this case, if you take two different people, it will tell you that it's two different people on these two images and it's not the same person. So, that could be a first step of doing a log in system that could be a first step of making sure that people are the right one. Of course, you don't want necessarily what Facebook has been doing in the beginning and they had to undo as well, automatically tag everybody because people might not want to be recognized in some photo depending on where they were. Imagine like Clark Kent being in this train station and people says it's Superman, that's not something you want to have automatically. He should be allowed to say he's not. Once you have face detection, there's an amazing amount of information that we put in this API, partly also because with the cool demos that we built in the past. So, every recognized picture, like the lady here, has a face ID. It has the rectangle, which is like where is the face in this image and what is the other part of the image? It has attributes, like if it's about the hair, if it's a bald person, if the hair is invisible by somebody wearing a hat, for example, the hair color with different conference level. So, in this case, it's a brown with one, a blonde with 0.69, and so on. I'm not going to read this out right now because there's lots of information, and the API will get more information over time. But, you can already see there's lots of information in there, and there's lots of cool things you can do with the app. So, I want you to think about what could you do if an uploaded picture has all these things. The emotion API is a very interesting one as well. It recognizes the faces of people and their emotions. So, the emotions are anger, contempt, disgust, fear, happiness, neutral, sadness, and surprise. That way, you can actually find out when something went wrong with your image or you can automatically categorize them into different databases. You can also, when you do it on a live version, you can see, for example, your user testing when you used to say something and they mean something differently, there might be discrepancy between the two. So, this is a great way to do automated use of testing and get an extra information point if people are really excited about seeing what you show them on their website, or if they just told you because they actually want to make appease the interview, or so to say. So, this is facial recognition and emotion recognition. So, these are two things that are very, very human in your interface. So, use them sparingly, but you can use them for quite amazing things if you want to play with them. 7. Speaking Human: In this part of the video series, we're going to talk about language. I'm not going to go into too much detail. I'm going to show you just some APIs to use in this case because it is a very, very deep topic and a very, very old topic. There's lots of intelligent people out there doing it, and I don't want to insult them with some half knowledge. I just know what I want, and I know what you can use in this. So, I want to make sure that I show you some of the opportunities that we have in this case. When we started using machines that actually are audio input, when people started talking to their computers, or people started typing whole sentences and we have to make machines more intelligent. We had to dive away into real human language into linguistics, into phonetics, into metaphors, and we became much more human again and needed to become much more human again than before. So, that way, a lot of people started working in IT that aren't supposed to be in IT in a religiously way of like what we think the program is only allowed to be in IT. Google for example, hires poets, or hire singers, or hires translators and linguists to make the computers understand the intrinsicness of language, because language is one of the most complex things we have, and computers have nothing that they can actually do with it. With an image at least you can analyze the pixels, you can find shapes, you could find the outlines. With text, you have to kind of guess and computers are bad at guessing, but humans are good at analyzing things. That's why we have hundreds of years of knowledge in linguistics that we're now trying to help APIs understand, and help you with that kind of things that you want to do. What you want to build with these APIs or interfaces that allow people to make mistakes because people make mistakes, just sloppy. When we type things in on a mobile phone, or we say something into a microphone and it's not as understandable as we want it to be, mistakes happen. So, the language analysis things should be getting much better that way. Bing translation and Google translation, a lot of translation services have become much better when this linguistic approach came to it. We translated word by word, maybe sentence by sentence, but it's still made not much sense. It was things that actually were understandable, but they weren't quite giving the meaning that we wanted them to give. So, now we actually compare by paragraph as well and by the last heading that comes before it and so on and so forth. So, that way we get more context out of it and automated translation becomes much better that way as well. So, if you want to play with some of these APIs, here are some of the things that are available to you. So, the first thing is a text analytics API. So, what you do with that, you extract information from your text. So, you copy a text in and it'll give you again an analyze text, and it gives you a JSON object with the data itself, so you send a text to it. For example, here I have a wonderful experience, the rooms were wonderful, and the staff was helpful, and found out that the language is English with a confidence level of 100 percent. The key phrases are, that there is a wonderful experience, that the staff and the rooms were there, and this is important information if you think about it for a comment. You don't want to read through thousands of comments. You only want to know which comment talked about the rooms, which comment talked about the people, which comment talked about the food, and this is what this API is about. If you take the negative example here, I had a terrible time at the hotel, the staff was rude and the food was awful. It finds out the food, terrible time, hotel staff, sentiment is awful, so it finds that the food and the staff were awful. So, these are the two things that you want your interface to care about later on. All this information is again available as a JSON object. This is where the data comes from, so you won't get the information as it's displayed in this demo here as well. It also creates Spanish or Spanish negative. So, in essence, I think it's 12 different languages. Google has also quite a few language, so many, many offers are there in different APIs. The next thing to talk about is language understanding, and this is where it very gets really, really tricky and really interesting. That would be a session for an old video series here and there are a few out there to look at it. There's a service code language understanding, LUIS service, and that one tells you or teaches you to say what you want to learn from a text. So, instead of having a text that automatically tells you that this is about the rooms and this is about the staff and about the food, this is about building systems that take in language commands, and then find out where the commands are. So, in this case, the demo that you would see on this page is a remote light control, where you can type in things and it will understand it. So, if I say now turn on the lights and submit it, it would turn on the light in the demo, and it would give you the query results for that. Switch all lights to green, turned the table light off, and so on and so forth. So, this is an example how you could do something like an Alexa control or Google Home control in your own interfaces by using this kind of text as a text input in this case. So, for example, for a robot in a chat client or auto for a search box. So, this one allows you to train your own model and make sure that your own expecting certain commands rather than just randomly analyzing text. So, this is a very interesting interface, and to me, it's actually the future of interaction. Thinking about that voice recognition or text recognition allows you to control things rather than clicking buttons and clicking having the button or clicking the right link might be much easier for end users to do. So, this is something to deep dive into if you want to be interested in that, and it's a great opportunity for non-technical person and the technical person to work together to build a cool interface for your controls or for the need that you have in your company. Now, you learned how your services can understand the meaning in text and how you can define controls for people to say. Control sentences, turn on the lights, turn off the lights, and so on and so forth. Of course, as a text, this is kind of interesting, but where you really want to have it is a speech recognition, and these are the APIs that we're going to cover next, turning text into speech and speech into text. 8. Audio & Video: In this video of the series, we're going to talk about probably most cool sky sci-fi thing you can do with AI and machine learning and that is speech recognition. In every science fiction movie, sooner or later we had something where somebody says like, "Okay, computer allow me to do this thing, tell me those things." The interesting bit here is that we're getting very close to the problem where something becomes too human to not be creepy or not human enough to be not annoying. So, doing an interface that recognizes speech and an interface that gives speech back that doesn't sound weird, it's quite a thing that we have a hard task doing right now. All companies that have their sort of personal assistance are doing great research right now, what would be the right language and what would be the right voice to use for these systems? So, there's a lot of things you can do wrong, but there's a lot of things that you can do right because if a voice recognition works right way, it's a wonderful thing, and lot of times people don't understand the more you have to use them, the better they get. A speech recognition API or a speech recognition interface is a wonderful thing to allow people hands free communication, so you can do it in the car or you can do it at home. But it's kind of limited to something that you can do in short steps. You want to make sure that people don't have to say like hold stories to your interface, you want to make sure the recognition happens as early as possible. So, when it comes to the API's to play with right now, I'm going to show you a few, and hopefully, the demos are going to work out so I can show you how that's done. What you also need to understand to make them perfect in the future, it may be a right thing to offer free end users to train them for their voice as well. That could be done with a few sentences by now. That's not like reading pages and pages to like, Dragon Natural Speaking like it used to be in the past, but now it's actually making more sense. If you think about it, we're going back. In the 50s, people have been dictating letters to their secretaries and they've been writing it down in shorthand and then typing it up. Now, the computer is basically our secretary that can do all these things, but we have to talk to it for a few times to actually make sure that it understands our accent, and you can clean up accents quite nicely with some of these customized API's that we're offering to you. So, the first thing I'm going to talk to you about as to being speech API, which is speech recognition. So, we can try that out here right now. So, I hit the start recording button and it says, "English U.S. I could also do it with German, so I allowed the access to my microphone and you can play with that yourself in the future as well. It says, south the future as well, and South the future as well and so on and so forth. You can see the words are happening on the screen while I'm talking to it, and it's doing quite a good job of it. This is untrained. This is just out of the box what it would do. So, if you want even better recognition, you have to start using it. So, I stop recording now I switch to German, and let's see if it does prove that as well. I allow, [inaudible] As you can see, it recognized that in German, you probably can't read it, but I could use the translation API right now to turn it into English if I wanted to and then use the other way around, the text to speech API to speak it to somebody else. There's open API's and open data sets that you can use as well to do this kind of recognition by using a system like Microsoft or Google's or Amazon's, we've trained it in different languages already. When I told you earlier that this is now a text in German that you wouldn't understand, there's also an interesting API that is a translator speech API. So, in this case you speak into it in one language and it automatically converts it to another language and then generates a synthesized voice that reads it out in the other language. We've used this in Sweden with the police to allow refugees from Syria to talk to police officers as quite successful and it's also part of Skype now. So, that's the barriers that I love that recognition can do with machines when we use it the right way. Sometimes recognizing who's speaking is much more important than actually what is being said. You want to use for example, a logging system that uses voice recognition as a second factor in a two factor authentication together with a token or with a password. This has been around for quite a long time. A lot of Hollywood movies had speech recognition in the 40s and James Bond movies in the 60s, but now we can do this inside the browser right now. I cannot do this right here right now with my microphone has a problem, but you can train it yourself with going through these different sentences. It asks you three times to say the same sentence and then it recognize the difference of your accent. The problems of the pronunciation, how you brief, where you make breaks, and these are all little indicators to recognize which speaker is speaking when. Once it's trained on these systems and you have pre trained models, you can recognize different speakers in audio data that you have as well. So in the demo down here, we have different American presidents and it can actually click the audio, and then it starts playing it and it recognize that it was Barack Obama speaking in this case. That's a pretty important or an interesting thing to do with speech recognition. So, these are all the APIs that we have to play with right now, and I want you to consider more of what you can do with these systems. If it's absolutely necessary to do your own, or if it makes more sense to tap into a third party service, Alexa Cortana and Siri are all available as APIs as well, so instead of training your own, you can just use these systems and benefit from all the training and all the planning that these companies have been doing for you as well. Think about voice recognition as the next interface that people will want to use and have to use. Think about something that is very personal though, and something that doesn't scale because if you have an office with 300 people speaking at the same time, that's not going to be a good interface either no matter how science fiction it feels like. So these are some of the APIs to play with when it comes to voice recognition, but the problem is that your voice is very unique and sometimes the systems that recognize very Californian, very trained voices, are not the right thing for you. So when it comes to APIs and AI APIs, a machine learning APIs, you want to get into personalization sooner or later and that's what we're going to talk about next. 9. Personalizing Your Machine Learning: In this video of the serials, we're going to talk about customization and this means like the things that you would expect from customization. Machine Learning and Artificial Intelligence seems pretty magical when it works. It does not do any good job if it doesn't work because it's actually frustrating. A lot of times there's a lot of jokes about like people with Scottish accents not being recognized by voice recognition and these kind of things. We want to make sure that this does not happen for yourself or actually for your end users as well. So, I like for example to sometimes just dictate to my computer and I taught my computers to recognize my voice, so I don't have to edit a lot of the text later on as well. You should do these kind of things with all your services as well because that way you make them unique to yourself and to a degree also more secure because other people will not be able to use the systems the way you do in the same manner. Customization is a very important part to make the solution useful for your end users on a very personal level. Much like you started talking to your Siri and it became better after a while or you started typing on your Android keyboard and after a few months it recognized the words that you keep using and gave you autocomplete for those, your end users deserve to have that kind of quality as well. So, the more data you could get in and the more customized the results can come back, the more quality interfaces you will build. You also got to make sure that it's fun to do those things and it doesn't feel like a chore. So, if three sentences are enough to get a first 60 percent quality of recognition, make it three sentences. Don't let people say like, "Okay, war and peace, and please read it in before you can use our system." Customization is for the end user and not against them. The thing to do here is to have the systems that are in place and see when you find that there is a mistake in them. Say for example, you have an image dataset that has images of bees and the recognition services of Microsoft, of Amazon, of Google, of IBM are okay with recognizing bees, but you're a bee expert. You really know how all the bees, what all different bees are. You want to teach to computer the same thing. You can do that yourself by writing an own neural network or your own deep-learning network and spend a few months learning that in university before you actually do it. Or you can use some of the APIs that allow you to customize these things. Most services allow for those, for extra payment and have customized systems. Some of them also allow you to let your data only be yours and only hosted on your machine so they don't go back into the main dataset. But if you allow them to go back into dataset, of course, this is much cheaper and much freer because the companies get to make a better model from your data than from everybody else as all of a sudden they can offer to other beekeepers, as well to know how to recognize different bees. When it comes to language understanding, context is incredibly important. You can have a normal speech to text API that just gives you a transcript of what you said but sometimes you want to make sure that different parts are recognized, like control words, or you want to make sure that the text is understood in a certain context. For this, you can use the language understanding, LUIS API that has been around for quite some time and has been successfully used by people for all kind of context. For example, we had as a demo we interviewed children about their favorite books and the content made no sense whatsoever. But once we told the system that these are the children books that the children have been talking about or that the context was children's books, all of a sudden the recognition went up from 40 percent to 80 percent as well. Other things that people have been doing with is recognize, for example, background noise. So, we had an airport in the Netherlands I think where people had a voice recognition that didn't work at all. So, what we did is we recorded about 16 hours of background noise off that airport that happened during a normal day, taught the system that this is also part of the audio and that way the recognition went up again a few percentages to make people aware that this is working. So, all the things that the computer did need to know, we had to train the computer first and that's the same with accent, that's the same with things that we're listening for, and control algorithms or control sentences that you want to have. There's a custom speech service as well that also allows you to give different vocabulary and background noise. This is another one we used with that airport for example or it actually understands different things, different special words that you've been using. So, the custom speech service allows you to train a system with a certain vocabulary and a certain background noise format and a certain language accent problem that you become better at recognizing the text this way. So, this is the custom speech service. When it comes to customizing these services, it's very important to understand that they are actually expensive because the computation power is happening in a new fashion, whereas the pre-trained models of recognizing celebrities or recognizing Eiffel Tower and these kinds of things and images that has already been done for you. But it's a very good way to get your perfect results in a certain subject matter and expert field into a much better quality than by just using the normal connectivity or the normal connection systems that you have in AI and ML systems out of the box. 10. Ethics of Machine Learning: In this video of the series, I want to talk about the power and the responsibilities that we have when it comes to Machine Learning and Artificial Intelligence. A lot of what we do here is a very personal thing and we're recording people, we're analyzing what they're doing, and we're making sure that what comes back is to their benefit to some degree. So, the ethics of AI is a big problem, I'm not going to solve it in the video and I'm not going to tell you what to do, because that's not how ethics work, and every large corporation that's working in Machine Learning and Artificial Intelligence has AI for good departments and very very intelligent people in psychological and ethical environments, talking what we can do with this and how we can do it wrong. So, a lot of it, what we want to think about here or what you as creative people want to think about, is how you can phrase these things. How can we build interfaces that allow people to reap the rewards of Machine Learning, but give their data voluntarily and in a fashion that doesn't feel like that they're being watched or they're being recorded without knowing it. Is a very problematic thing and to me the next step in user interaction, how do we make sure that people know that they're giving away their data for a service, but they know also where the data goes, and they're actually feeling confident that you are the right person to get that information? We're in the middle of a massive media fight about this with companies record your things, what they do with it, so you don't want to be the next company to get into that fight and to have that problem. Machine Learning and Deep Learning is there to find information with answering the right questions from users. If your questions are already biased, or the data is biased, that will actually exacerbate the problem. Your system will be biased as well. So, you've got to make sure when you build something that the team building it, and the data that's going in is as diverse as possible. That's a general thing to make any product better. Your end users are not you. They're not the people in the office that have that fast connection use that only one brand of computer, and actually knows what the system is about. If a system is supposed to be intelligent, then the system also needs to know about outliers. It needs not only to have the happy path of where you want the information to go, but you also got to be teaching it the error handling and the error problems that actually it should be aware of. This is how you avoid things like facial recognition not working on people of color. This is how you have to avoid problems like Asian people being seen as somebody who had their eyes closed. These are things that happened to large corporations. They were very, very obvious and very, very dangerous for them as well and a PR nightmare. You don't want to get into that space by making sure that your systems don't make assumptions. So, by not assuming that your end users are like you, you actually build systems that will allow non-biased Artificial Intelligence. It will never be 100 percent because humans are biased and we are there, but hopefully an analysis, a deep analysis of our data would show us our biases as well, and be an error handling that we shouldn't do anymore. One of the main things to understand about Machine Learning and Artificial Intelligence is as, that the results are as good as the questions you collate and the datasets you put in. So, the questions that you train your models on should be concise and simple. Don't expect the computer to be creative, don't expect the computer to be able to understand metaphors and make jumps and thinking like humans do. Computers are not good at that. So, by keeping your questions as simple as possible, you have to make sure that also your data sets coming back, will be inclusive for other users as well. So, it's nice to have a speech recognition for example, but somebody with a stutter, or somebody who cannot speak at all will not be able to use it. So, think about when using these cool systems and getting on into sci-fi mode and being excited about it, that humans have different needs and humans have different abilities as well. So, something as amazing as a voice recognition for a blind person is impossible for a deaf-mute person, and the other way around. So, we can use this as enhancements, but not the only way to access this kind of information. So, when it comes to getting consent from your end users, you want to make sure that you actually are on a legal path and you actually want to be on an ethical path as well. So, asking your users or telling your users upfront that you can get a better experience if you allow us to record this data, is one way of doing that. Yes, it's a different step in the interface, it's an extra button to press, but it makes sense for end users and to me, as somebody who cares much about privacy and security, it would make me trust you more if I get the right to say no, or if what I want to do with machine learning is an enhancement, and it always is an enhancement because Machine Learning is always a guess work. Machines don't do things right. They just guess that this is what the humans would want to do. In the end, there's always a human that should be able to say something, that it's wrong, or say something that it's right. So, you train it, you test it with real humans, and always have a way for your end users to say no, or to say that is wrong, or report it to somebody and be very very adamant about answering these things really really quickly, because these might be things you don't want to show when you interfaces, and when people report it, then there was probably a real problem with that. So, make sure that if you build human interfaces, you put a lot of human thought into it as well. 11. Machine Learning & Creativity: Whenever something is automated, people get worried about it. They wonder if their art, or if their craft, or what they do will be obsolete soon. Yes, Machine Learning and Artificial Intelligence will make a lot of jobs obsolete. It will make a lot of things obsolete that we take for granted right now is a normal income for a human being. Self-driving cars is one of those. Self-driving trucks, all the things that basically are dangerous for humans to do, because we get tired, we get bored when we do the same things over and over again, when it's not taxing us mentally. Then the question is like, what will happen to these people? I have a very positive spin on that. I think that with automating things, it should be all possible for those people to find the new creativity that they didn't have before. What we need to make sure though is that those people can afford being creative and not just be unemployed and be unhappy about this. This is where the revolution of Artificial Intelligence will actually have to be part of politics, and we'll have to be part of the social culture that we have as well. We just cannot have it as a one thing that only rich people with the newest smartphones can use, it is part of everybody's life already, so we have to democratize it to a degree as well and all the systems that we have out there. Now, when it comes to creativity, this is always where people against Artificial Intelligence always say, "That's where the limits of computers are." They're totally right, and I'm totally okay with that. I'm totally okay for a computer not to be creative because I don't want them to be creative. Creativity can be dangerous, creativity can be beautiful, but also can be hard. So, I don't want computers to be all that control. For example, how much tax I have to pay, or if the electricity in my house should be this high, or if the oxygen level in my house should be dissolved in the future when we're going to be living in space. So, whenever people want to show off their creative systems, or the AI systems, how strong they are, they show how far computers have come to be close to humans. That's what when Deep Blue was actually beating Kasparov in chess or some other chess master it was. When Google's computing systems played goal better than a human person. When we started having like the computer-generated music by analyzing all the music of the Beatles and finding out which are the things that people liked the most. All of a sudden, we showed these things that always look a bit creative, but some are few weird as well. Of course, there's going to be creativity there, but I think AI and ML is there to automate things. So, the things that we are bored with, that we don't want to do should be done by those. There is no creativity in computers yet. We don't have any machine that thinks. And once we have a machine that thinks, that might be very dangerous for humans, because a computer is trying to protect us, and their main job being to protect us will probably be very annoyed with us because we doing stupid things all the time to endanger ourselves. So, the creativity that you encounter right now, or that you feel is your best weapon against being automated and against having Machine Learning, Artificial Intelligence as your enemy. I look forward to my job not being necessary. I look forward to most of my coding being optimized by a machine and algorithm picking up the algorithms. That's totally fine too. I really was excited when computing started, that I have to talk to a computer in a certain fashion. But I'm as excited nowadays that I can talk to my computer, that I can look into a camera to unlock my computer, and I can be a human, and that I want to do human things, and that I have time to do human things because the computers are intelligent enough. So, when it comes to creativity of computers, a lot of it is like good showcases to show the power of AI systems. It's not necessarily real creativity. What we should be more worried about is actually the end users, the consumers of our creative output demanding real creative things. If you consider pop music over the last few years or even the whole boy bands of the 90s, this was algorithmic music. This was predictable and defined in a certain time. I saw contract of boy bands that they had to actually disband after a few years because the marketing model around that band was done. This is something that we've been doing to creativity and to media for the last few years. So, now it's time for us creative people, or you creative people to fight back and make sure that you can't be automated by doing something so creative that a computer is just confused about. 12. Final Thoughts: I thank you very much for following this course, and I hope I inspired you to play with a few things that beforehand you thought was beyond your grasp or reach. It was definitely beyond my grasp and reach and still is, and I've got a lot of friends who are much better at it that are happy to answer your questions that you might have as well. I'm really looking forward to see what you can do with the inspiration that you got here. If you weren't inspired, please tell me what I can do better next time as well. So, I thank you very much and make sure that your computers will work for you and not you for your computers.

Demystifying Artificial Intelligence: Understanding Machine Learning

Christian Heilmann, Principal Program Manager at Microsoft

Watch this class and thousands more

Watch this class and thousands more

Lessons in This Class

1.

Introduction

1:44

2.

What is Machine Learning

5:25

3.

How We Teach Machines

5:48

4.

Machine Learning to Help Humans

5:28

5.

Tools for Machine Learning

3:44

6.

Visual Uses

7:54

7.

Speaking Human

6:07

8.

Audio & Video

6:32

9.

Personalizing Your Machine Learning

5:08

10.

Ethics of Machine Learning

5:32

11.

Machine Learning & Creativity

4:33

12.

Final Thoughts

0:32