Artificial Intelligence for Beginners: How ChatGPT Works | Alvin Wan | Skillshare

Playback Speed

  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Artificial Intelligence for Beginners: How ChatGPT Works

teacher avatar Alvin Wan, Research Scientist

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

    • 1.



    • 2.

      Why understand ChatGPT


    • 3.

      How to “compute” words?


    • 4.

      What is a "transformer"?


    • 5.



  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class

What is "ChatGPT"? How does it work? How is this related to all the other buzzwords? "transformers", "large language models", "autoregressive decoding"... Let's break all this down.

This class is a "how it works" course that shows you how ChatGPT works from the inside out. In particular, we cover the general technology -- more broadly called "Large Language Models". We’ll cover many topics and takeaways:

  • What Large Language Models are, and how they relate to ChatGPT
  • How neural networks process and generate text
  • Concepts for processing language like word2vec
  • How a transformer processes and generates text of any length
  • Critical concepts in transformers, such as autoregressive decoding

This class does not require any prior knowledge but does assume you've taken my Artificial Intelligence for Beginners course. Regardless of your background, you’ll walk away with the fundamentals for discussing and learning more about Large Language Models.

Interested in more machine learning? Try my Computer Vision 101 (Applied ML) classes.

Interested in learning how to code? Check out my Coding 101 (Python), OOP 101 (Python), or VR101 (HTML) class.

Interested in data science? Check out my SQL 101 (Database Design) or Data 101 (Analytics) class.

Meet Your Teacher

Teacher Profile Image

Alvin Wan

Research Scientist

Top Teacher

Hi, I'm Alvin. I was formerly a computer science lecturer at UC Berkeley, where I served on various course staffs for 5 years. I'm now a research scientist at a large tech company, working on cutting edge AI. I've got courses to get you started -- not just to teach the basics, but also to get you excited to learn more. For more, see my Guide to Coding or YouTube.

Welcoming Guest Teacher Derek! I was formerly an instructor for the largest computer science course at UC Berkeley, where I taught for several years and won the Distinguished GSI (graduate student instructor) award. I am now a software engineer working on experimentation platforms at a large tech company. 4.45 / 5.00 average rating (943 reviews) at UC Berkeley. For more, see my Skillshare or Webs... See full profile

Level: Intermediate

Class Ratings

Expectations Met?
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.


1. Introduction: At some point you may have heard about GPT, maybe ChatGPT, GPT-4, you may have heard Microsoft call it a spark of artificial general intelligence. There's a lot to digest. Let me help you do just that. Hi,I'm [inaudible], a research scientist at a large company. I've been conducting research in AI for six years, previously at materiality labs and Tesla autopilot, receiving my PhD in AI at UC Berkeley. In particular, I studied how to make neural networks run really fast. In this course, I hope to use my background to break down and cutting-edge technology into digestible, intuitive concepts. Throughout, I'll use an illustration first approach to conveying intuition. No walls of texts, super complex diagrams or even a smidgen of math. My goal is to help you build the foundations for understanding chat GPT and its related technologies. The material in this course assumes you've taken my AI free beginners course, linked in description. Beyond that, no technical background is required. Whether you're an engineer, a designer, or anyone else curious to learn, this course is made for you. Let's get started. 2. Why understand ChatGPT: Here's why you should understand ChatGPT. We'll start with the benefits. The biggest benefit to taking this course is to understand discussions around the topic. There are tons of random related terms all across the web, transformers, large language models, ChatGPT, GPT this, GPT that. Our goal is to know how these different terms are all related. We'll cut to the marketing clutter and jump straight to technical understanding, the jargon doesn't need to be intimidating. With this knowledge, you can then understand the latest innovations in the area. What does it mean to make a transformer faster? What is the transformer even? To summarize, here are two benefits to understanding how ChatGPT works. Knowing the terminology to hold discussions on the topic and knowing how to read and understand the news when reading and learning more about the topic. To be clear, we won't exhaustively cover all terms or all topics. However, this course gives you a foundation for learning more, basically we'll cover the intuition and the big ideas behind the technology. A big part of this is knowing the terminology, so that will be our focus. Let's now jump straight into the content. Here's an overview of ChatGPT and related concepts. First, ChatGPT is a product. It's specifically the name of Open AI's product. Second, large language models or the technology more broadly, a technology that can take in texts input and generate high-quality natural sounding text as output. This is just like Kleenex and tissues. Kleenex is a specific brand, tissues are the generic product name. In this case, ChatGPT is a specific trademarked brand. Large language models are the general technology. As a result, these large language models or LLMs for short, are the focus of our course. Finally, transformers are the building blocks of large language models. We'll be focusing on these building blocks moving forward. Said broadly, our goal is to dissect the intuition behind large language models. I'm going to present a simplified version of these models that conveys the key ideas for why they work. No need for big complicated diagrams or unnecessary math equations. We'll have fairly straightforward diagrams that stick to the main points. Here's a brief introduction to large language models. In our AI masterclass, we discussed compartmentalizing our ML knowledge into four categories: data, model, objective, and algorithm. Data describes the inputs and outputs, what we learn from and what we predict. Model describes how to make predictions. Objective describes the goal, what the model is optimizing for. Finally, algorithm describes how the model learns. We haven't discussed algorithm much and we will again skip it this time around. For large language models, the input is text. This can be from websites, books, online forums or more. The model transforms this text using the input text to generate output text. The specific objective for a model is to predict the next word given the previous words. We'll break down this process later on in this course and like before, we'll skip over the algorithm. To recap, we've discussed the benefits of understanding ChatGPT in the context of fast-moving news and discussions. We've also briefly introduced ChatGPT, the product versus the broader class of technology, large language models. That's it for the overview. For a copy of these slides and more resources, make sure to check out the course website. This is it for why you should understand ChatGPT. Let's hop right into the technology so you're well equipped to understand the barrage of information out there. 3. How to “compute” words?: In this lesson, we'll discuss how to "compute" words. Here's an example of what I mean. Take the following equation. We have king - man + woman. What do you expect this equals? Pause the video here if you want to take a second to think. One reasonable output would be queen. This is what we'd expect if we could apply math to words. However, we can't really do this. There's no such thing as adding words together, so we need to convert words to numbers which we can add and subtract. Let's use a specific example. In this case, we want to translate from French into English. We have French in purple on the left which translates into I love you in green on the right. To simplify this example, we'll focus on just translating one word at a time first. In this case, we focus on translating Je into I. To do this, we need to convert Je into numbers. Fortunately for us, there already exists a dictionary of sorts that maps words into numbers. That mapping is called the word2vec. Word2vec defines a mapping from words to vectors. Vectors are simply collections of numbers. From now on we'll refer to a collection of numbers as vectors. Here's an example, here we have the French word Je. Word2vec maps this word to the vector.8.1.1, which is illustrated below with some boxes on purple. We can also have the English word I. Word2vec maps this to 0.1.9. Finally, we have You. Word2vec maps this to.1 0.9. Note that this mapping is just an example. In reality, word2vec uses 300 dimensional vectors, meaning each word corresponds to 300 numbers, way too many to show here. Let's now use these vectors in our translation example. Now, we translate Je into the corresponding word2vec vector on the left in purple. Some mysterious computation and in the middle is performed, then we get another vector in green. This vector in green then corresponds to the English word I. Now, let's discuss what goes in that box with a question mark. How are these vectors transformed? Here's how. That box's goal is to run meaningful computation. Here's what I mean. This was the example we had previously, were king minus man plus woman equals queen. Here's how we can actually add and subtract numbers or add and subtract words even. Start with the words on the left-hand side. We have king, man and woman translate each word into its corresponding word2vec vector. This gives us three vectors. Then starting with the king vector on the left, subtract the man vector, add the woman vector. Doing this gives us a new vector on the right and that resulting vector happens to correspond to queen, so this is what we really mean when we "perform math on words." In reality, we're performing math on the vectors these words correspond to. So now we can abbreviate the entire process by just writing this equation. This equation by the way is a real result. If you looked up the word2vec mappings, you would actually be able to compute this equation. Addition and subtraction in this vector space has real meaning. So knowing that, we can now fill in our mystery box. Given the input to translate into English, we subtract the French vector and add the English vector. More generally to accommodate any task we can represent any addition, multiplication, subtraction, etc. This small graph in the center represents something called a multilayer perceptron. We won't dive into this much. Just think of this small graph in the center of our figure as any set of adds, multiplies, subtracts and more. This allows us to represent any word to word translation task with this architecture. Now, notice that our pipeline ends with a vector. We need to convert that vector back into a word. So for our last goal here, we want to convert numbers back into words. Here's our pipeline from before. We ended up with some vector. So convert back into a word, we'll find the closest vector that maps to a word. In this case, our closest vector maps to the word I and with that we've finished our pipeline. Let's now recap what we did from start to finish. First, we converted words into vectors, then we transform those vectors. Finally, we transformed vectors back into words. Here is our final diagram. On the left we converted the word Je into a vector, then we perform some computation in the middle with that vector. This outputted another vector in green and we then looked up the closest vector that corresponded to a word. That finally led us to the word I, and that completes our pipeline, translating from one French word into one English word. Pause here, if you'd like to copy down this figure or take a moment to digest or recap. So we've converted one word into another word, however we ultimately want to convert many input words into many output words as we show here in our example, that will be our next lesson. For a copy of these slides and more resources, make sure to check out the course website. That's it for running computation on words. Now you understand the basics of how large language models run computation on inputs to produce outputs. In the next lesson, we'll learn how large language models take in multiple input words and generate multiple output words. 4. What is a "transformer"?: In this lesson, we'll cover what a transformer is. The transformer allows us to take in multiple input words and generate multiple output words. This is what we've explained so far. We've converted one French word into one English word. Now, we want to convert multiple French words into multiple English words. To do this, we'll modify our goal. Instead of translating from one word to another, we'll translate from the previous word and predict the next word. To do this, we'll change this diagram into this one. Now, our model takes in the French phrase and the previous word shown below in italics. With both of these inputs, the model predicts the next word, shown in italics on the right. The purple text, in this case, the French, is what we call the prompt to distinguish between the two types of inputs. Let's now run this diagram on our inputs. To generate the first word, we pass in the prompt and a magical start word. We've denoted the start of sequence word as a start quotation mark here, for simplicity, in reality, the start token is some unreadable tag. That magical start token, along with the prompt, then produces the first word I. To generate the next word, we again use the prompt in purple on top. On the bottom, we now feed in both previous words, the start-up sequence represented by a quote and the previous word I. Now we predict next word, love. We do this again. We feed in the prompt on top. On the bottom we feed in all previous words, the start of sequence quote and I and love. All of these inputs produce you. One last time. We feed in the prompt on top. On the bottom, we feed in all the previous words, the start of sequence then I, then love, then you. All of these inputs produce one output word, end of sequence. This end of sequence is denoted as an end quote in our diagram and that's it. Once we see the end of sequence word, we are done. We have now generated a sequence of multiple output words. We call this process autoregressive decoding. Autoregressive decoding predicts the next word one at a time until we reach the end of sequence word. Here's an overview of the entire process. This was the generic version of our diagram from before. On top, we have our prompt in purple. Below we have all previous words. These inputs then pass through the mystery box to produce the next word. Now, we fill in the mystery box. We'll fill in this box using the same process we did before. First convert all words into numbers. We feed in every prompt, every word in our prompt one-by-one. We also feed in the start-up sequence word, which in this case is again denoted by the start quote. All of these inputs are first converted into vectors. Somehow our mystery box then outputs a vector which corresponds to the next word I. Next, we need some way to incorporate "context." Effectively, our previous word, which is the start quote, needs contexts from the prompt to be translated into the correct first word. Here, I'm using the term context very vaguely. Somehow automagically, we need to incorporate information from the purple prompt into the green vector representing the previous word. In this case, we'll incorporate context by simply taking a weighted sum of all the vectors. This produces just one final vector which we feed into the mystery box. Next, we add computation, just like we did in the previous lesson. We replaced that mystery box with any number of adds, multiplies, subtracts, etc. This is represented by a small graph in the center of our figure. Like before, this graph formerly represents a multi-layer perceptron. But we won't need to know the details of the perceptron to understand what's going on. We've now successfully converted our prompt and the start of sequence into the first word I. Do the same thing we did before. Predict the next word one at a time from all the previous tokens. This is the exact same process. Next, we take the prompt, the start of sequence, and the previous word I, taken altogether, this produces the next word, love. We continue this process iteratively. Finally, we get the end of sequence word as output and we stop here. This now completes our pipeline. We can now take in multiple words as input and predict multiple words as output. We've added two new concepts in this lesson. We predict the next word one at a time until we reach the end of sequence word. We also add context by incorporating the prompt into the previous words. We added contexts in this case by simply using a weighted sum. This was our final diagram from before. On the far left, we convert the purple prompt into vectors. We also convert the previous words in green into vectors. We then incorporate context from the prompt into the previous words by taking a weighted sum. This weighted sum is then fed into a multi-layer perceptron to perform computation. That computation produces a vector, and like before, we find the nearest vector that corresponds to a word, which in this case is the end of sequence word. This now completes our pipeline for converting multiple input words into multiple output words. There is one detail we've left out, which is how this weighted sum is computed. So that you have another term in your pocket, this weighted sum, which adds context, is more formally called self-attention. Stay tuned as I plan to either add a new lesson to this course or to release a new mini-course describing how this works. For now, you understand the entire multi-word pipeline. For copy of these slides and more resources, make sure to check out the course website. That concludes our introduction to the transformer. You now understand the intuition for how a transformer works and how to generally produce output texts from an input text. Note that we've left out a number of details in the architecture, but this is a minimal representation of the key ideas and a good starting point for learning more about large language models. 5. Conclusion: Congratulations on making it to the end of the course. You've now covered the fundamentals of large language models, effectively how ChatGPT works. We've discussed a number of different terms, Word2vec which maps words into a vector space. Where addition, subtraction, etc, are meaningful. Autoregressive decoding, which is how transformers produce multiple output words by generating one word at a time from all the previous words. Transformers, the building blocks for large language models and large language models themselves, the general technology versus the specific brand and product ChatGPT. You now have the tools to understand conversations about the field and a high-level intuition for how large language models work. Remember that there are more resources and a copy of the slides on the course website. If you have any questions, feel free to leave them in discussion section. I'll also leave links with more information in the course description. If you'd like to learn more about AI, data science or programming, make sure to check out my courses on my Skillshare profile. Congratulations once more on making it to the end of the course and until next time.