GPT-3 for Chatbots: Building Conversational AI with Fine-tuning | Mariam Omar | Skillshare

Playback Speed


1.0x


  • 0.5x
  • 0.75x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 1.75x
  • 2x

GPT-3 for Chatbots: Building Conversational AI with Fine-tuning

teacher avatar Mariam Omar, Skilled in AI, Chatbots, Robotics

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

    • 1.

      Introduction to the finetuning course

      1:11

    • 2.

      Understanding GPT Models and Fine-Tuning for Specific Domains

      1:59

    • 3.

      Mastering Fine-tuning for NLP Applications

      1:55

    • 4.

      Introduction to OpenAI Playground

      5:09

    • 5.

      Maximizing Model Accuracy with Data Preparation and Formatting

      2:09

    • 6.

      Fine-tuning a GPT Model: Understanding Data Formats

      9:46

    • 7.

      Data Cleaning with Python: Removing Missing Values and Duplicates

      9:40

    • 8.

      Generating Questions and Answers from Text for Fine-tuning AI Models

      7:27

    • 9.

      Using ChatGPT to generate Python code for data manipulation

      5:05

    • 10.

      Running and Executing Python Code on Google Colaboratory

      8:11

    • 11.

      Building a Well-Structured Project Directory

      3:37

    • 12.

      How to Choose a Pre-Trained Model for Fine-Tuning

      3:13

    • 13.

      Introduction to Fine Tuning Process in Python

      3:09

    • 14.

      Fine-tuning a Pre-trained Model: A Three-Stage Process

      11:49

    • 15.

      Testing Your Fine-Tuned Model on OpenAI Playground

      3:54

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

133

Students

1

Projects

About This Class

In this course, you will discover the power of GPT-3 in creating conversational AI solutions.

We will start with an introduction to chatbots and their use cases, and then dive deep into GPT-3 and its capabilities. You will learn how to fine-tune the model for specific tasks, such as customer service, lead generation, or entertainment. We will cover techniques for improving the accuracy and fluency of the chatbot's responses, as well as strategies for handling user input and managing conversation flow.

Next, we will explore different ways to integrate GPT-3 chatbots with various platforms and channels, such as messaging apps, voice assistants, and social media. You will learn how to use APIs and SDKs to connect your chatbot to these platforms and leverage their features, such as natural language processing, voice recognition, or rich media support. We will also cover best practices for designing chatbot user interfaces and testing and deploying your chatbot in production.

By the end of this course, you will have a solid understanding of how GPT-3 works and how to use it to build powerful and engaging chatbots for your business or personal projects. You will have hands-on experience with fine-tuning GPT-3 models and integrating them with various platforms and channels, and you will be ready to apply these skills in real-world scenarios.

you can find all the course resources HERE

Meet Your Teacher

Teacher Profile Image

Mariam Omar

Skilled in AI, Chatbots, Robotics

Teacher

Hello, I'm Mariam Omar, and I'm thrilled to be sharing my passion for AI, Chatbots, and Robotics with you on Skillshare. I've spent the last 8 years working in this exciting and dynamic field, and I've had the opportunity to work on a wide range of projects, from designing autonomous drones to creating virtual assistants for healthcare.

One of my favorite things about AI, Chatbots, and Robotics is that they have the power to transform the way we live and work. With the rapid advances we're seeing in these technologies, there's never been a more exciting time to be involved in this field.

As an instructor on Skillshare, I'm committed to providing you with the knowledge and skills you need to succeed in this rapidly evolving industry.

My courses will be designed ... See full profile

Level: Intermediate

Class Ratings

Expectations Met?
    Exceeded!
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction to the finetuning course: We all know that touch of beauty is an impressive, powerful, artificial intelligent chatbot. But have you ever wondered if you can use juncture APTs, impressive abilities to build your own project that can help you in your Rick based on your data, such as building a chocolate that helps with your customer service department. If you ever did, then you are in the right place. I am money, I'm homeless. And artificial intelligence specialist. In this course, I will guide you to understand what fine-tuning is to do the whole process until you can create your own types. The first module, we will learn what fine-tuning is, why is it important? And what types of projects do people use fine-tuning. In this second module, we will be working on data preparation. How to prepare our data in a format that actually can be given to a pre-trained model and work on it to fine tune our own model in our specific domain. In the third model, we will configure the fine-tuning process and we will reach to a level of which we can test and evaluate the performance of our model. By the end of this course, you will have gained valuable skills about fine tuning and you will be ready to implement your own project. So let's get started. 2. Understanding GPT Models and Fine-Tuning for Specific Domains: Welcome to the first lesson in this course. In our lesson today we will speak about GPT models. What are they and what are their limitations? And why would we think of fine tuning our own models, which will leave us with an understanding of fine tuning. What does it mean and where does it actually kept? Gpt-3 models is a large, powerful families of language models. They all can deal with text, prompts, questions, any information according to what they were trained for. They apply the natural language processing, which is a subfield of artificial intelligence that is about computers. How do they deal, interpret, and generate human language? These pre-trained models previously trained by open here, they learned how to deal with human language, which makes them an excellent start for many applications. But we need to adapt and to meet our own. Although there's very powerful and impressive, but they have limitations. And those limitations are the things that will make us think about fine tuning our own. One of the limitations is that they have an upper bound on quality and how a person needs to do lots of prompt engineering in order to receive a quick response. Another limitation is that there is a maximum size for the front. So you have to work on the prompt and a good way. And you can't provide the model with much examples in order to receive a good response. Other limitations that we can speak about is the coast and literacy. We know that GPT charges for tokenization. So it's not a smart choice to think about depending on them forever and keeping for it. So this takes us to start thinking of fine tuning. And why would we fine tune our own models and train them to work and our specific domains. But what is fine tuning? Fine tuning is a technique to use pre-trained models, adapt them to do our order tasks. But first, we need to depend on the auto by removing the final layer, feeding the motor with our own prepared datasets, which will actually create a model that knows exactly what outputs should it give. 3. Mastering Fine-tuning for NLP Applications: Fine-tuning is a powerful technique for adapting pre-trained models to specific NLP applications. In this lesson, we'll cover the key concepts, the steps you need to follow in order to be successfully ready to find tune them before deciding to fine tune your model. And you need to understand the problem you're working on. What is the task in hand and the requirements of this task, such as the speed, the accuracy. You also need to know the characteristics of the dataset that you're using, such as the size, the domain, and what type of data are you using? By answering these questions, you are totally ready to determine if fine-tuning is the solution to your problem or not. Once you have decided to find tune a model, you need to choose the pre-trained model you want to depend on. There are many types of pre-trained models, each has strengths and reconsider. Some of them were trained on data or text generation. Others great for coding. So you have to choose the model that was trained on data that is similar to the data you're using. Or in this specific domain. This will make fine tuning or effective. Oxford you choose the pre-trained model you have to start breaking on preparing your data. Preparing the data includes changing the format of the data to a format that the pre-trained model can understand, such as tokenization and encoding. If you have your data ready, this is the key for you to start doing the fine tuning. And that means to continue training the pre-trained model on the dataset that you prepared. Once you have fine tune your model, you have to evaluate the performance of the model. You have to try the model, give it input, see what outputs does it give you. This will help you to do some error analysis to know where your models struggling and how to improve its performance. So after understanding your problem, preparing your data, choosing the pre-trained model, tuning the model and evaluating it. You will be in your way to mastering the fine tuning for NLP applications. 4. Introduction to OpenAI Playground: Hello and welcome to a new lesson. In our lesson today we are going to be previewing Open AI's. Click it on. Playground is an online, easy to interface that allows users to experiment and test APIs capabilities of the chat models they created. The thing is that whenever we want to use the data down, you don't have to have previous knowledge in programming. Any user or any person who doesn't know anything about programming can actually use the Play-Doh. Because the playground is an interface that allows you to test the already pre-trained models done by OpenAI. Or you can actually use it to test your own fine-tuned models. So in order to check what is this playground, Let's go. We're gonna go to a website called OpenAI playground. Once you type opening the grounds, it actually is going to take you to open a nice page where you can generate your API keys. Here you can present your API keys or you can actually create a new secret if the icky, remember these API keys are the ones that we use to fine tune our own models, to build our own chatbots and so on. And whenever you generate a new one, you have to copy it and keep a copy of it because it is a secret key that doesn't appear to you anymore over here. So this is the page where we can actually create an API keys and take them. But in this page we want to go to the playground at the top part. Once I click on the playground at the top, this is the playground, the one that we have to preview its features today. First of all, the Brigham is a testing tool for models for so on the right side over here, you get to choose from this drop-down list, which model are you using? Here I have the different Open AI's models, Da Vinci query, Babbage and Adele, and they have different specifications. We will be talking about them later. I can get which one of them to test. Once you choose the model, you have this box over here. This is the input and output books. It is the place where you get to ask a question such as your prompt, such as wiped me cars names, and sending the prompt. And in the same place you get to receive the risk bonds coming back from Copenhagen. So this is an input and output box where your prompt appears on the response to that prompt appears as the thing is, there are some other settings that you can work on in order to control this bond. E.g. let's talk about the temperature parameter. The temperature is a parameter that varies 0-1. If you set it to zero, means you don't want the model to be creative by responding back to you. If you put it to one, it means that you want the model to be much creative and to give you various answers every time I try to give creative answers as well. If you want to talk about the maximum length parameter, we're actually talking about the tokenization. And this is a number that can actually be the specific amount that you want the model to respond back to you. It's not just about the response, It's also including the prompt. So let's change this parameter to true. E.g. refresh. Send you a prompt again. And the answer you'll receive is very short because this altogether is around or about two tokens. But give it more, let's say tan and you submit and you get a longer answer because this answer over here with the prompt is around ten tokens get the stop sequence is a variable that allows us to make our foreign tuned model understand when to stop responding to any prompt. This is a character that we will decide or we will use whenever we're testing our fine tuned modals. So these are the settings I can change over here and there are some other features as well on the playground and I can break e.g. any prompt generated can be saved. It's also sharable, so this is good. You can share it with people. You can copy it, take a copy of it and keep it wherever you want, e.g. on your documents and so on. Another thing that I want to talk about is the bar above over here. You'll have different tabs over here. And these are very helpful because you can find lots of things about documentation. And it opened his eyes blog and other examples of prompts that you can use to test GP T-Mobile's. If you go to the documentation section, we will leave the playground, this prompt going to go away. But you will find lots of descriptive folders, files about whatever you may need more other tutorials on the tokenization pricing about fine-tuning and any other document that you may need that can help heal dealing with Open AI's to be team. So that's it for the playground. This is our playground. Again, it is an easy interface that allows users or developers to test the capabilities of GPT models. 5. Maximizing Model Accuracy with Data Preparation and Formatting: Hello and welcome to another lesson. In our lesson today we are going to be speaking about the benefits of preparing data and formatting it before using it to fine tune a model. So first of all, we have to know that preparing data is crucial for fine tuning. Why? Because it affects the accuracy and the efficiency of the model. So first, formatted data is used in a good way. So this makes the model can actually generalize patterns in a good way, which means our data has to be labeled encoded unstructured before given to the pre-trained model, as well as well-formatted data is actually a good starting point because it reduces the errors and it makes sure that the model does not learn biased information. This well-formatted dataset also reduces the time and effort for processing and cleaning data. Data actually comes in different texts or formats. Using different formats means that you have to ensure that this data is prepared in a way where it is suitable to be fed to the pre-trained. So the data comes in different formats. And different formats can be used for machine learning or natural language processing projects. But first, according to the data you have, you have to know how many steps coming ahead to format it. If you have a row text, you need to know that for this row texts cannot be given to the model, but yet there are some preparation tips that we can do to prepare it to be given to a pre-trained model. Another type of data is CSV files, which is known for spreadsheets. This also has to be prepared in a way where the model can understand it. The last format that I'm going to speak about is the tastes and plot. And this is the Final Four lot that we will break and prepare on any type of data in order to be ready to give the data directly to be used for fine tuning a pre-trained model. So in this lesson, we spoke about the benefits of formatting and preparing data before fine tuning a model, we also spoke about the types of data formats that can be used to fine tune a model. And in the upcoming list of lessons, we'll be talking about techniques and ideas of power to prepare data in the best way. 6. Fine-tuning a GPT Model: Understanding Data Formats: Hello and welcome to a new essence. In our lesson today we're going to be speaking about the format of the data that we will use for fine tuning it TP, T mode. First of all, we will be talking about preparing our different data. We will be previewing different datasets earned. We will be discussing what strengths and weaknesses do we have them? Then? We also are going to be mentioning tips about how to have a good data that is ready for fine tuning a model. And finally, there is something that I want to explain by the end of this lesson is adding suffixes to data. I'm gonna go to my Google Drive. I have here a set of or a group of datasets that I prepared for you. I want to preview with our look at them and see what are the strengths and weaknesses in such dataset. E.g. I. Have this first file is called Arduino. Arduino is a specific component, a microcontroller. So let's imagine or making a chart mode. This chatbot is to answer any question about Arduino. So the thing is, I need to have a look at my data. First of all, as you can see, I have a group of questions over here. On the left. I have answers to these questions. And these are different questions. What is Arduino what type of license does the Arduino hardware product? Hughes and soma, want to show you that it's sometimes a student with a different way, like what is the Arduino projects? So what is Arduino? What is the Arduino project? These two answers, they may have the same answer or a close answer that is acceptable. But if you look here, when did the Arduino Project began? The Arduino Project began 2005. This is a waste of time for the model to read each of the data. So I have a duplicate here. So first of all, I should actually remove this row. So you have to preview your data and you have to make sure that you have as single question of which there is a related answer, a single answer. Sometimes I may use the same answer, but for sure I have to feed my modal with different questions structures. So using two sentences with the same meaning, but they're written in a different way, is acceptable. But choosing the same or giving the same answer to two different prompts is not accepted. Okay, Let's preview another dataset. I'm gonna go to this one. This one is about earthquakes. It's also a group of questions about earthquakes. This dataset can be used in e.g. if you are a teacher or an instructor and you're building a tool to help your students in order to ask any question about this unit or something. So I have various group of questions on each of which has a different answer. And as you can see, it is acceptable to have a very long answer related to one question that is okay. But again, as you can see, I have here e.g. only 25 values and this is the next step I'm going to speak about. We should not train or fine tune a model according to only 25 different inputs. And results are prompts and completions. We should actually give the model at least around 200 different prompts and completions in order to make sure that the final stage or the fine tuning stages actually happening in a good way. Good. So this is a good dataset yet it doesn't have much information. Let's preview another one. I'm going to go with this one, the mental disorder data. This is also an initiative of questions and answers, and it is actually created in a smart way that I'm gonna be sharing with you in the upcoming classes. First of all, what is the mental disorder? And this is the answer over here. So this is a set of questions with a set of officers. Okay. That is good. But if I go down, I have empty rows and I shouldn't actually have integrals in my dataset, so these integrals should be deleted. Another thing is, what are the causes of mental disorder? This is a question. And what causes mental disorder? As I said, these two questions are written in a different way, but they both give the same answer. That's okay because they're actually taking the same answer. Good, because I have different questions with this e-mails or this can help the modal understand and link between that. Sometimes the question can be asked in various ways. Good. Other things I want to pay attention to is that if you can see here, these are sets of questions that are actually repeated. I have a set of questions over here. And I may have this same questions but with different ways of typing. These actually have to be read in a proper way because these were prepared with an online tool that generates questions and answers. So such data lakes go down. If I scroll down, I can see that I have around 138 rows removing the empty row. So 150, I can say that it says acceptable. So such dataset is acceptable to be used for fine tuning and model. Now let's preview the final one I've prepared for you, which is the one over here. This is different. Why? Because it's not in the shape of questions and answers. It's actually built as prompts, topics, different topics. And here on the right side of tweets of the person, of a specific person. Now, such data can be used, e.g. to mimic someone's way of responding to people. If you can pay attention the importance of staying hydrated during exercise. And on the right side I have his comment or his tweet. Just finished a long run and being awesome, Remember to stay hydrated while waking up. Good tips of improving your sleep habits. I also have here, struggling to fall asleep, try reading a book before, and so on. So these are different tweets in different topics. And these such data can be used to build a chat bot that can actually speak like natural people in a friendly way. You can see here it says, Guys, I highly recommended and he is giving some exclamatory sentences. Sometimes he starts my asking you a question and then giving his opinion about it. So good. Now, as you guys saw in the previous datasets, there were, there was something that I didn't really pay attention to or talk about. Looking here at each sentence in my dataset over here, I said we have a prompt. This is the prompt or the question that I expect my user, the one who's going to use my application to type into the model. And this is the answer that I expect my Moodle to give back to them. As you can see here, at the end of each prompt, we included a backslash end up flushing, and this is a suffix. Suffixes are supposed to be added to my data before giving my data to the model. This will make the pre-trained model understand that this is where the sentence or the prompt and this is where completion or mine respond to the sentence start. And also the completion should also include a suffix as well. Two different suffixes. Suffixes can be a backslash n, It's a can also be a word such as writing the word end, e.g. in the capital letter or anything, it has to be repeated in each grant, as well as I have to make sure that this specifics of eggs is not included in my dataset anywhere around the data. But now, going back to this, the tweets data, let's think about how do I add the suffixes and prefixes in one-click. I can go with the following. I can take this, paste it on the second column. Getting back to the first column, this is the first prompt that I have. I can go with this equals to. So we're applying somehow a function, this cell. And I want you to add to it between double quotation because I'm adding some strings, e.g. backslash. And I can close and send me whatever capital or small n. It's according to what is the suffix that we want to use, then you can send this. And as you can see, it's actually suggesting to auto-fill the rest of them. Do you want from me to also add this backslash into every so over here, I can go with yes, no, by actually created as set of bronze with the suffix that I need. So I'm going to copy this law in. I'm going to go back to the first one, but I'm going to paste as values only. Then I can remove this column from here. So now I have my dataset prepared with the same suffix added to each one of them, earn. I can also do this scene to this dataset over here. That's it for today. So to wrap up, we just previewed a group of datasets. We discussed how do I use these datasets to different applications using GPT models or fine tuning. We saw that we should remove duplicates in curios, we should add suffixes. So this is what we said or we spoke about in the previous lessons. We said that whenever I want to use data to fine tune, that data has to be labeled structured and encoded. And this is the final shape of the data that I'm going to use for mine point tonight. But the thing is, before I use it as an Excel file, I have to change its former two adjacent pipe. And that is the thing that we're going to explain on the upcoming lessons. So stay tuned and thank you for watching. 7. Data Cleaning with Python: Removing Missing Values and Duplicates: Hello and welcome to another lesson. In our lesson today we're gonna be working on our data. We want to make sure we have clean data to be used for fine tuning. The thing is today is one of the techniques that you can use Python code for data preparation. So what we're working on is we have a CSV file, which is a dataset, and we're going to read this slide. Then we want to remove any missing values or any duplicates because in fine tuning, duplicates are actually bought. Then we want to save the new dataset into a new file. So this is our plan for today's lesson. So first I'm gonna go to a new web browser, and I'm going to go to something called Google Colab. Google Colab is an online tool that allows you to run and execute your Python code easily. So this is where we're gonna be working. This is the main page that bulk bins whenever you go to Google Colab, you have to make sure you're logged into your own e-mail, the one that is linked to your drive, because this is where we're going to keep our detailed and use it for the drive earned when we wanna do is press to fall and start a new York, we have to make sure that the data you want to record is on your drive. So I'm gonna go to my drive, my Google Drive, and make sure that I prepared my datasets. So going to the drive takes me to show you that I created a file that is called dataset. Inside this line, I pasted or I uploaded my data. And if I want to preview the data that I have, you can see that I have a dataset as an Excel sheet, a CSV file that has some information about computers and what is their speed or other information that I need, such as the prize and wherever I can find them, the country, the city, and some other information. Maybe I want to build a model that uses this information in order to be able to protect a computer's priced according to the internal memory and the speed and the CPU will, and any other details that I need to know about the computer. Good. So this is my dataset that we're going to work. I put it in my drive, in a file, in a folder called data set inside a file that is called hardware software. Now here I'm minimally new. Note, fill in Google Colab. Now the thing is first of all, you have to make sure that the Colab is connected to your. I'm gonna go to the left over here, clicking on this icon over here, I can preview my files. You can see that it is connecting to something, but it's not actually connected to my droid. So clicking on the third button over here, I can actually mount Molly drive to the global color. So I'm going to request mountain with Google Drive. Sometimes it may be autoscaled, full permissions and so on. Connect to Google Drive. If you didn't connect before, maybe Osman you for access. So you can actually get access stone. It's supposed to appear over here. And as you guys can see here it is. We're ready to go. Now we're going to start typing some Python code in order to be able to read the file and recorded. First thing is that we need to import the pandas package. So we're going to work on import pandas as pd. So the thing is we're trying to import the pandas package, the one that is going to allow us to deal with CSV files. On the band, I click trunk and I'm waiting for partners to be imported. Good. Now I'm ready to use Pandas whenever I want. The faint. And the next thing is to start by adding a second code line. And now the thing is we want to decide the path of our file. The part of the CSP found that we added to our dry. I'm gonna go click on the drop over here. And then I'm gonna go to my drive. And then I'm gonna go look for the folder that I call dataset that I have my file inside. There it is. And once I open it, we can see there is my file. I will click on it and ask for copy path. This is the path of the file. So I'm gonna go here and say path is equal to, I'm hoping condition. And paste your path over here and make sure that you had pasted correctly since it did give me this chick sign. So two means that my file is ready to be written out. What do we want to? We're going to add another code line now we're going to start to try to read our four. So the thing is, we're going to go with defining a function is equal to PD dot read underscore csv, fine. And this CSP file can be found King. The path that I already wrote above. I'm ready to belong and to run like cooked. Good. No, I can actually read my data to make sure that I can read my Anita, let's go with the following. Df, head. And I can use this function to present or to show the first five rows of my dataset or future. Let's run it again. And as you can see, I have access toy dataset and I will present the first five values or rows in it. So I can see that this is a computer that is entailed and this is the speed and any other information that I need to know. Nope. What boast won't. Thing is now we want to start doing the steps we agreed on. First of all, you want to start by removing the missing values. So going back to our callout, we want to start a new code. Now, what do we want it to? We want to start by dropping the missing values. So we're going to go with the UDF dot, drop. Any heard. We're going to open a bracket around. We can give a variable that is cooled in place. True in place means you want to replace the values in the dataset you read and you upload it over here by the new values. Yes, I do. So I'm gonna go with the following. In-place equals true. Or will know I'm going to run my cooked. Good. Now, this specific code removes any missing values from my dataset. So if there were any missing value that may affect the functioning of my trained model, then I want to remove them so I can use this function to remove them. Now the next step is to add another coat on. What do we wanna do? We want to actually remove the duplicates. As we said, training a model with duplicate data actually affects the performance in the modal. Enlighten me. Good. So we're gonna go again with df dot drop on his care duplicates. Here it says, you can actually just collect on it. Earned. Again, we're gonna go with in-place equals true. We have double bonds state over here. And I'm going to run my code. Good. So the second statement over here, just to remove any duplicates from the dataset that we were using? No. Our next step is to actually save our new dataset into a new file. So I'm going to open a new code. Now. Let's save this dataset two hour drive. So I'm gonna go with the following, the doped to on your scores, CSP, the function I created, I'm going to save it as a CSV file that it opened. Rock kits are gonna go with naming the new files. So I'm going to open a single mutation. And in silencing, Fisher will go with, I'm going to go with formatted date underscore data set. And this is the name of the no-fault that are creating. And it's going to be as CSV file. And I need to also add an index on make sure the index variable is false. Hair and on my statement is ready. I'm going to run it and there we go. So what did this CO2, there's code actually import the pandas package. The pandas package helps me to deal with detail. I identified the part of the file which I kept on my drive after I mounted to My Drive. And then I started by reading a file, presenting some of the values and his bio, I dropped the unwanted or missing values. I remove the duplicates and I finally saved. My thoughts are owned by the way, to show you this nurse in unison call, I'm gonna go to my Drive over here below, and I can find that there is a flight over here called formatted data set. So this is the new dataset I can tell loaded than perfume on data and make sure everything is one. So this was actually one of the ideas that people may use in order to prepare their datasets before they use them for fine-tuning. Thank you for watching. 8. Generating Questions and Answers from Text for Fine-tuning AI Models: Hello and welcome to a new lesson. In our lesson today, we're gonna be speaking about preparing questions and answers from a fixed point. As we spoke before in previous lessons, we said that sometimes we may need to use text, just throw text in order to fine tune a model. But we can't actually send this row text the way you test to the model, we need to do so for plotting, the format is it can be divided into two stages. Principle, preparing the data as prompts and completions with the suffixes included in the data. And the second one is to change this data two adjacent per lot. And then we can start doing the fine-tuning process. So for today we're going to speak about this step. How can I prepared questions and answers, the text? What I'm going to actually share a good tool with you that you can use. So as you can see here, I have an essay about animals. This is around 500 words is saying. And as you can see, this essay is divided into different paragraphs or things. I can actually copy analytics all of it and take it to an online tool where I can use this text to generate questions and answers to be able to train my model according to those questions and answers as prompts and completions. So the thing is we're going to go to the width browser and we're going to type drain Pali and QA generated. This is our tool, the one that I'm going to be sharing with you for today. So this is, this is a tool that was created by OpenAI. And even you can use this tool without anything. We're just gonna be paying for the tokenization, the one that is already touched, GPT or TBT is actually charging for. So look here before we start, we need to enter our API key. Because we're actually using an ATI module to actually do this. We're, let's reach here, woke up extremely equals two nodes, which in a renter. And it can actually help you by signing up for logging in if you have an account. But let's say I don't have an account. I can go with bringing my API key. And just by adding more API key here and verifying the key, I can actually start using the tool. So I have only the icky over here prepared. I'm going to copy it. I'm going to go back to my website. I'm going to paste it. I'm going to try to verify my key. And it says that thing Q UK has been verified and stored locally in your browser. Good. I'm gonna go with, okay, and let's not say VB icky. Now I'm ready to go. Look at this. This is so easy and smooth and it's actually helpful. It says, enter the sticks out. You want to generate questions and answers two over here. So I'm gonna go back to the text I prepared. I'm going to copy my text. New ladder, how long it is. Once I have void, it's ready. I'm going to go back and paste it. Over here. You have to give the modal or at the program an idea about how was my paragraphs separated. So it says, what's the paragraph separated with a single return? Or let's say a blank space or something. So I'm gonna say, I'm gonna keep it as singular turn. Or maybe I can just go with the following, giving a blank space at the beginning or the end of each paragraph. And this will actually make wine data more clear and prepare it to be used. Good. So that's it, Good. Now I'm gonna go with reviewing my paragraph. And here the tool just divided my paragraphs into different groups. Okay? Now, there is something that I need to explain here. Generate three sets of up to five questions of each paragraph. So if I asked the tool to do this for me, I'm actually asking it to generic three different sets of five questions of each paragraph. So by multiplying, I can know that I'm getting for each paragraph three sets its moons that my questions are going to be repeated, but maybe in a different way, but they're actually going to be supplied with the same answer. So let's try with asking for two sets of five questions for each paragraph. Now, I have small paragraphs, so I'm gonna go with e.g. three questions for each paragraph. Good. Now I have here the estimated cost. And this is of course the coast that was that I'm gonna be charged for by OpenAI. I can check my balance for my API account, which actually in the beginning, whenever you started using the API services, you have around $18 for free. So that will be enough to use this for a long time because they're charging a very low amount on this. Good. So I'm gonna go with change text. If I wanted to remove this text, I can go with change text or start generating. It is going to take some time, but believe me, it does worth, and as you can see, it's giving me. This is the first set, three questions about the first paragraph. And this is another set off the first paragraph, but with different questions. What are the benefits of animals to humans? What do animals offer asked these two questions are actually alike and they may have the same or somehow some of the ulcer may be included. So that is good. It's actually giving me different structures to the same question, different sentences, which is actually smart because this can actually make the modal understand more about how can I ask a question and walked exactly should I respond if the question looked like this? So the model is learning and building on generalizing patterns in an effective way. Now I'm the only thing that I need to do as I can go with copying my answer and questions this way, taking the copy of them, e.g. going to start a new Google Sheet. And I can actually just paste my questions over there in a new sheet. So I'm gonna go with the first row over here now and I'm gonna paste. And as you can see, let's give some space over here. I have groups of questions and their answers. The only thing that I need to do as we discussed previously in other lessons, is all I can remove unwanted rolls into rows. I can also read my data to make sure I don't have a sub two duplicates in my data. And I can also add suffixes as the thing that we explained in previous lessons. Good. So now I have a list of questions and answers to my topic. Some questions are repeated but in a different way, which is more formally model. And this was a very nice and easy tool to use. Now after you prepare your data or your suffixes and everything is fine, you can download this Excel sheet and then we will learn in the upcoming lessons how to use this Excel sheet to change its format to adjacent Vermont, where we are actually going to use this data in that format to fine tune our model. And that's it for today. Thank you for watching. 9. Using ChatGPT to generate Python code for data manipulation: Hello and welcome to a new lesson. In our lesson today we're going to learn how to work on our data in a smart way using one of the steps that I spoke about previously. This step is that we want to use GPT to generate for us a Python code. This Python code is going to be used in order to work with a big dataset and change this dataset into the format of brands and completions of which you will have to use these pumps and completion to fine tune your model. So what we're gonna do is that we will go to chat GPT, we'll do some prompt engineering. We have to describe the features and the requirements of our projects and our dataset in a good way in order to get a good Python code that can change all of this dataset into the shape of prompts and completions and all of this with one-click on. So I went to chat TBT and I wrote to the following. First of all, I want to use a Python code. And this Python code is to be, to convert a prepared dataset to be carved into the shape of Bronson completions. So I told you that GPT than I need a Python code for the following, I enrolled every tip that I need to mention to touch it before it generates Michael, such as the data, where is it going to be low default, which file? And maybe tussle important import some necessary libraries. It will take the first value in each row. It will include this first value into a prompt, which is the name of the movie. The rest of the data in each row, which is the rest of the details mentioned in my CSV file about each movie is gonna be prepared in an appropriate completion. And I'm just including some of the data and not everything from the dataset, e.g. each movie has details released data director, actor, and general. That's the information that I need to include in my prompts and completion. I also want to judge anybody to add less. As such as new columns in the original dataset. These columns will include generated prompts, engineering hated completions, two new columns that will be added by the Python code. Then I have to save my data or the new DataFrame. And another file, this file is the one that I'm going to take and use to do the factoring process. I also had to mention subtle details about my data. E.g. what is my detail about e.g. a. Group of obese? These are the features of the movies that I included in my dataset. So I rotate every feature that was included in my dataset for the movie. I also gave examples, the name of the movie, the other features on the Ruby description. So whenever I send this to chat GPT chat, typically you will be able to understand my dataset in a very good way. It will know exactly what do I need my Python code to do. Once I sent this to jot GPT, I received the following. Let's have a preview on the code. First of all, it's imports the pandas package. This package helps us deal with data and CSV files. In Python. It's also going to import OpenAI. It's going to ask for the API key. This can help me to communicate and complete the fine tuning process, like sending some requests and receiving responses from OpenAI. Good. Now the first thing is that it's going to create a new DataFrame. It's like this data frame. It's going to fetch more data for all my CSV. But remember, this is the name of the file that two-half to place over here. What did you name the font that you put inside your drive? Or maybe rather than writing the name of the file, we can write the path of the file. Where did I actually save this one in later. Then it creates the two lists I asked for, the prompts and the completion. These two lists are to be appended each time like code reads my data, it's going to take the name of them. It will be added to the prompt section. It's going to take the other features at them to the completion section. So these are the new dataset, less that will be added to my DataFrame by the way, good. Now in this for loop over here, it's gonna go read every row, create the prompts and the completions. And in these two lines over here, it's going to append whatever new prompt it's generated to the prompts list. And it's also going to add whatever new concretions it created to the list of completions. Finally, when these two lists star ready, it's going to send these last two the DataFrame. And it's going to save the modified DataFrame, that new CSV file in my drive under the name of repaired data, and that is the CSP file. And once we execute this code in the next lesson, we will be able to see the new DataFrame that was generated by my code. So that's it. So I actually find this helpful, asking shut UP teacher generate for obstacles that will work the best with our code is actually so helpful. 10. Running and Executing Python Code on Google Colaboratory: Hello, welcome to another lesson. In the previous lesson, we previewed the Python code. That code was built for us to use to prepare prompts and completions from a big CSV data file. In this lesson, we will be running and executing this code step-by-step until we make sure that we get prompt syncopations ready to be used in fine tuning our model. This is the Python code that was prepared by tattoo PT, according to the prompts engineering we did in the previous lesson, we previewed the code we so its terms and what things do we need to change in order to make sure that this code meets the requirements of our dataset. Now, in this lesson we're going to execute the code. So I'm going to copy the code the way it has. I'm going to go to the environment which allows me to execute a Python code. For today's lesson, we're gonna be using something called Google Colab. We spoke about it in our previous lessons, but let's remind you of what Google Colab is. Now once I go to the Google Colab, it is call up lottery that allows you to run your Python code editor. And it was done by Google. Once I go into the Colab, it's going to suggest for me my previous projects or previous notebooks. This is what we call the files on Google Colab. Other thing is I can just start a new notebook to work on it. So this is the interface or the shape of the notebook whenever you open it. And as you can see, there are some things that we can do. We can change the name of the notebook According to what we're going to do. We can also add codes or texts to this notable before running and executing the code. And so what is low to our Google Drive? Why? Because we want to read the CSV file as on our. So as you can see, this is my Drive over here. And in my drive I created a folder called data set, and I included the datasets that I may need to use. And this is the one we will be working on. Let's have a preview on the dataset over here. Let me open it. So what if Google sheets that have thing is, as you can see on the first column, we have the movie's name and then the world earnings, the general at the Director and the release date, the actors and other details about the movie. This is the Z to set I'm going to be working on, I'm going to go back to my color up. I'm not going to need this folder over here, the one that I opened up as an Excel file. So I'm just going to remove it. I need a CSV file to be able to read it with getting MAC two minute. First of all, I have to make sure I can access this on using Google Colab. So I'm gonna go to the files to the left over here, as you can see, connecting to runtime to enable the fires. And so one, once it's activated, I'll have to do something called Mounting. Have to mount the Colab toy. And this is by clicking on the icon over here. Once you click on it, it's going to be mounting. And it's sometimes it may be asking for permissions of your days. It's Martina to my Google Drive. Once it's learned to date and everything is calling, you will see that mine Google Drive file appears over here. So here it says, there, it says, and I can actually enter my drive, found this place. And Chico, for the availability of the fun that I added going to the folder called data set, I can find a new dataset I included in that bond. And this is actually the CSV file that we will be working on. The Python code here, copy it and take it to our Google Colab, add it as a text. So first of all, I'm going to keep this takes out the top because I want this text to be there for me whenever I needed. Good. There we go. So this is it this is the text I prepared. Why did I take it here? Because I want to do the steps one by one. So this can be my lead or it can actually help me if I'm not familiar with using the Python code in a good way, okay, Now, first of all, I have to add a part of the code which is called the path. This is the path of the file that I want to read. So I'm going to start by creating a variable and then this is the path and what is the path of the file that I want to read? I'll go to my file here, click on it and copy the path, and paste it over here in-between double quotations. There we go. And then I can run to make sure that there is no mistakes in this sentence. That's all good. So let's preview what we're going to work on. First of all, we have to import the pandas package and we also have to import the OpenAI bucket as well. We also have to clarify our API key and so on. Now let's start executing our code. First of all, organic go with installing with the open API library. Pre wanted to use it, and it's going to take some time for installments. So let's go with the second part, which is importing the pandas package from Python. So I'm gonna go with import those paths PD. And I'm actually creating a function called bd. Whenever I want to use the pandas package, I'll be using the p.ball so long as I run this code. Now, the second thing is to read my file over here. To read my file over here, I should go with this and creating a new DataFrame by reading the previous DataFrame. So I have to include the name of the file or the path of the file where I can find it. So let's add another coat pay status. And rather than writing the name of the file over here, I'm gonna go with paths. The path that I declared above over here, which is the path of the file. And we're going to run this code now. So I can actually read the data in the CSV file over here. Good. Now, after I read my file, I'm going to start taking the data and adding it to new less. Now I'm gonna go with creating the two lists over here, the generated font and generated coefficient list. So in these two statements, I will be generating two lists in order to append to them whenever it is sustainable, creates a new prompt or near completion. So let's go at another code and go with the following. Now I have two lists. The first model called prompt and completions, and they're ready to be taking values. Now we have the for-loop on this little over which is going to go to the dataset, create brands, and create completions accordingly. So I'm going to take all group together over here. Let's take the rest of the code as well. Go execute it. So as you can see, for loop is going to go fetch data from the CSV file. It's also going to be adding this data to the lists that I created above. And these two lists are to be added to the original CSV file. And this all is to be saved in a new CSV file called prepared data CSV. So I'm going to run the code. And once I run the code and it's stolen, I can tell that they're most VC is the file called prepared data CSV fired at it till I enjoy over here. So let's go to the files. I'm just going to close and open to refresh. It's cool down today into it. And as you can see, here's my foot once I click on it. So as you can see, this is the name of the movie. It's earnings that general the director, whatever data I included previously are here. On the left side I have something called generated brown. This was added by the Python code, generated completion. And these two columns are the only ones that I'm going to need in order to use to fine tune or bimodal. So that's it tweaks cuz it decode my data's ready to go. Thank you for watching. 11. Building a Well-Structured Project Directory: Hello, welcome to another lesson. In our lesson today we're going to learn how to build a good project directory, how to structure and manage data within it. This is an essential skill for any d designs or fine-tuning project because it ensures that we can get back to the data and Rick and understand it in a good way. So there are many benefits for building a good project structure. One of those benefits is that if you were working with a team, this will allow each team member to come back to the data and complete and work on it. This will remove them as consumptions. It will also remove overlapping, and this actually saves our time and improves our professions. Another thing is that a good structure for a project allows me to come back to it, to reuse it again, e.g. I. Can take these data files, adjust them, and include them in your projects. I'm gonna be sharing some of the types that you may be used in order to create a good project directory. First of all, remember always to keep your data in folders. Inside those folders, you can include subfolders, but then moved to use descriptive names. Remember to use understandable names of which whenever you want to look for a specific file, you can come back to its search, to its binding. Another thing is that you should always keep a read me file in the read me file and you can keep data about how did you do the project, how to use it? What datasets to do, use Python codes to use the urn, so on. So as you can see here, I prepared here an example for a projects directory or how would the products structure look like? So first of all, you can include different for folders. Some of them maybe about codes, the script of the codes. Let's imagine that you can use different codes for preparing data. You have to keep a copy if you use a code for data analysis and the code that used for training the model. I keep particles over here. You can also keep notebooks. In the notebooks, you can keep the files of data that you use, the original ones. So this is a CSV file, and if you use the next one, you can include it as well. I also here have other fonts, more detail, but these are the processed ones. So this is called prepared data, and this is the prepared data from the text. And it is actually a recommended tuning them with numbers, e.g. if you prepared the data more than one state, you have to include prepare data one, data two, and so on. As well as you can keep the outputs of the process that you did or the project e.g. like keeping copies from the modals are keeping a read me files. In the read me file, you can include whatever data that you need, breaks out poll, what are the suffixes that you used in the data before you point to and what was the model that used? What was the modal's ID? Which API key did you use in order to build and fine tune this model? You can also write red birds about these low dose and keep them. This will actually make it easier for you or any team member or any person to refer back to the project, see what happened, take this data, work on it to some other point, tuning products and so on. So that was a brief example of how to create a good project directory, how to set up an organized on top within it. And as we saw, we shared a small example to that. This project can actually be uploaded to your GitHub. It can be kept in specific files on your drive or on your computer. But this actually will be the key for succeeding in using this project and learning from it and doing some other projects in the future. 12. How to Choose a Pre-Trained Model for Fine-Tuning: Hello and welcome back. In our lesson today, we will learn how to choose a pre-trained model that actually meets the requirements of our own project. We know that GPT-3 is a set of large and powerful models, but we want to use one of them all in order to create our own fine-tune. In this lesson, we will preview with the specifications of those models. And we will get to know which one of them would meet our requirements in order to get to know those models and think about which one of them should we choose? We should go and preview them from their source, which is the Open AI's website. So I'm gonna go do it openly. I specifically to the playground. Why? Because I want to go drew their documentation section. Here it says at the top it will go to the documentation section. Going, scrolling down on the left, we can go to Modules section. As you guys can see, this is an overview of every single GPT model that was done by OpenAI. They reached a typically for now there has to be 3.5. There's this Dolly than one that can generate images and there are some other modals. The thing is we're going to be depending on GPT-3. Gpt-3 is the basic set of TBT models, the ones that we can actually use for fine-tuning. I'm gonna go down, scrolling down to GPT-3 specifically, we can prove you GPT-3. Now, we want to see the basic GPT-3 models. Those are the da Vinci, the Curie they are now and the Babbage. And these are the only models that people can actually use for flight. No other models can be used fine-tuning except these three. Now, the thing is, the latest model of them is also generated over here, but we will, when we run a two fine tuning, we only depend on these four basic ones. And you need to know which one of them can take more tokens, which one of them can watch the coast of each one of them. What is their capabilities and abilities? And this is what you need to consider renal whenever you choose between them. So the thing is, Da Vinci is the biggest one, it's the most capable on. It's amazing, but it has a higher Coast going to puree by badge and ADA. Curie is good in sentiment analysis. Also, if you were working on a project that has to classify data into positive and negative q0 wouldn't be a good suggestion for that. Ada is the fastest one, it is the fastest one. So if your data means that you need to work in a fast responsive model button, not very impressive responses, let's say very direct responses to false. You can depend on ADA and so on. So these are the basic four models. Consider them, you have to think about what data types are using, what are the requirements of your project? Thinking about the speed, thinking about the resources that you have. How much do I want to pay, or what is my ability to be to do this fine tuning process. And this is when you get to be able to know which one of them to choose. And we will use the name of the model in order to do the fine-tuning process using this specific order. 13. Introduction to Fine Tuning Process in Python: To have an overview over the steps of the fine-tuning process from the beginning till the end. I prepared this for you to do the fine tuning. First of all, we have to download Python. How do we download Python? We go to the Internet. We can search for Python, download heroin or Veritas. You just have to make sure you download the latest version of Python to be able to excuse your Python codes. Here we go. So on my laptop, I'm gonna go search for Python. And as you guys can see, I have 3.11 version, which is the latest version. So this is the first step and I'm ready for it. The second thing we have to do is to open our C and D. What is the C? Becky? C and D is the Windows command processor. When you go to the search bar and your Windows ten or 11 laptop, when you go with C and D, you can find somebody called command prompt. What is this? This is an interface where you can run commands over here. These commands can operate or work on your programs or your operating systems. And it also can help link your laptop with the Internet, asking, sending for requests, receiving responses. And so this is where the whole fine-tuning process is can happen on our CMD or our commons processor. Did. Other thing that we will be doing today is we have to locate our fun on the CMT. The whole processes is going to happen in a file that we prepared previously. In the next class. I'm going to share this slide with you in a link in the description of the lesson. Then after we locate our file, we have to download some needed libraries and packages. Those libraries such as the opening on library pandas and some other libraries that we will need in order to work on our data and do the fine tuning process. After that, we're gonna be running Python codes and we're gonna be running so coal mines in order to go to US Open AI's tools and to prepare our data for fine tuning. And then we're going to run the Python code to finish the fine-tuning process. And after we do that, it will provide us with a model ID. We'll be using this ID in order to know what happened to the fine-tuning process to follow our model, the one that we find, to find turn until make sure it was creating change and everything is okay. So this is a brief explanation, a very short explanation to the steps. Again, to do the fine tuning, we need to download some libraries done on Python, run some Python code, do some commands to communicate with OpenAI, to do with the fine tuning. And finally, to check if our model was fine tuned at Seoul. So this is a fast overview. In the upcoming lesson, we're gonna be executing some of these steps by e.g. first of all, preparing our data in the best way ever. And then fine tuning it and then testing our fine tune into all those things you 14. Fine-tuning a Pre-trained Model: A Three-Stage Process: Hello and welcome back. In our lesson today we're going to start executing the fine-tuning process. And I can split that into three main stages. The first stage is to change the format of the data we prepared to adjacent format. We spoke about that in previous lessons. We said that any font that you want to get to a pre-trained model should have the adjacent format. And we'll see how to change that using a Python code. The second stage is to take this data and give it to a pre-trained model to do the fine tuning on the third stage is to test our fine-tune the model. As a review, today, we're going to work on the following. We will first, I'll make sure that we downloaded Python and we're going to work on our CMD as we explained in the previous lesson, we will open our file and we will start work on our Python code. The first Python code is to help us to change the CSV data file to achieve some needle. Once we've finished this, we are sure that our data is ready to be given to all the pre-trained model to do the cartoony. So let's go and start executing our tests. First of all, we're gonna go to our CMD theorists voice EMT. Remember we have to open our file. So I'm gonna go with CD and opening the font that I called. Test. This fine over here is included in the description below the nestling. And it includes the two Python code there I'm going to use. The first one is to change the format of the data to adjacent from the second one is to communicate with open EI and asked for doing the fine tuning using our API key is. And here is the explanation to the whole steps that we did. Here. I included here my API key because I'm going to need it in order to link my pre-trained model on my phone to unimodal wave from my API key to be able to use it for further applications. And this is the data that we're using good the dataflow. Let's preview our data. As you guys can see, I have the left side, over here, the bronze, and on the right, the second column is the confessions. In the prompt sides, I have a question and after it, I had spaced and after this face I included a suffix. This will help them understand. Then this is where they prompt finishes. On the right side, I have the completions, that completions as well. It starts with a blank space or a 1D space over here before the completion. And this space makes the model understand that this is where the collision starts and why they innovate. I added a suffix. This suffix will actually help me or help them understand that this is where the completion to the first prompt ends unsold. So my data's ready, I downloaded it, make sure you have a clean tea, a good amount of detail. Make sure everything is set. Make sure you have no extra information in any other cells or any other sheets and so on. Once you're ready, once your data's ready, You are good. Took up. Going back to our C and D, Let's start by running our first Python code. Before we run the code, let me preview the code. This is it. I'm going to open it as a notes. I'll as text file and see what does it do. First of all is import some needed libraries. And then it creates a workbook. In this workbook it's going to load the data from wind test file. Remember, if you named your data file in a different name. As you guys can see, I named the zest. So if you have a different name, you have to write the name your file over here. Good. Then it starts going to the sheets inside life, why we only have one sheet. Then it's gonna go and using this for loop, it's going to start creating the prompts and Nicole deletions in this format to save them in a JSON format file that is called data test dot json. Then new fault that's going to be created by this code is named dates test. And inside this file you will find our data, everything we included in this Excel file inside a fine But with adjacent per month. That's how I prepare my data to give it to a pre-trained model to fine tune it. Good. So after I make sure my code is ready and everything is changed, the name of the files and everything needs go to our C and D to run this code. So we're going to take the name of this file over here, excel to JSON. So let's just rename, copy the name. We're going to place the name of the file and click dot pi sin this to be executed. Once it was executed, as you guys can see over here, look back to our file and we have an extra fun here it is. It's called data test. Let's preview our new file. So Afghans can see here is the prom, next to it is the completion. And all of this is prepared in a decent format. Good, so my data is ready now for font. Now our next step is to take this data and send it to be fine tuned or to fine tune a model. So I'm gonna go to the next set in my C and D. And that is to run the Python code that helps me to find two are my model, which is over here. Let's preview our code. I'm gonna go back to open this code as a text file. Let's have a look. It's also imports, they needed libraries. It switches API key and after it finishes my API key, it starts communicating with OpenAI in order to create a new trained model or a new fine-tuned model. And it's going to be asking for requests list of the previous pre-trained models and other data. What I care about in this code is the last part over here, because this is where I get to decide the name of the file that I want to create or the name of the model that I wanted to create. And what is the name of the band model that you want me to depend on? So looking at this data over here, first of all, what am I going to load my data? The one that I'm going to use for fine tuning. You have to take it from a file called digitalis Jason. This is my f1. And what is the name of the new pre-trained model? First of all, I can name my model as I want. I called it Mary disorder and model. And which is the pre-trained model that I'm gonna depend on, that is to differentiate. So whatever was the name of this file over here, you have to include it over here. Remember it's the teeth and L5, you have to choose the name of your model and you also have to choose the pre-trained model. Now, when I make sure that my code is ready, I can start on this cuticle. The thing is, I also wanted to share something with you in order to be able to communicate with OpenAI and receive requests, you have to add your API key to the environment variables of a laptop. What does that mean? If you go down to the search bar and go through the environment variables. One site openness setting box below, we overhear you have this top environment variables. Once you click on it, you have to add the open API key. In the fire variables. The variables allow my system to communicate using this API key with OpenAI earned include my EBIT. So if you didn't have it over here, you're just going to go with the following. Open, name it, OpenAI from the score API underscore G. And it has to be written with capital letters this way. And then you can go to the playground of touchy P t and get of OpenAI and gets your API key and include it and save it. So if this was saved over here correctly, this will make your sustained able to communicate with OpenAI and Syndrome quest for fine tuning and receive responses after the foreign to me plus is good. So to execute this code, the fine tuning code, I'm going to go back to my Cmd. I'm going to go with typing the name pointer. Okay, let's again copy its name. Go to our CMD and jot high, because this is a Python code. What was, it sends requests to API. It has to respond back with one bit. It to this response came back from OpenAI. It shows me what happened and it actually gives me a very important thing, which is the ID of my fine-tuned to model the new one. So this is the whole history of one-to-many models that I did before. And most of these cases where succeeded, succeeded, earned. I'm going to show them to you in my next lesson when I show you how do we test our pre-trained model. Going back to the end of this response that I received, I want to see what's happened over here. As you guys can see, look, this is the ID of the model that I'm trying to point you in today. So I'm going to need this ID in order to shake the status of this model, the one that we're fine tuning now, Da Vinci is the pre-trained model we're using and we're gonna do with the fine tuning process. And the name of the new model is narrow disorder and so on. So these are the important information we need. And as you guys can see over here, it says it is still pending. Pending means that it's still doing the fine tuning model because it wasn't, It is still not fine-tuned. It's doing it. If you guys look over here at the previous time, I tried to find two unimodal I failed. There was a problem with something. If you go back above, you will see the history of every fun to model and each one of them with its ID. So this actually can help me remember what did I do previously if I need an ID or for previously a pre-trained model and so on. So in order to check what happened with mine and tuning process, I can use an Open AI command that included for you inside this file. This can be the final stage. This command is prepared by OpenAI. And this command actually twice to follow the status of the fine-tuned model, but you have to include the ID of the modal over here. So go back to your CMD, copy the ID, the idea of the fine-tuned model. Here. I'm going to copy it on double gonna go back over here, paste the ID after the letter. I. Take this response or this command, the way it has. Go back to your sandy, paste it and fun. Okay, good. It says that open EI is not recognized as an internal or external commands. So good that we have this issue over here. In order to resolve this problem, I'm gonna go with installing the opening library. So I go with pip, install Open AI. Once I do this, I'll get opening. I installed. Good. It's ready to be used. Okay, let's try again. Paste our command with the idea of our model and send it to open it and see what's happened to our fine-tuned model. And this is ours. As you guys can see, open EBI foreign to follow. After it followed, it says that our model was created good. So it did create our fine-tuned model. And this is the idea of the new fine-tune. So that's it. This is the whole process for fine-tuning a new model. And in the next class we're going to be previewing how to test our point tool. 15. Testing Your Fine-Tuned Model on OpenAI Playground: Hello and welcome back. In our lesson today, we're going to be testing our fine tune model. Where do we test that? We're fine tune global retested on the playground. It's the first place of which we brought our API keys. And it's the place where developers or people can actually test if their account and open AI is working and they can test our APIs key is on. So what are we going to go to a new web browser? Go to Open AI playground. And once we open it, okay, There you go. I'm going to click on plaguing old on the top. I love. This is the polygonal and this is where I get to decimal and fine tune the model. As we said variously, on the right side over here is the place where you get to choose the one that you want to deal with, the model that you wanted to respond back to you. Once you click on it, you can preview GPT is models. You have these basic models and down you can find your fine tune. For our model, the one we created together and visa sense, it was married disorder mobile. So here it is. Now once I choose my model, I can also change the temperature to make sure the model is not actually using some extra information or trying to be creative by adding some more information, referring back to the dimensional model, I also can decide if I wanted my answers to be shorter than a specific limit of tokenization. And I also need to use this top sequence, the subjects that I added to my data before I did the fine tuning. So the first thing is I'm going to write my prompt over here. Since this model is about disorder, I'm going to go with writing what is mental disorder? And I remember that this suffix or the separator that I added at the end of each prompt was backslash n backslash. So I'm going to add it over here. I also have to add a stop sequence, the suffix that I added to my completions in my data, which was triple hashtags. Once you write the stop sequence character that you used, you click top. So you added it. Now you can submit your prompt and see what response do you get. As you see, I get a specific answer and a short one, even though I didn't actually change the limit over here. Let's imagine I forgot to add these suffixes over here and over here. Let's see what's going to happen. So just remove Y suffix over here. And I'll remove this top sequence. And I will put e.g. then tokenization maximal length and they'll submit, I'll actually get a very short answer. And if I add more frees up 50, I'm going to submit it again. And as you can see, I get a longer answer and it's told by this number. So this top sequence is actually something that helps me. Here, as you can see, it stopped here. And it's still added some extra tokens over here in order to use all of the 50 tokenization. So the thing is, those suffixes are actually here because they help us make our modal understand that you have to stop where every completion you read when you were trained ended with these treble hashtags? This is how I get a specific fixed answer to my question over here. That's it. This is the way how we test our model. We actually come to the right side. We pick which model are we testing. Do we write our approach here and then defining the suffixes of the completions on process. And that's all.