Build an AI Agent (OpenAI, LlamaIndex, Pinecone & Streamlit)

David Armendariz

Get unlimited access to every class

Taught by industry leaders & working professionals

Topics include illustration, design, photography, and more

Get unlimited access to every class

Taught by industry leaders & working professionals

Topics include illustration, design, photography, and more

Lessons in This Class

- 1.
  
  Introduction
  
  1:46
- 2.
  
  Setting up the development environment
  
  15:19
- 3.
  
  Getting an OpenAI API key
  
  3:16
- 4.
  
  Understanding LlamaIndex and RAG
  
  3:39
- 5.
  
  What are agents?
  
  5:04
- 6.
  
  Vector embeddings
  
  2:37
- 7.
  
  Creating a tool to fetch papers from arXiv
  
  8:11
- 8.
  
  Creating a tool to download papers
  
  3:48
- 9.
  
  Defining the embed and LLM models
  
  3:40
- 10.
  
  Building the index and saving it locally
  
  11:55
- 11.
  
  Creating the RAG query engine tool
  
  10:33
- 12.
  
  Building and interacting with the agent
  
  9:03
- 13.
  
  Downloading the papers and fetching new papers
  
  5:15
- 14.
  
  Enhancing the prompt to download files
  
  7:39
- 15.
  
  Building a class to manage the index
  
  6:13
- 16.
  
  Building a class to interact with the agent
  
  4:53
- 17.
  
  Building a chat UI with Streamlit
  
  13:34
- 18.
  
  Getting an API key from Pinecone
  
  3:09
- 19.
  
  Creating an index manager for Pinecone
  
  11:29
- 20.
  
  Using the Pinecone index in the Streamlit app
  
  2:42
- 21.
  
  Deploying the app to Streamlit Community Cloud
  
  5:31
- 22.
  
  Conclusion
  
  0:53

Beginner level

Intermediate level

Advanced level

All levels

536

Students

Projects

About This Class

Are you ready to dive into the world of AI and create powerful agents using cutting-edge tools? This course is designed to take you from zero to hero in building intelligent AI agents with OpenAI, LlamaIndex, Pinecone, and Streamlit. Whether you're a beginner exploring AI or a seasoned developer looking to expand your skills, this course offers everything you need to build interactive, real-world AI applications.

What You'll Learn:

How to use OpenAI's API to generate intelligent responses.
Building and managing knowledge indexes with LlamaIndex.
Storing and retrieving vector embeddings with Pinecone for efficient AI searches.
Creating interactive user interfaces for your AI agents with Streamlit.
Best practices for integrating these tools to build scalable AI solutions.

Why Take This Course?

The demand for AI-driven applications is skyrocketing, and understanding how to create AI agents is a game-changing skill. This course provides practical, hands-on experience with real-world use cases. By the end, you'll have built a fully functional AI agent ready to deploy and showcase.

Who This Course Is For:

Developers and engineers interested in AI and machine learning.
Data scientists looking to explore AI-driven tools.
Entrepreneurs and innovators eager to build AI-powered applications.
Students and professionals seeking hands-on experience in AI development.

Join now and unleash the potential of AI agents in your projects!

Meet Your Teacher

David Armendariz

Teacher

Hi! My name is David Armendariz. I am from Ecuador.

I studied mathematics at USFQ (Universidad San Francisco de Quito). However, I love coding and that's why I transitioned to the software industry. I love to share my knowledge here in Skillshare.

I hope you enjoy my courses as much as I enjoy doing them and remember: never stop learning!

See full profile

Related Skills

ChatGPT AI for Development Development Programming Languages

Level: Beginner

Hands-on Class Project

In the class, we did prompt engineering to fix the issue of downloading all the papers.

In the class project you will have to take another focus and rewrite the download PDF function to instead accept an array of urls and an array of filenames and download all of the papers at the same time.

The result should be the same: have all the files downloaded in the same folder as we did in the lectures. Please upload a screenshot of the PDF download function, the agent logs and the PDF files downloaded.

James Capozzoli 3 likes

Class Ratings

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction: Hi, and welcome to this course, building an AI agent with Open AI, Lama index, pine cone, and streamlt. I am the Vidar Mendais and I will be the instructor of this course. Who am I? I have a bachelor of science in mathematics, a master's degree in data science and analytics, focusing on LMS. I'm a full stank software engineer with more than six years of experience. I AW certified, Ashur certified, and I am a cybersecurity enthusiast. What are we going to do in this course? We will create an LLM agent based on OpenAI GPT 40 mini moodel. The agent's purpose will be to find and summarize research papers from the archive platform and we will use the Lama index framework in order to augment the knowledge base of the agent. What are we going to learn in this course? Going to learn basic concepts of AI like vector embeddings, vector indexes, retrieval augmented generation or RAC, Crum templates, react agents, how to optimize the agents instructions, vector databases, Lama index for augmenting knowledge, stream need to build a UI and deploy it, and Python and software engineering best practices. What tools are we going to use? The agent will use three tools, a rack query engine to fetch information from a knowledge base, a research paper fetch tool to research or find information about papers that we don't have in our knowledge base and a PDF download to if you want to download papers directly into your machine. I hope you like this course because I enjoyed a lot building it. I hope I can see you in the next lesson. Bye. 2. Setting up the development environment: Hi and welcome back. In this lesson, we're going to set up the development environment. A few things you have to keep in mind. We're going to use VS code as a code editor. We're going to create a GidHubRpo to share the final result with you and we're going to use PDM, which stands for Python Dependency Manager to manage the dependencies of the Python project. We're also going to need an Open AI API key to use their models. So what are the dependencies that we're going to have in this project? We're going to have Archive, which is a library to download the papers from the archive platform, python dot F to manage the environment variables. In this case, we're going to have the OpenAI API key. We want that to be a secret, so that's why we're going to use Python Mv. We're going to use Notebook because we're going to use Jupiter Notebook and we're also going to install the Lemma Index, which is the framework to help build LLM apps. So this is the boring part of the course, but we have to do it. Let's go to github.com, and this is my GitHub account. You go to your GitHub account. If you don't have one, you can create one, and we're going to create a new repository. So the repository name, I'm going to call it Archive researcher. You can call it whatever you want. We're going to make it public so that I can share the final result with you. We're going to add a Gid Ignore for a Python project. License, non and these are just apps that I have. Probably if you don't have any apps, this is not going to show for you, but just click on Create Repositor. A brand new repo is created for me. I'm going to open up a terminal, and I'm going to click here on code, and I'm going to copy this SSH. If you don't have an SSH key, configure it in your Gita account, then you should use HTTPS, but SSH is the recommended way to do it. I'm going to copy this.Go to the terminal and say git clone, and paste that. So now I have that repo in my local machine and I'm going to open VS code here. As you can see, we have the GTI nor, which is the same that I have here. If I open that GTI nor, you're going to find a lot of things that are usually ignored in a Python project. So we're going to use Python dependency manager when we initialize a project with PDM, these are the things that are going to be ignored. Let's go and search for PVM PVM Python. So the URL is pdmpject.org. If I click here, I'm going to see the website of PDM. PDM, as described is a modern iPhone package and dependency manager supporting the latest PEP standards. But it is more than a package manager. I boosts your development workflow in various aspects. The main purpose of PDM is dependency resolution, basically. Because when you install packages, those packages may depend on sub packages. So packages need to agree on which versions of sub packages you have to install. That's basically the idea of dependency resolution PDM is a great tool for managing that. So how do you install this? First of all, you need Python 3.9 or later to be installed. It works on multiple platforms, including Windows, Linux, and MacOS. As you may have noticed, I am using MacOS. Another thing that PDM does is that it can manage multiple versions of Python. So for example, if I want to use Python 3.10 or 3.11 or 3.12, I can do so. We're going to see how to do that in a moment. But I want you to go to the documentation and follow the steps for the recommended installation method. It says like PIP, PDM provides an installation script that will install PDM into an isolated environment. For Linux and Mac, you just have to copy this command and paste it in your terminal. In Windows, you can do it with PowerShell. But if you are on Windows, I highly recommend that you use Windows Subsystem for Linux, so that you have a Linux environment inside your Windows machine. If you don't want to do that, if you're not familiar with Windows Subsystem for Linux, then you can install PDM with PowerShell. Okay. So after you do that, you should have PDM in your machine. There are also instructions if you want to uninstall PDM, but I will say keep it. It's a great tool. So if I go to my terminal and type PDM, me make this bigger. You're going to see that this has a lot of options. PDM ad is one of the commands that we're going to use, and this is for adding packages to a file called Piroject dot Tumel. This is similar to package Jason if you come from the world of No JS. PDM, another command that we're going to use is PDM in it, and that is to initialize a Pipe project that Tumel for PDM. Another command that we are going to use that we can use is PDM Python, and that is for managing the multiple versions of Python that I was telling you some time ago. So let's see, another command which is very important is PDM remove, which is used for removing packages from the Pi project up tamelFle. Okay? So let's see what happens if I type PDM Python. So if I type PDM Python, I will have additional subcommands. The first one is list, and that is to list all the Python interpreters installed with PDM. If I type PDM Python list, you'll see that I have these four Python versions installed in my machine. 3.12 0.2, 3.11 0.5, 3.13 0.0, and 3.11 0.8. Now suppose I want to install a new Python version, I can type PVM Python install I should specify the Python version I want to install. If I type PDM Python Install Help, then you are going to see that I have this list flag. Let me type that PBM Python Install list and you're going to see all of the Python versions that are available. If I just type PDM Python Install as I did here, it's going to install the latest version which is Python 3.13 0.0. But suppose I want to install Python 3.12 0.7. We're going to use that version of Python in this project and we just do PDM Python Install and you don't have to type the whole C Python app. You just type 3.12 0.7 and that should be enough. So it will download that version of Python and install it and save the executable in this folder here. Now we are ready to initialize a PDM project. Let's go back to VS code and we are going to type in the terminal PDM it. Now, if I make this little bigger, it's going to prop me to choose the Python version to use in this project. I'm going to use 3.12 0.7. That is this option here. I'm going to have the option four. Okay. And that is going to create this folder called VENV, which is the virtual environment. This is where all the packages are going to be installed, and then it is asking me what is going to be the project name? I can just type Enter multiple times to keep the default values here. Okay. So it creates a Pi project that tumble pile. Again, if you come from the NojS world, this is similar to a package JSON file. Here you can see that it has data about the project, the description of the project, the authors of the project, the dependencies, what version of Python it requires, et cetera. What we're going to do is right now install the dependencies that we're going to need for this product. For that, I'm going to say PDM at Archive, index, typon dot and Notebook. If I type Enter, you can see that it is resolving for the environment. It is resolving all of those sub packages that are going to be installed. And this can take a while. In the meantime, I want you to notice at this PDM Python file. This is only telling that the Python executable for this project is inside this VENB file. You can see this is very specific to my machine. That's why it is on the GTI no. When you selected that Python, GTI nor template, it already has this file in the template, so it is already ignored. Okay. Now let's see here what's happening. It has passed 1 minute and it is still resolving for the packages. Now, it made all of the dependency resolution. It resolved 149 packages. That's a lot because we just installed four of these packages. But these four packages in total, they use behind the scenes 149 packages. So PDM is intelligent enough to resolve all of the versions so that we don't have any conflicts in the versions. Everything got installed successfully, no errors, and this pdm dot log file got created. Again, if you come from the No JS world, this is the package, the package log file. And this contains information about all of the dependencies, sub dependencies that got installed. Last thing that we are going to do, we're going to see we're going to create a dot EMV file. And this EMB file is going to contain our OpenAI API. We're going to tie OpenAI APIKey and here we're going to paste our OpenAI APK. One stratege or one best practice is to have atnbtEample file. So this EMB example is not going to be in the GitknT can be safely committed to the repo. We just put the same information as we have in the EMV file, but without the values. Here we didn't have the value yet. Here we will never have the value. That's a good practice so that anyone that clones the repo knows that, hey, I have to have an Open AI API king. That's it for this lesson. I hope you like it and see you in the next lesson. H 3. Getting an OpenAI API key: Hi and welcome back. Before we forget, let's get an Open AI API key. Go to open.com, go to products and go to API Login. They change this all the time. If you don't have an Open AI account, sign up and if you have one, then login. Going to login with my account. And go to the dashboard. And go to API keys. First of all, you have to have billing enabled. So go to settings, go to billing at the payment method here, and you can put $5. I think it's the minimum. So don't forget to do that. Let's go back to Dashboard, API keys, and let's create a new secret key. So this is going to be called Archive and the project, you can scope these API keys to projects. I'm not going to scope it for a specific product because I don't have any projects. So I'm going to use a default project, and the permissions, you can be very strict and choose what you want, for example, only for models, for audio, for chat completions, for embeddings. I'm not going to complicate myself. I'm just going to say, Oh, I'm going to create the secret key and this is going to be my secret key. Obviously, I'm going to delete it after I finish building this course. We now I'm going to copy it. I'm going back to Vs code and in the dot EMV file, I'm just going to paste that. That's the API key I'm going to use before we forget, let's do a comic. Let's type kit, have everything. Git commit and we're going to say a initialized project with PDM and adds dependencies. Then we can push these chains. Okay. Now if we go to Github and refresh, we're going to see that the dotnB file is not committed. TheNB example is committed, but that doesn't have anything. So that's safe. Otherwise, people will see your API key and that will be bad. Okay, so that's everything for this lesson. I hope you like it and see you in the next lesson. 4. Understanding LlamaIndex and RAG: Hi and welcome back. So what is Lama index? First of all, what is the problem with LMS? They are great, but they are pre trained on large amounts of publicly available data. How do we best augment LMS with our own private data? That's where Lama index comes in. We need a comprehensive toolkit to help perform this data augmentation for LLMs. So Lama Index offers data connectors to ingest your existing data sources and data formats like APIs, PDFs, dogs, even SQL data. It also provides ways to structure your data so that it can be easily used with LMS. We're going to see that the most common way of structuring this data is through a vector index, and it also provides an advanced retrieval query interface over your data. You can fit in any LM input prompt, get back context and knowledge augmented output. We cannot talk about ama index without talking about RAC. What is Rag? Rag is retrievable augmented generation. This is an approach in natural language processing that combines the strength of two king components. Information retrieval, which fetches relevant data from an external knowledge baase, database, or document repository, and text generation, which is using a language model such as OpenAIs, GPT four Omni or whatever to generate human like text based on the retrieved information. How does RAG work? First, a user poses a question or query. Then we retrieve relevant documents from an external source and then we generate a response using a language model that uses the query of the user, and the retrieve context. So why do we have to use RAG? Because we can have access to up to date information, allowing AI systems to incorporate knowledge beyond their training data. It improves accuracy under responses because now responses are based on verified and retrievable sources and we can have domain adaptability. We can easily tailor the system to specific industries or topics by linking to specialized knowledge phases. This is rack in an image. The first step is to retrieve and ingest these documents, pass them through an embedding model and storing those embeddings into a vector database. The second step is the user poses a query. Then this user query goes through the same embedding model and then we're going to search in the vector database, context or passages or documents that are very similar to the query that the user is posing. When we have that context, we're going to have the query, the context, and some prompt that we are going to design. We're going to pass all of this information to the LLM, and the LLM is going to generate a response, and that response is going back to the user. That's Rag in a nutshell. I hope you like this video. See you in the next lesson. 5. What are agents?: Hi and welcome back. Now let's talk about agents. What is an agent? An agent is an automated reasoning and decision engine. It takes in a user input and can make internal decisions for executing that query in order to return the correct result. The key agent components can include but are not limited to breaking down a complex question into smaller ones, choosing an external tool to use, plus coming up with parameters for calling that tool, planning out a set of tasks, storing previously completed tasking, a memory module, et cetera. So agents share five fundamental building blocks. Perception, reasoning, memory, planning, and action. The first building block is perception. Perception is the agent's ability to gather information about its environment. This goal involve processing text queries, analyzing sensor data, interpreting images, or even reading structured data tables from a database. The more effectively an agent can perceive the richer the context it can understand. With a stronger perception, agents can better adapt to changes and respond accurately to evolving conditions. Then we have reasoning. Reasoning is where the agent makes sense of the information, it has perceived. This involves interpreting contexts, waving different options, and forming logical conclusions. Reasoning underpins an agent's intelligence. It ensures the agent doesn't just react blindly but evaluates scenarios to make informed decisions. Advanced reasoning often involves leveraging large language models or other AI frameworks to understand the nuances of a given situation. Memory, memory is the agent's way of retaining relevant information over time. This can include short term context like last user request and long term knowledge, like a database of past interactions or general industry expertise. Memory gives the agent a sense of continuity. Instead of treating each interaction as an isolated interaction, the agent can build upon previous experiences, improving its accuracy and context awareness as it goes. Then we have planning. Planning is where the agent decides what steps to take to achieve its goals. It might break down complex tasks into simpler steps, sequence them in an optimal order and anticipate potential roadlocks. Planning ensures that the agent isn't just reacting to one request at a time, but proactively charting a path towards longer term objectives. This is crucial for tasks like supply chain optimization or project management or any other scenario where action taken now have future applications. We have action. Action is the actual execution of the agent's decisions. For example, sending an email, adjusting inventory levels, recommending a product or performing a system level operation. Without any action, all the perception, reasoning memory, and planning in the world will be wasted. Action closes the loop and allows the agent to have a tangible impact on its environment, delivering a real world result. How do they work together then? The perception fits the agent with data. Memory stores and recalls useful information from both the immediate and distant past. Reasoning uses that data and context to form a plan, and the plan maps out the steps needed to achieve the agent's goals and the action executes on those steps creating measurable value. There are tons of use cases. Anything can be an agent nowadays. We have software engineering agents, AI phone agents, sales agents, research agents. That's what we're going to do right now. AI chief of staff which can streamline daily operations, that's pretty crazy. Sale research assistants, agent staff accountant, month and close AI assistant. There are a lot of use cases that agents can have nowadays. How AI agents will work in practice? Well, you have to train the AI agent. You have to provide your use case, data, and playbook to tailor the AI's capabilities to your specific needs. Input data such as transcripts, call recording, invoices, qualification criteria, and key objectives for accurate adaptation. Then you have to configure the workflows and the integrations. You have to align the AI agent with your existing tools and processes. For example, setting up SIMDs integrations with CRMs, calendars, and business systems while defining actions, alerts, and escalation protocols that match your team's requirements. Then you have to deploy and manage the operations. You have to launch the AI agent to handle operations autonomously, track its performance through real time metrics, evaluate outcomes, and refine processes to achieve optimal results. 6. Vector embeddings: Let's talk about vector embeddings. Vector embeddings are numerical representations of data. Data can be text, images, or audio in a high dimensional space. That means they are vectors with a fixed dimension. Each piece of data is converted into a vector and it captures its meaning, context, or feature. How it works similar items are represented as vectors closer to each other. That's how you enable easy comparison. For example, the word king and queen will have vectors that are close together reflecting their semantic similarity. How do you measure that similarity? Well, there are different metrics that you can use, like the dot product or the cosine similarity. What are vector indexes? They are data structures that organize these vector embeddings for efficient search and retrieval. They allow AI systems to find the most relevant data points quickly based on similarity measures like cosine similarity or dot product, these metrics that I was talking about. They spit up queering large datasets and they enable scalable real time retrieval of relevant information. For generating these vectorumbeddings, we need an embedding model. In this course, we're going to use the text embedding three large, which is an advanced embedding model developed by Open AI. It is designed to transform text into high dimensional vector representations, capturing the semantic meaning of the input as we already explained. What are the key features of this embedding model? It can generate high quality representations. It produces embeddings that effectively capture context, relationships, and meaning within the text. It is versatile, it is suitable for a wide range of tasks such as information retrieval, similarity search, and clustering, and the dimensionality that puts dense, high dimensional vectors optimized for downstream tasks. I hope you like this lesson in the next one. 7. Creating a tool to fetch papers from arXiv: So let's start coding. First of all, we're going to create a tools pile. From the terminal, touch tools dot PY. In this module, we're going to use the Archive library. So if we go to the documentation of the Archive library, you will see that Archive is just a Python wrapper for the Archive API. So Archive already has an API and this library is just a wrapper to use that API in a simpler way. So how do you install it with PIP? We did it with PDM. We import Archive. So let's do that. Import Archive. And then we can fetch results by first constructing the default API client. Let's build that client. Then, for example, we can search for the ten most recent articles matching the keyword quantum. Then we just search that and we can iterate over those results. Results is a generator, so we can iterate over its elements one by one. And there's also an advanced query syntax documentation and it tells us to see the Archive API user manual, which is this link here, and the query can be something like this. AU means author. If I'm not wrong, and TI I don't know. Let's go to that link and see those prefixes. AU means, TI means title, ABS, abstract, comment, et cetera. If we want to search for all of these things at the same time, we just say all so that's what we're going to do in our function. We're just going to use and column some topic. That's what we're going to do. Let's start coding. We're going to define this function called fetch archive papers, which is going to take two parameters. The first is the title or the query, whatever the second parameter is going to be the papers count. How many papers we want to fetch. So the search query is going to be all column, and the title. That's going to be our search query. Then we're going to do this. I don't know why this is not indenting by itself. So one thing that we need to do is select the Python interpreter. I'm going in Mac is Command Shift pin in Windows, I think it's Control Shift pin and we're going to tap here select interpreter and we're going to select this dot V&V file. Now it has more capabilities, the code editor because it knows we're using this virtual environment. This query is not going to be quantum, it's going to be search query. The max result is not going to be ten, it's going to be the papers count this parameter is going to be passed or constructed by the agent. You're going to see that in action in a few lessons and then sort by the submitted date. That's okay. Now we're going to initialize an empty array of papers and we're going to um get the results. Results client that results search. Now we have a generator and we're going to iterate over each of the results. For result in results, our paper info is going to be a dictionary with a title. Result that title Look at this result title and this has more stuff. How do we know what the result has? What other attributes it has? Well, if we hover over result, we can see that is of type result. What is the type result? Well, we can Control click or Command click on the archive module here with Command F or Control F here on Windows, we can search for document, sorry, not for document for result. So we have to get our hands dirty here and you can see that this is the definition of result. It has an entry ID, which is the URL, the updated, when the result was last updated, the published, when the result was original published, title, the authors, which is a list of authors, the summary, which is a string, comment, authors comment to present, journal rev, the toy, et cetera. Let's see what this author is. Author is another type definition which only has the name as an attribute. With all of this information, we're going to get more attributes. The second attribute is going to be summary, resolve that summary. We're also going to get the published result that published. We're going to get the journal ref, we sold the journal ref, we're going to get the Di, we solve that Di A obviously you have the auto completion from the code editor. We're going to have the primary category, primary category. We are going to have the categories, we solve that categories, and also GitHub copilot is helping me a lot. We solve that PDF URL, and the Archive Archive, resulted Archive, URL, and not the archive. It's the entry ID and the authors, which is going to be an array, author dot name for author in resulted authors. Remember that this was a list of authors. That's going to be our paper info, and we're going to append this paper info to the papers list. Pen paper ink. Finally, we're going to return all of the papers. This is our simple function to fetch archive papers, and as you will see, this is going to be also a tool for the agent. That's what we're going to do for this lesson. I hope you like it. See you in the next lesson. 8. Creating a tool to download papers: Hi and welcome back. Now we're going to code our second tool. As you can see, these are only Python functions, so they are very easy to code. This second function is going to be a download PDF tool, which is going to receive a PDF URL, which is going to be a string and an output file name, which is going to be as well as string. For this, we're going to use the requests library. To make a request to download the PDF, and we're also going to use the OS library to create a directory if it doesn't exist because we want to get our project organized. So we're going to, first of all, try to create a directory called papers. And if it already exists, then don't create it. Don't throw an error, live with it. Okay? And here, we're going to put the accept pass. We're going to specify the error layer. Now we're going to declare the full output path, and we're going to say OS, the path, the join, papers, and output filing. That is going to be the full output path, the papers folder concatenated with the output filing. Then we're going to get a response by using the request library, we're going to make a GET request to the PDF URL. And if there is any error, we're going to raise. Raise for status. So this method just raises H TTP error if one occurred. In the except, what we're going to do is accept requests dot exception, dot request exception as E. We're going to return the strings and error, and we're going to print that error. If nothing happens, then we are going to open the full output path with the right permissions, WB, and we're going to name that file and we're going to file that we the response dot content. And we have to return something. We have to return string saying PDF downloaded successfully and saved us and we're going to put the full output path here. Okay? So as you can see, this is a very, very simple Python function that just downloads something. It can download a paper, it can download adjacent, whatever. In this case, we're just going to download the paper from the PDF URL. That's it for this video. Very, very simple. I hope you like it and see you in the next one. 9. Defining the embed and LLM models: Okay, so let's continue coding. We're going to create a new file called constants in the terminal, touch Constance PY. In this file, we are going to declare the embed model and the LM model to reuse it when we build the index and when we build the agent. First of all, we're going to call the Load dot M function. First, let's import from the load dotM load dot. What this does is load all of the environment variable from the ENB file. Now we can access the OpenAI APIKey with the OS module. We're going to import and let Github Copilot help me write this os dot OpenAI APK. It matches this nice. Now we're going to say embed model is going to be open AI embedding and we're going to import that from Plama index dot embeddings OpenAI, open AI from there, and I'm missing embeddings to this here. Open AI embedding and also open AI embedding model type. We have something else. Embedding, no, this no. We don't need this. We only need these two. We're going to create an instance of the open AI embedding. We're going to pass the API key, which is going to be the open AI APN key, and the model is going to be open AI embedding model type. Look at this, we have Ava, Baba Cree, DaVinci. These are very old models. The newer ones are embed Ada 002 embed three small and bed three large. We're going to use the embed three large. And for the LM model, I'm going to import from Lama index dot OpenAI, Import Open AI. We're going to create an instance of open AI. Again, we have to pass the API key, which is going to be our AI, I API key, and the model, which in this case is going to be string, and it's going to be GPT for mini. Okay, so these are the two models from Open AI that we are going to use. I want you to notice something. When we imported things from embeddings and from LLMs, we only had the Open AI option. That's because by default, Lama Index only has that plugin by default, only the open AI stuff. But if you want to use, for example, clot or Mistral AI or whatever, then Lama has all of these connectors that we talked in this slide, but you have to install them separately. I hope you like this video. See you in the next one. 10. Building the index and saving it locally: Hi and welcome back. Now we are ready to build the index. In order to build the index, we're going to create a Jupiter notebook called build Index, the terminal, touch build index IPYNB. Great. First of all, we're going to add a cell. And here, if you don't have the VENV, you can click. It will probably tell you select kernel here and you're going to choose at VENB which is the virtual environment, sorry for this project. Make sure that dot VENV is selected. So first of all, we're going to build our index. That means our knowledge base. That will be papers of a specific topic. That we already have them in our database or repository. So first of all, from tools, we're going to import the fetch archive papers and we're going to fetch some papers. The topic is going to be or the title, language models, that's say language models. We want to learn about language models, and the papers count is going to be ten. Okay. You can download 100 papers if you want. The size or the number of papers doesn't matter. Remember, indexes are built to handle queries in large datasets. Okay, so we're going to execute the cell, and we're going to print the titles of the papers that we retrieve. So paper, title or paper in papers. So these are the papers that we fetch that are related to language models. Okay, now that we have this, we're going to create a function called create Documents from papers. And what is a document exactly? Well, that is just a generic interface for a data document. So this document just connects to data sources. We're going to pass text that is going to contain the information of the title of the paper, of the authors of the summary of the published, of the journal reference, DOI, the primary category, the categories, the PDF URL, and the archive URL. We're going to put all of that information into single string and then we're going to pass that single string to the document interface of Lama index. First of all, from Lama index dot core, we're going to import the document, and then we're going to create documents from papers, and we're going to pass the list of papers. First of all, we're going to initialize an empty list and then we're going to iterate over these papers. So the content is going to be a single string with the information of the title, and I'm going to let Github copilot do the boring work here. The authors is going to be a list of authors separated by a comma. Remember, authors is a list. Remember this? It's a list. We're also going to put the summary we're going to put the published information. We're going to put the journal reference, journal reference. We're going to put the Di the primary category. The categories as well is list. Although we didn't make any processing of that list here, we just put that and look at this. All of the results categories, that is a list of strings. We can join all of those strings by this command. We're going to have the PDF UL and also the archive URL. Which this time Githukpilot fail. Oh, no, it didn't fail. It's archive URL. Yeah, Archive URL. Okay, great. So now we have our string. What we're going to do now is append the content to a document. And the document disappeared, the import disappeared because I have a setting that removes unused documents save. Okay. So let's bring it back and say that the text is going to be content. And obviously we have to return this list, right. And now let's call that function. Let's say documents, sequel to create documents from papers, papers. Okay. And let's see this list. So this list is a list of this document object, and each document object has an ID. It doesn't have an embedding yet. It has empty metadata. This can be useful in many applications having metadata, but we are not setting any metadata to this document. Although if you want, you can put metadata here and put any Python dictionary here. A information you want. Let me close this again. It also has more attributes, but the one we are interest is the text resource, media resource, text, and this is the string that we build. You can see it can be a very long string. Okay. So now that we have this, we're going to build our index. So how do we do that? Let's first import from ama index dot core. Let's import settings, and let's import vectors store index. Also from constants, let's import the embed model because remember, we need to pass the text through an embed model. Okay. So first of all, we're going to say settings chunk size is going to be equal to 1024 settings Chunk overlap equal to 50. I'm going to explain what is this in a moment. Let me first create the index. I'm going to say vector store index from documents. We're going to pass the list of documents and the embed model is going to be the embed model that we instantiated in that constant. Remember this is text embed three large. Okay, so what are these chunk size and chunk overlap? Okay, so chunk size sets the chunk size property to this number. That means that the data, the text here will be processed in chunks of 1024 units. In this case, units means characters. If for example, this text has 2080 sorry, 2048 characters, then it's going to be split into two chunks, but not quite because we have this other setting called chunk overlap. The overlap means that there will be an overlap of 50 units between consecutive chunks. This can be useful for ensuring the continuity between chunks when processing data. That means that one chunk can have some context of the consecutive chunk and vice verse. So these two settings are very important. They are called hyperparameters because these can be 128 if you want to keep more context, but you're going to have more chunks. So these are good defaults for these two properties. Now that we have this, we already have our index. Great. So behind the scenes, this is actually calling the OpenAI API to convert all of these into vectors. It's using the text embedding three large embedding model. Convert everything into vectors. Now, we can store this index by using the storage context, that persist method, and we're going to store this in a folder called index. Right. Now we have this index folder with all of these JSON files, and this is something that we probably want to have in the Git Ignore. So let's add the index here in the Getting because this is dynamic. If you search for something else, the index is obviously going to change. That's it for building an index. You can see that with Lama index, this is so easy. This index obviously is a local index. We can use a cloud based index like Pine Cone, service like pinecone, or we can use more sophisticated tools like Chroma TB, which is also a local vector database, but we have to deploy that. There are other services, Cloud services like Vate to store these indexes in the cloud. They use AWS behind the scenes or GCP. But for now, we're going to store the index locally in this folder. I hope you like this video and see you in the next lesson. 11. Creating the RAG query engine tool: Hi and welcome back. Now we are going to start building the agent itself. Let's create another file called agent IPYNV. This time, here you have to select the kernel. Remember, select VENV. Great. First of all, we're going to load our index from storage. That's the first thing. Everything that is stored in these JSON files, we're going to load that. Lama Index has a method called load index from storage. So from Lama index dot core, we're going to import the Starch context and the load index from storage. We need to import the embed model so this starch context is going to be storge context from defaults, and the persist directory is going to be index. Now we can load the index with this load index from storage. We pass the storage context and we pass the embed model. Great. Now we have our index our local index loaded. What we're going to do now is to build a query engine to. We're going to see how that query engine tool works behind the scenes. From Lama index.co dot tools, we are going to import the query engine to. We're also going to import from the constants the LLM model. So the query engine is going to be the index, but this index has a method called as query engine. We have to pass the LLM model, which in our case is going to be GPT four Omni, and we can pass another parameter called the similarity. Top K, similarity, top K, and we're going to say five. We're going to retrieve at maximum five vectors when we submit a query, we're going to find a maximum five similar vectors. Now we're going to define the RAC tool as a query engine tool. And again, the import disappears, so we have to do it again. Core tools, import query engine to. From defaults, and the defaults are going to be the query engine. We have to also provide the name of this tool. So the name is going to be research paper, query engine tool. And it's also a good practice to give it a description. And this description is going to help the agent know what this tool is all about. So I'm going to say that this is a rag engine with recent research papers. So this is the tool the agent is going to use in order to fetch information in our existing database or in our existing repository. Now I want you to show I want to show you the prompts that this query engine uses behind the scenes. By default, Lama Index uses a refined prompt before returning an answer. And we're going to learn more about this in a moment. First of all, let me import from iPython that display. I'm going to import markdown and display. These are just utility functions to see things a little bit nicer here in the screen. I'm going to define this display prompt dictionary, and I'm going to pass prompt dictionary for key prompt and prompt digt that items. I'm going to display some markdown here. The markdown is going to be the prompt key. And all of this is going to make sense in a moment. Just bear with me. And prompt that get template. Now that we have defined this function, we're going to say prompts dictionary is going to be the query engine that get prompts, and we're going to display the prompts. Okay. Query engine. Okay, we haven't executed this cell. Okay. Here it is. So this query engine that we defined here has two prompts. The first one is this response synthesizer text QA template, and the second one is this response synthesizer refine template. So when we retrieve the relevant information, one chunk of information, it's going to use the most relevant chunk Right. It's going to answer the question or whatever query the user post using that chunk and using this template. Context information is below. And given the context information and not prior knowledge, answer the query. The query is whatever we as a user put and the answer is the LLM answer. After it has an answer, it's going to iterate over the other chunks, the other relevant pieces of information. Because remember, we're going to have five of these at maximum five, with the other four, it's going to use this template here. The original query is as follows. We have provided an existing answer so this is the answer from this prompt here, we have the opportunity to refine the existing answer only if needed with some more contexts below. This context is going to be another chunk, another relevant piece of information. Given the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer. This is called a response mode in Lama index and we have this documentation here. Response mode by default is refined, create and refine answer by sequentially going through each retrieve text chunk and this makes a separate LLM call for node or retrieve chunk. The details, the first chunk is used in a query using the text qa template. Using this template here, using the most relevant chunk is the LLM is going to retrieve an answer. Then the answer and the next chunk as well as the original question are using an query with the refined template prompt and so on until all chunks have been parsed. With the consecutive chunks, it's going to use this template here and it's going to have a refined answer. If a chunk is too large to fit within the window, considering the prom size, it is split using a token text splitter. Allowing some text overlap between chunks, and the new additional chunks are considered as chunks of the original chunks collection. This is only if the chunk is too large. There's another response mode called compact. So the compact, similar to refine, but compact concatenate the chunks beforehand, resulting in less LLM calls. So instead of going through the other four chunks separately, it's going to merge all of those other four chunks and run this prompt with those contents merged. So this is very important that you understand what's happening behind the scenes. An agent has the ability to correct itself by using this creative refined technique. We're going to keep this approach, and we're going to end the lesson here. I hope you like it and see you in the next lesson. 12. Building and interacting with the agent: Hi, and welcome back. Now we're going to define the other two tools that the agency is going to use. Let's import from tools, the download PDF, and the fetch archive papers. In order to define these tools, we're going to import from Lama index cord tools. We're going to import function tool. And we're going to define download PDF tool, which is going to be an instance of this function tool, and we have to pass the function itself, which is download PDF. We have, again, to give it a name. So I'm going to call this download PDF file tool, and also we have oh, sorry, function tool from defaults. And we also have to give it a description. It is a best practice to give it a description. So I'm going to say that this is a Python function. That downloads PDF file by link. That's our PDF download tool and we're going to also define another tool called the FEG Archive tool, which is going to be the same thing. We pass the PE Archive papers function, give it a name, fetch from Archive. And we're going to give you the description saying download the we can put here Max PursultsRcent papers regarding the topic. We can put that placeholder there from Archive. C. And we have to close this. Okay, so now that we have defined these two tools, we're going to create a new cell, and we're going to create the agent. So from Lama index dot core, that agent, we're going to import a react agent. So we have multiple things here, we're going to create an instance of a react agent. Why react? Because it's going to This agent operates in two main stages. The first stage is reasoning, so it receives a query, the agent evaluates whether it has enough information to answer directly or if it needs a tool, and then it acts. If the agent decides to use a tool, it executes the tool and then returns to the reasoning stage to determine whether it can now answer the query or if it needs more tools. So it's as easy as saying react agent from tools. And pass a list of tools. So download PDF two, Rack tool, and fetch Archive two. We have to provide an LM that this is going to use. We're going to pass our LM model, GPT 40 Mini. Last but not least, we're going to say Vervos true so that we know what's happening behind the scenes, all the logs that this agent throws out. And that's it. Now we have created an agent. So we can start chatting with our agent. For this, we're going to need a query template. Let's create a query template, and probably we're going to refine this in future lessons. So this is going to say, I am interested in some topic, right? Find papers in your knowledge database related to this topic. Use the following template to query research paper, query engine tool to. I'm going to say provide title, the summary, authors and link to download four papers. Let me see four papers related to topic. Period. I have Whoops. And I'm going to say, if there are not, could you fetch the recent one from Archive? From Archive. Okay. So, this is a query template I have written let's see if this works. Let's create a new self and say answer equals to agent that chat. We're going to pass the query template and we're going to format to give you the topic. The topic is going to be multi model models. I expect from this list of papers that, for example, multi model models is going to be retrieved. And probably something else. But I don't know what other papers are going to be retrieved because this is just the title. And remember these searches for the summary, it has summary, it has categories and all that stuff. So probably it's going to fetch other papers as well. So let's execute this, and you can see the output is saying running step and this ID, the step input is the query template with the topic, which is multi modal models. Now the thought of the agent is the current language of the user is English. I need to use a tool to help me answer the question. So the action it's going to take, it's going to use the research paper, query engine tool, and the input is going to be this, provide title summary, authors, and link to download for papers related to multimodal models. So what this is doing is using this query engine. Remember, the query engine uses GPT four omni, it has the ability to provide an answer following this template here. Following or giving the title summary authors and link to download the paper. The observation is that it will return. All of this is generated by GPT 40 M. It's giving the title, the summary, the authors, and the PDF URL. We can better visualize this response by using this Markdown class. I'm going to say Markdown, answer dot response. Okay. Now we can visualize this better. This is the response from GPT 40 Mini. It just gave me a list of four papers. This is great. In this lesson, we're going to end here. In the next lesson, we're going to see if it can download all of these papers. I hope you like it. See you in the next lesson. 13. Downloading the papers and fetching new papers: Hi and welcome back. Now, in this lesson, we are going to download all of these papers. The agent, remember one of the features of the agent is that it retains memory of the tasks that were already completed. Since the agent retains this chat history, we can request to then load the papers without mentioning them explicitly. Okay. So let's type in this new cell answer agent that chat and tell the agent download all the papers you mentioned. Let's see what happens. So it is running this step and the step input is to download all the papers you mentioned the action or the thought, first of all, is I need to download multiple PDF files based on the provided URLs for the papers related to multimodal models. The action is download PDF file two. The action input is this PDF URL, and the output file is this one. This is only for one paper. You can see that in this folder, it only downloaded one paper. And then it is going to the other paper, which is cross lingual text reach visual comprehension, which is the second one. But look at this. Now it is saying that the thug is saying that it can answer without any more tools, and it still uses the um, it is not going through an action. The answer is action download PDF two. It is not downloading the second file nor the third or the fourth one. We will fix this by doing some prompt engineering in this chat. But for now, let's continue and well, let's put the answer in markdown so that we can see this better. Can see that the answer is download this use the download PDF tool and it is telling us the last thing it did. This one. This is not accurate. We're going to fix this. But for now, let's see what happens if we ask about a topic that is not available in this list that we found. Let's go and interact with the agent once again and let's talk about agent, query template, and we're going to format with a different topic like quantum computing, something like this. None of these papers talk about quantum computing, I think. Let's see what happens if we talk about this topic that is not available. We're going to obviously see the answer better here. But let's see the reasoning process. Now the template is talking about quantum computing. The thought is the current language of the user is English. I need to use tool to help me answer the question. The action is to research paper, query engine tool. The input is this template, but it looks like there's nothing related to quantum computing. So the thought is, it seems there are no papers available in the knowledge database related to quantum computing. I will fetch recent papers from Archive. Now it is using the third tool, which is fetch from Archive, and the action input is the title quantum computing and the papers count five. Okay. So now I found these five papers here. You can see probing entanglement, scaling across a quantum phase transition on a quantum computer, uniform additivity, or whatever, this complicated stuff. But it definitely found new papers without us intervening in this process. This is cool. The next step in the next lesson, what we're going to do is to actually fix the problem of downloading the papers. But 14. Enhancing the prompt to download files: Hi, and welcome back. So first of all, let's do a commit, so we don't lose all of our changes. I'm going to delete this paper so that next time that we fix this papers issue, you're going to see that everything is fixed. So let's add everything. Let's add a commit message saying, build first version of agent, and let's do a push. At. Now, what we're going to do next is modify this prompt here, download prompt because right now it is very simple, and maybe that's not the best way to do it. Of course, you can use Chat GPT or clot to enhance this prompt here, that's what I did. In fact, the prompt that it came up with to fix this issue, was an iterative process, first of all. I tried this multiple times to fix this issue. I tried multiple prompts, and this is the one that fixed the issue. I also tried other approaches, and I'm going to invite you to try other approaches. We're going to discuss that in a moment. For now, I'm going to tell the prompt, download the following papers and for each paper. I'm going to say, first of all, oops. What's happening? Process one paper at a time. Second, let's do it like this. Like this. Yeah. Second, state which paper number you are processing out of the total. Third, I don't know why this is Okay. Complete a full download cycle before moving to the next vapor. Fourth, explicitly state when moving to the next paper, and fifth, provide a final summary only after all papers are downloaded. So these are going to be the new instructions for the download step. Okay? So let's see what happens now. I'm going to run everything again just to be clear, clear all outputs and run all and let's see what happens. It's fetching again the four papers related to multimodal models. Now it's here. It is saying the thought I will start downloading the papers one by one, processing each paper in the order they were listed. Action, downloading the PDF two, the second action is again downloading the PDF two. But now for the cross lingual text Rich visual, the third action is the same using the download PDF two but with these comprehensive multimodal prototypes, and the fourth action is the same but with a chat garment. Rat. Now it downloaded the four papers. Now let's see what happens when it fetches the quantum computing one. Oh, oops. Now it's downloading the quantum computing ones as well. So we fixed part of the problem. We managed to download the four papers that we explicitly set to download, but now it is also downloading the other ones. So now we have to do some other thing for avoiding this situation and you can see this also failed this last step. Let me delete all of the papers again, for some reason, here, it is not explicit that it doesn't have to download the papers. Again, I went through an iterative process. This is trial and error and this one is simpler. You just have to say, do not download papers. Unless the user asks for it explicitly. So by just telling the AI, Hey, do not download the papers unless the user says, so let's see what happens. I'm going to clear all oututs again and run all. This time, I had to modify the create template, not the download template. So now, it is again looking in the database or in the index. What are the multimodal models paper? It found the four papers, and let's see what happens. All of this process is prompt engineering, just as when you make prompt engineering for HAGPT, you can also do prompt engineering for agents. Let's see what's happening in the papers folder. I downloaded the four papers successfully. The multimodal papers. To 23 seconds. And now it is fetching for new papers related to quantum computing. What I expect now is that it doesn't download the papers of quantum computing. Here it is. You can see now that it this time only found one paper, but that's good. It at least didn't download all of the papers. That's a good sign. Let's do a commit here. SG commit, build second version of the and let's do push. This time with the papers included. That's it for this video. Hope you like it and see you in the next lesson. 15. Building a class to manage the index: Hi, and welcome back. So what we're going to do now is to create two classes. One class is going to be called the index manager, and the second class is going to be called the agent class. So these two classes, their purpose will be to capture all of the logic that we created here in these Jupiter notebooks into a class so that we can reuse that and we're going to build a stream lit up that uses these two classes so that things get a little bit easier to manage. So first of all, we're going to create the index manager class. So here we're going to define a class called index manager, and we're going to create a constructor. And here, we're just going to have as a parameter the embed model. Self dot embed model is going to be the one we pass here. We're also going to define this empty array of vapors. Okay. So now we're going to define a fetch papers method, which is going to receive as parameters the topic we want to fetch and the papers count, which we're going to default to ten as we did in the Jupiter notebooks. So self dot papers, instead of being now an empty array, is going to call these fetch archive papers with a topic, and the papers count. Topic. Great. And we need to col in here. Okay. Now that we have these fetch papers, we're also going to create this method called create documents from papers. This method is going to be the exact same as this one. Actually, let me just copy and paste everything. Indent this. But this time it's not going to receive papers as a parameter because papers is already initialized here. I'm just going to say for paper in self dot papers. Also I'm going to get rid of these documents here. We're going to initialize documents somewhere else. We're going to say self dot documents append document. We need to import document from Lama index, right? And we can return the documents or we just don't do it, that's up to you. I really don't care. I'm just going to return them. Okay, and it says documents doesn't exist. That's why that's because we haven't defined documents. So well, let's do it here. Doesn't harm anyone. Let's do it here. Self documents and terrain. Now we're going to create a create index method, Dev Create index, and this is going to call this create document from papers function, and it's also going to gather and execute this logic here. Okay. So something like this, went we invent this and we have to import settings and vector store index. Okay. And here, we're going to assign this to an instance variable called index. Also, embed model, we have that in the constructor, so we can call self that embed model, and documents is here, self the documents. So I prefer initializing this here. There you go, and there you go. This is just a matter of organization. Doesn't affect the result. But now we have this class that has this method to fetch the papers. I will populate the papers array. Then after executing this method, you can create the index, and then we need another method to retrieve the index. This method is going to do the same thing that we are doing here. This same logic, we're going to put it here and obviously we have to import things, vectors to index, load index from storage, for some reason, this got duplicated. Bed model is going to be self taught and Bt model. And what else? We're going to not assign this but return this rate. So I think this is everything we need to do for this class. We can also define an arr method just to print the titles of the papers. So I'm going to say list papers, and we're just going to copy the logic of this. Like print Paper tile for paper Iself the papers, to show things if you want. That's it for this class. In the next lesson, what we're going to do is create the agent class. Don't forget to make a commit adds index manager Cass. Push. That's it for this lesson. I hope you like it. See you in the next lesson. 16. Building a class to interact with the agent: Great. Now we're going to build another class called the agent class. Let's create another file called agent dot PY. Here we're going to define this class called agent. In destructor, we're going to get the index and the LLM model. Self dot index is going to be index and self dot LLM model is going to be LLM model. In Deconstructuor, we're going to build the query engine, the RAG tool, the PDF download tool, the Fetch archive tool, and the build agent method to build the agent. So basically, take out the logic from this Jupiter notebook into a class. So first of all, build, build query engine. What this is going to do is basically do this. Take this line of code and put it here. We're going to say self dot query engine is going to be equal to self dot index as query engine, self dot M module, and similarity top K equal to five. This can be also parameter of the constructor, but let's hard code it to five. The second method is going to be build RAC tool. And basically, it's going to be this. So self dot Rat is going to be equal to query engine tool. Let's import that in this file and query engine is going to be self dot query engine. Let's get rid of this and there you go. Now, the other method is going to be the built PDF download tool and it's basically going to be just the so we copy this, self dot download PDF to. We have to import function tool and we have to download PDF from the tools file here. There you go. Now, build fetch Archive tool. Again, this is going to be this. Let's just copy and paste it here. Say self dot fetch Archive tool and import the fetch archive from the tools file. There you go. Now we're going to define another method, which is going to be build agent. Build agent is going to be just this here. Let's just copy this, paste it here, and import this at the top. There you go. And these are going to be self dot download PDF tool, self dot g tool, self dot fetch archive tool and self dot LLM model rate. Again, all of these parameters like verbos true. They can be set in the initializer if you want. I'm just going to hard code them and say variables, varios always true. Now I'm going in the initializer to call all of these methods in that order. First, the query engine, then the build Rag tool, then the build PDF download tool, then build whatever pet archive tool, and finally, build the agent. Great. We're missing just one method here. When we initialize this agent class, the agent is going to be initialized automatically, but I want a chat method, which is going to receive a message. It's going to return self, the agent, the chat. Message. Basically, this interaction here, we're not going to pass these query templates anymore. We're just going to pass any message. Message is going to be basically any string here. Okay, so that's it for this class. Again, that's get at Git commit as agent class and push. That's it for this video. Hope you like it and see you in the next lesson where we're going to build a sprint let app using these two classes. 17. Building a chat UI with Streamlit: Hi, and welcome back. What we're going to do now is to build a stream let app. The Stream lead is a Python framework to build apps, especially data apps like they say here. It turns data scripts into sharable web apps in MD, all inter pith. No front end experience required. You go to the gallery section, you will see a lot of examples that people have built with stream it. For example, for LMS, they have built chatbots or chat GPT with memory, a lot of things. You can see the trending ones, Math GBT, portfolio, whatever. And it has a lot of compatibility or a lot of tools, I will say, to build LLM maps. Okay. So we're going to start building this chatbot to interact with the agent. First of all, we need to install streamlt as a dependency. So PDM add streamlet and wait for the dependency to be installed. In the meantime, we're going to create a file called app dot PY. Okay. So here, we're going to import our agent class, and also we're going to import our index manager class. Also, we're going to import from the constants, the embed model and the LM model. Okay. And we're also going to import streamlt as ST. That's how people in the Python community used to import this library. It's still not recognizing it because it's still being installed. And we're going to start building the app while this gets installed. So first of all, we need to understand a concept in streamlt which is session state. So in streamlt we build a script, and this script is run like if it were in a wild loop. So everything we write here will be recreated if we don't put those variables in what's called the session state. There's a way there are multiple ways to cache these variables, and one of those ways is by using a decorator. This decorator is called ST stream lit cache. Here we can define a function, and I'm going to call this function initialize agent. Because we want the agent to be initialized just once. That's why we're caching this resource. We're going to say index manager is equal to index manager, remember, we have to pass the embed model, and then we can get the index by calling the retrieve index method that we built for this class. Finally, we're going to return an agent with index and the LM model. Okay. And now we're going to initialize initialize the agent and the session state. Okay. So how do we do this? We say if the agent is not in the extremit dot session, session state, then we're going to say stream session state dot agent, initialized agent. The first time this script runs, agent is not going to be in the session state, so it will be initialized. We also need to initialize in the session state the messages of the chat. If messages not in stream session state, then messages is going to be an empty array. So stream lead has a way to build chats. We're going to see how it uses the concept of rows. So one part of the chat is going to be the user or the human, and the other part of the chat is going to be the assistant or the AI. So we're going to see how that works in a moment. Then what we're going to do now is to display the chat messages. Remember, all of this is going to be run like in a while loop. So we need to every time this script is run, print the messages. For message in session state that messages, this is the syntax we use to write the messages with SD and this has this chat message variable. And what do we have to put? The name? Name can be the user, the assistant, the AI, or human. User and human are the same thing. Assistant and AI are the same thing. But message is going to contain the role. So we're going to say message dot role. So message role is going to be either user or it's going to be assistant. How do we know that? Because we're going to define that in a moment. Just bear with me. So here to write something in the chat, we use this markdown method, and we're going to print the message content here. Okay. You could say that messages is going to be an array of dictionaries and each dictionary is going to have a role and a content. The role is going to be either user or assistant and the content is going to be the message itself, which can be in Markdown. Now we're going to build the core functionality of this. We're going to say if prompt, and we're going to call this input method. Ask me anything about research vapors. Okay. So what does this mean? This means that if this prompt variable is not initialized, this syntax means, okay, initialize it to be this to be what this chat input returns. This is just a placeholder as you're going to see in a moment. So what we type here is going to be assigned to this prompt. Okay. So now we're going to append to the session state messages, list this dictionary here here because as I am going to input something, I am the user. So the role will be the user and the content will be what's assigned in this prompt. Great. So we appended that, but now we have to show it in the screen. So with SD, the chat message, and this time it's going to be the user. I'm going to mark down the prompt. So I'm going to display what I typed, basically. Then I'm going to display what the assistant responds. With ST chat message, assistant, I'm going to get the answer and the answer is going to be retrieved with the agent and remember that we build this method chat chat and we pass the prompt and this returns something that has a response attribute. Now we're going to print down the answer and finally, we're going to append to the session state messages the answer, but with the role of the assistant. So that's everything. We just build this app in 33 lines of code. And in order to run this, make sure that in your terminal, close the terminal and make sure to open it again and that it shows this. This means that you are inside the virtual environment, sir. And if you don't see that, you can also type PDM use sorry, PDM Ben B ENV and PDM BENV activate. That's the command. So print the command to activate the virtual environment. So you will have to basically type this and paste it. Okay. So now I am inside a virtual environment. Okay. So you should see this. When you see this or if you don't have CSH or Mac, just type this. Obviously, with a correct command, you will type stream it run up dot PY. This will open this chat here, but something is wrong. So let me see what it's cache resource, not cache. So let's try again. Okay, so let's add something that I missed, and it's going to be a tile just to see this a little bit nicer. So st Tile Archive, Papers, Chatbot, yeah. That's a good name. So I hope if you're fresh, you see this title. Okay, so as I told you, this is a placeholder for the chat input. But I can type anything here. I'm going to say, can you fetch papers related to quantum mechanics? And remember, quantum mechanics is not in the knowledge base. So here this little icon shows for indicating that I am the user, and this is the assistant. And here, I'm going to add something else just to make it a little bit better. First of all, I think it returned the answer, but I want to be more user friendly. And here in this line, I'm going to say with st dot spinner, and I'm going to say thinking. So while this is not, while the agent is thinking, we're going to show spinner so that it's more user friendly. Let me just copy this and do it again. Thinking with a spinner. Because remember these takes sometime. This is a more user friendly way to show the result. Let's see what lots are. So yeah, I think it finished, or I think it erred for some reason. Let's stop this and run this again and ask the same question again. Can you fetch papers related to quantum mechanics? So it is using the fetch from Archive tool, but it's still thinking. So let's wait for a moment, and there you go. This is the answer. Okay, so now we have a user interface to see the results of our hard work. Hope you like this video, see you in the next lesson. 18. Getting an API key from Pinecone: Hi and welcome back. So we are going now to store our index into Pine Ce. So pine cone is a service in the cloud to store indexes. That means it's a vector database in the cloud. So as it says here, you can build knowledgeable AI with its vector database at the core. Pine cone is the leading knowledge platform for building accurate, secure and scalable AI applications. So you can create a free account. Obviously, this has a free tier. You can choose the start plan for trying out and for small applications. It's free. You have Pinec serverless, you have Pinec inference and assistant that are some products that they are building, and you have to use the region US east one from eight of us. You can start for free, create an account, then login. I'm going to do that by myself. Here, And then what we are going to do here is to create an index. Click on this button, create an index, and I'm going to say Archive research. Here you can configure the dimensions of those vectors. Remember, vectors have dimensions. Those are the embeddings, or you can choose a preset. Remember, we're using text embedding three large, so we can just choose this directly. And look at this. The dimensions is automatically populated with 3,072 dimensions. So that's the dimension of the embeddings when we use this embed model. Serverless. I'm going to choose Ws. I can choose other Cloud providers because I am on the paid plan and I can choose other regions. But if you're using the free tier, then you are only going to be able to use AWS USEStO you can enable delete protection to prevent any user from accidentally deleting this index. As I'm going to delete this either way, it doesn't matter. Okay. Now you created your index in the Cloud. Now you need some API keys. I created an API key. You can give it a name. You can give custom permissions or if you don't want to complicate yourself for now, just give it all permissions, and then you will have to copy that and copy it in a safe place. In the next lesson, what we're going to do is to create another class called Index manager pinecone that inherits from the index manager and make some tweaks in order to save the data into Pine Cone. 19. Creating an index manager for Pinecone: Hi, and welcome back. So first of all, let's do a commit to save our work. So get at Git commit, and I forgot what we did. So, oh, yeah, we built, streamed it up. That's what we did. We do a Git push. Okay. So what we're going to do now is to install two dependencies, first of all, that we need in order to use piece. First of all, we need the pine cone client, and we are also going to need the Lama index vector stores pinecone. So let's let PDM do the heavy work here and install those dependencies. The other thing is we're going to create an index manager pinecone dot Pyle. And we're going to create another class called index manager Pine cone. But we're going to use inheritance here and we're going to inherit from index manager. So we're going to inherit some methods from the index manager class. So in that constructor, we're going to need, again, the embed model, the index name of the pinecone that we just created. We're going to call the parent constructor, and the parent constructor only needs the embed model. And we are going to create an instance of pine cone. Right now, it is not showing because it's installing, but let's import it ways. So from Pine cone, we're going to import this pinecone. Well, it finished installed, so now it recognizes. So we're going to need the API key. And that is going to get from environment variables. We didn't do that. I don't know why it's doing this. We didn't do that. Let's put here in dot ENB example, pin cone, API key here in the ENB, we're going to put the same thing but with the actual value. So I already have my API key copied. So you should also copy it going to paste it here. And obviously, I'm going to delete it later. Okay. So in order to load that API key, we need deload dot m. So we're going to import from dot EM. We're going to port load dot M. We're going to call that to get the environment b from that EMV file. Great. Now what we're going to do is say self dot Pine Ce Index, pc dot index, and we're going to pass the index name. Then we're going to initialize the vector store as pinecone Vctor store, and we need to import that. We're going to import that from Lama index dot vectorstores dot Pine cone, Import Pine Cone Vector store. Okay. So this vector store, we need to pass the pinecone index, which is going to be self dot pinecone index, what we defined here. Also we need the storage context. I'm going to say self dot storage context equal to, and we need to import the storage context from Lama index dot core, import the starch context. Storage context from defaults, and we're going to pass this time the vector store. Okay. Great. So what we have done so far is to call the constructor of this index manager and assign some other variables here. Now we need to write the create index method from the index manager. Remember, we have this create index. We have to rewrite that because now we're going to store things in Pine C. So we do the same. We initialize an MTRray. We call the create documents from papers, and we set the settings to this. We have to import settings as well. Let's get settings from here as well. Last but not least, we are going to upsert the vectors. Self or vector sorry, vector store index, which I think we also need to import from here. Vector store index. That what we're going to do here is from documents as we did here. Okay. From documents. We're on the self dot index like that. Self dot documents. The storage context is going to be self dot storage context and the embed model or the embed the embed Model will be self dot embed model. Okay. There you go. In order to retrieve the index, we're just going to return vector store index dot from Vector store, and we're going to pass the vector store, which is self dot vector store and the embed which is going to be self dot Embed model. That's it. This is everything we need to do in order to, um upsert data and retrieve the index from Pin C. So if we go here to the database, archive for search, no records yet. So what we're going to do is to create a Jupiter notebook, which is going to be called Pine C and in this UPIter notebook, what we're going to do is to upsert the vectors. From constance, let's import the embed model. Let's select the kernel first. We're also going to from the index manager pinecone import the index, index manager pinecone. We're going to create an instance of that class index manager PyCon, we pass the embed model and we pass the index name, which in our case is Archive research, Archive research. And now we're going to fetch the papers fetch papers, language models. We're going to fetch the last 101. We fetch the papers and now we are ready to create the index. Let's create the index. You can see, oops, we got an error, and that is because it says, I think this needs to be embed model, not embed that model. Let's try again. Let's see if that solves the issue. Yeah, so for some reason, this is embed model, but auto completion didn't work for me. So embed model, not embed. Embed model will upsert the vectors. You can see now we have upserted those vectors into the default name space. And you can see here the vector. These are basically just values. There are 3,072 of these numbers here, and this has some metadata, node content, and node type, doc ID, document ID, et cetera. So basically, these are vectors. We have ten of them, ten vectors. These are embeddings. That's it. There's no more magic to it. We can also retrieve the index by doing this index manager retrieve Index and BLA we retrieve the index, and we can also print the list of the papers that we fetched. So you can see now it's different. You can see video Panda. Actually, we're going to do some change here because this is difficult to read. Let's see. We go to the index manager, and here we're going to say for paper for paper in papers. Just print the paper tile. Stuff the papers. Okay. So that's it. That's everything we need to do. This is not going to reflect sadly because we have to restart the kernel. So let's do this index manager. Sorry. And we're not going to fetch Well, yeah, let's fetch the papers, but let's create let's not create the index again. I'm just going to list the papers and here it is. These are the papers that we have right now. Now what we're going to do is to make a commit to save our progress. Add index manager Pinec and get push. That's it for this video. Hope you like it, and let's continue on the next lesson. 20. Using the Pinecone index in the Streamlit app: Great. Now we have our vectors in Pine cone. We have this data, these titles in Pine C. Test. Let's test if this is actually working. So in our streamlt app, when initializing our agent, instead of using the index manager, we're going to use the index manager pinecone. Okay. So we have to pass the index name, which is going to be Archive research. That's the only change we need to do in order to now get the data from Pin C. Let's test this streamlt run up dot PI. Let's see what the result is now. So I'm going to use this time this query template that we used last time, which is here. Let's see. So let's replace this. I am interested in multi model models and multi model models. So let's see if the agent is capable of retrieving information from this pinecone index and retrieve those papers that are related to multimodal models. If we see here, where is that? Icon. These are the papers of language models. It says that here are some recent papers related to multimodal models. Zero resource speech translation and recognition with LMS and long form speech generation with spoken language models. We have one of them. Where is that? This one, long form speech generation with spoken language models, and the other one is what's the other name? Zero resource speech translation, zero resource speech translation and recognition with LM. I could successfully retrieve the data from PyCon. Isn't that exciting. That's it. Now we have built an agent that stores the index in the cloud. It retrieves that and it works perfectly. I hope you like this lesson and see you in the next one. 21. Deploying the app to Streamlit Community Cloud: Hi, and welcome back. So now what we're going to do is to deploy our app to streamed Community Cloud. As you can see, in the landing page, deploying to the Community Cloud is free. It's free. So you only have to sign up, create an account, connect your GitHub account, and then you will be able to find the repo. It has to be in your Github account and deploy that app. It's a very, very easy process. So what I'm going to do right now before doing this is to create a requirements dot TXT file. Why? Because Streamlt Community Cloud expects a requirements text file to install all the dependencies. It doesn't recognize this pdmt log file, so maybe in the future, they will, but for now, they need the requirements dot textFile. So PDM has this export command. So if you type this, it's going to print that in the terminal, but we want that to be in a file. So we use this greater than symbol and say, Okay, we want that in requirements dot text. So if you open this file, it's going to have all the dependencies that are in the PDM log, but as a requirements dot TXT file. Let's do a commit a requirements dot TXT file. Let's push. And now let's go to the Streamed Community Cloud to the dashboard. Once you sign up, sign up, connect your GitHub account and you can click on this button that says create App. I'm going to choose this option, which is deploy a public app from GitHub. So I'm going to search for the Archive researcher Ripple. Remember, this is my Ripple. I'm going to say that the main file path is app dot PY, and this is a randomly chosen subdomain. Your domain is going to be this that stream app. You can click on Deploy and that's going to deploy your app. But this is not going to work right now because we have to set environment marbles. Remember that we need the OpenAI API key and the Pine hone API key for this to work. If you go back to share that stream dot IO and click on the project settings and go to secret, you're going to be able to set the secrets or the environment bubbles here. I'm going to copy and phase, but you have to put these in quotes. Otherwise, it's going to complain. Save changes, and now the app is going to deploy with those environment bubbles. I'm going to click here If you click on this manage app, you're going to see that it installed the dependencies from requirements. I install all of this and Python dependencies were installed from requirements dot TXT, and that's it. Now you can say hello to the chat. This is an LLM, so it will know that it doesn't need any knowledge base in order to respond to hello. Now we're going to use the template that we had here. But just to keep in mind that you don't need this, you can experiment with other templates, but this is the one that works right now, so I'm going to use this. I am interested in multi model models here, multi models. Let's see if it can process this. I expect logs to show here, but not sure why it's not showing. Let me just refresh this. I'm going to copy this and refresh just to see there's something to this. Yeah. If that doesn't work, I'm going to reboot the app to see if that works. Well, it worked this time, but I don't see the logs here. Usually the logs are shown here, but I'm not sure why they are not showing right now. But anyway, it responded with one of the papers from the knowledge base. Remember, this non form speech generation with the spoken language models was in the knowledge base. So that's it. Now you can share this link with your friends and let them test you app. I hope you like this video, see you in the next lesson. 22. Conclusion: Congratulations on finishing this course. Thank you for joining this journey to learn how to create AI agents. You know how the tools to build, enhance and deploy AI solutions with real world applications. Keep experimenting, stay curious and remember that the possibilities with AI are endless. What are the next steps? Apply your knowledge to real world projects. That's the best way to learn. Share your achievements and connect with the community and keep learning and stay updating on advancements of AI. Your feedback matters. Please take a moment to leave a review or share your thoughts. Your feedback helps improve this course and future content, and don't forget to stay connected. Reach out with questions, project ideas, or just to share your progress. Together, we can make AI accessible and I pat C.

Build an AI Agent (OpenAI, LlamaIndex, Pinecone & Streamlit)

David Armendariz

Watch this class and thousands more

Watch this class and thousands more

Lessons in This Class

1.

Introduction

1:46

2.

Setting up the development environment

15:19

3.

Getting an OpenAI API key

3:16

4.

Understanding LlamaIndex and RAG

3:39

5.

What are agents?

5:04

6.

Vector embeddings

2:37

7.

Creating a tool to fetch papers from arXiv

8:11

8.

Creating a tool to download papers

3:48

9.

Defining the embed and LLM models

3:40

10.

Building the index and saving it locally

11:55

11.

Creating the RAG query engine tool

10:33

12.

Building and interacting with the agent

9:03

13.

Downloading the papers and fetching new papers

5:15

14.

Enhancing the prompt to download files

7:39

15.

Building a class to manage the index

6:13

16.

Building a class to interact with the agent

4:53

17.

Building a chat UI with Streamlit

13:34

18.

Getting an API key from Pinecone

3:09

19.

Creating an index manager for Pinecone

11:29

20.

Using the Pinecone index in the Streamlit app

2:42

21.

Deploying the app to Streamlit Community Cloud

5:31

22.

Conclusion

0:53

About This Class

Meet Your Teacher

David Armendariz

Related Skills

Hands-on Class Project

Class Ratings

Why Join Skillshare?

Learn From Anywhere

Related Classes