Transcripts
1. Introduction: Hi, and welcome to this course, building an AI
agent with Open AI, Lama index, pine
cone, and streamlt. I am the Vidar
Mendais and I will be the instructor of this
course. Who am I? I have a bachelor of
science in mathematics, a master's degree
in data science and analytics, focusing on LMS. I'm a full stank
software engineer with more than six
years of experience. I AW certified, Ashur certified, and I am a cybersecurity
enthusiast. What are we going to
do in this course? We will create an
LLM agent based on OpenAI GPT 40 mini moodel. The agent's purpose will be to find and summarize
research papers from the archive platform
and we will use the Lama index framework in order to augment the
knowledge base of the agent. What are we going to
learn in this course? Going to learn basic
concepts of AI like vector embeddings,
vector indexes, retrieval augmented
generation or RAC, Crum templates, react agents, how to optimize the
agents instructions, vector databases, Lama index for
augmenting knowledge, stream need to build
a UI and deploy it, and Python and software
engineering best practices. What tools are we going to use? The agent will use three tools, a rack query engine to fetch information from
a knowledge base, a research paper fetch tool to research or find information
about papers that we don't have in our knowledge
base and a PDF download to if you want to download papers directly
into your machine. I hope you like
this course because I enjoyed a lot building it. I hope I can see you in
the next lesson. Bye.
2. Setting up the development environment: Hi and welcome back.
In this lesson, we're going to set up the
development environment. A few things you have
to keep in mind. We're going to use VS
code as a code editor. We're going to create
a GidHubRpo to share the final result with you
and we're going to use PDM, which stands for Python
Dependency Manager to manage the dependencies
of the Python project. We're also going to need an Open AI API key
to use their models. So what are the dependencies that we're going to
have in this project? We're going to have Archive, which is a library to download the papers from the
archive platform, python dot F to manage the
environment variables. In this case, we're going
to have the OpenAI API key. We want that to be a secret, so that's why we're
going to use Python Mv. We're going to use Notebook
because we're going to use Jupiter Notebook and we're also going to install
the Lemma Index, which is the framework
to help build LLM apps. So this is the boring
part of the course, but we have to do it. Let's go to github.com, and this is my GitHub account. You go to your GitHub account. If you don't have one,
you can create one, and we're going to
create a new repository. So the repository name, I'm going to call it
Archive researcher. You can call it
whatever you want. We're going to make it public so that I can share the
final result with you. We're going to add a Gid
Ignore for a Python project. License, non and these are
just apps that I have. Probably if you
don't have any apps, this is not going
to show for you, but just click on
Create Repositor. A brand new repo
is created for me. I'm going to open up a terminal, and I'm going to
click here on code, and I'm going to copy this SSH. If you don't have an SSH key, configure it in
your Gita account, then you should use HTTPS, but SSH is the
recommended way to do it. I'm going to copy this.Go to the terminal and say git
clone, and paste that. So now I have that repo in my local machine and I'm
going to open VS code here. As you can see, we
have the GTI nor, which is the same
that I have here. If I open that GTI nor, you're going to find
a lot of things that are usually ignored
in a Python project. So we're going to use Python dependency manager when we initialize a
project with PDM, these are the things that
are going to be ignored. Let's go and search
for PVM PVM Python. So the URL is pdmpject.org. If I click here, I'm going
to see the website of PDM. PDM, as described is a modern iPhone package
and dependency manager supporting the latest
PEP standards. But it is more than
a package manager. I boosts your development
workflow in various aspects. The main purpose of PDM is dependency
resolution, basically. Because when you
install packages, those packages may
depend on sub packages. So packages need to agree on which versions of sub
packages you have to install. That's basically the idea of dependency resolution PDM is a great tool for managing that. So how do you install this? First of all, you need Python 3.9 or later to be installed. It works on multiple platforms, including Windows,
Linux, and MacOS. As you may have noticed, I am using MacOS. Another thing that PDM does is that it can manage
multiple versions of Python. So for example, if I want to use Python 3.10 or 3.11 or 3.12, I can do so. We're going to see how
to do that in a moment. But I want you to go to the documentation and follow the steps for the recommended
installation method. It says like PIP, PDM provides an installation
script that will install PDM into an
isolated environment. For Linux and Mac, you just have to copy this command and paste
it in your terminal. In Windows, you can do
it with PowerShell. But if you are on Windows, I highly recommend that you use Windows Subsystem for Linux, so that you have a
Linux environment inside your Windows machine. If you don't want to do that, if you're not familiar with
Windows Subsystem for Linux, then you can install
PDM with PowerShell. Okay. So after you do that, you should have PDM
in your machine. There are also instructions
if you want to uninstall PDM, but I will say keep it. It's a great tool. So if I go to my
terminal and type PDM, me make this bigger. You're going to see that
this has a lot of options. PDM ad is one of the commands
that we're going to use, and this is for adding packages to a file called
Piroject dot Tumel. This is similar to package Jason if you come from
the world of No JS. PDM, another command that we're going to use is PDM in it, and that is to initialize a Pipe project that
Tumel for PDM. Another command that
we are going to use that we can
use is PDM Python, and that is for managing
the multiple versions of Python that I was telling
you some time ago. So let's see, another command which is very
important is PDM remove, which is used for removing packages from the
Pi project up tamelFle. Okay? So let's see what
happens if I type PDM Python. So if I type PDM Python, I will have additional
subcommands. The first one is list, and that is to list all the Python interpreters
installed with PDM. If I type PDM Python list, you'll see that I have
these four Python versions installed in my machine. 3.12 0.2, 3.11 0.5, 3.13 0.0, and 3.11 0.8. Now suppose I want to install
a new Python version, I can type PVM Python install I should specify the Python version
I want to install. If I type PDM Python
Install Help, then you are going to see
that I have this list flag. Let me type that PBM
Python Install list and you're going to see all of the Python versions
that are available. If I just type PDM Python
Install as I did here, it's going to install
the latest version which is Python 3.13 0.0. But suppose I want to
install Python 3.12 0.7. We're going to use
that version of Python in this project
and we just do PDM Python Install and you don't have to type
the whole C Python app. You just type 3.12 0.7 and
that should be enough. So it will download that version
of Python and install it and save the executable
in this folder here. Now we are ready to
initialize a PDM project. Let's go back to VS code and we are going to type in
the terminal PDM it. Now, if I make this
little bigger, it's going to prop me to choose the Python version to
use in this project. I'm going to use 3.12 0.7. That is this option here. I'm going to have
the option four. Okay. And that is
going to create this folder called VENV, which is the virtual
environment. This is where all the packages
are going to be installed, and then it is asking me what is going to
be the project name? I can just type Enter multiple times to keep the
default values here. Okay. So it creates a Pi
project that tumble pile. Again, if you come
from the NojS world, this is similar to a
package JSON file. Here you can see that it
has data about the project, the description of the project, the authors of the project, the dependencies, what version of Python it
requires, et cetera. What we're going
to do is right now install the dependencies that we're going to need
for this product. For that, I'm going to
say PDM at Archive, index, typon dot and Notebook. If I type Enter, you can see that it is
resolving for the environment. It is resolving all of those sub packages that
are going to be installed. And this can take a while. In the meantime, I
want you to notice at this PDM Python file. This is only telling
that the Python executable for this project
is inside this VENB file. You can see this is very
specific to my machine. That's why it is on the GTI no. When you selected that
Python, GTI nor template, it already has this
file in the template, so it is already ignored. Okay. Now let's see
here what's happening. It has passed 1 minute and it is still resolving
for the packages. Now, it made all of the
dependency resolution. It resolved 149 packages. That's a lot because we just installed four of
these packages. But these four
packages in total, they use behind the
scenes 149 packages. So PDM is intelligent
enough to resolve all of the versions so that we don't have any
conflicts in the versions. Everything got
installed successfully, no errors, and this pdm
dot log file got created. Again, if you come
from the No JS world, this is the package, the package log file. And this contains information about all of the dependencies, sub dependencies
that got installed. Last thing that we
are going to do, we're going to see we're going
to create a dot EMV file. And this EMB file is going
to contain our OpenAI API. We're going to tie OpenAI APIKey and here we're going to
paste our OpenAI APK. One stratege or one
best practice is to have atnbtEample file. So this EMB example
is not going to be in the GitknT can be safely
committed to the repo. We just put the same information as we have in the EMV file, but without the values. Here we didn't have
the value yet. Here we will never
have the value. That's a good practice
so that anyone that clones the repo knows that, hey, I have to have
an Open AI API king. That's it for this lesson. I hope you like it and see
you in the next lesson. H
3. Getting an OpenAI API key: Hi and welcome back.
Before we forget, let's get an Open AI API key. Go to open.com, go to
products and go to API Login. They change this all the time. If you don't have
an Open AI account, sign up and if you
have one, then login. Going to login with my account. And go to the dashboard. And go to API keys. First of all, you have
to have billing enabled. So go to settings, go to billing at the
payment method here, and you can put $5. I think it's the minimum. So don't forget to do that. Let's go back to Dashboard, API keys, and let's
create a new secret key. So this is going to be called
Archive and the project, you can scope these
API keys to projects. I'm not going to scope it for a specific product because
I don't have any projects. So I'm going to use a default project, and the permissions, you can be very strict
and choose what you want, for example, only for models, for audio, for chat
completions, for embeddings. I'm not going to
complicate myself. I'm just going to say, Oh, I'm going to create
the secret key and this is going to
be my secret key. Obviously, I'm
going to delete it after I finish
building this course. We now I'm going to copy it. I'm going back to Vs code
and in the dot EMV file, I'm just going to paste that. That's the API key I'm going
to use before we forget, let's do a comic. Let's type kit, have everything. Git commit and
we're going to say a initialized project with
PDM and adds dependencies. Then we can push
these chains. Okay. Now if we go to
Github and refresh, we're going to see that the
dotnB file is not committed. TheNB example is committed, but that doesn't have anything. So that's safe. Otherwise, people will see your API key and
that will be bad. Okay, so that's everything
for this lesson. I hope you like it and see
you in the next lesson.
4. Understanding LlamaIndex and RAG: Hi and welcome back.
So what is Lama index? First of all, what is
the problem with LMS? They are great, but they are pre trained on large amounts of
publicly available data. How do we best augment LMS
with our own private data? That's where Lama
index comes in. We need a comprehensive
toolkit to help perform this data
augmentation for LLMs. So Lama Index offers data
connectors to ingest your existing data sources
and data formats like APIs, PDFs, dogs, even SQL data. It also provides
ways to structure your data so that it can
be easily used with LMS. We're going to see that
the most common way of structuring this data
is through a vector index, and it also provides an advanced retrieval query
interface over your data. You can fit in any
LM input prompt, get back context and
knowledge augmented output. We cannot talk about ama index
without talking about RAC. What is Rag? Rag is retrievable
augmented generation. This is an approach in natural
language processing that combines the strength
of two king components. Information retrieval,
which fetches relevant data from an
external knowledge baase, database, or document
repository, and text generation, which is using a language
model such as OpenAIs, GPT four Omni or
whatever to generate human like text based on
the retrieved information. How does RAG work? First, a user poses a question or query. Then we retrieve
relevant documents from an external
source and then we generate a response using a language model that uses
the query of the user, and the retrieve context. So why do we have to use RAG? Because we can have access
to up to date information, allowing AI systems to incorporate knowledge
beyond their training data. It improves accuracy
under responses because now responses
are based on verified and retrievable sources and we can have
domain adaptability. We can easily tailor
the system to specific industries or topics by linking to specialized
knowledge phases. This is rack in an image. The first step is to retrieve
and ingest these documents, pass them through
an embedding model and storing those embeddings
into a vector database. The second step is the
user poses a query. Then this user
query goes through the same embedding model and then we're going to search
in the vector database, context or passages or documents that are very similar to the query that the
user is posing. When we have that context, we're going to have the query, the context, and some prompt
that we are going to design. We're going to pass all of
this information to the LLM, and the LLM is going to
generate a response, and that response is
going back to the user. That's Rag in a nutshell. I hope you like this video. See you in the next lesson.
5. What are agents?: Hi and welcome back. Now
let's talk about agents. What is an agent? An agent is an automated
reasoning and decision engine. It takes in a user
input and can make internal decisions for executing that query in order to
return the correct result. The key agent components
can include but are not limited to breaking down a complex question
into smaller ones, choosing an external
tool to use, plus coming up with parameters
for calling that tool, planning out a set of tasks, storing previously
completed tasking, a memory module, et cetera. So agents share five
fundamental building blocks. Perception, reasoning,
memory, planning, and action. The first building
block is perception. Perception is the
agent's ability to gather information
about its environment. This goal involve
processing text queries, analyzing sensor data,
interpreting images, or even reading structured
data tables from a database. The more effectively
an agent can perceive the richer the
context it can understand. With a stronger perception, agents can better
adapt to changes and respond accurately to
evolving conditions. Then we have reasoning. Reasoning is where
the agent makes sense of the information,
it has perceived. This involves
interpreting contexts, waving different options, and forming logical conclusions. Reasoning underpins an
agent's intelligence. It ensures the agent
doesn't just react blindly but evaluates scenarios to make informed decisions. Advanced reasoning often
involves leveraging large language models or other AI frameworks to understand the nuances
of a given situation. Memory, memory is
the agent's way of retaining relevant
information over time. This can include
short term context like last user request
and long term knowledge, like a database of past interactions or
general industry expertise. Memory gives the agent
a sense of continuity. Instead of treating
each interaction as an isolated interaction, the agent can build upon
previous experiences, improving its accuracy and
context awareness as it goes. Then we have planning.
Planning is where the agent decides what steps to take
to achieve its goals. It might break down complex
tasks into simpler steps, sequence them in
an optimal order and anticipate
potential roadlocks. Planning ensures
that the agent isn't just reacting to one
request at a time, but proactively charting a path towards longer term objectives. This is crucial for tasks like supply chain
optimization or project management or
any other scenario where action taken now
have future applications. We have action. Action is the actual execution of
the agent's decisions. For example, sending an email, adjusting inventory
levels, recommending a product or performing a
system level operation. Without any action,
all the perception, reasoning memory, and planning in the world will be wasted. Action closes the loop
and allows the agent to have a tangible impact
on its environment, delivering a real world result. How do they work together then? The perception fits
the agent with data. Memory stores and recalls useful information from both the immediate
and distant past. Reasoning uses that data
and context to form a plan, and the plan maps out the
steps needed to achieve the agent's goals and the action executes on those steps
creating measurable value. There are tons of use cases. Anything can be an
agent nowadays. We have software
engineering agents, AI phone agents, sales
agents, research agents. That's what we're
going to do right now. AI chief of staff which can streamline daily operations,
that's pretty crazy. Sale research assistants,
agent staff accountant, month and close AI assistant. There are a lot of use cases that agents can have nowadays. How AI agents will
work in practice? Well, you have to
train the AI agent. You have to provide
your use case, data, and playbook to tailor
the AI's capabilities to your specific needs. Input data such as
transcripts, call recording, invoices, qualification
criteria, and key objectives for
accurate adaptation. Then you have to configure the workflows and
the integrations. You have to align the AI agent with your existing
tools and processes. For example, setting
up SIMDs integrations with CRMs, calendars, and business systems
while defining actions, alerts, and escalation protocols that match your
team's requirements. Then you have to deploy
and manage the operations. You have to launch the AI agent to handle operations
autonomously, track its performance
through real time metrics, evaluate outcomes, and refine processes to achieve
optimal results.
6. Vector embeddings: Let's talk about
vector embeddings. Vector embeddings are numerical
representations of data. Data can be text, images, or audio in a high
dimensional space. That means they are vectors
with a fixed dimension. Each piece of data is
converted into a vector and it captures its meaning,
context, or feature. How it works similar items are represented as vectors
closer to each other. That's how you enable
easy comparison. For example, the word king
and queen will have vectors that are close together reflecting their
semantic similarity. How do you measure
that similarity? Well, there are different
metrics that you can use, like the dot product or
the cosine similarity. What are vector indexes? They are data structures
that organize these vector embeddings for efficient search and retrieval. They allow AI systems to find the most
relevant data points quickly based on
similarity measures like cosine similarity
or dot product, these metrics that I
was talking about. They spit up queering
large datasets and they enable scalable
real time retrieval of relevant information. For generating these
vectorumbeddings, we need an embedding model. In this course,
we're going to use the text embedding three large, which is an advanced
embedding model developed by Open AI. It is designed to
transform text into high dimensional vector
representations, capturing the
semantic meaning of the input as we
already explained. What are the key features
of this embedding model? It can generate high
quality representations. It produces embeddings that
effectively capture context, relationships, and
meaning within the text. It is versatile,
it is suitable for a wide range of tasks such as information retrieval,
similarity search, and clustering, and the
dimensionality that puts dense, high dimensional vectors
optimized for downstream tasks. I hope you like this
lesson in the next one.
7. Creating a tool to fetch papers from arXiv: So let's start coding. First of all, we're going
to create a tools pile. From the terminal,
touch tools dot PY. In this module, we're going
to use the Archive library. So if we go to the documentation
of the Archive library, you will see that Archive is just a Python wrapper
for the Archive API. So Archive already has an
API and this library is just a wrapper to use that
API in a simpler way. So how do you
install it with PIP? We did it with PDM. We import Archive.
So let's do that. Import Archive. And then we can fetch results by first constructing the
default API client. Let's build that client. Then, for example,
we can search for the ten most recent articles matching the keyword quantum. Then we just search that and we can iterate
over those results. Results is a generator, so we can iterate over
its elements one by one. And there's also an advanced
query syntax documentation and it tells us to see the
Archive API user manual, which is this link here, and the query can be
something like this. AU means author.
If I'm not wrong, and TI I don't know. Let's go to that link
and see those prefixes. AU means, TI means title, ABS, abstract,
comment, et cetera. If we want to search for all of these things
at the same time, we just say all so that's what we're going
to do in our function. We're just going to use
and column some topic. That's what we're going to
do. Let's start coding. We're going to define this function called
fetch archive papers, which is going to
take two parameters. The first is the
title or the query, whatever the second parameter is going to be the papers count. How many papers
we want to fetch. So the search query is going to be all column, and the title. That's going to be
our search query. Then we're going to do this. I don't know why this is
not indenting by itself. So one thing that we need to do is select the
Python interpreter. I'm going in Mac is Command
Shift pin in Windows, I think it's Control Shift pin and we're going to tap here select interpreter
and we're going to select this dot V&V file. Now it has more capabilities, the code editor because it knows we're using this
virtual environment. This query is not
going to be quantum, it's going to be search query. The max result is
not going to be ten, it's going to be the papers count this parameter is going to be passed or constructed
by the agent. You're going to see
that in action in a few lessons and then sort
by the submitted date. That's okay. Now we're going to initialize
an empty array of papers and we're going
to um get the results. Results client that
results search. Now we have a generator
and we're going to iterate over each
of the results. For result in results, our paper info is going to be
a dictionary with a title. Result that title Look at this result title and
this has more stuff. How do we know what
the result has? What other attributes it has? Well, if we hover over result, we can see that is of type result. What
is the type result? Well, we can Control
click or Command click on the archive module here with Command F or Control
F here on Windows, we can search for document, sorry, not for
document for result. So we have to get our
hands dirty here and you can see that this is the
definition of result. It has an entry ID,
which is the URL, the updated, when the result was last updated, the published, when the result was
original published, title, the authors, which is a list
of authors, the summary, which is a string, comment, authors comment to present, journal rev, the toy, et cetera. Let's see what this author is. Author is another
type definition which only has the
name as an attribute. With all of this information, we're going to get
more attributes. The second attribute
is going to be summary, resolve that summary. We're also going to get the published result
that published. We're going to get
the journal ref, we sold the journal ref, we're going to get the Di, we solve that Di A obviously you have the auto completion
from the code editor. We're going to have the primary category,
primary category. We are going to have
the categories, we solve that categories, and also GitHub copilot
is helping me a lot. We solve that PDF URL, and the Archive Archive, resulted Archive, URL,
and not the archive. It's the entry ID
and the authors, which is going to be an array, author dot name for author
in resulted authors. Remember that this was
a list of authors. That's going to be
our paper info, and we're going to append this paper info to
the papers list. Pen paper ink. Finally, we're going to
return all of the papers. This is our simple function
to fetch archive papers, and as you will see,
this is going to be also a tool for the agent. That's what we're going
to do for this lesson. I hope you like it. See
you in the next lesson.
8. Creating a tool to download papers: Hi and welcome back. Now we're going to code our second tool. As you can see, these are
only Python functions, so they are very easy to code. This second function is going
to be a download PDF tool, which is going to
receive a PDF URL, which is going to be a string
and an output file name, which is going to be
as well as string. For this, we're going to
use the requests library. To make a request to
download the PDF, and we're also going
to use the OS library to create a directory
if it doesn't exist because we want to
get our project organized. So we're going to, first of all, try to create
a directory called papers. And if it already exists, then don't create it. Don't throw an error, live with it. Okay? And here, we're going
to put the accept pass. We're going to specify
the error layer. Now we're going to declare
the full output path, and we're going to
say OS, the path, the join, papers,
and output filing. That is going to be
the full output path, the papers folder concatenated
with the output filing. Then we're going to get a response by using
the request library, we're going to make a GET
request to the PDF URL. And if there is any error,
we're going to raise. Raise for status. So this method just raises H
TTP error if one occurred. In the except, what
we're going to do is accept requests dot exception, dot request exception as E. We're going to return
the strings and error, and we're going to
print that error. If nothing happens, then
we are going to open the full output path with
the right permissions, WB, and we're going to name that file
and we're going to file that we the
response dot content. And we have to return something. We have to return string saying PDF downloaded successfully and saved us and we're going to put the
full output path here. Okay? So as you can
see, this is a very, very simple Python function that just downloads something. It can download a paper, it can download
adjacent, whatever. In this case, we're
just going to download the paper from the PDF URL. That's it for this video. Very, very simple. I hope you like it and see
you in the next one.
9. Defining the embed and LLM models: Okay, so let's continue coding. We're going to create
a new file called constants in the terminal,
touch Constance PY. In this file, we are going to
declare the embed model and the LM model to reuse it when we build the index and
when we build the agent. First of all, we're going to call the Load dot M function. First, let's import from
the load dotM load dot. What this does is load all of the environment variable
from the ENB file. Now we can access the OpenAI
APIKey with the OS module. We're going to import and let Github Copilot help me write
this os dot OpenAI APK. It matches this nice. Now we're going to say embed
model is going to be open AI embedding and we're going
to import that from Plama index dot
embeddings OpenAI, open AI from there, and I'm missing
embeddings to this here. Open AI embedding and also
open AI embedding model type. We have something else.
Embedding, no, this no. We don't need this. We
only need these two. We're going to
create an instance of the open AI embedding. We're going to pass the API key, which is going to be
the open AI APN key, and the model is going to be open AI embedding model type. Look at this, we have
Ava, Baba Cree, DaVinci. These are very old models. The newer ones are embed Ada 002 embed three small
and bed three large. We're going to use the
embed three large. And for the LM model, I'm going to import from Lama index dot OpenAI,
Import Open AI. We're going to create
an instance of open AI. Again, we have to
pass the API key, which is going to be
our AI, I API key, and the model, which in this
case is going to be string, and it's going to
be GPT for mini. Okay, so these are the two models from Open AI
that we are going to use. I want you to notice something. When we imported things from
embeddings and from LLMs, we only had the Open AI option. That's because by default, Lama Index only has
that plugin by default, only the open AI stuff. But if you want to
use, for example, clot or Mistral AI or whatever, then Lama has all of these connectors that we
talked in this slide, but you have to install
them separately. I hope you like this video.
See you in the next one.
10. Building the index and saving it locally: Hi and welcome back. Now we
are ready to build the index. In order to build the index, we're going to create
a Jupiter notebook called build Index, the terminal, touch
build index IPYNB. Great. First of all, we're
going to add a cell. And here, if you don't have
the VENV, you can click. It will probably tell you select kernel here and
you're going to choose at VENB which is the
virtual environment, sorry for this project. Make sure that dot
VENV is selected. So first of all, we're
going to build our index. That means our knowledge base. That will be papers
of a specific topic. That we already have them in
our database or repository. So first of all, from tools, we're going to import
the fetch archive papers and we're going to
fetch some papers. The topic is going
to be or the title, language models, that's
say language models. We want to learn about
language models, and the papers count
is going to be ten. Okay. You can download
100 papers if you want. The size or the number of
papers doesn't matter. Remember, indexes are built to handle queries in
large datasets. Okay, so we're going
to execute the cell, and we're going to print the titles of the
papers that we retrieve. So paper, title or
paper in papers. So these are the papers that we fetch that are related
to language models. Okay, now that we have this, we're going to create a function called create
Documents from papers. And what is a document exactly? Well, that is just
a generic interface for a data document. So this document just
connects to data sources. We're going to pass text that is going to contain the information of
the title of the paper, of the authors of the
summary of the published, of the journal reference,
DOI, the primary category, the categories, the PDF URL, and the archive URL. We're going to put all of that information into
single string and then we're going to pass that single string to the document interface
of Lama index. First of all, from
Lama index dot core, we're going to
import the document, and then we're going to
create documents from papers, and we're going to pass
the list of papers. First of all, we're going to
initialize an empty list and then we're going to
iterate over these papers. So the content is going to be a single string with the
information of the title, and I'm going to let Github copilot do the
boring work here. The authors is going to be a list of authors
separated by a comma. Remember, authors is a list. Remember this? It's a list. We're also going to put the summary we're going to put
the published information. We're going to put the
journal reference, journal reference. We're going to put the
Di the primary category. The categories as well is list. Although we didn't make any
processing of that list here, we just put that
and look at this. All of the results categories, that is a list of strings. We can join all of those
strings by this command. We're going to have the PDF
UL and also the archive URL. Which this time
Githukpilot fail. Oh, no, it didn't fail. It's archive URL. Yeah, Archive URL. Okay, great. So now
we have our string. What we're going to do now is append the content
to a document. And the document disappeared, the import disappeared
because I have a setting that removes
unused documents save. Okay. So let's bring it back and say that the text
is going to be content. And obviously we have to
return this list, right. And now let's call
that function. Let's say documents,
sequel to create documents from papers, papers. Okay. And let's see this list. So this list is a list
of this document object, and each document
object has an ID. It doesn't have
an embedding yet. It has empty metadata. This can be useful in many
applications having metadata, but we are not setting any
metadata to this document. Although if you want, you
can put metadata here and put any Python dictionary here. A information you want.
Let me close this again. It also has more attributes, but the one we are interest
is the text resource, media resource, text, and this is the string that we build. You can see it can be
a very long string. Okay. So now that we have this, we're going to build our index. So how do we do
that? Let's first import from ama index dot core. Let's import settings, and let's import vectors
store index. Also from constants, let's import the embed model
because remember, we need to pass the text
through an embed model. Okay. So first of all,
we're going to say settings chunk size
is going to be equal to 1024 settings Chunk
overlap equal to 50. I'm going to explain what
is this in a moment. Let me first create the index. I'm going to say vector
store index from documents. We're going to pass the list of documents and the embed model is going to be the
embed model that we instantiated in that constant. Remember this is text
embed three large. Okay, so what are these chunk
size and chunk overlap? Okay, so chunk size sets the chunk size property
to this number. That means that the data, the text here will be processed
in chunks of 1024 units. In this case, units
means characters. If for example, this
text has 2080 sorry, 2048 characters, then it's going to be
split into two chunks, but not quite because we have this other setting
called chunk overlap. The overlap means
that there will be an overlap of 50 units
between consecutive chunks. This can be useful for ensuring the continuity between
chunks when processing data. That means that
one chunk can have some context of the consecutive
chunk and vice verse. So these two settings
are very important. They are called
hyperparameters because these can be 128 if you
want to keep more context, but you're going to
have more chunks. So these are good defaults
for these two properties. Now that we have this, we already have our index. Great. So behind the scenes, this is actually calling the OpenAI API to convert
all of these into vectors. It's using the text embedding three large
embedding model. Convert everything into vectors. Now, we can store this index by using
the storage context, that persist method, and we're going to store this
in a folder called index. Right. Now we have
this index folder with all of these JSON files, and this is something that we probably want to have
in the Git Ignore. So let's add the index here in the Getting
because this is dynamic. If you search for
something else, the index is obviously
going to change. That's it for building an index. You can see that with Lama
index, this is so easy. This index obviously
is a local index. We can use a cloud based index like Pine Cone,
service like pinecone, or we can use more sophisticated
tools like Chroma TB, which is also a local
vector database, but we have to deploy that. There are other services, Cloud services like Vate to store these
indexes in the cloud. They use AWS behind
the scenes or GCP. But for now, we're
going to store the index locally
in this folder. I hope you like this video and see you in the next lesson.
11. Creating the RAG query engine tool: Hi and welcome back. Now we are going to start
building the agent itself. Let's create another
file called agent IPYNV. This time, here you have
to select the kernel. Remember, select VENV. Great. First of all, we're going to
load our index from storage. That's the first thing. Everything that is stored
in these JSON files, we're going to load that. Lama Index has a method called
load index from storage. So from Lama index dot core, we're going to import
the Starch context and the load index from storage. We need to import
the embed model so this starch context
is going to be storge context from defaults, and the persist directory
is going to be index. Now we can load the index with this load
index from storage. We pass the storage
context and we pass the embed model. Great. Now we have our index
our local index loaded. What we're going to do now is
to build a query engine to. We're going to see how
that query engine tool works behind the scenes. From Lama index.co dot tools, we are going to import
the query engine to. We're also going to import from the constants the LLM model. So the query engine is
going to be the index, but this index has a method
called as query engine. We have to pass the LLM model, which in our case is going
to be GPT four Omni, and we can pass another
parameter called the similarity. Top K, similarity, top K, and we're going to say five. We're going to retrieve at maximum five vectors
when we submit a query, we're going to find a maximum
five similar vectors. Now we're going to define the RAC tool as a
query engine tool. And again, the
import disappears, so we have to do it again. Core tools, import
query engine to. From defaults, and the defaults are going to be
the query engine. We have to also provide
the name of this tool. So the name is going to be research paper,
query engine tool. And it's also a good practice
to give it a description. And this description
is going to help the agent know what
this tool is all about. So I'm going to say that
this is a rag engine with recent research papers. So this is the tool the agent
is going to use in order to fetch information in our existing database or in
our existing repository. Now I want you to
show I want to show you the prompts that this query engine uses
behind the scenes. By default, Lama Index uses a refined prompt before
returning an answer. And we're going to learn
more about this in a moment. First of all, let me import
from iPython that display. I'm going to import
markdown and display. These are just
utility functions to see things a little bit
nicer here in the screen. I'm going to define this
display prompt dictionary, and I'm going to pass
prompt dictionary for key prompt and
prompt digt that items. I'm going to display
some markdown here. The markdown is going
to be the prompt key. And all of this is going
to make sense in a moment. Just bear with me. And prompt that get template. Now that we have
defined this function, we're going to say prompts
dictionary is going to be the query engine
that get prompts, and we're going to
display the prompts. Okay. Query engine. Okay, we haven't
executed this cell. Okay. Here it is. So this query engine
that we defined here has two prompts. The first one is this response synthesizer
text QA template, and the second one
is this response synthesizer refine template. So when we retrieve the
relevant information, one chunk of information, it's going to use the most
relevant chunk Right. It's going to answer the
question or whatever query the user post using that chunk
and using this template. Context information is below. And given the
context information and not prior knowledge,
answer the query. The query is whatever we as a user put and the answer
is the LLM answer. After it has an answer, it's going to iterate
over the other chunks, the other relevant
pieces of information. Because remember, we're going to have five of these
at maximum five, with the other four, it's going to use this template here. The original query
is as follows. We have provided
an existing answer so this is the answer
from this prompt here, we have the
opportunity to refine the existing answer only if needed with some
more contexts below. This context is going
to be another chunk, another relevant
piece of information. Given the new context, refine the original answer
to better answer the query. If the context isn't useful, return the original answer. This is called a
response mode in Lama index and we have
this documentation here. Response mode by
default is refined, create and refine answer by
sequentially going through each retrieve text
chunk and this makes a separate LLM call for
node or retrieve chunk. The details, the first
chunk is used in a query using the
text qa template. Using this template here, using the most relevant chunk is the LLM is going to
retrieve an answer. Then the answer
and the next chunk as well as the original
question are using an query with the refined
template prompt and so on until all chunks
have been parsed. With the consecutive chunks, it's going to use this
template here and it's going to have a refined answer. If a chunk is too large
to fit within the window, considering the prom size, it is split using a
token text splitter. Allowing some text
overlap between chunks, and the new
additional chunks are considered as chunks of the
original chunks collection. This is only if the
chunk is too large. There's another response
mode called compact. So the compact,
similar to refine, but compact concatenate
the chunks beforehand, resulting in less LLM calls. So instead of going through the other four
chunks separately, it's going to merge all of
those other four chunks and run this prompt with
those contents merged. So this is very
important that you understand what's happening
behind the scenes. An agent has the ability to correct itself by using this
creative refined technique. We're going to keep
this approach, and we're going to
end the lesson here. I hope you like it and see
you in the next lesson.
12. Building and interacting with the agent: Hi, and welcome back.
Now we're going to define the other two tools that the agency is going to use. Let's import from tools, the download PDF, and the
fetch archive papers. In order to define these tools, we're going to import from
Lama index cord tools. We're going to import
function tool. And we're going to define
download PDF tool, which is going to be an
instance of this function tool, and we have to pass
the function itself, which is download PDF. We have, again, to
give it a name. So I'm going to call this
download PDF file tool, and also we have oh, sorry, function
tool from defaults. And we also have to
give it a description. It is a best practice to
give it a description. So I'm going to say that
this is a Python function. That downloads PDF file by link. That's our PDF download
tool and we're going to also define another tool
called the FEG Archive tool, which is going to
be the same thing. We pass the PE Archive
papers function, give it a name,
fetch from Archive. And we're going to give
you the description saying download the we can put here Max PursultsRcent
papers regarding the topic. We can put that placeholder
there from Archive. C. And we have to close this. Okay, so now that we have
defined these two tools, we're going to
create a new cell, and we're going to
create the agent. So from Lama index
dot core, that agent, we're going to import
a react agent. So we have multiple things here, we're going to create an
instance of a react agent. Why react? Because it's going to This agent operates
in two main stages. The first stage is reasoning, so it receives a query, the agent evaluates
whether it has enough information to answer directly or if it needs a tool, and then it acts. If the agent decides
to use a tool, it executes the tool and then returns to the
reasoning stage to determine whether
it can now answer the query or if it
needs more tools. So it's as easy as saying
react agent from tools. And pass a list of tools. So download PDF two, Rack tool, and
fetch Archive two. We have to provide an LM
that this is going to use. We're going to pass our
LM model, GPT 40 Mini. Last but not least, we're going to say Vervos true so that we know what's
happening behind the scenes, all the logs that this
agent throws out. And that's it. Now we have created an agent. So we can start chatting
with our agent. For this, we're going to
need a query template. Let's create a query template, and probably we're going to refine this in future lessons. So this is going to say, I am interested in
some topic, right? Find papers in your
knowledge database related to this topic. Use the following template
to query research paper, query engine tool to. I'm going to say provide
title, the summary, authors and link to
download four papers. Let me see four papers
related to topic. Period. I have Whoops. And I'm going to say,
if there are not, could you fetch the
recent one from Archive? From Archive. Okay. So, this is a query
template I have written let's see if this works. Let's create a new self and say answer equals
to agent that chat. We're going to pass the query template and we're going to format
to give you the topic. The topic is going to
be multi model models. I expect from this
list of papers that, for example, multi model models
is going to be retrieved. And probably something else. But I don't know
what other papers are going to be retrieved
because this is just the title. And remember these
searches for the summary, it has summary, it has
categories and all that stuff. So probably it's going to
fetch other papers as well. So let's execute this, and you can see the output is saying running step and this ID, the step input is the query
template with the topic, which is multi modal models. Now the thought of the agent is the current
language of the user is English. I need to use a tool to help
me answer the question. So the action it's
going to take, it's going to use the research
paper, query engine tool, and the input is going to be
this, provide title summary, authors, and link
to download for papers related to
multimodal models. So what this is doing is
using this query engine. Remember, the query engine
uses GPT four omni, it has the ability to provide an answer following
this template here. Following or giving the
title summary authors and link to download the paper. The observation is
that it will return. All of this is
generated by GPT 40 M. It's giving the
title, the summary, the authors, and the PDF URL. We can better visualize
this response by using this Markdown class. I'm going to say Markdown,
answer dot response. Okay. Now we can
visualize this better. This is the response
from GPT 40 Mini. It just gave me a list of
four papers. This is great. In this lesson, we're
going to end here. In the next lesson,
we're going to see if it can download
all of these papers. I hope you like it. See
you in the next lesson.
13. Downloading the papers and fetching new papers: Hi and welcome back.
Now, in this lesson, we are going to download
all of these papers. The agent, remember one of the features of
the agent is that it retains memory of the tasks
that were already completed. Since the agent retains
this chat history, we can request to
then load the papers without mentioning
them explicitly. Okay. So let's type in this new cell answer
agent that chat and tell the agent download all
the papers you mentioned. Let's see what happens. So it is running this step and the step input is to
download all the papers you mentioned the action
or the thought, first of all, is I need to
download multiple PDF files based on the provided URLs for the papers related to
multimodal models. The action is download
PDF file two. The action input
is this PDF URL, and the output file is this one. This is only for one paper. You can see that in this folder, it only downloaded one paper. And then it is going
to the other paper, which is cross
lingual text reach visual comprehension,
which is the second one. But look at this. Now
it is saying that the thug is saying that it can answer without
any more tools, and it still uses the um, it is not going
through an action. The answer is action
download PDF two. It is not downloading the second file nor the
third or the fourth one. We will fix this by doing some prompt engineering
in this chat. But for now, let's
continue and well, let's put the answer in markdown so that we can see this better. Can see that the answer
is download this use the download PDF tool and it is telling us
the last thing it did. This one. This is not accurate.
We're going to fix this. But for now, let's see
what happens if we ask about a topic that is not available in this
list that we found. Let's go and interact
with the agent once again and let's
talk about agent, query template,
and we're going to format with a different topic like quantum computing,
something like this. None of these papers talk about quantum
computing, I think. Let's see what happens if we talk about this topic
that is not available. We're going to obviously
see the answer better here. But let's see the
reasoning process. Now the template is talking
about quantum computing. The thought is the
current language of the user is English. I need to use tool to help
me answer the question. The action is to research
paper, query engine tool. The input is this template, but it looks like there's nothing related
to quantum computing. So the thought is, it
seems there are no papers available in the
knowledge database related to quantum computing. I will fetch recent
papers from Archive. Now it is using the third tool, which is fetch from Archive, and the action input is the title quantum computing
and the papers count five. Okay. So now I found
these five papers here. You can see probing
entanglement, scaling across a quantum phase transition on
a quantum computer, uniform additivity, or whatever,
this complicated stuff. But it definitely found new papers without us
intervening in this process. This is cool. The next
step in the next lesson, what we're going to
do is to actually fix the problem of
downloading the papers. But
14. Enhancing the prompt to download files: Hi, and welcome back.
So first of all, let's do a commit, so we don't lose
all of our changes. I'm going to delete this paper so that next time that we
fix this papers issue, you're going to see that
everything is fixed. So let's add everything. Let's add a commit
message saying, build first version of agent, and let's do a push. At. Now, what we're going to do next is modify this prompt here, download prompt because
right now it is very simple, and maybe that's not
the best way to do it. Of course, you can
use Chat GPT or clot to enhance this prompt
here, that's what I did. In fact, the prompt that it came up with
to fix this issue, was an iterative
process, first of all. I tried this multiple
times to fix this issue. I tried multiple prompts, and this is the one
that fixed the issue. I also tried other approaches, and I'm going to invite you
to try other approaches. We're going to discuss
that in a moment. For now, I'm going
to tell the prompt, download the following
papers and for each paper. I'm going to say, first of
all, oops. What's happening? Process one paper at a time. Second, let's do it like this. Like this. Yeah. Second, state which paper number you are
processing out of the total. Third, I don't know
why this is Okay. Complete a full download cycle before moving to the next vapor. Fourth, explicitly state when
moving to the next paper, and fifth, provide
a final summary only after all papers
are downloaded. So these are going to
be the new instructions for the download step. Okay? So let's see what happens now. I'm going to run everything
again just to be clear, clear all outputs and run all
and let's see what happens. It's fetching again the
four papers related to multimodal models.
Now it's here. It is saying the thought I will start downloading the
papers one by one, processing each paper in
the order they were listed. Action, downloading the PDF two, the second action is again
downloading the PDF two. But now for the cross
lingual text Rich visual, the third action
is the same using the download PDF two but with these comprehensive
multimodal prototypes, and the fourth action is the same but with
a chat garment. Rat. Now it downloaded
the four papers. Now let's see what happens when it fetches the quantum
computing one. Oh, oops. Now it's downloading the
quantum computing ones as well. So we fixed part of the problem. We managed to download the four papers that we
explicitly set to download, but now it is also
downloading the other ones. So now we have to do
some other thing for avoiding this situation
and you can see this also failed this last step. Let me delete all of
the papers again, for some reason, here, it is not explicit that it doesn't have to
download the papers. Again, I went through
an iterative process. This is trial and error
and this one is simpler. You just have to say,
do not download papers. Unless the user asks
for it explicitly. So by just telling the AI, Hey, do not download the papers
unless the user says, so let's see what happens. I'm going to clear all
oututs again and run all. This time, I had to modify
the create template, not the download template. So now, it is again looking in the
database or in the index. What are the multimodal
models paper? It found the four papers, and let's see what happens. All of this process is
prompt engineering, just as when you make prompt
engineering for HAGPT, you can also do prompt
engineering for agents. Let's see what's happening
in the papers folder. I downloaded the four
papers successfully. The multimodal papers. To 23 seconds. And now it is fetching for new papers related to
quantum computing. What I expect now is
that it doesn't download the papers of quantum
computing. Here it is. You can see now that it
this time only found one paper, but that's good. It at least didn't download all of the papers.
That's a good sign. Let's do a commit here. SG commit, build second version
of the and let's do push. This time with the
papers included. That's it for this video. Hope you like it and see
you in the next lesson.
15. Building a class to manage the index: Hi, and welcome back. So what we're going to do now is
to create two classes. One class is going to be
called the index manager, and the second class is going to be called
the agent class. So these two classes, their purpose will
be to capture all of the logic that we
created here in these Jupiter notebooks
into a class so that we can reuse that and
we're going to build a stream lit up that uses these two classes so that things get a little bit
easier to manage. So first of all, we're going to create the index manager class. So here we're going to define a class called index manager, and we're going to
create a constructor. And here, we're just going to have as a parameter
the embed model. Self dot embed model is going
to be the one we pass here. We're also going to define
this empty array of vapors. Okay. So now we're going to
define a fetch papers method, which is going to receive
as parameters the topic we want to fetch and
the papers count, which we're going to default to ten as we did in the
Jupiter notebooks. So self dot papers, instead of being
now an empty array, is going to call these fetch
archive papers with a topic, and the papers count. Topic. Great. And we
need to col in here. Okay. Now that we have
these fetch papers, we're also going to create this method called create
documents from papers. This method is going to be
the exact same as this one. Actually, let me just copy
and paste everything. Indent this. But this time
it's not going to receive papers as a parameter because papers is already
initialized here. I'm just going to say for
paper in self dot papers. Also I'm going to get rid
of these documents here. We're going to initialize
documents somewhere else. We're going to say self dot
documents append document. We need to import document
from Lama index, right? And we can return the documents or we just don't do
it, that's up to you. I really don't care. I'm
just going to return them. Okay, and it says
documents doesn't exist. That's why that's because we
haven't defined documents. So well, let's do it here. Doesn't harm anyone.
Let's do it here. Self documents and terrain. Now we're going to create a create index method,
Dev Create index, and this is going to call this create document
from papers function, and it's also going to gather and execute
this logic here. Okay. So something like this, went we invent this
and we have to import settings and
vector store index. Okay. And here, we're going to assign this to an instance
variable called index. Also, embed model, we have
that in the constructor, so we can call self
that embed model, and documents is here,
self the documents. So I prefer
initializing this here. There you go, and there you go. This is just a matter
of organization. Doesn't affect the result. But now we have this class that has this method to
fetch the papers. I will populate
the papers array. Then after executing
this method, you can create the index, and then we need another
method to retrieve the index. This method is going to do the same thing
that we are doing here. This same logic, we're
going to put it here and obviously we have
to import things, vectors to index, load
index from storage, for some reason,
this got duplicated. Bed model is going to be
self taught and Bt model. And what else? We're going to not assign
this but return this rate. So I think this is everything we need to
do for this class. We can also define an arr method just to print
the titles of the papers. So I'm going to say list papers, and we're just going to
copy the logic of this. Like print Paper tile for
paper Iself the papers, to show things if you want. That's it for this class. In the next lesson,
what we're going to do is create the agent class. Don't forget to make a commit adds index
manager Cass. Push. That's it for this lesson. I hope you like it. See
you in the next lesson.
16. Building a class to interact with the agent: Great. Now we're going to build another class called
the agent class. Let's create another file
called agent dot PY. Here we're going to define
this class called agent. In destructor, we're going to get the index
and the LLM model. Self dot index is going to be index and self dot LLM model
is going to be LLM model. In Deconstructuor,
we're going to build the query engine, the RAG tool, the PDF download tool, the Fetch archive tool, and the build agent method
to build the agent. So basically, take out the logic from this Jupiter
notebook into a class. So first of all, build,
build query engine. What this is going to do
is basically do this. Take this line of
code and put it here. We're going to say
self dot query engine is going to be equal to self
dot index as query engine, self dot M module, and similarity top
K equal to five. This can be also parameter
of the constructor, but let's hard code it to five. The second method is going
to be build RAC tool. And basically, it's
going to be this. So self dot Rat is going to be equal
to query engine tool. Let's import that in this file and query engine is going to be self
dot query engine. Let's get rid of this
and there you go. Now, the other method
is going to be the built PDF download
tool and it's basically going to be
just the so we copy this, self dot download PDF to. We have to import
function tool and we have to download PDF from
the tools file here. There you go. Now, build fetch Archive tool. Again, this is going to be this. Let's just copy
and paste it here. Say self dot fetch
Archive tool and import the fetch archive
from the tools file. There you go. Now we're going
to define another method, which is going to
be build agent. Build agent is going
to be just this here. Let's just copy this, paste it here, and import this
at the top. There you go. And these are going to be
self dot download PDF tool, self dot g tool, self dot fetch archive tool
and self dot LLM model rate. Again, all of these
parameters like verbos true. They can be set in the
initializer if you want. I'm just going to hard code them and say variables,
varios always true. Now I'm going in
the initializer to call all of these
methods in that order. First, the query engine, then the build Rag tool, then the build PDF
download tool, then build whatever
pet archive tool, and finally, build the agent. Great. We're missing
just one method here. When we initialize
this agent class, the agent is going to be
initialized automatically, but I want a chat method, which is going to
receive a message. It's going to return self, the agent, the chat. Message. Basically, this
interaction here, we're not going to pass these
query templates anymore. We're just going to
pass any message. Message is going to be
basically any string here. Okay, so that's it
for this class. Again, that's get at Git commit
as agent class and push. That's it for this
video. Hope you like it and see you in
the next lesson where we're going
to build a sprint let app using these two classes.
17. Building a chat UI with Streamlit: Hi, and welcome back.
What we're going to do now is to build
a stream let app. The Stream lead is a Python
framework to build apps, especially data apps
like they say here. It turns data scripts into sharable web apps in
MD, all inter pith. No front end
experience required. You go to the gallery section, you will see a lot of
examples that people have built with stream it. For example, for LMS, they have built chatbots
or chat GPT with memory, a lot of things. You can see the trending ones, Math GBT, portfolio, whatever. And it has a lot of
compatibility or a lot of tools, I will say, to build LLM maps. Okay. So we're going to start building this chatbot to
interact with the agent. First of all, we need to install streamlt as a dependency. So PDM add streamlet and wait for the dependency
to be installed. In the meantime, we're
going to create a file called app dot PY. Okay. So here, we're going to
import our agent class, and also we're going to import
our index manager class. Also, we're going to
import from the constants, the embed model and
the LM model. Okay. And we're also going
to import streamlt as ST. That's how people in the Python community used
to import this library. It's still not recognizing it because it's still
being installed. And we're going to start building the app while
this gets installed. So first of all, we need to understand a concept in streamlt
which is session state. So in streamlt we
build a script, and this script is run like
if it were in a wild loop. So everything we write here
will be recreated if we don't put those variables in what's called
the session state. There's a way there
are multiple ways to cache these variables, and one of those ways is
by using a decorator. This decorator is called
ST stream lit cache. Here we can define a function, and I'm going to call this
function initialize agent. Because we want the agent to
be initialized just once. That's why we're
caching this resource. We're going to say
index manager is equal to index
manager, remember, we have to pass the embed model, and then we can get
the index by calling the retrieve index method
that we built for this class. Finally, we're going
to return an agent with index and the LM model. Okay. And now we're going to initialize initialize the
agent and the session state. Okay. So how do we do this? We say if the agent is not
in the extremit dot session, session state, then
we're going to say stream session state dot
agent, initialized agent. The first time this script runs, agent is not going to be
in the session state, so it will be initialized. We also need to initialize in the session state the
messages of the chat. If messages not in
stream session state, then messages is going
to be an empty array. So stream lead has a
way to build chats. We're going to see how it
uses the concept of rows. So one part of the chat is going to be
the user or the human, and the other part
of the chat is going to be the assistant or the AI. So we're going to see how
that works in a moment. Then what we're going
to do now is to display the chat messages. Remember, all of
this is going to be run like in a while loop. So we need to every time this script is run,
print the messages. For message in session
state that messages, this is the syntax
we use to write the messages with SD and this has this chat
message variable. And what do we have to put? The name? Name can be the user, the assistant, the AI, or human. User and human are
the same thing. Assistant and AI
are the same thing. But message is going
to contain the role. So we're going to say
message dot role. So message role is going to be either user or it's going to be assistant.
How do we know that? Because we're going to define that in a moment.
Just bear with me. So here to write
something in the chat, we use this markdown method, and we're going to print
the message content here. Okay. You could say that
messages is going to be an array of dictionaries and each dictionary
is going to have a role and a content. The role is going to
be either user or assistant and the content is going to be the
message itself, which can be in Markdown. Now we're going to build the
core functionality of this. We're going to say if prompt, and we're going to call
this input method. Ask me anything about
research vapors. Okay. So what does this mean? This means that if this prompt variable
is not initialized, this syntax means, okay, initialize it to be this to be what this chat input returns. This is just a placeholder as you're going to
see in a moment. So what we type here is going to be assigned to this prompt. Okay. So now we're going to append to the session
state messages, list this dictionary here here because as I am going to input something,
I am the user. So the role will be the
user and the content will be what's assigned
in this prompt. Great. So we appended that, but now we have to
show it in the screen. So with SD, the chat message, and this time it's
going to be the user. I'm going to mark
down the prompt. So I'm going to display
what I typed, basically. Then I'm going to display
what the assistant responds. With ST chat message, assistant, I'm going to get the
answer and the answer is going to be retrieved
with the agent and remember that we
build this method chat chat and we pass the prompt and this returns something that
has a response attribute. Now we're going to print
down the answer and finally, we're going to append to the session state
messages the answer, but with the role
of the assistant. So that's everything. We just build this app in
33 lines of code. And in order to run this, make sure that in your terminal, close the terminal and make sure to open it again and
that it shows this. This means that you are inside the virtual environment, sir. And if you don't see that, you can also type PDM use sorry, PDM Ben B ENV and
PDM BENV activate. That's the command. So print the command to activate
the virtual environment. So you will have to basically type this
and paste it. Okay. So now I am inside a
virtual environment. Okay. So you should see this. When you see this or if you don't have CSH
or Mac, just type this. Obviously, with a
correct command, you will type stream
it run up dot PY. This will open this chat here, but something is wrong. So let me see what it's
cache resource, not cache. So let's try again. Okay, so let's add
something that I missed, and it's going to be
a tile just to see this a little bit nicer. So st Tile Archive,
Papers, Chatbot, yeah. That's a good name. So I hope if you're fresh,
you see this title. Okay, so as I told you, this is a placeholder
for the chat input. But I can type anything here. I'm going to say, can you fetch papers related to
quantum mechanics? And remember,
quantum mechanics is not in the knowledge base. So here this little icon shows for indicating
that I am the user, and this is the assistant. And here, I'm going
to add something else just to make it a
little bit better. First of all, I think
it returned the answer, but I want to be
more user friendly. And here in this line, I'm going to say
with st dot spinner, and I'm going to say thinking. So while this is not, while the agent is thinking, we're going to show spinner so that it's more user friendly. Let me just copy this
and do it again. Thinking with a spinner. Because remember
these takes sometime. This is a more user friendly
way to show the result. Let's see what lots are. So yeah, I think it finished, or I think it erred
for some reason. Let's stop this and run this again and ask
the same question again. Can you fetch papers related
to quantum mechanics? So it is using the fetch from Archive tool, but
it's still thinking. So let's wait for a moment, and there you go.
This is the answer. Okay, so now we have a user interface to see the
results of our hard work. Hope you like this video, see you in the next lesson.
18. Getting an API key from Pinecone: Hi and welcome back.
So we are going now to store our
index into Pine Ce. So pine cone is a service in
the cloud to store indexes. That means it's a vector
database in the cloud. So as it says here, you can build knowledgeable AI with its vector
database at the core. Pine cone is the leading
knowledge platform for building accurate, secure and scalable
AI applications. So you can create
a free account. Obviously, this has a free tier. You can choose the
start plan for trying out and for small
applications. It's free. You have Pinec serverless, you have Pinec inference and assistant that are some products
that they are building, and you have to use the region US east
one from eight of us. You can start for free, create
an account, then login. I'm going to do that by myself. Here, And then what we are going to do here
is to create an index. Click on this button,
create an index, and I'm going to say
Archive research. Here you can configure the
dimensions of those vectors. Remember, vectors
have dimensions. Those are the embeddings, or you can choose a preset. Remember, we're using text
embedding three large, so we can just choose this
directly. And look at this. The dimensions is automatically populated with 3,072 dimensions. So that's the dimension of the embeddings when we
use this embed model. Serverless. I'm
going to choose Ws. I can choose other Cloud
providers because I am on the paid plan and I
can choose other regions. But if you're using
the free tier, then you are only going
to be able to use AWS USEStO you can enable delete protection to prevent any user from accidentally
deleting this index. As I'm going to delete this either way, it doesn't matter. Okay. Now you created
your index in the Cloud. Now you need some API keys. I created an API key.
You can give it a name. You can give custom permissions or if you don't want to
complicate yourself for now, just give it all permissions, and then you will have to copy that and copy it
in a safe place. In the next lesson,
what we're going to do is to create another
class called Index manager pinecone that inherits from the
index manager and make some tweaks in order to save the data into Pine Cone.
19. Creating an index manager for Pinecone: Hi, and welcome back.
So first of all, let's do a commit
to save our work. So get at Git commit, and I forgot what we did. So, oh, yeah, we
built, streamed it up. That's what we did.
We do a Git push. Okay. So what we're going to do now is to install
two dependencies, first of all, that we need
in order to use piece. First of all, we need
the pine cone client, and we are also going to need the Lama index vector
stores pinecone. So let's let PDM do the heavy work here and
install those dependencies. The other thing is
we're going to create an index manager
pinecone dot Pyle. And we're going to create another class called
index manager Pine cone. But we're going to use
inheritance here and we're going to inherit
from index manager. So we're going to inherit some methods from the
index manager class. So in that constructor, we're going to need, again, the embed model, the index name of the pinecone that
we just created. We're going to call the
parent constructor, and the parent constructor
only needs the embed model. And we are going to create
an instance of pine cone. Right now, it is not showing
because it's installing, but let's import it ways. So from Pine cone, we're going to import
this pinecone. Well, it finished installed, so now it recognizes. So we're going to
need the API key. And that is going to get from environment variables.
We didn't do that. I don't know why it's doing
this. We didn't do that. Let's put here in dot
ENB example, pin cone, API key here in the ENB, we're going to put
the same thing but with the actual value. So I already have
my API key copied. So you should also copy it
going to paste it here. And obviously, I'm going
to delete it later. Okay. So in order to
load that API key, we need deload dot m. So we're going to
import from dot EM. We're going to port load
dot M. We're going to call that to get the environment
b from that EMV file. Great. Now what
we're going to do is say self dot Pine Ce Index, pc dot index, and we're going
to pass the index name. Then we're going to
initialize the vector store as pinecone Vctor store, and we need to import that. We're going to import that from Lama index dot vectorstores
dot Pine cone, Import Pine Cone
Vector store. Okay. So this vector store, we need to pass the
pinecone index, which is going to be
self dot pinecone index, what we defined here. Also we need the
storage context. I'm going to say self dot
storage context equal to, and we need to import the storage context from
Lama index dot core, import the starch context. Storage context from defaults, and we're going to pass
this time the vector store. Okay. Great. So what we have
done so far is to call the constructor of
this index manager and assign some other
variables here. Now we need to write the create index method
from the index manager. Remember, we have
this create index. We have to rewrite that because now we're going to store things in Pine C. So we do the same. We initialize an MTRray. We call the create
documents from papers, and we set the settings to this. We have to import
settings as well. Let's get settings
from here as well. Last but not least, we are going to
upsert the vectors. Self or vector sorry, vector store index, which I think we also need
to import from here. Vector store index. That what we're going to do here is from documents
as we did here. Okay. From documents. We're on the self
dot index like that. Self dot documents. The storage context
is going to be self dot storage context
and the embed model or the embed the embed Model
will be self dot embed model. Okay. There you go. In order to retrieve the index, we're just going to return vector store index dot
from Vector store, and we're going to
pass the vector store, which is self dot vector store and the embed which is going
to be self dot Embed model. That's it. This is everything
we need to do in order to, um upsert data and retrieve
the index from Pin C. So if we go here
to the database, archive for search,
no records yet. So what we're going to do is to create a Jupiter notebook, which is going to be called Pine C and in this
UPIter notebook, what we're going to do is
to upsert the vectors. From constance, let's
import the embed model. Let's select the kernel first. We're also going to from the index manager
pinecone import the index, index
manager pinecone. We're going to create
an instance of that class index manager PyCon, we pass the embed model and
we pass the index name, which in our case is Archive
research, Archive research. And now we're going
to fetch the papers fetch papers, language models. We're going to
fetch the last 101. We fetch the papers
and now we are ready to create the index. Let's create the index. You can see, oops,
we got an error, and that is because it says, I think this needs
to be embed model, not embed that model. Let's try again. Let's see if that
solves the issue. Yeah, so for some reason, this is embed model, but auto completion
didn't work for me. So embed model, not embed. Embed model will
upsert the vectors. You can see now we have upserted those vectors into the
default name space. And you can see here the vector. These are basically just values. There are 3,072 of
these numbers here, and this has some metadata, node content, and node type, doc ID, document ID, et cetera. So basically, these are vectors. We have ten of
them, ten vectors. These are embeddings. That's it. There's no more magic to it. We can also retrieve
the index by doing this index manager retrieve Index and BLA we
retrieve the index, and we can also print the list of the papers
that we fetched. So you can see now
it's different. You can see video Panda. Actually, we're going to do some change here because
this is difficult to read. Let's see. We go to
the index manager, and here we're going to say for paper for paper in papers. Just print the paper
tile. Stuff the papers. Okay. So that's it. That's everything we need to do. This is not going to reflect sadly because we have
to restart the kernel. So let's do this
index manager. Sorry. And we're not going
to fetch Well, yeah, let's fetch the papers, but let's create let's not
create the index again. I'm just going to list the
papers and here it is. These are the papers
that we have right now. Now what we're going
to do is to make a commit to save our progress. Add index manager
Pinec and get push. That's it for this video. Hope you like it, and let's
continue on the next lesson.
20. Using the Pinecone index in the Streamlit app: Great. Now we have our
vectors in Pine cone. We have this data,
these titles in Pine C. Test. Let's test if this
is actually working. So in our streamlt app, when initializing our agent, instead of using
the index manager, we're going to use the
index manager pinecone. Okay. So we have to
pass the index name, which is going to be
Archive research. That's the only change
we need to do in order to now get the
data from Pin C. Let's test this
streamlt run up dot PI. Let's see what the
result is now. So I'm going to use this time this query template that we used last time, which is here. Let's see. So let's
replace this. I am interested in multi model models and
multi model models. So let's see if the agent is capable of
retrieving information from this pinecone
index and retrieve those papers that are related
to multimodal models. If we see here, where is that? Icon. These are the papers
of language models. It says that here are some recent papers
related to multimodal models. Zero resource speech translation
and recognition with LMS and long form
speech generation with spoken language models. We have one of them.
Where is that? This one, long form
speech generation with spoken language models, and the other one is
what's the other name? Zero resource
speech translation, zero resource speech translation
and recognition with LM. I could successfully retrieve the data from PyCon.
Isn't that exciting. That's it. Now we have
built an agent that stores the index in the cloud. It retrieves that and
it works perfectly. I hope you like this lesson
and see you in the next one.
21. Deploying the app to Streamlit Community Cloud: Hi, and welcome back. So now
what we're going to do is to deploy our app to
streamed Community Cloud. As you can see, in
the landing page, deploying to the Community
Cloud is free. It's free. So you only have to sign
up, create an account, connect your GitHub account, and then you will be
able to find the repo. It has to be in your Github
account and deploy that app. It's a very, very easy process. So what I'm going to
do right now before doing this is to create a
requirements dot TXT file. Why? Because Streamlt
Community Cloud expects a requirements text file to install all the dependencies. It doesn't recognize
this pdmt log file, so maybe in the
future, they will, but for now, they need the
requirements dot textFile. So PDM has this export command. So if you type this, it's going to print
that in the terminal, but we want that
to be in a file. So we use this greater
than symbol and say, Okay, we want that in
requirements dot text. So if you open this file, it's going to have
all the dependencies that are in the PDM log, but as a requirements
dot TXT file. Let's do a commit a requirements dot
TXT file. Let's push. And now let's go to the Streamed Community
Cloud to the dashboard. Once you sign up, sign up, connect your GitHub
account and you can click on this button that
says create App. I'm going to choose this option, which is deploy a
public app from GitHub. So I'm going to search for the
Archive researcher Ripple. Remember, this is my Ripple. I'm going to say that the
main file path is app dot PY, and this is a randomly
chosen subdomain. Your domain is going to
be this that stream app. You can click on Deploy and that's going to
deploy your app. But this is not
going to work right now because we have to
set environment marbles. Remember that we need
the OpenAI API key and the Pine hone API
key for this to work. If you go back to share that
stream dot IO and click on the project settings
and go to secret, you're going to be able to set the secrets or the
environment bubbles here. I'm going to copy and phase, but you have to put
these in quotes. Otherwise, it's
going to complain. Save changes, and now the app is going to deploy with those
environment bubbles. I'm going to click here If
you click on this manage app, you're going to see that it installed the dependencies
from requirements. I install all of this
and Python dependencies were installed from requirements
dot TXT, and that's it. Now you can say
hello to the chat. This is an LLM, so it will know that it doesn't need any knowledge base in
order to respond to hello. Now we're going to use the
template that we had here. But just to keep in mind
that you don't need this, you can experiment
with other templates, but this is the one that works right now, so I'm
going to use this. I am interested in multi model
models here, multi models. Let's see if it
can process this. I expect logs to show here, but not sure why
it's not showing. Let me just refresh this. I'm going to copy this and refresh just to see
there's something to this. Yeah. If that doesn't work, I'm going to reboot the
app to see if that works. Well, it worked this time, but I don't see the logs here. Usually the logs are shown here, but I'm not sure why they
are not showing right now. But anyway, it responded with one of the papers from
the knowledge base. Remember, this non form
speech generation with the spoken language
models was in the knowledge base.
So that's it. Now you can share this link with your friends and let
them test you app. I hope you like this video, see you in the next lesson.
22. Conclusion: Congratulations on
finishing this course. Thank you for joining
this journey to learn how to create AI agents. You know how the tools to build, enhance and deploy AI solutions with real world applications. Keep experimenting,
stay curious and remember that the possibilities
with AI are endless. What are the next steps? Apply your knowledge to
real world projects. That's the best way to learn. Share your achievements and connect with the
community and keep learning and stay updating
on advancements of AI. Your feedback matters. Please take a moment to leave a review or share your thoughts. Your feedback helps improve this course and future content, and don't forget
to stay connected. Reach out with questions, project ideas, or just
to share your progress. Together, we can make AI
accessible and I pat C.