Transcripts
1. Intro to GenAI: There are a lot of courses
out there on generative AI. I spent a lot of time
going over many of them because I wanted to make
sure that in this course, I'm giving you all the
fundamentals that you need to completely understand
what generative AI is. And on top of that, I'm going to give you some
practical examples, some hands on demos
on different tools that use generative AI
that can help you today. I'm Professor Reza, and I teach undergraduate and
graduate students topics on computer science and
artificial intelligence. I also have thousands
of online students. I have done research on AI, and I collaborated with prestigious institutes
like MIT Media Lab, Carnegie Mellon University,
Harvard University, and University of
California San Diego. And the results of those
works have been published in venues like Springer
Nature and ACL. I'm going to use all of that experience and everything
else that I've learned in all of these times to
let you know how you can understand the
transformation from traditional AI to general AI. This course is divided into
five different sections. In the first section,
we will cover traditional artificial
intelligence. We will give a definition of what artificial intelligence is. We also cover what
machine learning is and discuss different
types of machine learning, including unsupervised
learning, supervised learning, and reinforcement
learning, and we will also discuss
deep learning and the difference between
discriminative deep learning and generative deep learning. In the second section,
we will discuss how to distinguish between
generative AI and traditional
machine learning. Then we will talk
about generative AI and we will provide some
examples of generative AI. We will discuss what
transformers are and how they change the game in
artificial intelligence. We will also cover
topics such as prompt engineering and
foundation models. Then we will discuss different
types of generative AI and in this section
with some examples of code generation using AI. In section three, we will
discuss large language models. We will provide an
introduction to them and also provide a comparison between
LLM and generative AI, and we will also discuss
the benefits of LLMs. In Section four, we will talk about different types of
tools that Google Cloud provides us so we can use generative AI for
our own projects. In the last section, I will
provide a demo on how to build them up using
generative AI without writing a
single line of code. So if you're excited about
learning what generative AI is and how you can use it in your daily life, let's dive in.
2. L1V1 - Traditional AI: In this video, we provide an introduction to traditional
artificial intelligence. Artificial intelligence
is a discipline like physics or chemistry. It is a branch of
computer science that deals with the creation
of intelligent agents, which are systems
that can reason, learn, and act autonomously. In a more formal way, AI is the theory
and development of computer systems able to perform tasks normally requiring
human intelligence. One of the subfields of
AI is machine learning. Machine learning is a program or system that trains the
model from input data. That trained model can make useful predictions
from new or never before seen data drawn from the same one used
to train the model. Machine learning gives computers the ability to learn without
explicit programming. Another subfield of
AI is deep learning. Deep learning is a type of machine learning that uses
artificial neural network. Artificial neural networks are inspired by the structure
of the human brain, and they can be used to process complex patterns that traditional machine
learning algorithms cannot. We will discuss
machine learning and deep learning in more details
later on in this section. But before that, let's
give an overview of AI. The rest of this video is
structured as follows. First, we will provide a real life example of using
artificial intelligence. Then we will provide a
brief history of AI. Next, we'll try to understand what artificial
intelligence is, and then we cover
different types of AI, different applications of AI, and also we discuss how the
future of AI will look like. When we talk about AI
making our lives easier, smart homes are a
great place to start. Here's how it works.
In a smart house, we've got home appliances
and voice activated sensors. They're like your own
personal assistant tweaking the light and air conditioning to match the weather outside. Then there's the
security system. It's always on the
lookout detecting any unusual movement outside and alerting you immediately. Here's the really cool part. All these appliances
talk to each other. They're connected and can even
communicate with your car. For example, opening the
garage door when you enter your driveway
and to top it off, you can manage all
these appliances from your phone
wherever you are. There's so much that
AI can do for us. But before getting distracted
by the applications, let's take a step
back now and dive into how artificial
intelligence came to be. Here's a timeline of
artificial intelligence. In 1950, Alan Turing came
up with the touring test, a test of a machine's
ability to exhibit intelligent behavior
equivalent to or distinguishable
from that of a human. In 1956, John McCarthy coined the term artificial
intelligence and organized the Dartmouth Summer
Research Project on Artificial Intelligence, the first conference on AI. In 1969, Shake the
Robot was built, the first general
purpose mobile robot. Although simple by
today's standards, Shake marked a milestone
in AI development by demonstrating the
ability to process data and perform
tasks with a purpose. In 1997, Deep Blue defeated world Chess
champion Gary Kasparov. The first time a computer had beaten a human at
a complex game. Deep Blue's victory was a
major breakthrough for AI, demonstrating the ability of computers to learn and adapt. In 2002, the first commercially successful robotic vacuum
cleaner was introduced. And for the decade 2005-2015, we saw the development of a number of new AI technologies, including speech recognition, robotic process
automation or RPAs, dancing robots, smart homes, and self driving cars. In 2016, AlphaGo, a computer program developed
by Google Deep Mind, defeated World Go
champion Lee Sidle. Alpha Go's victory was a
major milestone for AI, demonstrating the
ability of computers to master complex
strategic games. In 2017, transformer
Technology was introduced in a paper called
Attention Is All You Need. Transformer technology
is now widely used in natural language
processing tasks such as machine translation and
text summarization. In 2020, GPT three, a large language model developed
by OpenAI was released. GPT three is capable of
generating human quality text, translating language, and writing different
kinds of creative content. And finally, in 2023, Google Cloud Gen AI
tools are released, providing a suite of tools for developers to build and
deploy AI applications. In the same year, Bart, a large language model developed by Google AI was released. Art is capable of answering your questions in
an informative way, even if they are open ended,
challenging, or strange. This is just a brief overview
of the history of AI. AI is a rapidly
developing field, and new advances are
being made all the time. It will be interesting to see
what the future of AI hold. We've been quiet a journey
with artificial intelligence. Starting from the 1950s with Turing's
groundbreaking tests, the term artificial intelligence
was coined in 1956, and a new era began. Over the years, we witnessed milestones
like the creation of our first general purpose
mobile robot Shaki in 1969. By 1997, computers were
defeating chess champions, and now here we are in 2023, witnessing highly
sophisticated language models like GPT three and Bar. It's a bit like the computer
revolution of the 80s, but this time,
it's all about AI. Mastering these new and
powerful tools is becoming more crucial as the pace of advancements in AI
is accelerating. The potential is immense, much like it was for those
computer visas in the 80s. So buckle up for this
exciting journey ahead. It's not just about watching what the future
of AI will bring, but being a part of shaping
that future ourselves. Understanding artificial
intelligence. AI is a branch of computer
science that creates a smart machines
capable of human like tasks like
speech recognition, object identification, learning, planning,
and problem solving. Remember the computer
that outsmarted the chess champion or the one that controls
your houselights? That's AI solving
problems, just like we do. Our current understanding
of AI is largely based on how it interacts with us and how it compares to
human capabilities. Things like speech
recognition and object detection are
big in AI today. It's about the ability
to absorb information, learn from it, and use it to plan and tackle future tasks. A very human like activities. This, in a sense, is the magic of AI. To understand AI correctly, we need to understand
three concepts, different types of AI, different applications of AI, and different possibilities
for the future of AI. Now, let's start with
different types of AI. There are many
different types of AI, but they can be broadly classified into four
different categories. The first one is reactive AI. Reactive AI systems can only respond to the current
state of the world. They do not have any
memory or past events, and they cannot plan
for the future. An example of that is a chess playing robot that
only follows a set of logical instructions and reacts properly based on
the opponent's move. The second type of AI
is limited memory AI. These systems can remember past events and use this
information to make decisions. However, they
cannot reason about the future or understand the
intentions of other agents. An example of that could
be a MPs application that suggests places to eat based on your
previous visits. The third type of
artificial intelligence is theory of mind AI. These systems can understand the thoughts and intentions
of other agents. This allows them to cooperate
with other agents and to achieve goals that would be impossible for
a single agent. One way that theory of mind AI could revolutionize the
way we interact with machines is by creating robots that are able to
provide companionship and support to people who are lonely or isolated or through virtual assistance that
are able to understand our needs and provide us with the information and
assistance that we need. The fourth type of
AI is self aware AI. Self aware AI is a
hypothetical type of AI that would be conscious and have its own subjective experiences. Suppose we have a personal
robot assistant named Eve. If Eve were a self aware AI, she wouldn't just follow preprogrammed instructions
or react to our commands. Instead, she'd understand
her own existence and have her own
feelings and thoughts. For instance, if we ask Eve to fetch a book
from the library, a regular AI would
just calculate the shortest path
and go get the book. However, a self aware AI like Eve might think about
whether it's a nice day for a walk or ponder
if she's fetched too many books lately and
suggest an e book instead. It's important to note
that this kind of self aware AI is purely
hypothetical at this point. Some researchers believe it's a future possibility
worth exploring. Now, let's dive to
applications of AI. Artificial intelligence is not just about imitating
human capabilities, but also about
augmenting our abilities and enhancing efficiency
in many different areas. From transportation
to healthcare, financial services
to customer support, and education to entertainment, AI's potential seems
to be limitless. Here are some examples of
how AI is being used today. Self driving cars. AI is revolutionizing
the way we travel. It is powering self
driving cars that can navigate roads and avoid
obstacles autonomously, leading to safer roadways. Medical diagnosis.
In healthcare, AI is making a strides
in diagnosis disease, often outpacing human
doctors in accuracy. By analyzing vast
amounts of medical data, it helps identify
patterns and trends supporting doctors in making
more informed decisions. Banking and fraud detection. AI has matured significantly in the banking sector over
the past half decade, from predicting potentially
fraudulent transactions to determining loan eligibility
based on various factors, AI plays a pivotal role in
today's financial sector. Customer service
and online support. Imagine a company
like HP managing 70,000 plus help pages
across 17 languages. AI steps in here, automating customer support and providing round
the clock service, significantly reducing the cost and enhancing the efficiency. Education. In the
realm of education, AI is enabling personalized
learning experiences, adapting to each student's
individual needs, and fostering a more
effective learning process. Entertainment. Our virtual
assistance like Siri, Cortana, Alexa and Google
are all powered by AI. With their voice
recognition capabilities, they're like having a
personal secretary at your command and cybersecurity. In the digital domain, AI is our watchman. With its machine
learning algorithms and vast data analysis, it detts anomalies and
responds to threats, strengthening our
cybersecurity measure. You can see, AI has inter twined itself into every
facet of our lives, enhancing our capabilities and reshaping both our commercial
and business landscapes. The possibilities are immense, and they continue to expand
with each passing day. Now let's talk about
the future of AI. When we think about
the future of AI, it's really quite fascinating. We're on the verge
of a time when self driving cars
could be the norm. Imagine having robots
at home helping us with tasks from making coffee
to more complicated stuff. We're also looking at the
rise of smart cities where AI runs everything from our phones to
household appliances. Plus, robots are
stepping up to do high risk jobs like bomb
diffusel, for example. As AI continues to develop, it is likely to have an even
greater impact on our world. So based on all of that, some possible applications
of AI in the future may include automated
transportation. Imagine a world where AI
does all the driving for us. We're getting closer
to a reality where self driving cars are a
standard way to get around. This isn't just
about cars, though. We're talking automated drones, delivering our packages,
AI driven trains, ensuring precise,
timely transport, and even autonomous
boats and planes. All this aims to
make our journey safer and more efficient
by reducing human error. It's a massive shift that could redefine how we think
about transportation. Personalized medicine, AI can be useful to analyze large
amounts of medical data to identify patterns and
trends that can help doctors diagnose and treat
diseases more effectively. For example, AI can
be used to develop personalized cancer
treatment that are tailored to the specific
genetic makeup of each patient. Virtual assistance. AI
powered virtual assistance can help us with a variety of tasks such as scheduling
appointments, making travel arrangements,
and managing our finances. Virtual assistance can also provide us with information
and entertainment, and they can even be used to control our smart home devices. And speaking of smart
homes themselves, AI can be used to make our homes more comfortable,
efficient, and secure. For example, AI can be used
to control our thermostats, lights, and other appliances. And it can also be
used to monitor our homes for security threats. And last but not least artificial general
intelligence or AGI. AGI is a hypothetical type of AI that would be as
intelligent as a human being. AGI could potentially
solve some of the world's most
pressing problems such as climate
change and poverty. However, AGI also raises
some ethical concerns, such as the potential
for AI become self aware and to develop its
own goals and desires. AI is a powerful technology with the potential to revolutionize
many aspects of our lives. It is important to be aware of the potential
benefits and risk of AI and to use it responsibly. I hope you enjoyed this
brief explanation on AI. In the next video,
we will have a more in depth look
at machine learning.
3. L1V2 - Machine Learning: What is machine learning? In this video, we're going to cover the basics of
machine learning. Specifically, we're going to start with a definition of
what machine learning is. Then we provide a comparison between artificial intelligence, machine learning,
and deep learning. Then we discuss how
machine learning works. We talk about different
types of machine learning, supervised learning,
unsupervised learning, and reinforcement learning. We talk about the prerequisites
for machine learning, and at the end, we provide some examples of applications
of machine learning. So what is machine learning? Machine learning works
on the development of computer programs
that can access data and use it to automatically learn and
improve from experience. This enables machine learning to help us do complex tasks, such as three D printing
of entire houses. By using algorithms
and large datasets, machine learning can automate
design and planning, helping to address
construction challenges such as structural integrity
and material efficiency. Can also customize designs based on environmental
conditions. It can help reduce cost
and time while increasing precision and has the potential to transform the
construction industry. As another example, consider our personal assistance like Siri or Google Assistant
or Amazon Echo. They all use the power
of machine learning to help us with our
everyday tasks like playing our favorite music or ordering
food or voice controlling our home appliances or requesting rights from
Uber and much more. As we said before,
artificial intelligence is a technique which enables machines to mimic
human behavior. This is key because it's how we figure out if our
calculations and work are on the right track by seeing if they can
imitate human behavior. We're using this
approach to take over some of the
work humans do with the aim of making things more efficient, streamlined,
and accurate. AI is a broad field that covers many
different technologies. Some examples of artificial
intelligence include IBM, Deep Blue chess, electronic
game characters, and self driving cars. These are just a few
examples of many ways that artificial intelligence
is being used today. Machine learning is a
technique that uses statistical methods to allow machines to learn
from their past data. This means that machines
can use past inputs and answers to help them make better guesses in
future attempts. Google search algorithm and email spam filters are examples of applications
of machine learning. And then we have deep learning, which is a subset of
machine learning. Uses algorithms to allow models to train themselves
and perform tasks. AlphaGo and natural
speech recognition are two examples
of deep learning. Deep learning is
often associated with neural networks which are
a type of black box model. As a black box model, it's difficult for
humans to track how deep learning models
make their predictions. However, deep
learning models can still be very effective
at performing tasks. We will have a deeper dive into the world of deep
learning later on. Now let's see how
machine learning works. To understand how
machine learning works, let's have a look at
the following diagram. In the first step, we start
by training our data. Then we fed the trained data into a machine learning
algorithm for processing. The process data goes through another machine
learning algorithm. And now it's time
to test our work. We bring in some new data and run it through
the same algorithm. In the next step, we check the predictions
and the results. If we have any reserve
training data, now is the time to use them. In the next step, if the
prediction doesn't look right, let's say it gets
the thumbs down, it's time to circle back
and retrain the algorithm. Remember, it's not always about getting the right
answer right away. The goal is to keep trying
for a better answer. You might find the initial
result isn't what you wanted. That's okay. It's
part of the process. And it can depend on the
field you're working in, whether it's
healthcare, economics, business, stock market,
or something else. The results can be
very different. So we need to try out the model, and if it's not giving
us the result we need or if we think we can
achieve better outcomes, we retrain our model. And in the final step, we keep refining and retraining until we get the best
possible answer. That's how machine
learning works. Now let's look at different
types of machine learning. We can see that we
have supervised, unsupervised, and
reinforcement learning. We'll go into each one, get a good idea of when and where to use them and
what they're all about. In machine learning,
we use lots of different algorithms to
deal with tough problems. Each one fits into
a certain type. So we've got three main types of machine learning algorithms, supervised learning,
unsupervised learning, and reinforcement learning. Now let's get into what each of these learning
methods really mean. Supervised learning uses labeled data to train machine
learning models. Labeled data means that the output is
already known to us. The model just needs to map
the inputs to the outputs. An example of supervised
learning can be to train a machine that identifies
the images of animals. Here, we can see a trained model that identifies
the picture of a cat. Unsupervised learning uses unlabeled data
to train machines. Unlabeled data means that there is no fixed
output variable. The model learns from the data, discovers patterns and
features in the data, and returns the output. In this example, our
unsupervised model uses the images of vehicles to classify if it's
a bus or a truck. So the model learns by identifying the
parts of a vehicle, such as the length and
width of the vehicle, the front and rear
end covers, roof, hoods, the types of wheels used, and many other features. Based on these features, the model classifies if the
vehicle is a bus or a truck. And we have
reinforcement learning. Reinforcement learning
trains the machine to take suitable actions and maximize reward in a
particular situation. It uses an agent and an environment to produce
actions and rewards. The agent has a start
and an end state, but there might be
different parts for reaching the end
state like a maze. In this learning technique, there is no predefined
target variable. An example of reinforcement
learning is to train a machine that can identify
the shape of an object, given a list of different
objects such as square, triangle, rectangle,
or a circle. In this example, the model tries to predict the shape of the
object, which is a square. Now let's look at different machine
learning algorithms that come under these
learning techniques. Some of the commonly used
supervised learning algorithms are polynomial regression,
random forests, linear regression,
logistic regression, chain nearest
neighbors, naive base, support vector machines,
and decision trees. And these are just
some examples of algorithms used for
supervised learning. There are so many
other algorithms that are used in
machine learning. For unsupervised
learning, some of the widely used algorithms are K means clustering, singular
value decomposition, fuzzy means, partial is squares, a priori, hierarchical
clustering, principal component
analysis, and DBS scan. Similarly, there are so
many other algorithms that can be used for
unsupervised learning. And some of the important reinforcement
learning algorithms are Q learning, SARSA, Monte Carlo,
and deep Q networks. So as we said, there are so many different
algorithms available to us, and choosing the right
algorithm depends on the type of problems
we're trying to solve. Now let's look at the approach in which these machine
learning techniques work. So supervised
learning methods need external supervision to train
machine learning models, and therefore, the
name supervised comes. They need guidance and additional information
to return the result. It takes labeled inputs and
maps it to known outputs, which means you already
know the target variable. Unsupervised learning
techniques do not need any supervision
to train any models. They learn on their own
and predict the output. They find patterns
and understand the trends in the data
to discover the output. So the model tries to label the data based on the
features of the input data. And similarly, reinforcement
learning methods do not need any supervision to train machine
learning models. Reinforcement learning
follows trial and error method to get
the desired solution. After accomplishing a task, the agent receives an award. An example could be to train
a dog to catch the ball. If the dog learns
to catch a ball, you'll give it a
reward, like a treat. And with that, let's focus
on applications and types of problems that can
be solved using these three types of machine
learning techniques. So supervised
learning is generally used for classification
and regression problems. For example, you can
predict the weather for a particular day
based on humidity, precipitation, wind speed,
and pressure values. As in another example, you can use supervised
learning algorithms to forecast sales for the next month or next quarter for
different products. Similarly, you can use it
for stock price analysis or identifying if a cancer
cell is malignant or benign. Unsupervised
learning is used for clustering and
association problems. For example, it can do
customer segmentation, which is segmenting
and clustering similar customers into groups
based on their behavior, likes, dislikes and interest. Another example for
the applications of unsupervised learning is
customer churn analysis, which is a process
of evaluating and understanding why and when customers stop doing
business with a company. With the aim of developing strategies to improve
customer retention. And finally, we have
reinforcement learning. Reinforcement learning
is reward based. So for every task or every
step completed correctly, there will be a reward
received by the agent. And if the task is not
achieved correctly, there will be some
kind of penalty. Now let's look at some examples. Reinforcement learning
algorithms are widely used in gaming
industry to build games. It is also used to train
robots to perform human tasks. Multipurpose AI chatbots like
hat GPT or Google Bart use reinforcement learning to
learn from the user input and adjust their output based
on previous conversations. And with that, we have come
to the end of this section on supervised versus unsupervised versus
reinforcement learning. Now, let's see what are the prerequisites of
machine learning. So the first one is computer science fundamentals
and programming. Many machine learning
applications today require a solid foundation in basic
scripting or programming. It is not just about
writing complex algorithms, but being able to understand and manipulate the
underlying structures. Without a good grasp of
these fundamental skills, it will be challenging
to make the most of the machine
learning tools available. So if you're seriously considering diving
into machine learning, it's advisable to brush up
your programming skills. Intermediate
statistical knowledge. A fundamental understanding of probabilities is needed in the
world of machine learning. You'll often find yourself
asking questions like, if A is happening, what's the likelihood
that B will occur? Or if there are clouds overhead, what are the chances
it will rain? These type of questions, rooted in probability, are at the heart of many machine
learning algorithms. It's all about predicting outcomes based on
given conditions. So if you're keen to make significant strides
in machine learning, it's definitely worth
familiarizing yourself with the basics of a
statistics and probability. Linear algebra and
intermediate calculus. Linear algebra is key as it requires you to
grasp the concept of plotting a line through your data points and
understanding what it represents. This is the primary idea behind linear regression
models where you draw a line through your data and use this line to compute new values. Regarding intermediate
calculus, it involves having a basic understanding
of differential equations. No need to be a master at it as the computer handles most
of the heavy calculations, but it's beneficial to recognize the terminology when it pops up, especially if you're diving deeper into model programming. And data angling and cleaning. Perhaps one of the most
significant aspects in this field is mastering the
art of tidying up your data. It's often said if
you input bad data, you'll get bad data out. But if you have good data in, it's more likely to
have good data out. The quality of your
data can greatly influence the outcome of your
machine learning models. Therefore, understanding how to effectively clean and organize your data becomes a
critical skill in ensuring the accuracy and
reliability of your results. Now let's look at some examples of applications
of machine learning. We have object detection
and instance segmentation. Object detection and
instance segmentation are two different but related
tasks in machine learning. Object detection is about
recognizing and finding items within a picture like telling different
cats apart, for example. The other hand, instance
segmentation is the next step that separates these identified objects
from the rest of the image. These techniques are put
to use in various ways, including identifying different
elements in an image. Moreover, segmentation
can further isolate or cut out
specific components. One popular application
of object detection and segmentation is the Google Pixel phones
quick snapshot feature. This feature uses machine
learning to identify objects in a user's current view
and then overlay animated stickers or filters
on top of those objects. This can be a fun
and creative way to add a personal
touch to photos. We also have license
plate detection. This is a pretty cool
use of machine learning. Imagine a car driving into
view and the system is able to spot and identify the
license plate on that car. This application of machine
learning can be particularly useful in various situations
like security checkpoints, parking lots, traffic control, or even for charging
tolls without making the car stop in
the middle of a highway. It showcases how machine
learning can extract specific information from a
larger context with accuracy. And we also have
automatic translation. Automatic translation, powered
by machine learning has been a game changer in breaking
down language barriers. It is the driving force behind the instant translation that
you see on foreign websites, making the content accessible in your preferred language
with just a click. It is also the technology that enables tools
like Google Lens to provide real
time translations of signs when you point
your camera at them. Whether it is browsing the Internet or navigating
a foreign city, machine learning
has revolutionized our ability to understand
and interact with the world. Truly, automatic translation is an impressive testament to how machine learning can bring
the world closer together. Thank you for
watching this video. The world of machine
learning applications is vast and ever evolving. It is one of the fastest
growing sectors in technology, and the possibilities
are endless. What I have shown you here are just a few of
the highlights, a small sample of
what is possible. But there is so
much more to come. In the next video, we will shift years and
explore deep learning, a soft field of machine
learning that made many of the machine learning advances possible. See you
in the next one.
4. L1V3 - Deep Learning: In this video, we'll talk about deep learning
and neural networks. Have you ever wondered
how Google can translate an entire webpage in no time from almost any
language to another, or how Google Photos
magically sorts your pictures based on the faces of people and pets
it recognizes? Or how about when Google Lens fills you in on the
details of a plant, object, or animal when you
scan it with your phone? That's deep learning working
its magic right there. In this video, let's try to answer the
question of what is deep learning and how it makes all these incredible
things possible. In this video, we'll be
discussing the following topics. We'll start with an
understanding of deep learning and then move on to artificial
neural networks, which are a type of
machine learning algorithm that are used in deep learning. We will then explore some
of the practical uses of deep learning and introduce some of the most popular deep
learning platforms. And finally, we'll discuss
some of the limitations of deep learning and how quantum computers can
tackle those limitations. It's going to be an exciting
session on deep learning. So as we said earlier, deep learning is a subset
of machine learning, and both are part of the bigger concept called
artificial intelligence. Imagine artificial
intelligence as the whole realm of making
machines act like humans. Machine learning is a
part of that realm, and it's all about
giving machines the ability to learn and make
decisions based on data, kind of like how we
learn from experience. Now, deep learning is a more specific part
of machine learning. It's like teaching a machine to think a bit
like a human brain with a structure called an
artificial neural network. Artificial neural networks or
ANNs are a specific type of machine learning
algorithm that try to loosely mimic the neural
networks in the human brain. When we say deep learning, we're usually
talking about using really big neural networks to train a model on loads of data. It's not different
from machine learning, just a fancier term we use when things get
pretty big scale. So what are artificial
neural networks? Let's have a closer look at the construction of
a neural network. Each layer consists of nodes
or what we call neurons. The neurons in one
layer connect with neurons in the next
layer through channels. Each channel is
assigned a weight, which plays a significant role
in the network's learning. Every neuron has
an associated bias and an activation function. The activation function is used to transform
the weighted sum of the inputs and the bias into an output that is sent
to the next layer. As we said before, ANNs sit
at the core of deep learning. These algorithms are crafted in a way that mirrors the
working of the human brain. They absorb data, learn to
identify patterns in the data, and then make educated predictions
for a new set of data. Let's explore the
process by building a neural network capable of distinguishing between
a cube and a pyramid. Consider an image of
a cube as an example. This image is comprised
of 28 by 28 pixels, resulting in a total
of 784 pixels. Each pixel is then provided as input to individual neurons
within the first layer. Neurons in one layer
are connected to neurons in subsequent
layers through channels. The inputs are multiplied by
their corresponding weights, and then the bias
will be added to it. This combined value then undergoes evaluation through
a threshold function, known as the
activation function. The result is transmitted as input to the neuron
within the hidden layer. Then the output of the activation
function determines whether a neuron becomes
activated or not. Activated neurons
transmit data to the neurons in the next
layer via the channel. This iterative process known as forward propagation
enables the data to propagate through
the network. Within the output layer, the neuron with
the highest value becomes activated and
determines the final output. These values are
essentially probabilities. In this particular scenario, the neuron associated with the pyramid has the
highest probability, indicating that
the neural network predicts the output
as a pyramid. Well, obviously, our neural network has made
an incorrect prediction. It's important to note
that at this stage, our network has not
undergone training yet. So let's have a look at the steps for training
a neural network. During the training process, the network receives
both the input and the expected output. By comparing the predicted
output to the actual output, the network identifies the
error in its prediction. The magnitude of the error
indicates how wrong we are, and the sign suggests if our predicted values are
higher or lower than expected. This information is then propagated backward
through the network, a technique referred to
as back propagation. Through back propagation, the network adjusts its
internal parameters, such as the weights
and biases to minimize the error and improve
its future predictions. The iterative cycle of
forward propagation and back propagation is repeated with multiple inputs during
the training process. This cycle continues until the weights within the
network are adjusted in a way that allows the network to accurately predict the shapes
in the majority of cases. This marks the completion
of our training process, where the network has learned to make correct predictions. Although training
neural networks can be a time consuming process, sometimes taking
hours or even months, the investment of time is justified given the immense
possibilities they offer. The intricate nature of
training involves fine tuning numerous parameters and optimizing the
network's performance, requiring significant computational resources
and patience. However, the benefits gained from a well trained
neural network, such as improved accuracy, advanced pattern
recognition, and sophisticated decision
making outweighs the time spent in training. It's a reasonable
trade off considering the remarkable potential and capabilities the neural
networks bring to the table. Now let's have a look at some of the applications
of deep learning. As we said earlier,
it is the power of neural networks that makes
deep learning possible. Let's explore some of the key applications where
neural networks shine. One notable example is facial recognition
technology on smartphones, which utilizes neural
networks to estimate a person's age based on
their facial features. By distinguishing the face from the background and
analyzing lines and spots, these networks correlate
the visual cues to approximate the person's age. Neural networks also play a
crucial role in forecasting, enabling accurate predictions in various domains such as weather forecasting or
stock price analysis. These networks perform very
well in recognizing patterns, making them able to identify
signals that indicate the likelihood of rainfall or fluctuations in stock prices. Neural networks can even
compose original music. They can learn intricate
patterns in music and refine their understanding to
compose original melodies, showcasing their
creative potential. And another area where NRA networks Excel
is customer support. Many individuals engage
in conversations with customer support
agents without even realizing they are actually
interacting with a bot. These sophisticated
networks simulate realistic dialogue and
provide assistance, enhancing customer
service experience. Also in the field
of medical care, neural networks have made
significant strides. They have the ability to detect cancer cells and
analyze MRI images, providing detailed
and accurate results that aid in diagnosis
and treatment decisions. And obviously, we also
have self driving cars. Once only a possibility
in science fiction, now they become a
tangible reality. These autonomous
vehicles rely on neural networks to perceive and interpret the environment, enabling them to navigate roads, make decisions, and
ensure passenger safety. And with that, let's have a look at some popular deep
learning frameworks. So some of these frameworks
are tensor flow, Pytorch, cross, Deep learning four J, Cafe, and Microsoft
Cognitive Toolkit. These frameworks have gained widespread recognition and play a significant role in advancing the field
of deep learning. And now let's discuss some of the limitations
of deep learning. While deep learning holds
tremendous promise, it is important to acknowledge
its limitations as well. So firstly, although
deep learning is highly effective in
handling unstructured data, but it needs a
substantial amount of data for training purposes. The second problem is that even assuming that we have access
to the required data, processing it can be challenging due to
computational power. Training neural networks demands the use of graphical
processing units or GPUs which have thousands of cores compared to central
processing units or CPUs. And at the same time, GPUs are way more
expensive than CPUs. And finally, training
is time consuming. Deep neural networks can require hours or even months
to train with the duration increasing
as the volume of data and the number of
network layers increase. That said, it is
worth mentioning that quantum computers
developed by companies such as Google and IBM offer a potential solution to
overcome these limitations. Quantum computers have
the ability to perform complex computations at an exponentially faster rate
than classical computers. With their unique
architecture and quantum processing units or QPs, they have the potential
to significantly accelerate the training
process of neural networks. In addition, quantum
computers can handle larger scale
datasets more effectively. Reducing the data
requirements and mitigating the challenges associated with processing such vast
amounts of information. While quantum computing is
still in its earlier stages, ongoing research and
development hold the promise of overcoming the limitations faced by traditional deep
learning approaches. Thank you for exploring the world of deep
learning with me. It is crucial to
acknowledge that we are still in
the earlier stages of exploring what deep learning and neural networks
can do for us. However, big names like Google, IBM, and Nvidia have recognized
this growth trajectory, investing in the development of libraries, predictive models, and powerful GPUs to support the implementation
of neural networks. We are almost at the end of this section on traditional
artificial intelligence. It's important to note that
we have merely scratched the surface when it comes to the potential of deep
learning and AI. Exciting possibilities
lie ahead. As we push the boundaries
of what is possible, the line between
science fiction and reality becomes
increasingly blurred. The future holds an
overload of surprises, and deep learning is at the forefront of these
groundbreaking advancements. In the next video, which is the last video
of this section, we will learn the
difference between discriminative and generative
machine learning models, which will prepare us for the
next section of this course on generative artificial
intelligence. See you in the next one.
5. L1V4 - Discriminative vs Generative: In this video, we're
going to talk about discriminative and
generative algorithms. These are two important types
of machine learning models. To make it easy for
you to understand, we'll start with the story and then discuss how
these two types of machine learning work in detail using popular
algorithms as examples. So let's jump right in. All right. Let's
dive into our story. Let's imagine we have
two alien visitors who have never seen apples
and bananas before. We want to observe
how they learn to distinguish between
these two fruits. The first alien decides to understand these fruits
by drawing them. It carefully observes
the shape, color, and texture of each fruit, and then recreates
them on paper. This way, it creates a visual representation or a model of what each
fruit looks like. Whenever it sees a new fruit, it refers to these drawings
to identify that fruit. This is similar to what we call a generative algorithm
in machine learning. The second alien,
on the other hand, goes about it differently. Instead of drawing, it starts comparing the features
of the fruits. It notices that apples
are usually round and red while bananas
are long and yellow. When it's given a new fruit, it doesn't look for
a perfect match. Instead, it checks which
fruits features are closer to the new fruit and guess that it
is the same one. This approach is
more like what we call a discriminative
algorithm in machine learning. So that's the basic idea. These two different
approaches will help us understand discriminative
and generative algorithms. Moving on, let's formally define our two types of algorithms
based on our aliens approach. The first aliens method is a prime example of what we call generative
classification. This is where a model learns to generate a representation
of each class. It's like learning how
an apple or a banana looks and using that knowledge to identify the future instance. In contrast, the
second aliens method represents discriminative
classification. This model learns to distinguish between classes based
on their features. Instead of learning what an
apple or banana looks like, it learns the differences
between them. It then uses these differences to decide what a
new fruit might be. Each approach has its own
strengths and weaknesses, and they're used in
different scenarios. Now that we've
introduced the concepts, let's explore them
in more depth. To better grasp these concepts, we're going to discuss
specific algorithms that employ these two
types of classifications. For discriminative
classification, we're looking at logistic
regression as an example. And for generative
classification, our example will be the
naive base algorithm. So in the realm of
discriminative classification, logistic regression creates
a decision boundary based on the features
of the input. For our fruit example, these features could be
color, length, or weight. The algorithm learns
patterns from these features and then uses
them to classify new fruits. Conversely, the naive
base algorithm, which is a generative
classification model, tries to understand
the distribution of each class in
the feature space. Rather than just identifying the differences between classes, it learns how each class
distributes in the data. Now, let's go deeper
and understand how these algorithms use the
strategies to classify new data. With the logistic
regression model, we're dealing with
features like color, length, and weight
of the fruits. The model uses these features to learn patterns
and make decisions. For example, it
could learn that if a fruit has a yellow color
and is longer than 5 ", there's a high chance
it's a banana. This method of creating a
decision boundary based on features of examples is the essence of
discriminative learning. On the flip side,
generative learning used by the naive base model
attempts to understand the distribution of each class in a multidimensional plane like a three dimensional space for our three fruit features. The model tries to visualize where apples and
bananas are likely to appear in this space based on their color, shape, and weight. Now, let's think about some important questions about generative and
discriminative models. Questions like which model
needs more data for training? Which one gets affected
by missing data? Which model gets
impacted by outliers, which requires more math, and which one tends to overfit. It's important to think about these questions because they affect how you might choose
to use these models. For example, a generative
model doesn't need a lot of data because it's just trying to understand the basic
characteristics of each class. However, a discriminative
model needs more data because it's trying to learn the intricate differences
between the classes. Thinking about these questions
can help you understand these models more deeply and
use them more effectively. But don't worry if you're
not sure about the answers. We're going to discuss
them in detail. So let's look at
the first question. Which model needs more
data for training? Discriminative models
like logistic regression, generally need more
data for training. They learn by identifying
differences between classes. So they need a rich
and diverse set of examples to do
that effectively. The second question is, which one gets affected
by missing data? The fact is that both types of models can be affected
by missing data. But generative models
might be more sensitive because they're
trying to capture the overall distribution
of the data. Any missing
information could skew their understanding
of that distribution. Next question is, which model
gets impacted by outliers? Again, outliers can
affect both models, but discriminative models
might be more susceptible. These models focus on
boundaries between classes, and an outlayer could significantly shift
those boundaries. Next question is, which
requires more math? In terms of mathematics, generative models like
naive base often require more calculations because they involve estimating the
distribution of data, which can be
computationally intensive. And the last question is, which one tends to overfit? Overfitting can occur
in both models, but discriminative models are
generally more prone to it. This is because they can become too tuned to the training data, learning even it's
noise and errors. And now that we know
the difference between discriminative and generative machine learning algorithms, let's break down some common
examples of each type. So some of the discriminative algorithms
are logistic regression, support vector machines,
decision trees, random forests, and
gradient boosting machines. Some of the generative
algorithms are naive base, Gaussian
mixture models, hidden Markov models,
latent directt allocation, and generative
adversarial networks. So to wrap it up, in this video, we've unpacked the word of discriminative and
generative models using a simple and
engaging story. We've seen how
logistic regression in a discriminative model uses distinct features to create
decision boundaries, while the generative model, naive base tries to understand the overall
distribution of the data. Understanding the
difference between discriminative and
generative models give us valuable insight into how
generative AI systems operate. Generative models like the
ones used in generative AI, learn the underlying distribution
of the training data. This knowledge is
then used to generate new data that mirrors
the training data. This is why generative
AI is so powerful. It can generate new realistic
outputs such as images, text, and even music because it understands the world
of its training data. In contrast,
discriminative models simply learn the boundaries between classes and are primarily used for
classification tasks. They can't generate new
data because they don't try to understand the underlying
distribution of the data, the differences between classes. By understanding
these differences, you can better appreciate
the capability and flexibility of
generative AI systems. This comprehension could guide you when deciding
on which type of AI system would be best suited to your particular
project or use case. And by that, we reached
the end of Section one of this course, traditional
Artificial Intelligence. I will see you in Section two, where we discuss generative
artificial intelligence.
6. L2V1 - Transformers: Now let's discuss
transformers and their leading role in powering generative
artificial intelligence. Transformers are a type of
neural network that are able to learn long range
dependencies in sequences. This makes them well suited for tasks such as
text generation, where the model needs to
understand the context of the previous words in order
to generate the next word. Transformers produced
a 2018 revolution in natural language processing. Now let's see how
transformers work. Transformers are made
up of two main parts, an encoder and a decoder. Encoders are
responsible for taking an input sequence and converting it into a
sequence of hidden states. The encoder is made up of a stack of self
attention layers. Self attention is a mechanism
that allows the encoder to attend to different parts of the input sequence when
generating the hidden states. This allows the encoder to learn long range dependencies
in the input sequence, which is essential for tasks
such as text generation. Decoders are
responsible for taking a sequence of hidden states and generating an
output sequence. The decoder is also made up of a stack of self
attention layers. However, the decoder also has a special
attention layer that allows it to attend to the input sequence when
generating the output sequence. This allows the decoder
to learn how to generate output that is consistent with the
input sequence. So the encoder and decoder work together to generate
an output sequence. The encoder first converts the input sequence into a
sequence of hidden states. The decoder then takes these hidden states and
generates an output sequence. The decoders attention
layer allows it to attend to the input sequence when
generating the output. This allows the decoder
to learn how to generate output that is consistent with the
input sequence. There are several benefits to using transformers
for generative AI. First, transformers are able to learn long range
dependencies in sequences. This allows them to generate more realistic and
coherent output. Second, transformers are able to be trained on very
large datasets. This allows them to learn more complex patterns and
relationships in the data. And third, multiple
parallel transformers are able to work together. This allows them to be trained more quickly
and efficiently. As a result of these benefits, transformers have
become the state of the art approach for a wide variety of
generative AI tasks, such as text generation, image generation, and
music generation. Something to be aware of
when using transformers is that it is possible for them
to create hallucinations. In transformers, hallucinations
are words or phrases that are generated by
the model that are often nonsensical or
grammatically incorrect. But why hallucinations happen? Hallucinations can be caused
by a number of factors, including the model is not
trained on enough data, or the model is trained
on noisy or dirty data, or the model is not
given enough context, or the model is not given
enough constraints. Hallucinations can be a
problem for transformers because they can make the output text
difficult to understand. They can also make the
model more likely to generate incorrect or
misleading information. So how can we mitigate
hallucinations? There are a number of ways to mitigate hallucinations
in transformers. One way is to train the
model on more data. Another way is to use a
technique called beam search, which allows the
model to explore a wider range of
possible outputs. And finally, it is important to give the model
enough context and constraints so that it does not generate nonsensical or
grammatically incorrect output. Here are some examples
of hallucinations that have been generated
by transformers. The cat sat on the mat
and the dog ate the moon. The boy went to the store
and bought a gallon of air. The woman drove to the bank
and withdrew $1 million. As you can see,
these examples are all nonsensical or
grammatically incorrect. This is because the transformers have generated these words or phrases without any
context or constraints. It is important to note that hallucinations are not
always a bad thing. In some cases, they
can be used to generate creative and
interesting texts. However, it is important to
be aware of the potential for hallucinations when
using transformers and to take steps
to mitigate them. Transformers are being
used to generate a wide variety of
creative content, including text, image,
music, and even video. Some of the most common
applications of transformers in generative AI include
text generation. Transformers can be used to generate texts such
as news articles, blog posts, and
creative writing. For example, the
transformer model, GPT three has been used to generate realistic looking
fake news articles, and it can even write
poetry and stories. Image generation.
Transformers can be used to generate
images such as paintings, photographs, and digital art. For example, the
transformer model, Imagine has been
used to generate realistic looking images of
people, animals, and objects. Music generation.
Transformers can be used to generate music, such as songs,
melodies, and beats. As an instance, the
transformer model Mus Net has been able to generate original
music that sounds like it was composed
by a human musician. And we have video generation. Transformers can be
used to generate video, such as movies, TV shows
and animated cartoons. For example, the
transformer model Deep Mind video has
been used to generate realistic looking
video that looks like it was filmed by a
human camera operator. As the technology
continues to develop, we can expect to see even more amazing applications of transformers
in generative AI. Transformers have
the potential to revolutionize the way we
create and consume content, and they are already
being used to create some truly
incredible things. I
7. L2V2 - Gen AI: Welcome to generative
Artificial Intelligence. We start this video
by explaining how to distinguish between
generative AI and traditional
machine learning. Then we provide a
formal definition for generative artificial
intelligence and end the video with some
examples of generative AI. Here, we're showing two key approaches in
artificial intelligence, traditional machine learning and generative
artificial intelligence. The top image shows
traditional machine learning. Here, the model learns from data with labels
attached to it. What it does is it
figures out the link between the features of the data and their
corresponding labels. This understanding
is then used to make educated guesses on new
data it hasn't seen before. Now the bottom part of the image shows something
a bit different. The generative AI model. Instead of just figuring out
the relationship between inputs and outputs,
it digs deeper. It focuses on the complex
pattern in the content. This understanding of
pattern is what gives it the power to create new and
realistic content on its own. This could be anything, a poem, a news article, a picture, or even a music composition. So you see generative AI brings a new creative angle to
the huge world of AI. The nature of the output
plays a crucial role in differentiating between generative AI and other models. Traditional models
typically produce categorical or
numerical outputs, such as whether an
email is a spam or not or predicting sales figures. On the other hand,
generative AI can produce outputs like
written or spoken language, images, or even audio, reflecting its ability to generate content
that mimics reality. We can imagine it like
this mathematically. If this equation isn't
something you've seen recently, here's a quick reminder. The equation Y equals F of X computes the outcome
based on varying inputs. Y symbolizes the
result from the model. F stands for the function
we use in the calculation. And what about X?
That represents the inputs or inputs
used in the equation. So in simple terms, the models output is a
function of all the inputs. The key here is
understanding the nature of the output Y as a
function of inputs X. Traditional models generally
produce numerical outcomes. While generative
AI models can map those numerical values to
different forms of information, making them able to generate
complex responses such as natural language sentences
or images and videos. To summarize at a high level, the traditional
classical supervised and unsupervised
learning processes, take training code and label
data to build a model. Depending on the use
case or problem, the model can give
you a prediction. It can classify something
or cluster something. The distinction lies
in the application. Traditional models
make predictions, classify or cluster data, while generative AI models
are more versatile, creating a wide
range of content. The generative AI method can work with training
code, labeled data, and unlabeled data of all kinds to construct what we
call a foundation model. This foundation model can
then produce new content, such as text, code, images, audio, video, and so on. Generative AI's power lies in its ability to ingest
diverse data types, including unlabeled data to build models that
generate fresh content, which extends beyond traditional
models capabilities. We've come a long
way moving from traditional programming
to neural networks and now to generative models. In the old days of
traditional programming, we had to manually input the rules to
differentiate the cat. We had to embed specific
rules into the program. It was something like if it's an animal with four
legs, two ears, fairy, and shows a liking
for yarn and catnip, then it's probably a cat. And we had to write
all of that in a programming language
and not natural language. In the way of neural networks, we could show the network
images of cats and dogs, then ask, is this a cat? The network would likely respond with a
prediction, it is a cat. So we can see that neural
networks allow for more nuanced decision making
by training on examples, which is an evolution
from hard coding rules. In the generative wave, we can produce our own content, such as text, images, audio, video, et cetera. Models like Palm or
Pathways Language Model, Lambda language model for
dialog applications, and GPT, generative pre
trained transformer, consume vast amounts of
data from diverse sources, including the Internet to construct foundation
language models, which can be utilized simply
by asking a question, whether typing it into a prompt or verbally talking
into the prompt itself. So if we ask what's a CAT, it can give us everything it
has learned about the cat. Generative AI boosts
user interaction, turning users from mere
spectators to active creators. Models like Palm, Lambda, and GPT stand out. They're trained
on large datasets and provide smart
context aware answers. This focus on the user makes generative AI attractive for a range of different
applications. Now let's provide our
formal definition. What is generative AI? Generative AI is a type of
artificial intelligence that creates new content based on what it has learned
from existing content. The process of learning from
existing content is called training and results in the creation of a statistical
model when given a prompt. AI uses the model
to predict what anepected response might be and this generates new content. The emphasis here is on the inheritability of generative
AI to learn and create. Unlike traditional models, which predict based on pre
established relationships, generative AI focuses on understanding the underlying
structure of the input data. After training, the model can generate unique
responses or content, which significantly
broadens the applications and capabilities of AI systems. Essentially, it learns
the underlying structure of the data and can then generate new samples that are similar to the
data it was trained on. So let's see what
is the difference between language models
and image models. Generative language
models learn about patterns in language
through training data. Then given some texts, they predict what comes next. Generative image models produce new images using
techniques like diffusion. Then given a prompt
or related imagery, they transform random noise into images or generate
images from prompts. Let's dig a little deeper
into each of them. As previously mentioned,
generative language models focus on grasping the
inherent structure of pattern within the data. They then leverage these learned patterns to generate novel
responses or content. Which often closely
resemble the original data. These characteristics make
large language models an exceptional example of
generative AIS potential. A generative language model
takes text as input and can output more text and image,
audio, or decisions. For example, under
the output text, question answering is generated, and under output image,
video is generated. So large language
models are a type of generative AI
because they generate novel combinations of texts in the form of natural
sounding language. We also have generative
image models, which take an image as
input and can output text, another image or video. For example, under
the output text, you can get visual
question answering, which is a task in
computer vision that involves answering
questions about an image, while under output image, an image completion
is generated. And on their output video,
animation is generated. Like we mentioned before, generative language
models learn about patterns and language structures through their training data. And then when given some text, they try to predict
what comes next. So in a sense, generative
language models can be seen as pattern
matching systems, honing their ability to discern patterns from the data
presented to them. Now that we provided the formal definition of generative artificial
intelligence, let's end this video with some
examples of generative AI. Here is an example of Google search auto
complete feature. Based on things it learned
from its training data, it offers predictions of how
to complete this sentence. Cats hate, and some of
the suggestions are cats hate the smell of cats hate
water, cats hate cucumbers. Here is the same
example using bar, which is a language
model that is trained on a massive amount of text
data and is able to communicate and generate
human like text in response to a wide range
of prompts and questions. So when I use the prompt, cats hate it answers. Cats hate a lot of things, but some of the most
commons include, and then a list of things that
it thinks cats would hate. And the same prompt
using GPT four produces this
response. Cats hate. Cats can express dislike
or discomfort in response to a variety of situations, objects,
or behaviors. Below are some of the
things that cats typically dislike and list some of the things that it
thinks cats would hate. Similar to Bart,
GPT four is also a language model trained on a massive amount
of text data and is able to communicate
and generate human like text in response to a wide range of
prompts and questions. Now let's look at some
examples of image generation. We use the same prompt on
three different AI tools. The prompt is a cat surrounded
by things cats hate. If we try this prompt on DALE, which is an AI image generator
that is built by OpenAI, the same company that build GPT. We get this result. We can also try it on Adobe Firefly Tik
to image generator. And these are some
of the results that we get from Firefly. We can also try Canva text
to image application, and it provides some
examples of what it thinks is appropriate in
response to our prompts. Please keep in mind that here we used a very
minimalistic prompt, just to show that even without
providing much context, we can still produce results that are more
or less relevant. We would get a much better
result if our prompt included more detail and
followed a solid structure. This points out to
the importance of prompt design and
prompt engineering, which we cover later
on in this section. In the next video, we will
talk about transformers, a technology that made all of this possible. See
you in the next one.
8. L2V3 - Gen AI Applications: Let's have a look at
the type of tasks that different AI
models can perform. This task can generally be classified based on the type of input data they accept and the type of output
data they generate. Here are some examples. Text to text. This is typically used in
machine translation, text summarization, and chatbots
like Bard and chat GPT. For example, if you ask
GPT four, what is a cat? I would tell you, a cat is a small carnivorous mammal
that is often kept as a pet. The term generally refers to, and then it keeps generating more information
related to cats. Another model is text to image. This is used to generate
images from text descriptions. An example of this would be Canvas text to
image application, which creates images
from text inputs. In this example, we
can use the prompt a gray and white cat sitting on a window sill watching
pigeons outside. And it will generate this
image for us. Text to video. AI can also be used to generate videos from
text descriptions, although this is a more
complex and less explored task than text to image generation. For example, using
tools like in video, we can create a video
with just a text prom. The AI model uses our prompt to write a
script for the video and then picks images and video clips that are relevant to the
content of the script. It can even apply filters and
transitions to the video. Some of these tools
can also pick a music that is relevant to the
content of the video. For example, the AI model
can associate pets with playfulness and then pick a playful music to
add to the video. Text to three D,
these models generate three dimensional objects that correspond to a user's
text description. For example, if you ask Shape E, a conditional
generative model for three D assets to make an airplane that
looks like a banana. It creates the three D
object you can see here. We also have text to code. These models are able to use natural language description and then create a code
based on that. It is useful for tasks like
automated code generation, error detection, and
code translation. For example, hat GPT, Bart and Github copilot share the capability
to generate code. Models such as Chat GPT, due to their training on a
wide range of Internet texts, including code,
have the ability to generate code when provided
with a suitable prom. Bart two works along
the same lines, but its training is specifically focused on
programming related text. We also have Github copilot that uses OpenAI's Codex model, which is trained on
publicly available code, enabling it to suggest
code completion and generate code from comments
or function signatures. Takes to task. Takes to
task models are trained to perform a defined task or
action based on text input. This task can be a wide range of actions such
as answering a question, performing a search,
making a prediction, or taking some sort of action. For example, a Ts to task
model could be trained to navigate a web UI or make changes to a
doc through the GUI. We also have image to text. This is used in tasks like image captioning where the AI describes an image in words. For example, BliP which is an AI model capable of both captioning and
generating images, can take an image as
input and provide a description of that
image in text as output. We also have image to image. These models perform tasks like image translation,
for example, converting day images to night, colorizing black
and white images, or enhancing image resolution. For example, Night Cafe is
an AI image generator that can take an image as input to initialize the image
creation process. It then produces a
stylized image as output based on user's prompt and other settings
that can be adjusted. Depending on the
chosen algorithm, artistic or coherent, the start image serves
different purposes. For the artistic algorithm, the shapes and structures in the image are
more significant. But in the coherent algorithm, more attention is paid to resolution, colors and textures. As another example, let's look at Canva's
magic edit feature. Using this feature, we can
select a specific part of the image and replace
it with a different image. For example, here, I
select the basket and then tell the model that I want to replace it with a
mountain range, hoping that the result
will make it look like the cat is
sitting on a rock. And we can see that the model creates some suggestions for me. If I'm not happy
with the result, I can ask the model
to regenerate the results until I
find something I like. And here is the final result. I can even go further and
select the ceiling in the background and ask the model to replace it with
the sky with clouds. And this is how the
result looks like. Video to text. This
involves generating a text description or a
transcription from a video. For example, RS AI can create transcripts or
captions from a video input, even translating them from
one language to another. While what we see
in this example is captioning only the speech
element of the video, some of the more advanced
captioning tools are able to caption non
speech elements as well. Non speech elements can include sound
effects, for example, a bee buzzing, cheese jangling or doorbell
ringing, music, either in the background
or as part of a scene, audience reactions, for example, laughing, groaning, or booing, manner of speaking, for example, whispering, shouting,
emphasizing a word, or talking with an accent
and speaker identification for an off screen narrator or speaker or for
multiple speakers. Similarly, we have
audio to text. This is typically used in
speech recognition systems to transcribe spoken
language into written text. Whisper by Open AI is an example of an
audio to text model. AI can also turn text
into synthesized speech. Text to Speech systems convert text into
a spoken language. Many AI tools such
as play dot HT, can take a text input and
generate human like speech with synthesized voices that can resemble different genders, ages, accents, or
even different tones, such as happy, sad,
angry, et cetera. Image to video. This task
involves generating a sequence of images or a video from a
single or a set of images. For example, cinematic photos, a feature in Google Photos, utilize machine
learning to estimate an image's depth and construct a three D
representation of the scene, regardless of whether
the original image contains depth information
from the camera. Following this estimation,
the system animates a virtual camera to create a smooth panning effect similar
to a cinematic sequence. This intricate process uses
artificial intelligence to transform a static image into a dynamic three
dimensional scene, giving it a video like quality. What we discussed so far
are just a few examples, and the list is continually
growing as the field of AI progresses and as researchers invent new applications
for these technologies. Furthermore, in many
real world applications, these tasks are combined. For example, an AI
system may need to convert speech to text using
a speech to text model, and then process the text
using a text to text model, and after that, generate an appropriate spoken
language response, which uses a text
to audio model. But wait, there's more. There are many other fields that GNAI can have groundbreaking
applications in. For example, just consider
the word of music. Music related tasks are an active area of research
and development in AI. Here are some common tasks. We have text to music. These models can generate
music based on text inputs. For example, they can create
a melody or a composition described by a phrase or a
piece of text. Music to text. On the other hand, AI can
also convert music to text, such as creating sheet
music for a song or generating descriptive
or emotional text based on a piece of music. We have audio to audio, which can convert one type of sound or music into another, like changing the
genre of a song, turning a humming into
a composed piece, or even removing
vocals from tracks. There's also music
recommendation. AI is heavily used in recommending music based
on users listening habits, preferences, and even mood. We also have music generation. Mus Net by Open AI is one
example of music generation. Models like OpenAI's
Mus enet can generate four minute
musical compositions with ten different instruments
and can combine styles from country to
mozart to the beatles. There's also music enhancement. For example, audio Studio can be used to enhance or
alter existing music. It can do it by things like
upscaling audio quality, changing the tempo
or adding effects. And there is also music
source separation. We can also use AI models to separate individual
instruments, vocals, or other components
from a mixed or master track. Metas DMax is an example of a music source
separation tool. So in conclusion, the realm
of generative AI is diverse, fascinating, and
full of potential. The range of tasks
it can perform from takes to take,
takes to image, audio to take, and even
intricate tasks like music enhancement and
generation is truly remarkable. It's a field that's
continuously evolving, pushing the boundaries of
what we thought was possible. As we continue to
explore and innovate, the list of applications
is only going to expand. Generative AI holds the key to many breakthroughs
and advancements that could revolutionize
numbers of sectors and the way we
interact with technology. As we continue to explore
this exciting era of AI, who knows what astonishing possibilities we might uncover. The key is to stay curious, keep exploring and start
imagining a future in which we can interact with
AI systems in a trustworthy, responsible and ethical manner.
9. L2V4 - Prompt Engineering: Let's talk about an
intriguing topic in the realm of generative AI,
prompt engineering. As the name suggests, generative AI is all about
systems that generate output, whether it's text, images, or any other type of content. As you will see in the next
section of this course, large language models or LLMs, which are the powerhouse
of generative AI are designed to generate human like text based on input prompts. In addition to generating
human like texts, LLMs also help translate our prompts into outputs
of other type of content, such as images and videos. This means that the better
our input prompts are, the likelier we are to achieve higher quality output from
any generative AI tool. Today, we'll be exploring several key aspects related
to these input prompts. We'll clarify what
exactly a prompt is and its role in shaping
the model's output. We'll distinguish between
prompt design and prompt engineering
and then move on to introduce various methods
of prompt engineering. And lastly, we'll discuss the limitations of prompt
engineering to give you a realistic understanding and expectation of
this exciting process. So let's get started. So what is a prompt? A prompt is essentially
a piece of text that is given to a generative
AI model as input. But it's not just any takes. It serves a fundamental purpose. These prompts are your
communication link to the model. They direct the AI model and
steer its output generation. The model takes in your
prompt, processes it, and delivers an output that aligns with the
prompts instruction. In other words,
these prompts are your tool for controlling
the model's output. Think of a prompt as your guiding instruction to
the generative AI model, like a director
guiding an actor. The more precise and
clear your direction, the better the performance you
can expect from the actor. Similarly, well
designed prompts enable the model to produce higher quality and
more specific outputs. Remember, the key lies in the quality and design
of your prompts, and this is where
the concepts of prompt design and prompt
engineering come into play, which we are going
to discuss now. As we mentioned earlier, the quality of the prompt
plays a crucial role in determining the quality of the output from a
generative AI model. Here, two concepts come into the picture prompt design
and prompt engineering. Prompt design refers to
the crafting of prompts that are specific to the task that the model is
asked to perform. For instance, if you
want the model to translate a piece of text
from English to French, the prompt would be
written in English and specify that the desired
output should be in French. In essence, it's
all about creating prompts that will generate
the desired output. On the other hand, we
have prompt engineering. This process is more about enhancing the performance
of the model. It involves strategies like leveraging domain
specific knowledge, providing examples of
the desired output, or incorporating
keywords known to be effective for any particular
generative AI model. So you see, while both concepts revolve
around crafting prompts, they serve different purposes. Prompt design is about
tailoring prompts to tasks while prompt engineering
aims to boost performance. However, they aren't
mutually exclusive. In practice, creating an effective prompt
often involves both designing it for the task and engineering it for
better performance. Now let's look at some of
the techniques employed in prompt engineering to maximize the output quality of our
generative AI models. One such method is using
domain specific knowledge. When you know the
task area well, you can leverage that
expertise to design prompts that guide the
model more effectively. For instance, if you're
working in medical AI, you might use medical
terminology and structures in your prompts
to increase accuracy. Another method is to use keywords known to be effective
for a specific model. Just as in search
engine optimization, where specific keywords
help rank pages higher, certain keywords can direct
the model more effectively. The choice of keywords
would be based on the model's training data
and its learned patterns. With models like
Bart or chat GPT, you can directly
ask the model about these keywords and how to use them to
optimize your prompt. We should also consider advanced strategies
such as role prompting, shot prompting, and chain
of thought prompting. Role prompting is a technique where we instruct the GNAI model to take on a certain role or persona while
generating its output. For instance, you could instruct
the model to respond as if it's a historian explaining the causes
of the first World War. The model then uses
its training data to generate a response that
aligns with this person. Shot prompting, on
the other hand, involves giving a shot of context before the
actual instruction. You can provide examples
of the desired output. This helps guide the model
by providing it with a reference or blueprint
of what's expected. For example, if you want
a summary of a document, you might provide
a few examples of summaries along with
the original text. Or if you're looking
for a review of a film, instead of simply saying, write a review of the
film X, you could say, Imagine you have just
finished watching the thrilling film X
in a crowded cinema. Write a review of the film. This added context can
guide the model to produce more emotionally charged
and context aware output. There are different
types of shot prompting, zero, one, and few
shot prompting. Zero shot prompting tasks the Gen AI model
without prior examples. For example, translate this
English sentence into French. The chat is sweeping. Here, we are providing
a task without a specific example of
how it should be done. One shot prompting provides a single example for guidance. For example, continue
the following story. Once upon a time in
a land far away, there was a brave night. And then we provide an example
of a story continuation. The model then tries
to continue the story following the style of
the example we provided. And we have few shot prompting, which is also known as
multi shot prompting. Here, we provide multiple
examples to assist the model. An example would be asking the model to generate
a product review, preceded by a series of
example product reviews. The model will try to write a review similar to the
examples we provided. And last but not least, chain of thought prompting
involves providing a line of reasoning or
argument to the GNAI model. Instead of a direct
question or instruction, you give a series of thoughts
that lead to the question. For example, instead of asking, what are the causes
of global warming, you would prompt with we've seen an increase in
global temperatures over the last few decades. This change often referred to as global warming seems to be
influenced by various factors. What are these causes? These strategies can enhance the richness and relevance
of the model's output, further demonstrating the power of skillful prompt engineering. Remember, these are not
standalone methods, but can often be combined to
engineer a powerful prompt. Now that we have an understanding
of these techniques, let's move on to the limitations
of prompt engineering. While prompt
engineering opens up exciting opportunities
to fine tune the output of a
generative AI model, it's important to bear
in mind that it's not a magic wand that can always
guarantee perfect results. There are certain limitations and constraints we
need to be aware of. First, generative AI models, although powerful,
are not omnipotent. They're trained on a
diverse range of data, but this doesn't mean
they have the ability to accurately answer any question or fulfill any task you prompt. For example, the
model doesn't have the capability to
generate content outside its training cutoff date or accurately predict
future events. Second, the accuracy
and relevance of the model's output
highly depends on the quality and clarity
of your prompt. However, even a
perfectly crafted prompt may not always produce the expected result due to the inherent unpredictability
of the AI models. Third, even with meticulous
prompt engineering, models may sometimes generate outputs that are factually
incorrect or nonsensical. This is because these
models generate responses based on patterns
they learned during training, and they don't understand the
content in the human sense. And lastly, certain tasks
might require a level of specification or domain
specific knowledge that surpasses the
model's training. A general purpose
generative AI model may not be able to accurately generate highly specialized
content or response to highly technical prompts
in fields like law, advanced mathematics, or specific medical
sub disciplines. So while prompt engineering
is a powerful tool, it is essential to be aware of these limitations to maintain realistic expectations and use generative AI models
more effectively. Alright, let's wrap things up. Today, we've explored the world of prompts in
generative AI models. We've learned about
prompt design and engineering and discussed various methods like
shot prompting, roll prompting, and chain
of thought prompting. Remember, prompt engineering
is not a magic bullet. It's a tool, and like any tool, it has its limitations. So experiment with creating your own prompts and
explore the possibilities. Thanks for watching, and I'll
see you in the next one.
10. L3V1 - LLMs: Welcome to introduction
to large language models. Large language models
or LLMs for short, are a subset of deep learning. They intersect with
generative AI, which is also a part
of deep learning. We already explained that
generative AI is a type of artificial intelligence
that can produce new content, including text, images,
audio, and synthetic data. But what are large
language models? When we use the term
large language models, we refer to large general purpose language
models that we can pre train and then fine tune to meet our needs for
specific purposes. But what do we mean by pre
trained and fine tuned? Think about the process
of training a dog. Typically, you instruct your dog on basic commands like sit, calm down, and stay. These commands are usually
enough for day to day life, assisting your dog in becoming a well behaved dog
in the neighborhood. However, when you require a dog to fulfill a special role, such as a security
dog, a guide dog, or a police dog, additional specific
training becomes necessary. The same principle applies
to large language models, just like the
specialized training prepares dogs for
their unique roles, fine tuning a pre trained
large language model enables it to perform specific tasks efficiently
and accurately, whether it's sentiment analysis
or machine translation. The model can be honed to
Excel in the desired domain. These models undergo
training with a broad focus, preparing them to address the standard language
related tasks like text classification, a widely used natural language
processing task which involves categorizing text into organized groups
based on its content. Question answering, which is a significant task in
natural language processing, where the model is trained to understand and respond
to inquiries accurately, essentially simulating
the human ability to comprehend and
answer questions. Document summarization,
where the model is tasked with producing a concise and fluid summary of a large text, maintaining the essence and primary ideas and text generation across
multiple industries, creating human like text, which can be tailored to
a specific industries, whether it's drafting emails
in corporate communication, creating product
description in ecommerce, or generating patient
reports in healthcare. These models possess the
capability to be fine tuned in order to solve unique challenges
within various sectors, including retail, finance,
and entertainment, utilizing comparatively smaller
field specific datasets. For example, in retail, they can be used
for personalized product recommendations
based on text data. While in finance,
they can aid in predicting market trends
from financial reports. Or in the entertainment
industry, they might assist in script generation or
content recommendation, showcasing the flexibility and wide applicability of
large language models. Let's further break
down the concept into three major features of
large language models. Large language models are large, general purpose, and pre
trained and fine tuned. Let's discuss each
of them separately. The term large refers
to two things. Firstly, it points out to the massive size of
the training dataset, sometimes reaching the
scale of petabytes. Secondly, it points to the immense number of
parameters involved. In the realm of
machine learning, these parameters are often
called hyperparameters. Essentially, these
parameters act as the memory and knowledge that the machine gains
during model training. They often outline the
proficiency of a model in addressing a task
such as predicting text. By adjusting these parameters, we can fine tune the model's performance for
more precise prediction. General purpose means
that the models are powerful enough to solve
commonplace everyday problems. This concept is led
by two reasons. Firstly, human language
exhibits a universal nature, irrespective of the distinct
task it's applied to. Secondly, we have to consider
resource limitations. Only a limited number
of organizations have the ability to train these
massive language models, which require extensive datasets and a massive amount
of parameters. So why not let these
organizations construct foundational language
models that others can use. This brings us to
the final aspect of large language models, pre training and fine tuning. Essentially, this means
that a large language model is first pre trained
for broad use cases, using an extensive dataset, collecting a wide range of linguistic patterns
and knowledge. Following this pre
training stage, the model is then fine
tuned to cater to particular goals using a relatively smaller, more
specialized dataset. This two step process ensures the model maintains a broad
base of understanding while also being able to
deeply understand and generate predictions specific
to a given field or task. And with that, we wrap up our introduction to
large language models. In the next video,
we will discuss some of the benefits of using LLMs.
11. L3V2 - LMM Benefits: In this video, we're going to explore the various benefits of using large language
models or LLMs for short. We'll see how these
impressive AI models can be used for a
variety of tasks, how they function with
minimal field training data, and how they continue to improve as more data and
parameters are added. We'll also discuss how LLMs adapt to different
learning scenarios, even with the presence
of minimal prior data. So hopefully by discussing
these benefits, we can see why LLMs are a major leap forward in the realm of artificial
intelligence. There are many clear
and impactful benefits in employing LLMs. They aren't restricted
to a single task. One model by itself is a multitasking powerhouse fulfilling many different roles. These sophisticated LLMs, which are trained on an
enormous volume of data and develop billions of parameters have the capability to handle a variety of tasks. For instance, they excel
at answering questions. LLMs can sift through their
extensive training data to find the most appropriate
and accurate answers to a wide range of queries. They are capable of understanding
context, ambiguity, and even nonces in language, making them highly effective
at question answering tasks. The model generates a response matching the queries context, tone and complexity, providing precise and contextually
appropriate answers. In terms of text generation,
LLMs truly shine. They can create high quality
text that is coherent, contextually appropriate,
and remarkably human like. Whether it's generating a
piece of news, writing a poem, or even coming up with
an engaging story, LLMs are highly capable. They can also assist with tasks
such as content creation, writing assistance, and
even draft completion. By considering the given input and using their vast
knowledge base, they can generate text that is not only
grammatically correct, but also rich in content meeting the demands
of various use cases. Large language models are also highly capable in
language translation. Equipped with the knowledge of numerous languages from their
extensive training data, they can accurately translate texts from one
language to another, maintaining the semantic meaning and context of the
original text. Not only do they work with
commonly spoken languages, but LLMs can also handle
less widely used ones, making them an invaluable tool for cross cultural
communication. In addition, they can comprehend and adapt
to different dialects, slang and informal language, ensuring the translations
are accurate, readable, and natural sound. LLMs are also a powerful tool when it comes to brainstorming. They can generate ideas, suggest alternative
perspectives, and contribute to
creative problem solving. Whether you're looking
for a catchy headline, a unique marketing strategy
or a fresh plot for a novel, these models can generate numerous possibilities based on the context you provide them. By training on a
vast array of data, they have learned
to come up with diverse and innovative ideas which can spark
further inspiration and help move your
project forward. Not only that, but
they can also offer critique and suggestions for improvement on existing ideas, acting as an artificial
brainstorming partner available at any time. And there's much more. Beyond the task we've
discussed so far, LLMs have an abundance
of other capabilities. For instance, they can be
used in sentiment analysis, determining whether a piece
of text conveys a positive, negative or neutral sentiment. They can assist in summarizing
long pieces of text. LLMs can also be used
in tutoring systems, providing explanations
for complex topics in a variety of subjects. The possibilities are
endless and constantly expanding as these models
continue to evolve and improve. Another major benefit of large language models
is their ability to perform impressively with
minimal training data tailored to a specific problem. They can deliver quality
results even when provided with a small amount
of domain specific data. This quality makes them
highly adaptable to few shot or zero shot
learning scenarios. Now, let's not get
confused here. Let me explain what
is the difference between shot learning
and shot prompting. As we discussed in the
prompt engineering video, shot prompting
involves giving a shot of context before the
actual instruction. This added context can
guide the model to produce more emotionally charged and context aware output. We also said that there are different types of
shot prompting, zero, one, and few
shot prompting. Zero shot prompting tasks the GNAI model without
prior examples. One shot prompting
provides a single example, and few shot prompting provides multiple examples
to assist the model. In the context of
machine learning, few shot learning
refers to scenarios where a model is trained
on a limited set of data. This process is particularly
beneficial in situations where large amounts of training data are not
available or practical. On the other hand,
zero shot refers to an even more impressive
capability of models. It implies that a
model can identify and understand concepts or tasks that it has not been
explicitly trained on. It's like having an intelligent
system that can make logical assumptions and deliver solutions based on the
knowledge it has gained, even when faced with
completely new scenarios. So we can see that
LLMs can excel even in zero shot scenarios thanks to their training on vast datasets. They can handle new
situations by leveraging their extensive knowledge to
infer appropriate responses, even without having
directly encountered the specific scenario
in their training data. In essence, LLMs can be quickly adapted to a
wide range of tasks, even when those
tasks are outside of the specific domain that the model was
originally trained on. This adaptability opens up
a world of possibilities for using these models in diverse fields
and applications. A key benefit of LLMs is their consistent
improvement as we increase the amount of data and the number of parameters
involved in their training. For example, consider the
journey from GPT 3.5 with 175 billion parameters to GPT four with an estimated
170 trillion parameters. The exponential increase in
the parameter count led to a notable advancement in the model's capabilities,
understanding, and precision. This growth trend suggests that LLMs can evolve even further as we continue to push
the boundaries of available data and
computational resources. GPT four significantly
outperforms GPTs 3.5 due to its larger
number of parameters. It shows superior understanding
of contexts and nuances, delivers more accurate
responses and performs better in translation
and summarization tasks. In addition, GPT four is more
capable in understanding complex instructions
without having to breaking them down
into smaller steps, showcasing its
superior ability in adapting to zero shot and
few shot learning scenarios. And the fourth benefit of LLMs is that by interacting
in natural language, they improve accessibility to AI for anyone with
a basic computer, eliminating the need for
specialized technical skills. Whether you're a student
looking for help with homework, a writer needing inspiration or a business owner seeking
market trends analysis, LLMs are here to help. Their speech recognition and human voice synthesis abilities
opens up possibilities for those who might
struggle with typing or even individuals who
cannot read and write. In addition, their high quality
translation capabilities remove language barriers, making these powerful
tools usable to people from diverse
demographics and backgrounds. Essentially, LLMs
are transforming the way we interact
with technology, bringing complex AI abilities to a broad range of
users worldwide. In conclusion, large
language models are breaking down barriers, making AI accessible to all. With their versatile
capabilities and ever evolving potential, LLMs are revolutionizing the way we interact with technology. As we look to the future, we're sure to see these
models continue to enhance our lives and work in
unimaginable ways. In the next video, we
look a little deeper into three examples of LLMs Palm and Lambda from Google and GPT from Open AI. See
you in the next one.
12. L3V3 - Examples of LLMs: In this video, we look at some examples of large
language models. We get into the details
of three state of the art LLMs, Palm, Lambda, and GPT, and we will also
discuss some other LLMs, which have shown promise in
the field of generative AI. So let's start with Palm, which stands for
pathways language model. Palm is a 540 billion
parameter language model developed by Google AI. It is trained on a massive
dataset of texts and code and can perform a
wide range of tasks, including question answering,
natural language inference, code generation, translation,
and summarization. It utilizes Google's
pathways system, which enables it
to be trained on a massive dataset
of text and code. With 540 billion parameters, Palm is one of the largest
language models in the world. It is a dense decoder
only transformer model, which means that
it is specifically designed for natural
language generation tasks. Palm can achieve a state of the art few shot
performance on most tasks, which means that it can learn to perform a new task with
only a few examples. This makes Palm a powerful tool for a variety
of applications. So what is pathways system? Pathway system is an AI
architecture that stays highly effective
while generating across different
domains and tasks. It is able to effectively train a single model across
multiple TPU V four pods, which are Google's custom design machine
learning accelerators. This allows Pathway system to
handle many tasks at once, reflect a better
understanding of the world, and learn new tasks quickly. Pathway systems
achieves this by using a number of techniques,
including model parallelism. This technique allows
multiple models to be trained on the same
data simultaneously. This can improve
the training speed and efficiency of
pathway system. Data parallelism. This technique allows
multiple copies of the same model to be trained
on different datasets. This can improve the accuracy of pathway system by allowing it to learn from a wider variety of data and automchine learning. This technique allows
pathway system to automatically optimize
its training parameters. This can improve
the performance of pathway system by preventing it from overfitting
to the training data. Pathway system is still
under development, but it has the potential to revolutionize the way we
build and deploy AI models. By enabling models
to orchestrate distributed computation
for accelerators, Pathway systems can make it
easier to build and train large complex AI models that can handle a wide
variety of tasks. Next, let's discuss Google's
another LLM, Lambda. Lambda stands for language
model for dialog applications. Lambda is a family of neural language models
developed by Google AI. It is trained on
dialogue and has up to 130 billion parameters, pre trained on a dataset
of 1.56 trillion words. Lambda has three key objectives of quality, safety,
and groundness. These objectives are measured by metrics such as sensibness, specificity, interestingness,
and informativeness. Lambda is designed to
be informative and comprehensive while also
being safe and grounded. It is able to generate different creative text
formats like poems, code, scripts, musical pieces, email, letters, and much more. It will try its best to
fulfill all your requirements. Lambda has the potential to revolutionize the way we
interact with computers. It can be used to create more natural and engaging
dialogue experiences and to provide users with more helpful and
informative assistance. As we said earlier, there are three key benefits in Lambda. One, natural and
engaging dialogue. Lambda can engage in natural and engaging
dialogue with humans. It can understand the
context of a conversation, and it can respond in a way that is both informative
and interesting. Two, helpful and
informative assistance. Lambda can provide users with helpful and
informative assistance. It can answer questions, generate creative text formats,
and follow instructions. And three, safe and grounded. Lambda is designed to
be safe and grounded. It is trained on a massive
dataset of text and code and is able to distinguish between safe and unsafe content. And now let's move
on to OpenAI's GPT, which is short for generative
pre trained transformer. GPT is a type of deep learning model used to
generate human like text. It was developed by OpenAI, a nonprofit research company
and is funded by Microsoft. GPT utilizes a
transformer architecture, which is a type of
neural network that is well suited for natural
language processing tasks. Parameters for the
latest version of GPT, which is GPT four
are undisclosed, but it is likely
much larger than GPT 3.575 billion parameters. This means that GPT four has a greater capacity to learn
and understand language. GPT four has also
shown to be proficient at skills assessments
such as the bar exam. In a recent study,
GPT four scored in the 90th percentile
on the bar exam, which is a standardized
test that is required for admission to the
bar in many jurisdictions. GPT is a powerful tool that can be used for a
variety of tasks, including generating text,
answering questions, translating languages,
and much more. In addition to what
we discussed so far, there are other LLMs that
are transforming how we look at AI and are helping shape
the future of the field. Let's briefly review
some of them. The first one is touring
NLG by Microsoft, a larger scale language model trained on diverse Internet text that is capable of writing coherent paragraphs and
even whole articles. Burt by Google, a revolutionary
transformer based model that is pre trained on a large corpus of texts and fine tuned for various natural
language processing tasks, providing a high level of understanding of context
and semantic meanings. Transformer XL, a language
model developed by Google's brain team
that innovatively handles long term
dependencies in sequences, significantly enhancing
the performance of tasks like text generation
and translation. There's also Excel Net, which is an extension
of transformer xl, developed by Google Brain and
Carnegie Melon University. It uses a permutation based
training method to overcome some limitations of BIRT and outperform it on
several benchmarks. Electra is a highly efficient
pre training approach developed by Google research that uses less compute power for similar or even better
performance than models like BRT. We have Megatron transformer, a transformer based model
developed by Nvidia, designed to train very
large language models with billions of parameters. It leverages parallel
processing capabilities of modern GPUs. And we have ama
introduced by meta, Lama is a foundational, smaller yet performant
large language model. It's designed to broaden
access to AI research, requiring less computational
power and resources for testing new approaches and
validating existing work. Lama can also be available
in different sizes, ranging 7000000000-65
billion parameters. So in this video, we dived into the
exciting advances in large language models, focusing on Google
AI's Palm and Lambda, as well as OpenAI's GPT. We also highlighted other
noteworthy models in the field, such as touring Energy, BRT, XLNt and mega
tron Transformer. We discussed how these LLMs with parameters that can reach
hundreds of billions redefine multitask
learning and revolutionize our interaction with computers
through engaging dialogue. These models have already demonstrated their
exceptional abilities in tasks like text generation and even practical assessments
like bar exam, and they only keep
getting better.
13. L3V4 - Foundation Models: In this video, you will learn
about foundation models, which as the name suggests, provide a foundation for
generative AI models. Specifically, we
start by providing a definition for
foundation models and explaining what they are. Then we will talk
about platforms that provide different kinds
of foundation models with a focus on vertex
AI's model garden and end the video by discussing different types
of foundation models. Now, what are foundation models? Let's ask Bart to help
us answer the question. It provides three
different drafts. Foundation models are large pre trained
neural networks that can be fine tuned
for a variety of tasks such as natural
language processing, computer vision, and
speech recognition. Foundation models are large
language models trained on massive datasets that
can be fine tuned for a variety of downstream
tasks such as translation, question answering,
and summarization. Foundation models are large pre trained machine
learning models that can be adapted to a wide range of tasks such as natural
language processing, computer vision, and robotics. So foundation models are
large AI models that can be adapted to a wide range of tasks and can generate
high quality output. Even though AI models
aren't brand new to us, there's something
quite different about these foundation models. They come equipped with
several key characteristics that set them apart, marking a significant shift from the AI models we've seen
in earlier generations. Foundation models aren't
limited to a single task. They're multitask. A single foundation
model can tackle a wide range of tasks
right out of the box, such as summarization, question answering
or classification. They can handle various
modalities of data types, including images, text,
code, and much more. With minimal or no
training at all, foundation models can
work well out of the box. They can also be tailored for specific use cases using only
a handful of example data. Because they are
typically trained on vast amounts of diverse data, these models can learn general patterns and
representations, which can then be applied across various
domains and tasks. Before now, foundation models
were difficult to access. Required specialized
machine learning skills and compute resources
to use in production. But with the recent wave of advancements in generative AI, things are changing
dramatically. For example, take Vertex AI, a fully managed machine
learning platform available on Google Cloud. If you are already familiar
with Google Cloud's tools, you already know that Vertex
AI enables you to access, build, experiment, deploy, and manage different
machine learning models. Things like traditional data
science, machine learning, MLPs, or simply creating
an AI driven application. Vertex AI is equipped to
support all such workloads. That's pretty cool and all, but this is where things start
to get really interesting. Recently, Google Cloud announced two major tools which enable us to do even more model garden
and generative AI studio. These tools make
foundation models available to a much
broader audience, even without much experience with coding and ML development. The last section of this
course is dedicated to introducing Google
Cloud Gen AI tools, and we will talk more about generative AI studio
in that section. So in this video, let's
only focus on model garden. What is exactly where
the XAI Model Garden? It's a single place
to explore and interact with both Google's
industry leading models, as well as popular
open source models, or with Google Cloud's enterprise EmlopsToling
support built in. It houses both traditional
machine learning models and foundational models for
generative AI applications. Inside Model Garden, you'll find a range of models
from Google Cloud, Google research, and
various external sources, accommodating a variety
of data formats. So this is what inside Vertex AI's Model
Garden looks like. With many different enterprise ready models at your disposal, Model Garden enables you to select the most
suitable model, depending on your use case, your expertise in ML, and your available budget. Please keep in
mind, we are using Vertex AI's Model Garden as
an example of a platform that Google Cloud provides for different generative AI and other machine learning
tools and APIs. There are other
companies which have their own versions of
Model Garden as well. For example, Amazon Sagemaker, IBM Watson assistant, and
Vida Clara, data IKAI, Open AI chat GPT API, Microsoft Azores
Machine Learning, data Robot AI and Databricks
LakehousePlatform, all provide tools and APIs for both traditional
machine learning models and generative AI
foundation models. There are different types
of foundation models, including text generation
and summarization, chat and dialog, code
generation and completion, image generation and
modification, and embeddings. Now, let's have a deeper look into each of them. Text models. These models help you perform natural language tasks with
zero or few shot prompting. They can do tasks
like summarization, entity and information
extraction, idea generation, and much more. For instance, a
journalist could use text models for summarization of large articles or reports. An academic researcher
could extract specific entities of information from a massive corpus of papers, or in a brainstorming session, an entrepreneur
might use the model to generate new ideas
or perspectives. As mentioned earlier,
these models work effectively
right out of the box. However, if you desire the model to follow certain
specifications, you can provide
structured examples to guide its responses. This allows for a
tailored experience that aligns with your specific
needs and objectives. Next, let's focus on dialogue. These models are
also text based, but they've been fine tuned to hold a natural conversation. Dialogue models allow you to engage in multiple
turn conversations, keeping the context
throughout the interaction. Consider a scenario in a
customer support center, an AI chatbot powered by these dialogue models
can assist customers remembering the previous turn of the conversation and providing
context aware responses. It can answer questions,
summarize information, or even guide users through
complex procedures, or while being fine tuned
to your specific domain. These models can help you build powerful tools that greatly
enhance the user experience, whether deployed on a browser, a mobile app, or other
digital interfaces. Moving on to code
completion and generation. These models act as your
supercharged coding assistant. You can give a natural
language prom to describe a piece of
code you want written, or you can use the model to auto
complete a piece of code. There are even extensions
for IDEs that can take a partial code snippet as input and then provide
the likely continuation. Imagine you're working on a
complex software project. It can help eliminate the
tedious aspects of coding, even provide help with
debugging your code, allowing you as a
developer to focus more on the creative problem solving and less on syntax
or routine code. And now let's dive
into image generation. These models allow
you to generate and edit images according
to your specifications. In addition, you can utilize these models for
media related tasks, such as classification,
object detection, and more. Plus, such models
typically incorporate content moderation mechanisms to ensure responsible
AI safety practices. Imagine you're building
an e commerce platform. A model for object detection can automatically tag items
in the product images, while an image
generation model can create new product images
based on descriptions. The content moderation
feature would ensure all user generated content aligns with your
platform's policy, enhancing user experience,
last but definitely not least. Let's talk about embeddings, which might sound
a little complex, but it's actually a
really cool concept. So let me explain it for you. Imagine you have a huge basket of fruits and you want
to sort them out. You could sort them by color, size, weight, or even taste. Similarly, in the world of data, we often need to sort
or categorize things. But the things
we're dealing with are words or
phrases, not fruits. That's where the
embeddings come in. They are like a unique ID card
for every word or phrase, but instead of a card, it's a list of numbers, which we call a vector. This list of numbers captures the essence of that
word or phrase. Is meaning is context and its relationships
with other words. With embeddings, we can make
sense of unstructured data, like a long book or a Twitter feed and use
this understanding to do things like powering
recommendation engines or targeting advertisements
more effectively. For example, consider
the realm of e commerce. A embedding model can be used to power
recommendation engines, matching users with the
products they are most likely to be interested in based
on their browsing history. Or in digital marketing, these models can enhance
ad targeting systems, enabling highly
personalized advertising. They can also be used for
complex classification tasks, search functionality, and
many other applications. So in conclusion,
foundation models represent a significant leap
forward in AI technology. They offer a powerful
adaptable base that can be used for a wide range of tasks
right out of the box. With platforms like
Vertex AI's Model Garden, these tools are more
accessible than ever before, putting advanced AI
capabilities into the hands of a much wider
population of users. From natural language tasks to multi turn dialog,
code implementation, image generation
and modification, and semantic
information extraction, the potential applications
of these models are vast. Whether it's enhancing customer
service with AI chatbots, assisting developers with
auto generated code, or powering
recommendation engines, foundation models are
shaping the future of AI. With foundation models and
the power of generative AI, we're not just
predicting the future. We're building it.
In the next video, we see some of the
amazing applications that different types of
generative AI models offer.
14. L3V5 - LLM Development: Let's talk about how large
language models are developed. In this video, we start
by providing a comparison between LLM development and traditional machine
learning development. Then we will talk about three main kinds
of LLMs, and at the end, discuss a concept called chain
of thought reasoning and how that can help with designing
better prompts for LLMs. Let's kick things off by comparing the
development of LLMs using pre existing models with the traditional approach of
machine learning development. In the LLM world, there's no prerequisite
for technical expertise or extensive training
examples, and guess what? You can forget about
model training, too. It's all about the
art of prompt design, clear, concise, and full
of useful information. The other hand, traditional machine learning requires you to roll up your sleeves and dig
deep into training examples, model training,
and even sometimes needing a basic knowledge of hardware and computing power. There are three
main types of LLMs, generic, instruction
tuned, and dialog tuned. Each of these models require its unique style of prompting. Generic language models work like your phones autocomplete, predicting the next word based on the training data's
linguistic patterns. Instruction tuned models,
on the other hand, are responsive to
specific directives, whether it's summarizing a text, generating a poem in the
style of a famous poet, or offering a sentiment
analysis of a statement. These models follow
the instructions embedded in the input. Lastly, we have
dialog tuned models. These are a subset of instruction tuned
models specifically designed for
interactive context, much like a chat with a bot. So let's dive into examples of these three kinds and
see them in action. Before jumping into examples
of different kinds of LLMs, let's provide a
definition for tokens. A token is a unit of data
that the model processes. It can be a word
or part of a word. We'll begin with the
generic language models. They're pretty straightforward. Their primary task is to predict the subsequent word based on the context provided
by the training data. Let's take a simple example. The cat sat on and now
we want to know what the most probable next word is the model tells us
the is that answer. Just like how your phones autocomplete features
would suggest. It's a fascinating
glimpse into how AI can mimic the way we
naturally communicate. Moving on to instruction
tuned models, these models shine when it
comes to generating responses. They take their cue from the instructions
given in the input, whether it's a request
to summarize a text, generate a poem in
a particular style, or even classify
a text sentiment. It's like having your very
own digital assistant, always on standby to carry out your
instructions precisely. And lastly, we have
dialect tuned models, which are a specialized type
of instruction tuned models. However, they aren't just
waiting for instructions. They're trained to engage in a back and forth
conversation. You might typically encounter them in the form of chatbots. If you've ever asked a
virtual assistant a question, it's likely you've interacted
with this kind of model. It's all about enabling a natural conversational
interaction. Now it's time to explore
an interesting concept, the chain of thought reasoning. This is an observation that models are more
accurate in producing correct answers when
they first generate a reasoning pathway or chain
leading to the answer. Let's consider a simple example. Roger has five tennis balls
and buys two more cans, each with three balls. How many balls does
Roger have now? Initially, the model might struggle to provide
the correct answer. However, after presenting
the problem a second time, the model becomes more likely to conclude with
the correct answer. The chain of thought reasoning
assists in enhancing the understanding and
response capabilities of large language models. In conclusion, the
development and deployment of large language models open up exciting new avenues in the
world of machine learning. As we continue to improve and
refine these technologies, we anticipate a future where advanced language
comprehension by AI drastically changes our interaction with
digital platforms. Now that we understand
how LLMs are developed, it's time to see why tuning
them for specific tasks is important and how we can tune
LLMs in an efficient way. See you in the next video.
15. L3V6 - Tuning LLMs: In this video, we will talk about the importance of tuning LLMs for specific tasks and
how to do it efficiently. It's an interesting
thought to have a model that can
handle everything. But in practice, LLMs come with their fair
share of limitations. To increase their
reliability and efficiency, LLMs need to be fine tuned for specific tasks and on
specific domain knowledge. Just like a professional athlete specializing in their sport, these models need to refine their skills to master
their performance. Let's start with a
simple task example. Question answering. This is a subdomain of natural language
processing that's about automatically answering questions posed in
everyday language. These Q&A systems are powerhouses capable of
tackling a range of questions from
factual to opinion based thanks to their
extensive training on text and code. However, the secret
ingredient for this model success
is domain knowledge. Consider this. When you're developing a QA model
for customer support, healthcare, or supply chain, domain knowledge becomes
a critical requirement. In customer support, a
domain tuned LLM could provide insightful information on subscriptions and services, ensuring your clients receive efficient AI assisted service. In the realm of education, these models can offer detailed information
about courses, tuition fees, or
academic policies. For healthcare, they could serve as self management
tools for patients, providing critical health
related information. Retail businesses
could benefit from better AI chatbots and
product visualization, elevating the
customer experience. And in the realm of
supply chain management, LLMs could offer valuable logistics information
and inventory insights. And let's not forget about
the big tech companies. They could use these models to provide superior tech
support to customers. Each sector has its own
unique requirements, and tuning an LLM, according to these
specifications, can drastically enhance
the model's effectiveness. While generative
Q&A models can use their train knowledge
base to answer questions without needing
specific domain knowledge, Fne tuning these models on
domain specific knowledge significantly boost their
accuracy and reliability. It's like providing the model a detailed map of the terrain
it's supposed to navigate. Take Vertex AI as an example. It provides task specific
foundation models that are already tuned for
a variety of use cases. Let's say you want to understand your customer's
sentiments toward your product or services better. Vertex AI has a sentiment
analysis task model that is just right for the job. Perhaps you're in the retail
or real estate sector, and you need to perform
occupancy analytics. There's a task specific model
designed for that as well. These models honed
for specific tasks, demonstrate the value of tuning. They're more
efficient, targeted, and effective at their
respective jobs. The ability to select and
use a model that aligns with your specific needs
can dramatically enhance the overall effectiveness
of your AI solutions. Okay, now it's time to formally define what
we mean by tuning. Tuning refers to the process of adapting a pre
trained model to a more specific task
such as a set of custom use cases or new domain by training
it on new data. Tuning is achieved by
training the model on new data that's relevant
to the task at hand. For example, if we're working within the legal
or medical sector, we would collect
training data from these domains to tune
our model accordingly. But what is fine tuning? Think of fine tuning as a high precision
adjustment to the model. You bring your own dataset
and retrain the model, affecting every
weight in the LLM. It can be a labor and
resource intensive job and requires hosting your
own fine tune model. And that can make it
impractical for many use cases. But it's important to know
that a fine tune model is equipped with a high level
of accuracy and specificity. Let's take a real
world example to illustrate the power
of fine tuning. Picture a healthcare
foundation model that has been extensively trained on a wide
range of healthcare data. It can perform various
tasks seamlessly, answering medical questions,
analyzing medical images, finding patients with similar
conditions, and much more. The reason behind
this success is fine tuning with domain
specific knowledge. By doing so, it becomes a specialist rather
than a generalist, delivering precise
and reliable results within the healthcare context. This process underscores
the immense potential and versatility of tuning, transforming a one size
fits all model into a highly specialized tool to navigate complex
healthcare scenarios. Fine tuning is a great way to boost a model's performance, but similar to renovating
a whole house, it can be expensive and
not always practical. So if we're looking for
a more efficient way to tune large language
models, what can we do? One approach to follow is parameter efficient tuning
methods or PETM for short. Think of PETM as
giving your model a makeover instead of
a complete renovation. Normally, with fine tuning, we adjust all of the
parameters of the model, which is complicated
and time consuming. But with PETM, we
focus on changing just a small subset of these parameters or even
adding a few new ones. Maybe we add on some
extra layers to the model or throw in an
extra piece of information. Figuring out the
best way to do this is still a hot topic
among researchers. The key takeaway here is that
PETM is like a shortcut. It helps us avoid the need
to retrain the entire model, saving us time,
effort, and resources. Plus, it even simplifies the process of
using these models later on as we just use the base model and add
on our extra bits. We've now reached
the conclusion of our exploration into the world
of large language models. From this section, we've gained valuable insights into LLMs, starting with an introduction to their structure and function. We discussed the numerous
benefits of using LLMs and provided some
examples for them, including Palm, Lambda and GPT. We also talked about the
process of LLM development, highlighting how it's different from traditional machine
learning development. And most importantly, we underscored the significance
of tuning LLMs. Diving into the ways it enhances their
reliability and accuracy. We've seen how domain
specific knowledge can significantly improve their performance
and learn about efficient tuning methods like parameter efficient
tuning methods. In the next section,
we will familiarize ourselves with four major
tools on Google Cloud that enables us to access
and fine tune generative AI models and build our own generative
AI applications.
16. L4V1 - App Sheet: Let's talk about apshet an innovative no code
platform from Google that is leveraging the power of generative AI to transform
app development globally. Imagine a world where anyone, irrespective of their
coding skills can quickly create data centric apps
for Google workspace. That's exactly what
apsheet is designed to do. Remember the tedious process of traditional app development, from conceptualization to drafting project
specifications, and from team
collaboration to coding, it was a long and
exhaustive journey. But with apsheet the app
development life cycle has been streamlined
drastically. What used to take months can now be accomplished in
days or even hours, freeing up your time for
more valuable tasks. The beauty of Apshet
is its versatility. It enables you to create
apps for multiple platforms, including desktop, mobile
and chat applications. Apsheet provides an array of applications that are as
varied as your specific needs. It might be that you're
managing a warehouse and need a streamlined solution
for inventory tracking. Perhaps you're organizing
a major corporate event and need a detailed
event planning tool, or maybe you're running a massive multifaceted
marketing campaign and need an app to coordinate
the many moving parts. From customer
relationship management to supply chain coordination, employee scheduling to
project management. Apsheet is adaptable enough to handle your specific needs. Its flexibility and versatility open the doors to
countless scenarios, making it a go to platform for developing customized
data centric apps. As long as you have a
clear idea about what you need and you can explain
it in natural language, apsheet can help you turn that
idea into an application. So let's dive deeper. Recently, Apshet introduced
new capabilities all powered by generative AI. Now, you can turn your idea into a fully functional
app within minutes, and you can do this
using natural language. For instance, you want to create an app for tracking
travel expenses. All you need to do is to
describe your process to apset. Aphet then takes over asking follow up questions to better understand the
requirements of your app. Once Apsheet has gathered
enough information, it presents a preview
of the tables for your app and even provides sample data
to help you test that. Then Upseet proceeds to build
the starter app for you. As soon as the app is ready, you can launch it, try it out and make any necessary
adjustments. Interestingly, you
can continue using natural language to specify
the changes you want, and apsheet will assist
you in refining your app. Creating an app through natural language
with zero coding, that's the magic of apsheet. It empowers anyone to
develop applications for their organization
rapidly and efficiently. So let's see how Apsheet leverages generative AI
to make this possible. When the user
interacts with apheet, Dialogflow and generative AI, aided by a custom trained LLM, work together to provide the necessary information
to create the app. Dialect flow gathers
essential information about the user's business problem that Apsheet will use to
construct a starter app. Apsheet tries to make
this app align as closely as possible with
the user's ideal solution. After Direct flow has gathered
the required information, Apsheet sends a
request to the LLM to assist in generating
the data model and views needed for the app. When the LLM provides
the right Schema, apsheet utilizes all the
collected information to build a starter app
in just a few minutes. The delivered app includes
a comprehensive database, an intuitive app interface, and any specific configurations that were expressed
during the interaction, such as notification
preferences. Once the starter app is ready, users can continue
collaborating with Apsheet to fine tune and
improve the app further. Depending on the
complexity of the request, apsheet may utilize
both Dialogflow and the LLM during
this interaction. The combination of
Dialect flow and the LLM enhances
the capabilities of Apheet allowing it to handle even the most complex app
development requests. You can even customize
these two technologies. For dialect flow, you can
customize it to help you create conversational
chat interfaces. Here's how you do it. First, you create a
dialect flow agent. Then define your customized
intent and entities. Once that's done, the
dialect flow API is there to help you integrate this agent into
your application. In essence, you're
tailoring a piece of sophisticated technology
to your specific needs. For the LLM, you can
design a model to cater to your application's
unique demands with the Vertex AI
generative AI Studio. This platform presents a
range of foundational models from Google Cloud that you can refine according
to your needs. You can achieve this by
formulating and adjusting prompts as necessary and honing the
models using your own data. So in conclusion, by benefiting from generative
AI technologies, specifically Google's
Dialect flow and a custom trained LLM, Appset allows any
individual to develop data driven apps with no coding experience and
in a very short time. It has become a
powerful platform that allows users to generate different apps using
natural language. And that concludes
this section on four powerful
generative AI tools available on Google Cloud. In the next and last
section of this course, we will try to use
these tools to build our own application using
the power of generative AI.
17. L4V2 - Gen App Builder: The Jen App Builder available through Google
Cloud's Vertex AI masterfully blends
foundation models with the strength of search
and conversational AI, empowering a new range
of users to craft innovative generative
AI applications in no time and with no
coding skills required. The human like and engaging nature of online
interactions presents an opportunity for
users to enhance their connections with
their potential audience. For enterprises, this means better communication
with customers, employees, and partners. With Jen App Builder, creating these powerful
GN AI applications does not require
any coding at all. Think about crafting your
personalized digital assistant, custom made search engines, knowledge bases, educational
applications, and much more. With Jen App Builder, you hold the power to bring
such visions into life. The Jen App Builder features a user friendly dragon
drop interface, making the process of app design and development
much smoother. It has a visual
editor which lets you easily create and modify
your app content. The built in search
engine allows users find information
within the app, while the conversational
AI engine allows interactions
in natural language. So the Jen App Builder provides the flexibility to create an enterprise search experience, a conversation or chat
experience, or even both. The process is simple. You start by building
a content source, which could be a
word document or a spreadsheet containing
information about your business. Next, you select the features you'd like to incorporate
into your app. This could be search,
chat, or both. And once you're done,
simply hit Create. But wait, there's more. This app also allows
you to control and customize generated responses or create default responses. If that isn't enough, you can always access
granular control over the responses with options to
control the response type, set forbidden terms, and disable generated
responses when needed. But don't worry. Even with
generated responses disabled, your GAIPowered app
can still answer complex questions thanks to
Google's search technology. Jen App Builder also has the capability to complete transactions on
behalf of the user. With the integration of
pre structured flows for common use cases
such as checking order status or
explaining bills, you can effortlessly add these functions to your app
with just a single click, but it's not limited to only providing predefined
functionalities. It lets you create your unique transaction
flow with the help of a simple graph based interface to outline high level
business logic. If you prefer, you
can even make use of prompt based flow
creation to explain your logic using a
straightforward natural language. Once you're happy with
your app configurations, it's ready for testing. And if everything looks good, Jen App Builders in built
integrations facilitates a seamless launch of your app on your website or popular
messaging platforms. It also offers connectivity
with telephony partners. To deploy your new app, you just need to get the
widget Deployment code. It's as simple as that. So you can see that Jen App Builder allows
you to easily publish your conversation or searchbt to a website or connect to
popular messaging apps. Jen App Builder leverages the
power of AI enabling you to create chatbots that can handle tasks like answering
domain specific questions, processing multimedia inputs, and delivering
multimodal responses. These chat bots can guide
users to relevant content and deliver generative AI responses even without specific
domain knowledge. And they can complete
transactions, summarize information using AI, and have the
flexibility to pause and resume conversations
whenever needed. With Jen Apbilder,
you're crafting digital assistance that redefine the standards of
online interactions. In conclusion, Jen
Abuilder is where the strengths of Google's
current foundation models, enterprise search, and
conversational AI come together. It empowers you to
effortlessly build advanced applications that
redefine online experiences. It's user friendly interface and visually
appealing editor pave the way for creation and modification of app content
with minimal effort. With capabilities ranging from
built in search engines to a conversational AI engine and a user friendly and
intuitive interface, Jen App Builder offers an extensive toolkit for building responsive
dynamic apps. You can build chatbots
capable of multimedia input, multimodal responses, and
domain specific questions. And with the ability to complete transactions and pause
and resume conversations, these chatbots are more
than just ordinary bots. They are designed to handle the most complex tasks
and interactions, all while being
easy to publish and connect to your website or
popular messaging apps. Now let's move on to the next interesting App Builder available on Google
Cloud app sheet.
18. L4V3 - Maker Suite: Maker SID is an intuitive
browser based tool designed to enable rapid and user
friendly prototyping with Palm to model. The integration of
Maker SUID with Palm API means that now we are able to access the API through a user friendly
graphical interface. The Palm API is a gateway to Google's large language models
and generative AI tools, facilitating time efficient
and accessible prototyping. This platform lets
you test out models rapidly and experiment
with different prompts. You can use it to create
and fine tune your prompts, add synthetic data to
your custom dataset, generate cutting
edge embeddings, and adjust your custom
models with ease. And if you come up with
something you are happy with, make your Suite offers you the capability to turn
it into Python code, making it possible to call
the model using the Palm API. Palm API and Maker Suite are the perfect duo for
generative AI development. Palm API is your starting
point to access Google's LLMs, giving developers
the freedom to use models optimized
for various tasks. On the other hand,
Maker Suite provides an intuitive interface to start prototyping and creating
your unique apps. Now, let's have an inside look into Maker Suite and see how we can start prototyping with large language models
in just minutes. So this is how inside
Maker Suite looks like. Let's check out this
menu on the left. Here, we can create new prompts. As we can see, there are
three different types of prompts that we can
create text prompts, data prompts, and chat prompts. We also have access
to our library, which is the current page
that's open right now. If you go to get APIKey, we can see that here, we have the option to create an API key for a new project. And there are also other
quick links available. There is a guide for
getting you started. There is a prompt gallery that helps you explore different
kinds of prompts. There is API documentation and some more information about privacy policy and
terms of service. Now let's get back to our library and try
different prompts. The first one is text
prompt. Let's try it. So here, there are interesting
things to explore. The first thing to notice
are these sample prompts. There are some examples
to help us get a better idea of how these
prompts could look like. Also, if you pay
attention to textbooks, we can see that there are some
examples provided for us. Let's read some of them. Categorize an apple as
fruit or vegetable. Write a JavaScript function
and explain it to me. Paraphrase, it looks
like it's about to rain and many
other more examples. It just shows what kind
of prompts you can use as examples
of a text prompt. Now, let's explore one of the samples that
are provided here. Let's check out
casual ponderings. So the prompt would be rewrite
this into a casual email, and then you provide
a text for an email. I can click Run and now I can see that
the language model created a response to my prompt. Let's explore the other kind
of prompt, data prompt. So here, we can see that there
are two different parts. The first one is a table for
writing our prompt examples. And the second part is to
help us test our prompt. So let's look at an example
and see how it would look. Let's try opposites. In the examples, we see
that we are providing four different examples
of what each of these inputs should
receive as an output. So if our prompt is find a word or phrase
with opposite meaning, then we can provide examples
like if the input is strong, the output should be weak. If the input is thick, the output should
be thin and so on. After providing these examples, we can test our prompt. Now, we ask the language model. If the input is wrong, what would be the output? And if the input is fast, what should the output be? And now if we run, we see that in
response to wrong, the language model
is creating right, and for the input fast, the language model creates slow. We can see that for every input, the language model creates
the opposite as the output. Let's explore the third type
of prompt, chat prompt. Here, we can also see
that there are two parts. There is a part for writing our prompt examples and another part for
testing our prompt. So let's look at some
of these samples. Let's try chat with an alien. So in the example, we provide some context. Be an alien that lives on one of Jupiter's moons and provide
an example conversation. If the user says,
how's it going, the model should say, I
am doing well and so on. If you want to add
more examples, we have the option down here. And now we can test our model. So in response, we say, I'd like to visit.
What should I do? But the model provides an answer which is relevant and is continuing
the conversation. We can keep interacting with the model by writing
more prompts. We also have some options
for tuning the model below. The first one is a text preview of the same
prompt we are working on. Whether it's a table
prompt or a chat prompt, we can always have access to the text version
of the same prompt. Through the other one, we
can fine tune our model. We can choose what kind
of model we want to use. We can set the temperature
that defines the level of randomness or creativity
of the model, and we can also customize the number of outputs the
model should produce. There are also some more
advanced settings available. So to recap, first, we select our prompt
type and enter a prompt, including any examples
and instructions. Whatever type of prompt you use, you always have the option
to see it in text form. If you need to test
the models output, make your suit makes it simple for you to
reuse prompts in different ways by using test
inputs in your prompts. We also have the flexibility to play around with the
model parameters. For instance, there's an option to tweak the
temperature setting, which influences the element of randomness in the
models responses. A higher value here
often leads to more unexpected or
even creative outputs. We can also make
additional adjustments to parameters such
as stop sequences, number of outputs, and so forth. And finally, after you are
happy with your prompt, you can save, share and even export it to different
developer environments. For saving your prompts, Maker Suite offers a
prompt library feature, acting as a secure storage
space for all your prompts, making them easily retrievable
for future references. You can also save your
prompts on your Google Drive. Sharing your prompt is as simple as clicking
the Share button. And if you're looking to export your work to a
developer environment, just hit the Get code button. You can export your prompts
in the format that suits you. Python or JavaScript code, JSON Objects, or even
as a CURL command. Your work in Maker Suite,
including the settings, instructions, and test examples are all stored in
this code snippet. So in conclusion,
the combination of Palm API and make
your Suite offers an incredibly convenient
and user friendly approach to prototyping with
large language models. They place the power of
generative AI in your hands, providing the flexibility
to experiment, tweak, and refine until you've crafted the perfect AI
driven application. See you in the next one.
19. L4V4 Generative AI Studio: As the excitement around
generative AI is growing, we can see that its
power to speed up the application prototyping
process is a game changer. If you have access to the right tools such as
the Generative AI Studio and other GNAI
capabilities that are available now through
Vertex AI on Google Cloud, you can experiment, adapt, and perfect new ideas in a Snap. And by Snap, I mean minutes or hours instead
of weeks and months. Building an app is
as easy as opening the generative AI studio in the Vertex AI section of
the Google Cloud Console. Selecting the modality
you want to work with, choosing your preferred
format and inputting your prompt and adjusting model parameters for
additional control. With Generative AI Studio, you get the chance to
explore and tailor generative AI models that perfectly fit into your
Google Cloud applications. You can even embed these applications to your
website or mobile app. In this video, we
are going to explore generative AI Studio
available on Vertex AI. But before that,
let's briefly see what other tools are
available on Vertex AI. So this is how inside
Vertex AI looks like. If we expand the
menu on the left, we can see all the tools
that are available to us. We can see that
we have access to model garden, workbench,
and pipelines. We also have
generative AI studio, which we will talk
about shortly. In addition to that, we have
tools for data management, model development, and
model deployment and use. Generative AI Studio helps
developers create and deploy models by providing tools and resources that make it
easy to get started. Generative AI Studio lets you quickly test and
customize a variety of Google's foundation models
through prompting and tuning and allows you to easily
deploy your tuned models. Inside Generative Va Studio, you can access
Google's language, vision, and speech
foundation models. The availability of
some modalities varies. For example, you can see that at the time of
recording this video, I do not have access
to vision models. So let's focus on
language and speech. Let's focus on language for now. You can either click the
language from the menu on the left or open button at the bottom of
the language box. If you want to get a
better idea of how you can use GN AI Studio
for different purposes, you should explore
the prompt gallery. So before exploring the
different types of prompts, let's have a look at
the prompt gallery. Here we can see a variety
of sample prompts that are predesigned to help
demonstrate model capabilities. The sample prompts are
categorized by task type, such as summarization,
classification, and extraction. Let's have a look at an example. When you open the sample prompt, you can see that the prompts are preconfigured with a specific
model and parameter values. So you can just click Submit and get the model
to generate a response. To directly work with
the language models, we have three options. Interact with the model in a free form or
structured prompting, interact with the
agent as a chatbot or create a tuned model that's better equipped
for our test cases. Let's explore the text or code prompt in a
free form format. So let's try design and
test your own prompts. Here, I can give the model a prompt and ask it to
produce a response. I just provided a long
article here and I'm asking the model to provide a brief summary for
the following article. For different kinds of prompts, I can also use my microphone and directly
speak to the model. On the right side, we can
also see that there are some settings that we can
use to configure the model. We can choose what type
of model we want to use. Here, we have two language
models and two de models. We can set the temperature
for the model, which controls the degree of
randomness or creativity. We can also set the token limit, which determines
the maximum amount of text output from one prompt. Top K changes how the model selects
tokens for the output. Top P changes how the model selects
tokens for the output, and we can also set different
safety filter thresholds. So now we can ask the model to produce a response
for our prompt. Let's click Submit. And we can see that
the model summarize the long article
into three lines. If you are doing a
few shot prompting, a structured prompt template
is available to make it easy by providing a form
for context and examples. For structured
prompts, let's get back to our wine
classification example. We can provide some
context to the model, which instructs how the
model should respond. We can also provide multiple
examples to the model. These examples help the model understand what an appropriate
model response looks like. We also have our settings
on the right side. We also have the option to add more columns for more
complex examples. And to test the model, we provide an input, whether by writing it in the input section or directly
talking to the model. And when I click Submit, the model generates
a response for me. We can easily convert any
structured prompt to free form. And that's how it looks like. You can choose to
initiate a text or code chat to start a
conversation with the model. You can provide context
and examples of interactions to further
direct the conversation. All the settings for model configuration are
available here as well. Now, let's try a chat prompt. In chat prompt, we have the option to provide some
context to the model, which instructs it on
how it should respond. We can also provide
examples to help the model understand what an appropriate response
would look like. For example, if the
user says this, the model should say this. We also have the option to provide more examples
to the model. After providing enough
context and examples, we can start chatting
with the agent. So if you ask how many planets are there in the solar system, the model provides an
appropriate response. Similarly, we can ask
other questions and the model keeps providing appropriate and
accurate responses, consistent with the
examples we provided. Now, let's see how we can create a tuned model using
our own database. We have the option
to tune a model so it's better equipped
for our use cases. Let's check it out. So here, we can choose our
JSON dataset and set a location to store
the dataset on the Cloud. After providing the dataset, we can tune the model details, and after that, we
can tune the model based on our dataset
and our settings. To decide which model
would be the best fit for our specific use cases, we can check out Google's
library of foundation models, which is available
in Model Garden. In Model Garden, you can
explore models by modality, task, and other features. With many different enterprise ready models at your disposal, Model Garden enables
you to select the most suitable model
depending on your use case, your expertise in
machine learning, and your available budget. Okay, time to check
out the speech models. Now it's time to explore
the speech models. We can choose the speech
either from the menu on the left or by clicking on the
open under the speech box. Here, we have two
different options, text to a speech
or speech to text. Let's go to text to speech. Here, we can either provide the text or directly
talk to the model. After providing the text, we have some options to choose different languages or set
the speed of the speech. If everything looks good, we can click Submit. And now we have a synthesized AI voice that
can read that text for us. Building an app is
as easy as opening the generative AI studio in the Vertex AI section of
the Google Cloud Console. Selecting the modality
you want to work with, choosing your preferred format. For more advanced features
like support for longer audio, we can use speech Studio and that's how the
environment looks like. We also have speech to text. Here, we can either upload an audio file or
record our own voice. And after providing the
speech to the model, we can see that it
turns it into text. So now I recorded my
voice and I click Submit And here's my
speech turned into text. We can also use the speeches Studio for speech to
text applications. Both features, speech
to text and text to speech are available
in the speech studio. After you've customized your model, you
have a few options. You can save the prompt
to the prompt gallery. You can also deploy to Vertex AI's Machine
learning platform for production and management. Or you can implement your newly tuned models directly into your website
and applications. In conclusion, through Vertex
AI's Generative AI studio, we can access language, vision, and speech models. Through language models,
we are able to test, tune, and deploy generative
AI language models. We can also access the palm or Cody API
for content generation, chat, summarization,
code, and more. With vision models,
we are able to write text prompts to generate new images using
the imagine API. We can also generate new
areas for an existing image. And with speech models, we can convert speech into
text using the chirp API. We can also synthesize
speech from text using Google's Universal
speech model or USM, and that concludes this video on generative Va studio
on Vertex AI.
20. Project Demo- App Sheet No-code App Builder: In our introduction
video on apsheet, we saw that through
apsheet we can create custom apps
without writing code. Recently, Google added
GNAI capabilities, which makes us able to directly explain the
type of app we need to apht it builds us a starter app based
on that explanation. We can then further
modify and customize the starter app only through
chatting with apheet. Let's consider the
following example. Anne Gray is a manager
at a company and one of her responsibilities is to oversee the travel
requests of her coworkers. These requests can
come from emails, chat or in meetings, which can get pretty
overwhelming. She wonders if the
generative AI feature in apsheet can help
her streamline operations by
facilitating a solution for approving and
tracking the request. To try it, she
decides to explore the apsheet chat app
available on Google Chat. Let's see how it works. In order to access
this chat feature, we go to chat.google.com. Then we select Explore apps
and find the apshet chat app. Now let's see how the process would look like from
Ann's perspective. On the first page, we can see that apsheet welcomes
the user and invites them to
submit a description of an app or business
problem they wish to solve. For example, by
describing a workflow. An does this by briefly describing what she
needs something to simplify the process of managers receiving and approving
travel requests. She adds to the description by noting the kinds
of data she will also need to keep track of
after entering the prompt, Ushid responds with a
general app schema. From Ann's first prompt, Upshd has recognized that the app should have
an approval flow and asks her to choose how the notifications for approval
request should be sent. As we can see, there
are different options available here selects
only email for now. Next, apshet suggests
a few screens she may want to include, a form for users to
submit new requests, a travel summary list, upcoming travel, and
a few other views. These screens are basically the backbone of
Anne's app schema. It describes what her
app is all about. She doesn't want a M
Travel screen in her app, so she diselects it to remove it from the app
and then clicks App. Now that Apsheet
knows what to put together for Ann's
Travel request app, it confirms the tables that could be created in
Apsheet database. These datasets are
created based on the screens or app views
that she just selected. Upsheet creates two tables to support Schema, travel and team. We haven't entered any
data in our app yet, so all these tables
will be empty. If we don't have any
data in the app, then how can we test to
see if everything works? Upset thought of that too. After creating the tables, Upseet provides the option to include sample data in the app. Anne is ready to test the
app, so she picks yes. And finally, apseet prompts Anne to choose a
name for her app. A calls this symbol
travel, and that's it. Apshets next response
is a link to a fully functional preview of the app that
was just created. Let's take a break and look back at what
we've done so far. Through just a few question
and answer exchanges, apseet was able to
take Ann's request, which was written in
natural language, and I recommended
several solutions, including the screens that her app users will need to see. The things they'll need to do and the place for the
data to be stored. It's even set up the email
notification to users. Creating an app through
natural language with no coding is a magic that is
now a reality in app sheeet. It enables many new
users to develop applications rapidly
and efficiently. Moving on, Anne is presented with the
option to either preview the app that has
been created for her or dive into the Ashoot
editor for customization. She chooses to take a quick
look at the preview first. As she navigates through the
app emulator on her desktop, she explores the views
that apse generated, starting from new travel, proceeding to travel by user, and finally upcoming travel. This last view displays
both a map and a list of future trips all filled with the sample data that she
decided to include previously. Everything seems to
be in order so far. But Ann notices that a view she had in mind
is missing from the app. She has a particular
addition in mind, a screen that compiles all the travel requests into
a comprehensive dashboard, providing the finance team with an answer to a question
they frequently ask. What is the total cost for
each employee's travel? In the editor, and notices the generative AI feature she used before is
available here too. She types in her request
for a new dashboard, and in no time Abscht
takes her request, dissects it, and suggests the necessary components
for this new edition. It proposes a new calculated
column for her team table and gives a preview of the chart that will represent
the aggregated data, just like she did
before and wants to scrutinize every part of
the suggested changes. So she checks out the preview
chart. And it looks fine. Then she inspects the new column in the database to ensure
all is well there. She uses the link
provided to see the proposed change in the approval table in
the asheet database. With a quick look at the
numbers and confirms that the new view and the data changes align with
her expectations. She approves the changes in the apsheet editor,
and that's it. Her app is now live and ready to use and feels that she
has got what she needs. Her confidence in the tables and columns apseet has
created for her is high. Since she is happy with the
functionality of her app, she gets rid of the sample data, deploys the app, and
shares it with her team. Now her team can see
this refined version of the app and start submitting
their travel requests. Fast forward a few weeks
while Anne is going through her company's intranet for a specific form, an
idea strikes her. Anne goes back to the editor. Knowing how often her
team uses Google Chat, she considers veraging Apsheets
no code chat app feature. This would allow
her team to fetch the required form simply by
chatting with symbol travel. Anne goes back to the
editor and she enables symbol travel as a chat app for her domains internal spaces. This step makes it feasible
for Anne's colleagues to add symbol travel to
their Google chat spaces, group chats, or even in
private conversations. Now it's time to go
over the settings. By default, symbol
Travel chat app would display a list of all accessible
app views to the users. But Anne is building this chat version specifically
for end users. The employees that
primarily want to use the app to
submit travel requests. She chooses only the necessary
app views for her users, which means she has to delete everything except request forms. Next, and adds a welcome
message for her users, providing some context on how to interact
with the chat up. She decides to include
a slash Command. By adding this command, whenever a user types slash NETRIP the chat app promptly brings up the
travel request form. Aphit also provides a
smart search command. This command would enable
her teammates to use Aphts natural language
processing pipeline to search her app
for data or views. But she decides to keep things simple and disables the
smart search command. Her last task involves
setting up an automation to notify users whenever their travel approval status changes. In this page, can create the right flow working with
a graphical interface. This way, she can build the groundwork for the
needed automation. Once it's done, she
names her automation, tweaks a few details
about when it should run and how responses should be threaded and returns to the chat app builder
to wrap things up. Thanks to Apshts no code
chat app deployment, and doesn't need to deal with
any further configuration for her app or its
automation to work in chat. Upshet takes care of all the Google Cloud
platform configuration behind the scenes all
with a single click. Now and is ready
to share her chat app with the team.
And there it goes. The chat app is now
live and is ready to be installed and used by
her entire organization. Now let's say Jeffrey Clark, a member of Ann's team
decides to use the app. Jeffrey needs approval for his travel plans to visit
the customer's premises. He has already installed
the symbol Travel chat app, so he writes the slash
New Trip Command to bring up the
travel request form. Jeffrey inputs all the
necessary details about his upcoming trip into
the form and hit submit. From Anne's end, she can see Jeffrey's request showing
up almost instantaneously. The new approval
request triggers an email notification to
Marcus Jeffrey's manager. Marcus receives an email detailing Jeffrey's
travel request. After examining the
details of the submission, Marcus goes ahead and approves the form directly
from his Gmail. In a matter of seconds, Jeffrey notices a chat
notification from symbol travel. What's the message is travel
approval confirmation. Congrats Jeffrey
and safe travels. So wrapping up, we witnessed the power of app Sheets
generative AI feature. It helped Ann to build and customize a solution
for managing her team's travel request using natural
language and no code. An efficiently solved
a business challenge, creating a travel
request app fine tuned to her team's needs. The seamless integration with Google Chat and the smooth
operation as shown in Jeffrey's travel request and Marcus's prompt
approval underlines the platform's accessibility
and efficiency. This is the power of
no code development. APshetsGenerative AI is revolutionizing
no code development, making it accessible,
efficient, and intuitive.