Transcripts
1. S01L01 Welcome: Hi, and welcome to this
training about generative AI. My name is Ian and I
will be your teacher. I'm super excited that
you are willing to explore with me
the evolving topic of artificial intelligence and specifically the exploding
subtopic of generative AI. Generative AI is a disruptive technology
that is reshaping the landscape of many industries while changing how individuals, teams, and companies
perform a variety of tasks. It is being rapidly adopted
as a set of tools, services, and components that can help to enhance efficiency,
creativity and innovation. The impact is huge, and we are just at
the early stage. This training is designed for anyone who would
like to better understand the key principles of this technology from a
theoretical perspective. No previous knowledge is needed. We are planning to explore the main pieces
of the AI puzzle, the key terms of
machine learning, and deep planning, and then
zoom in on generative AI. Reveal the secrets of this
technology step by step. My main target is to spark your imagination and willingness to explore those
interesting topics. I will do my best to keep it
simple, fun and interesting. Thanks for watching and I
hope to see you inside.
2. S02L01 Introduction: Hi and welcome. Thanks for joining this training
about generative AI. I'm excited to start
this journey with you. Generative AI is a super
interesting topic, gaining momentum everywhere. As the name suggests, it's part of a larger topic called AI, artificial
intelligence. In this introduction section, I plan to identify and
map the main pieces of the AI puzzle so we can see the big picture before
drilling down to other topics. We'll start by better
defining the concept of AI. What is AI? How did we get here and where are we
going moving forward? Those are interesting questions. Then we will add the flavor of machine
learning and deep learning to AI and discuss what is the big deal
about those technologies. The last step will be to define the next AI evolution wave with the introduction
of generative AI. It's going to be an
interesting and fun story. I will keep it a
high level so we can better understand the big
pieces of the AI puzzle. That's our starting point, seeing the next lecture.
3. S02L02 AI: Hi and welcome. Let's start
with the most basic question. What is AI, artificial
intelligence? Think about it for
a second or two. The answer is not
straightforward, and if we ask five
different people, we may get five
different answers. AI five years ago was
not the AI of today, and it's not going to be the
same five years from now. Things are constantly
changing in the AI landscape. In addition, AI is a
general purpose technology. It means that AI is
useful for many things. It can be used to optimize the process of discovering
a new medicine. Teach a robot to play a game, make virtual conversation,
brainstorm IDs, write content like a blog, generate new pictures,
identify objects in pictures, predict a stock price, enhance military equipment,
and the list is long. It's a general purpose
technology like electricity that powers
endless type of machines. In that case, we need a more high level definition that can survive
a longer period. Let's try to define AI. Do you know what is the most sophisticated machine on planet? I guess you know. It's you. I'm talking about your brain. The human brain is a
complex machine that can digest data from
multiple data sources, store and then retrieve it
while making fast decisions. Your brain can learn, adapt, and create new things. It is an amazing complex
organic machine, highly efficient and very fast. And to be completely
humble with mother nature, humans are still
trying to figure out how the brain works.
It's still a mystery. AI has always been compared
to the human brain. The human brain is
still considered to be a good benchmark for intelligence until some more smart alien
race will take over, hopefully not very soon. In that case, it
makes sense to try building machines that can somehow mimic human
intelligence. Your mind can scan a picture in a few seconds and quickly identify objects
in that picture. It's a complex cognitive task. Trying to mimic such
complex cognitive functions as identify an
object in a picture, recognize a human voice, understand the
meaning of text and many other very complex tasks, it is commonly described as
artificial intelligence. Those tasks are complex. That's the definition of
artificial intelligence. The practice of
getting machines to mimic human intelligence to
perform different tasks. And if I will summarize
that in a simple sentence, AI is the human desire to
create a digital brain, the simulation of
human intelligence in machines that
can think, learn, and perform tasks
almost like humans, and in some cases maybe
better than humans. Are we there yet?
Meaning, do we have machines that can think
and learn like humans? No. Are we making progress
in that direction? Yes, we are. AI is
already embedded in our daily lives as companies are using AI for
many products and services. AI is the technology
layer for automation while handling a
growing number of tasks that were previously
performed only by humans. Are we going to reach
the scary breaking point that AI will be better
than a human brain? Maybe I don't know. There are many tasks that AI is already better
than the average person. For example, it will be
impossible for me to learn 20 different languages
in a short period of time. Or to summarize a complete
book in a few seconds, something that AI can do. However, my set of generic capabilities
to handle a variety of very different tasks is still very hard to achieve
using a more generic AI. I can drive my car in a
variety of road conditions, I cook different types of meals, clean my house,
play with my kids, and I have three
kids, scuba dive with friends, and
many more things. I can easily learn new
task and adapt as needed. Can we reach that
point, meaning, develop a generic AI machine that can do multiple
tasks like a human being? Maybe it's hard to predict. I assume that as part of
the ongoing progress, we will be able to
see AI solutions that can handle a group of tasks that are part of the
same domain or expertise. Think about the analogy of you standing 100 meters
from your friend. You are allowed to walk half the remaining
distance on every step. Okay? So step number one, you will go 50 meters. Step number two, 25 meters, number three, 12.5
meters, and so on. Are you going to
reach your friends? Well, no. You will get very close but not be able
to reach your friend. Maybe AI is like an unreachable frontier
that is hard to reach. At this point, no one knows, but there is a
constant progress in the industry. The race is on. Until a few years ago, AI was not able to digest a complex languages and
generate new content. It was part of a good
science fiction movie. All that changed in 2022 when CHGPT was introduced by Open
AI and change everything. We'll talk about it
in the next sections. Another important
question, is that good or bad? What do you think? Feel free to share your thoughts
in the course comments. Some people will say that AI is a very
dangerous technology, and they are right. Like any technology, it
will be used for bettings like cracking into
our bank accounts while faking our identity. On the other end, it can be
used for great goals like speeding up the development of a new medicine and save lives. It will be utilized
in different ways. One thing to consider is that automation is a
core use case of AI. It's a general
purpose technology that can power many
different use cases, and therefore, it will have a dramatic impact
on many industries. A going number of
tasks and processes that were performed by humans
are going to be automated. We're in a major
evolution period, and it's hard to
predict what will happen five or ten
years from now. I can assume that many jobs will disappear or the demand
for them will be reduced, and many new jobs
will be created. That's part of the game, and
we should be able to adapt. All right, I think we
have a more solid, high level definition of AI. In our next lecture, let's zoom in and talk about the next piece of the AI puzzle, which is machine learning. Thanks for watching
and see you next.
4. S02L03 ML: Hi and welcome back. In
the previous lecture, we talked about
AI and the desire of humans to create
machines that are smarter, better, more powerful
using the benchmark of the human brain, creating
artificial intelligence. That's a perfect scary story for a good science
fiction movie. Right? And you know what, humans have been doing
that from day one. Every breakthrough
over the years in computer hardware
to make them smaller, faster and cheaper is another
step in that direction. Computers can run
more efficiently, digest more data,
store more data, and process data faster. But it is not just hardware. It's also the evolving progress
in software engineering. Think about a group of
NASA developers that created a complex
software program that can run a spaceship. Calculate the precise
location in space, traveling at an amazing speed, automatically perform
the required adjustments to the engine and other system, and then land it
on another planet. I'm still amazed by
those space projects. The level of
automation is amazing. As you can imagine, it is highly complex set of tasks that can be
automated by software. Those developers need to
put all the knowledge and all the rules inside the spaceship program to
handle different situation. So if a software program
is highly sophisticated, can do many things, can
manage a spaceship, can monitor millions of sensors, is that good enough to call it AI, artificial intelligence? Maybe it's a subjective
very broad definition. However, any
sophisticated software created before the AI wave
was missing something. It was missing the basic
capability to learn and improve, which is a foundation
of human intelligence. We can try things and
learn from our experience. Our brain is
constantly changing. Our knowledge is not
fixed, it is evolving. So to machine learning. Machine learning is
a subfield of AI. It is the missing component that helped AI to improve
almost exponentially, making significant
progress while trying to match
human intelligence. The main concept of
machine learning is to provide machines
with the ability to learn things without
specifically being programmed
about those things. As you can imagine, it was a major mind shift in
software programming. Instead of building
software that is preprogrammed with a huge
number of tools and knowledge, let's create a system
that can digest and learn pattern from the data and make decisions based
on those patterns. Machine learning is based on many scary things
like algorithms, statistical analysis,
training models and consuming huge amounts of data using complex computing
infrastructure. It is a combination of multiple components working
together to digest data, learn patterns, get better, and eventually do
something useful. Later as part of the training, we will open the
machine learning box and explore the main
component running inside. As a quick summary, AI is the
umbrella term for trying to create or mimic
human intelligence for performing a
variety of tasks. Machine learning
is a subfield of AI that brings the ability
to learn patterns from data. Machine learning is a
very broad field with many different methods that are used to handle
different scenarios. Some of those methods are highly focused to
solve specific tasks like predicting the
future price of real estate property or to classify an
object in a picture. As part of the evolution
of machine learning, some of those machine
learning methods are using highly
sophisticated algorithms, Sill to deep learning, that's the next piece
of the AI puzzle, see in the next lecture.
5. S02L04 DL: Welcome back. We
just talked about machine learning as
the subfield of AI. Machine learning is also a broad branch of
different methods. Those methods developed over the years by a
variety of scientists and engineers helping to push the boundaries
of machine learning. A key dimension of
machine learning is related to the complexity of
patterns inside the data. The complexity of
patterns related to different tasks is sitting on a scale some tasks are simple, and some of them
are very complex. For example, predicting
the price of a real estate apartment
based on historical data and using around 100
different parameters like the apartment size, number of rooms,
location, and so on, does not require learning complex patterns using
machine learning. We can use simple algorithms
to learn those patterns. On the other end, teaching a
machine learning solution, the human language will require
more complex algorithms, more time to train a system, and the ability to process and store very complex patterns. That's the job of deep learning. Deep learning is a subset
of machine learning. It is called deep learning because it can be used to learn very complex and deep patterns from a large amount of data. It uses something that is called artificial neural networks that are inspired by the human brain. Information in our
brain is processed by a complex network of interconnected nodes that are
placed in different layers. The same concept is
used in deep learning, meaning the ability to create many layers in a
complex network. The depth is directly related
to the number of layers. More layers means
that it can digest more complex data and identify
more complex patterns. As part of the training, we'll talk about the structure of artificial neural networks at a high level to better
understand this approach. Let's review the
AI puzzle so far. AI is the umbrella
term for all methods and technologies that
enable machines to mimic, simulate or replicate
human intelligence. Machine learning is
a subfield of AI, focusing on different
algorithms that enable machines to learn
patterns from data. Deep learning is a sub
field of machine learning, adding the ability to
handle large amounts of data and learn more
complex deep patterns. Now we are ready to talk about another piece of the AI puzzle, meaning generative
AI. See you next.
6. S02L05 Gen AI: Hi, and welcome. What
is generative AI? Finally, we reached the main
topic of this training. I will use some perspective
based on my experience. Between 2000 2021, I created
a training course about machine learning and it was a great opportunity to explore
the main concept of AI. Machine learning was evolving in multiple directions with many interesting
market use cases. During that time, generative
AI was not a major topic. The famous Chachi PT tool that kicked off
this new branch of AI was only introduced
in November 2022. Now, while I'm sharing
that with you? Well, it was an amazing
major pivot point in AI, taking us in a new direction. Before that time, most of the market solutions
and use cases of AI were highly focused
on specific tasks, one task per one AI solution. On the other end, think
about the human brain. AI is constantly compared
to the human brain. We can digest information using our eyes while scanning light. We can hear things by
analyzing sound wave, smell things or taste food. The input is highly complex and our brain can
handle many different tasks. Another dimension that humans developed over the years is
the ability to communicate. The text we read or hear
can describe many things from simple questions to highly complex
description of processes, solutions, knowledge,
insight, and more. We constantly exchange
information using our language. Generative AI added the
important capability to analyze text as a language, and that's a major shift
in the AI industry, breaking the
communication barrier between humans and machines. Now, machines can
digest, analyze, and understand complex text as input like direct questions, requirements,
instruction, and more. Another super interesting
things that NAI added is the amazing capability
to create new content. It's called generative because it can be used to
generate new content. New content like text, an image, a video, a sound wave, and more. All of that is based on
patterns learned from data. It's taking us a
step further who are trying to mimic human
creativity and intelligence. As you can imagine, it opens a new frontier of a growing number of
business use cases and a huge opportunity
for companies and also for individuals to harness
this evolving technology. One of the highest potentials
and low hanging foods of generative VI is the ability to uplift and boost
productivity. Generative VI can be
used to automate, augment and accelerate
work in many directions. We'll talk about that
during the training. Suddenly, creativity, which is a fundamental human
intelligence is also possible with machine
learning using generative AI. The
sky is the limit. Like AI, generative AI is also a general purpose technology that can be used across
many different domains. We are still at the early stage of the evolution of
this technology. It has the potential to impact a wide range of industries
and application. All right, that's a quick
overview of generative AI. Let's summarize everything
in the next lecture.
7. S02L06 Summary: Welcome back. This section was a high level introduction
to the AI landscape. We talked about the
definition of each term and how those pieces are
aligned with each other, starting with the
umbrella term AI, moving to machine learning, deep learning, and
finally, generative AI. We started by defining AI artificial intelligence as the human desire to
create a digital brain, a brain that can mimic
human intelligence. So machines can perform more
and more complex tasks. AI is a general
purpose technology that can be used
almost anywhere. Machine learning algorithms
were able to boost AI into new frontiers by adding the capabilities to learn
patterns from data. Machine learning
is a collection of different algorithms
and methods. One of them is deep learning. Deep learning managed to take us much deeper into the ocean
to explore new things, handle more complex
patterns, and as a result, improve the ability of machine learning solutions to
handle more complex tasks. Then in 2022,
generative AI added the important capability to
analyze text as a language, breaking the communication
barrier between humans and machines and providing
the capability to generate new content. That's our story so far, and we just started. In the next section, we will start to overview the main building blocks in market terms that are
related to machine learning. Machine learning is the
foundation of generative AI, and we must create some
basic understanding around a couple of key terms. You are welcome to test your knowledge and
understanding by answering the quiz at the end of this section just
after this lecture. Thanks again for watching, and I hope to see you
in the next section. H
8. S03L01 Introduction: Hi, and welcome back. I would like to share with
you something. I like to watch science
fiction movies. I mean high quality science
fiction movies that explore new interesting ideas
and topics that are beyond the reach of human
knowledge and capabilities. One of them is related to the interesting balance point between humans and machines. Humans have been using machines
for thousands of years, making them better,
faster and smarter. We can't even imagine our lives without using machines as
they are embedded everywhere. Assuming this
constant improvement will continue in the future, the balance point
between humans and machines will eventually
start to favor machines. Machines will be
more powerful to perform tasks that require
human intelligence. That's an interesting idea
explored by many movies. Now, which technology is the foundation to
make those machines mimic human intelligence? Machine learning.
That's the scut engine inside any AI based solution. For people who just started to explore the
concept of machine learning, it will be like climbing a very high mountain without
a map and without equipment. It's hard to figure out
where and how to start. It is an intimidating topic. And we should do something
about it, right? That's the objective
of this section. It is called soft introduction
to machine learning. We are going to talk about
the main building blocks, technologies, and market terms related to machine learning. It is an important step before moving to the main topic,
meaning generative AI. Overall, just to set
the expectation, machine learning is a
very complex topic, and I'm putting a lot of emphasise on simplifying
some of the terms. Maybe you already have
some background and knowledge about some of those
topics, which is great. I still suggest reviewing
the complete section. It will help to establish a unified background around
machine learning terminology. All right, let's slowly open the magical machine
learning box step by step. See you next. M
9. S03L02 The ML Box: At the most basic level, we can take any machine
learning solution out there and simplify
its main function using the analogy of a box. This simplified
illustrations enable us to slowly dive
into this topic. This machine learning box will have two sides,
input and output. Input is a collection of
data to be analyzed by the machine learning box to create the output
on the other side. The box is closed
because at this point, we don't care what's
going inside. It is doing something,
hopefully something useful. Now, what kind of data the ML box should get and
what will be the output? Well, it completely depends
on the task or use case. Let's take a few examples. Our first ML box, number one, is used to classify if a product is defective or not during the
production process. At some point as part
of the production line, a camera takes a picture of the product and fits that picture into our
machine learning box. The box will process
that picture. Identify all kind of
patterns related to the product, product
size, shape, all kinds of indicators, and the output will be a perfect product or
a defective product. It is a typical
classification job of a machine learning solution. Classify if the input
data is X or Y next, machine learning box number
two is used to classify if a product review provided by a customer using a website
is positive or negative. This time, the input for
our machine learning box is a text with sentences
written by the customer, while he or she are
writing the review, and the output of the ML box will be
positive or negative. Again, this is a
classification exercise. Another machine learning
box, number three, is used to generate an animated
video based on a story. The input this time
is a text that describes a scenario or a story line and the
output is a video, like an MP four
format, video file. As you can imagine, there are many different machine
learning boxes that are used to handle
a variety of tasks. Those are just a
couple of examples. Now, these illustrations using the L box is not just
useful for this training. In many cases, an ML solution is embedded as a small component in a much larger
software application. It is part of a process. Data is flowing in from another component
and the output of the machine learning
component is going to another component as an
input, like a chain. Another very popular
example is related to consuming machine learning as a service from another company. Instead of building and maintaining the
machine learning box, I'm paying another company
to provide me with the options to feed an
input and get an output. In the software space or in
the software terminology, it is called APIs. There are a growing number
of companies that offer a variety of APIs to consume different machine
learning services. Someone else is building the machine learning
box and then provide pipes to
interact with that box. What kind of tasks machine
learning boxes can perform? That's the topic of the next
lecture. See you again.
10. S03L03 Typical ML Tasks: Hi, and welcome. We have a simple illustration of a
machine learning solution. Using a box, the box has input and output,
nothing scary, right? I also provided a couple of examples using those
machine learning boxes. Now, I would like to make it a little bit more generic and talk about the typical tasks that machine learning
solutions are being used. We can divide those
typical tasks into the following
four categories. Prediction, classification, clustering and
content generation. Let's review them one by one. Starting with prediction,
predicting something is a key and practical use
case of machine learning. It refers to the process where a machine learning
solution is trained to predict the value of a target variable
based on input data. Let's mention a
couple of examples. Predicting the future
prices of stocks based on historical prices or economic
indicators or recent news, estimating future sales of a product by analyzing
previous sales data, marketing efforts
and seasonal trends. Using historical
weather data to predict future conditions like
temperature or wind speed, predicting which customers
are likely to leave a service based on their past
behavior and interactions. Those are examples
related to predictions. The next category of typical tasks while using machine learning
is classification. Classification is a
fundamental use case of machine learning
where the goal is to assign input data into predefined
categories or classes. A simple example
is spam detection. Classifying emails as
either spam or not spam, based on features like
the content of the email, sender information
and subject line, all kinds of pieces
related to the email. This type is called binary classification as there are only two classes,
spam not spam. It can be used for
sentiment analysis, analyzing texts like
customer reviews or social media posts to classify if the sentiment
is positive, negative. Another popular example
is image classification, categorizing image based on
objects inside the picture, such as is that a dog, cat, house, a car. As we have more
than two classes, it is called multiclass
classifications. Moving next to clustering. Clustering is a bit
different approach compared to prediction
and classification. The task is to discover hidden structures or
patterns within the data. The output is not a
predefined number like in prediction or some category label,
like in classification. We don't know what is
the expected output. The goal is to group data
points together into clusters based on internal
characteristics or patterns. A cluster is a group of
similar data points that are closer to each other compared to points in other clusters. Let's mention a couple of examples how clustering
is being used. For example, in e commerce, clustering can be used to group similar products based
on user preferences, allowing for personalized
product recommendation. Or grouping customers based on their behavior, preferences, or maybe demographics to target marketing efforts
more effectively. In social network clustering
can reveal a group of user who interact frequently or share
common interests. Recommendation system can use clustering for
knowledge clustering, like grouping
documents, articles or stories that cover
similar topics. And the final most
recent category of machine learning task
is content generation. Generation in machine learning
refers to the creation of new content that did not exist before based on patterns
learned from data, producing outputs such as text, images, music, code, and more. We'll see many examples
during the training. Those are the four categories of typical tasks performed by
machine learning boxes, prediction, classification, clustering, and
content generation. I would like to emphasize that those categories are not
competing with each other. Each category has a
tremendous unique value in a variety of use cases. All right, we have an ML
box with input and output, and we now understand what
kind of task such box can do. It is time to talk about how those boxes are trained
to perform their job. That's the topic in
the next lecture.
11. S03L04 Training Phase: Hi and welcome back. Any successful Olympic
player has thousands of hard and sweaty training hours
before reaching the level of expertise to compete with the top talented
people worldwide. We can summarize that
with a sentence, no pain, no gain. Without a big surprise, an ML box is like
an Olympic player. It must be trained. Before using an ML box to
perform something useful, we need to make sure that it has the required knowledge
to digest the data and identify the
relevant patterns for our specific use case. This phase is called
the training phase, and it is a fundamental
concept in machine learning. The training phase is used to train the machine learning box as much as possible with
enough amount of data, so it will be able to
accurately predict, classify, cluster, or generate
content based on the input. And like a good Olympic player, this training process can take substantial
time and resources. The knowledge inside the
machine learning box created during the
training phase is called a trained model. A model is also a fundamental term
that we will use a lot when talking
about machine learning. That's the final output
of the training phase. Then the machine
learning box can use this trained model to do
to do something useful. This training phase is probably the most challenging step when building a
machine learning box. We need to collect
enough amount of data. In some cases, we need
to prepare the data like cleaning arrows before
filling it into the system. In other cases, we
need to manually decide which data elements
are more relevant. Then we need to select the most suitable algorithms that will be used to
analyze the data. There is a long list of very scary mat algorithms
that can be used. The selection of the most relevant algorithm will be based on the data characteristics to process and the
required job to handle. For example, to
perform the job of classifying an
object in a picture, the best matching algorithm is the planning with
a neural network. It is also important to measure somehow the level of accuracy of the trained model using performance metrics to make sure we're not getting
stupid results. It is usually an
iterative process, meaning we adjust something, re train a new model, measure performance,
and do it all over again until reaching the
required performance benchmark. As you may guess,
it is a complex, sensitive process performed by a skilled team of AI engineers. Those AI teams are
using different tools, frameworks, and
computing resources to build and fine tune
those trained models. It can take hours, days, weeks, and even months. It's all about the
complexity of the model. The final model
will be copied as a snapshot into the machine learning box
running in production. This step is called inference. It is like taking our Olympic players and
letting them compete. That's the knowledge used by the machine learning box
to digest the input, put some magic, and
generate the output. Now, let me ask you something. Do you think that after
the Olympic competition, our Olympic players will stop their training?
Of course not. That's also true for a trained model running in production. It must be re trained at some repeated intervals
to make it more optimized to recent
data and recent events. It is like a cloud cycle. All right, we cover
a couple of terms. The training phase, which is generating a trained
model using algorithms. The model must be
validated using performance metrics to make sure it is operating as expected. Finally, the trained
model is used by the machine learning box in production to do
something useful. But you may ask yourself, what is this trained model created during the
training phase? How the training model
can be illustrated? Let's talk about it in the
next lecture, see you next.
12. S03L05 Y=F(X): I will start this lecture
by saying the obvious. Oh, my God. Most of us are running away from any formula like birds flying away from a
fire in a forest. That's fine and understandable. Math is something
we have learned a long time ago and we are
trying to forget it. However, a little bit
of very simple math can sometimes be useful to organize and simplify
complex things. I know it may sound strange, but stay with me for
a couple of minutes. You remember that we
managed to squeeze any machine learning
solution in a box, which is great because
at this point of time, I just want to know how to
interact with that box. I have two pipes,
input to insert something and output
to get something. An ML box with an input and output is
basically some kind of data transformation that can be presented as a generic
simple math formula. X is the input data. It can be text and image, audio, numbers, et cetera. Why is the output generated by the
machine learning model? Like the input X, the output can be also
different types of content. F is the machine
learning trained model. It is the formula or
function used to take the input data X and map it into the output of
the model, which is Y. It is a data
transformation process. This simple formula shows that any machine learning
model is basically a mathematical transformation
function between input and output discovered
by the algorithm during the training process, taking input X and applying some function F to
produce output Y. If I didn't scare you
yet, that's great. Let's review an example. If our ML box is about predicting the price of
a real estate apartment, most probably the
formula will look like something similar to
this simple structure. This type of function is
gated by an algorithm called linear regression
in that specific use case, and the function is
a linear function. By the way, I invented this formula and those
specific numbers, it is just for demonstration. But what is the meaning
of those x1x2, et cetera? X one can be the number of
rooms in the apartment. X two, the square size, X three, distance from nearby school, X four, distance for a
nearby hospital, et cetera. Those are the list
of features that are provided as input X. The absolute numbers,
meaning 0.1, three, 0.4, are the model weights, also called parameters that were estimated by the algorithm
during the training phase. Doing training, the model was
exposed to many examples of apartments and learned what are the parameters and their
impact on the price. For example, we can see that the number of rooms,
meaning X one, has a dramatic impact
on the price of the real estate apartment compared to the
distance from school. It has a bigger weight. So using this formula, which represent the model, I can take any combination of input parameters and
predict the output Y, which is the price of
a specific apartment. That's a simple example
of machine learning, even if it's not a
complex use case. Let's take a look at a
more complex example, like an ML box that can be used to identify an
object in a picture. It's a common to use
deep learning to train a model that will be based on a few layers
of neural networks. How can I use a simple math to present
a deep learning model? Well, very simple. The following MT formula represent a four
layer neural network. F one is the first layer. It is getting X as input, like the raw image
file to analyze. The output of F one is propagated as input to F two,
which is the next layer, and so on until reaching
the last layer, F four, and then we get the
final output Y, which can be the label of the identifier object
in that picture. As you can see, any
trained model is a mathematical
transformation taking input and mapping it
into some output. As part of the training phase, the job of the algorithm is to consume a lot of
data and use it to generate this mapping function F to be used as the
final trained model. It is an optimization
exercise optimizing the F mapping function step by step while using
the training data. The training phase
is all about data, and it's important to understand what kind of data types are available and how to handle them with the right
machine learning box. That's our next
topic. See you again.
13. S03L06 Data Types: In previous lectures, I mentioned that the trained
model in machine learning is created by an algorithm or a group of algorithms
using data as input. The final ML box is using the trained model
for getting data as input and generating
data as output. It's all about data. I think it's important to
understand the concept of data, what kind of data those machine learning
boxes can handle. There are three
main types of data. The first one is
called structure data. Structure data is a data
that has a defined format, and it is highly organized and arranged in a
predefined structure. For example, a list
of customers on a website will be handled and organized
in a structured format, including all kind of information about the
customer, like name, age, address, phone number, email, identification
number, and more. We can present that list of customers as a
simple tabula view, like a spreadsheet or
inside a database table. Each row represents
a single customer, and each column is a specific piece of information
about that customer, like the email on
name and so on. Structure data is
organized and consistent, making it easily searchable and accessible by
humans and computers. It's easy to open a spreadsheet and quickly find the
relevant customer, and then the relevant piece of information about that
customer. It's very organized. On the other end, structured
data has a low flexibility. For example, if I want
to add another piece of data to the list of customer, another attribute that
described a customer, it will require
substantial work to make that adaptation
across different models. On the other side
of the spectrum, we have unstructured data. Unstructured data is
information that does not have a predefined data structure or is not organized in
a predefined manner. It's the opposite
of structured data. Let's take about a few examples related to unstructured data. Emails, which is text data is an example of
unstructured data, social media post,
a war document, PDF, books, articles, blogs. It can be also multimedia data like images, video, audio files. Unstructured data is considered more complex and more
challenging to process. Think about analyzing
a text of an email. In that case, finding
patterns in unstructured data required to use more advanced
machine learning methods, like using deep learning, we'll see it during
the training. And the last one
is sitting between them, semi structured data. Semi structured data is
a hybrid type of data located between structured
and unstructured data. It has a partial structure. An email is a simple example
of semi structured data. It has a combination of
structure and unstructured data. The structured parts includes the fields in the
email like the sender, the date, and subject
of the email. On the other end, the content of the email can contain
unstructured data, like free text or attachments. Another example is about log
files created by system when they measure things like
sensor measuring temperature. Or error events when something is not
working in a system. Such log files may have
semi structured data where certain information is organized in a
consistent format. Like every event will
have a timestamp, severity level, event ID. But the content of the log
message can be free text, which is unstructured data. Perfect. We talked about the
three main types of data. Those data types can be the input or output of
a machine learning box. Later in this training, we will see how
different methods are used to handle
specific data types. Let's zoom in a little
bit more and talk about the concept of features
in machine learning.
14. S03L07 Features: Welcome back. We just talked about the
main types of data, structured, unstructured
and semi structured. I also mentioned
that the type of data we would like to feed
a machine learning box has a direct influence on the methods that will be used to train the model inside that box. When feeding data into
a machine learning box, the data will be divided
into more digestible pieces. Those pieces are
called features. Features are the input
variables used to train the model and later on also make prediction
classification, clustering or
content generation. It is like taking the data
stream and slicing that to more meaningful pieces.
So what is a feature? A feature is an individual
measurable properties or characteristic of a
data being analyzed. A feature can be related to unstructured data
or structured data. Let's talk about a few examples. Numerical value is an example of structured data like at the age of a person.
It's a feature. Categorical data, like
the agenda or color, date and time, text, like quotes, phrases, or topics. Again, this is example of features that are
unstructured data. Images, meaning a pixel value, textures or patterns
extracted from the image. Audio, data like the spectrum, pitch, and other audio
characteristics. Let's assume that our machine
learning box is about predicting the likelihood or the risk of developing
a specific disease. In that case, the
features could be, for example, the
age of the person, which is a numerical,
geo location, which is categorical
feature, gender, again, categorical
feature, blood pressure, numerical, smoking
status is a smoker, non smoker, health condition, and those are just examples. These features provide
the model with the necessary input to learn and make predictions about
the target variable, such as whether a person is likely to develop a
specific disease. Features are crucial for the
model's learning process, as they represent the input data used to predict the
target variable. The quality and relevance of features can
significantly impact the performance of a
machine learning model. For example, if I
drop the age value as an input feature for
our machine learning box, it may not be able to accurately predict if that person will
develop a specific disease. In some cases, the engineers
responsible for training a machine learning model
may decide to remove, transform code or combine
different features. It's called features
Engineering. Feature Engineering is
the process of creating new features or
transforming existing ones, which is a crucial step in many machine
learning projects. For example, taking
the age feature and transforming it into
a specific age group. So instead of using the
raw age value like one, two, three, four, and so on, we can beam the ages into categories like group
one will be 0-18. Group two will be
19-30 and so on. This can help to capture relationships between
age and disease risk, especially if the risk change significantly at a
certain age thresholds. By carefully selecting features, engineers can
improve the accuracy and effectiveness of
machine learning models. All right, we talked
about the ML box and typical ML tasks. Then we reviewed
the training phase, the training model, different data types,
and also features. It's time to talk about
the teacher who is supervising the training
process. See you next.
15. S03L08 Supervisor: Hi and welcome back.
In our last lectures, we talked about the
training phase to create a trained model for an ML box that will use that trained
model to do something. But how that training phase is done and which methods are
being used to train a model. Let's talk about
those questions. In machine learning, there are a couple of methods
that can be used. We have supervised learning, unsupervised learning,
semi supervised learning, and the last one is
reinforcement learning. Under each method, there are multiple machine learning
algorithms that can be used. The selection of the
most relevant method and the best
algorithms to perform the job will be based on the data type and
required objective. For the first three options, we can see that it is somehow related to the
supervision level, how much external human
intervention is needed to supervise and control the
process of training a model. The first and probably
most popular method to train machine learning models is called supervised learning. Supervised learning is called supervised because it is
guided by labeled data. Let me explain that
concept of labeled data. Using this method, an
algorithm is training a model using a specific
pre selected dataset. This dataset is called
the training dataset. The training dataset
is labeled data. A label data is basically a collection of
many data examples. Each example is a pair of the input and the
expected output. Now, because the expected
output is provided, it is called labeled data. We know what is the input and
we know what is the output. The labels provide the model with the correct
answer foreach input, acting as a supervisor or teacher doing the
learning process. Think of it like a student learning a new subject
with a teacher. The teacher provides the student with exercises and
their correct answers, guiding the student's
understanding. In the same manner in
supervised learning, the label data is used
to guide the model to better understand
the relationship between inputs and outputs. As a simple example, a
training dataset can be a collection of ten K images, and per each image, the label will be a text that describe the main object
in a specific image, like a house, dog, cow, cat, bike, et cetera. Each example is a pair
of input and output. The input is the image and
the output is the label. Looking at the
following diagram, we have a training dataset with many examples, all those images. Each example has input
and expected output. We fed the model with
the first example, the first image,
meaning input data X. The model predicts
the output Y and then compares the predicted output
X with the expected output. So for example, if I feed an image with a cat
as the main object, and the algorithm
identify that as a dog, then something is not working. If they are not the same, it
means that the model should be tuned a little bit because there is an arrow
with the prediction. The algorithm will adjust the model parameters
and then try again, predict again and
check the arrow again, trying to reduce the
arrow to a minimum. That's called optimization. It's a repeated process while digesting a large
number of examples. On the other side of the spectrum to train
machine learning model, we have unsupervised learning. With unsupervised
learning, the model is trained using unlabeled data. That's a major difference. While supervised learning focus on training models
with labeled data, unsupervised learning trains
model on unlabeled data. The model learns
patterns directly from the data without
any guidance or targets. The main goal when using unsupervised learning
is pattern discovery. We can take a large amount
of raw data and fit it into a machine learning box
using unsupervised learning, and it will try to
discover hidden patterns, structures, or relationship
within the data itself. Those patterns can be used
to create useful insights. It's like exploring a new
territory without a map, aiming to uncover hidden places. You don't know what
you're going to find. A classic example using unsupervised learning
is clustering. We talked about clustering as a category of task
in machine learning. Checking if there
are data points that naturally falls into different clusters,
different groups. That's about
unsupervised learning. One of the biggest challenges
when training a model with supervised learning is
to get enough labeled data. Otherwise, the model will
not perform so well. Sometimes getting
enough labeled data is expensive or a time
consuming process, and we have easier access
to unlabeled data. This is the situation
in many projects. In that scenario, a third method was developed to
bridge that gap. It's called semi
supervised learning. Semi supervised learning
is hybrid approach that combines elements of supervised and
unsupervised learning. It involves training a model on a dataset that contains both labeled and
unlabeled examples. It utilizes a small amount of labeled data and a large amount of unlabeled data
to train a model. And therefore, it's a very
cost effective option to improve the accuracy, and efficiency of
machine learning models. The last method to train machine learning models is
called reinforcement learning. It is a completely
different approach compared to the three
methods we covered so far. It is inspired by how humans and animals learn
through trial and error. Where actions that lead to positive outcomes
are reinforced, while those that lead to negative outcomes
are discouraged. Using this method, we train something that is
called an l agent, reinforcement learning
agent by interacting with the environment to maximizing something that
is called a reward signal. The agent is basically a
decision making machine. It is constantly making
small decisions or actions by trying things and
learning from experience. Imagine that you play
a table tennis with a robot that is controlled
by such aryl agent. The Al agent gets a positive reward each time
it wins a ping pong session. It may play very
badly when starting a new game and then improve while learning
how to play better. It will learn which
actions and what kind of strategies can lead to
better positive outcomes. It is very similar to how
we learn from experience. Over time, the RL agent
develops a policy also called a strategy for selecting actions to maximizing
the reward. That's the model created by RL. Reinforcement learning is
used in many fields in transportation while
developing self driving cars that can navigate roads
and make decisions in robotics for
teaching robots to perform tasks in
complex environments. Or in the gaming industry while developing agents that can
play a variety of games. By the way, some
use cases are using a combination of those four
options to train models. Let's summarize everything
we've covered so far in this section and
see the next lecture.
16. S03L09 Summary: Hi and welcome to the last
lecture in this section. I would like to summarize all the things we
covered so far. We started by using the concept
of a machine learning box as a simplified representation of any machine
learning solution. This box takes input data, process it, and
produce an output. The type of data and
the desired output depends on the specific use case you would
like to handle. Next, we talked about the four main categories of tasks by machine
learning solutions, prediction, classification, clustering and
content generation. Prediction involves
forecasting future values based on past data. While classification
is used to assign input data into predefined
categories like labels. Clustering is used to group similar data points to find
hidden patterns inside, and content generation creates new content based on
learned patterns. How does the MLBx know to
perform a specific task? Well, based on training. Doing training, MLBx
learns from data to acquire the knowledge needed for performing different tasks. This process involves
collecting data, repairing it, selecting algorithms, and then evaluates the model performance
while doing the training. The final trained
model is then used by the ML box in production
to make something useful. It is a complex
process that requires time, resources and expertise. We also saw how any machine learning solution
can be represented as a simple mathematical formula while equal to F X.
X is the input data, Y is the output, and F is the trained model. During the training process, the algorithms tune
the mapping function F using a large amount of data. Can we push any type of data to a machine learning
box? Well, no. We need to understand the data types to better
match the right solution. In that context, we mentioned the three main types of
data in machine learning, structured, unstructured
and semi structured. Structured data has
a defined format. Unstructured data lacks
any defined format, and semi structured data
falls between the two. Examples of structured data
includes a list and tables, while examples of
unstructured data includes the text,
images and videos. Semi stature data is a partial stuture and we
mentioned emails and log files. Now, data is a very
high level definition. We need to make it
more granular and divide the data into
more manageable pieces. That's the concept of features. Features are the individual
pieces of data that are used to train a model and later on make prediction
or something else. They can be numerical, categorical, text,
images or audio. As part of improving
a model performance, it is a good practice to perform something that is
called feature engineering, meaning add remove or
transform specific features. Lastly, we cover the
different methods used to train machine
learning models, including supervised, unsupervised semi supervised and risk
enforcement learning. Supervised learning uses
labeled data to train model for prediction classification
and data generation, while unsupervised learning uses unlabeled data for
pattern discovery. Semi supervised
learning combines both labeled and
unlabeled data and can be used as a cost
effective option to handle the situation that there is
not enough a labeled data. And the last one was about
reinforcement learning, which is based on
trial and error, a learning method and
can be used to handle very complex scenarios that the box need to interact
with the environment. The choice of method
or a combination of methods depends on the datatype and the desired objective. There are increasing
number of models that utilize the combination
of those learning methods. They can use supervised
learning together with unsupervised learning and even use reinforcement learning. And this is absolutely the
direction of many use cases. That's it for this section. As a soft introduction
to machine learning, you're more than welcome to test your understanding with a
quiz following this lecture. In our next section, we are planning to talk about the secrets of generative
AI. See you next.
17. S04L01 Introduction: Hi, and welcome. Thanks
for watching so far. Did you ever see a live
magic show on stage? I guess you managed to see
a couple of such shows. It's a great experience. You are watching every step that the magician is taking and saying to yourself that
you're going to get it. You're going to reveal how he or she is doing
that magic show. But unfortunately,
in most cases, the magician is doing
that performance so well that you are
just surprised, amazed with a big
smile on your face. That's the job of
a great magician. Going back to our topic, the capabilities of Genea to understand human
language and to create content based on
text seems almost magical. Like our magician. It's quite amazing that we
reached a point where those system can
understand complex text as input and generate
many types of content. It's a big step forward
in the AI landscape. There is a tremendous
growing list of tasks that can now be handled
by GAI based systems. Maybe ten or 20 years from now, it will be so embedded in our daily lives that it will
not be so amazing anymore. We get used to it
like any technology. But how those generative AI
systems doing their job? What is the cicut
engine running inside? It is a good and
important question, even if most of us will
not build those systems, we are going to use them
and for using them wisely, it will be useful to
better understand the key principles of
those technologies. That's the main objective
of this section. We will uncover the
secrets related to GAI. You don't need any
background in math, computing science
or programming. Just bring a nice coffee tea, and let's start our journey. See you in the next lecture.
18. S04L02 Artificial Neural Networks: The first building block of GNAI is artificial neural networks created using deep learning. Artificial neural networks
or in short, ANN, are computer based
models inspired by the biological neural
networks found in human brain. Don't worry, we are not going
to make it too complex. Let's look at the
high level structure of an artificial neural network. We can see three layers, input, hidden and output. It's a very simple illustration. On the left side,
the input layer receives the input data, which can be numbers, images, text, or any other
form of information. The input data is divided
into features like X one, X two, X three, et cetera. A simple example,
if the task is to predict the price of an
apartment as we saw before, then the input can be a collection of features
like the overall size, number of rooms,
location, and more. If the task is to classify
an object in a picture, then the input will
be a collection of pixels of that image. Then we have the
hidden layers inside. These layers process
the input data and extract relevant
sub features. They can be based
on multiple layers, allowing the network to
process more complex patterns. That's why it's
called deep learning. The depth is correlated with a number of
those hidden layers. The last layer is
the output layer that produces the final output, which can be a classification, prediction or
generation of new data. We can also see many lines of connection between nodes
inside those layers. Each connection
between nodes has a certain weight that is adjusted during training to optimize the performance
of the network. For example, if the overall
size of an apartment is a critical factor in predicting
the price as output, then it will have a
stronger weight number. The algorithm used to train that model adjusted the
size of each connection. If the input feature
is important, then it will have a bigger
and stronger connection with greater influence
on the output. More important features
will be translated into stronger signals
propagating inside the network with a greater
influence on the output. That's a high level definition
and illustration of an artificial neural network as part of the evolution
of deep planning, multiple internal
architecture or deep planning architecture
were developed to handle different types of data and different
types of tasks. That's the topic of
the next lecture.
19. S04L03 Deep Learning Architectures: Part of the introduction
to machine learning, we talked about the concept
of training a model. The model represents
the knowledge of the machine learning box. When the patterns inside
the data are very complex, the typical method will
be to use deep learning, so the trained model
will be based on multiple layers inside
the neural network. Those layers will be able to
catch more complex patterns. That's the main concept
of deep learning. As part of the evolution of training a model
using deep learning, three main types of
architecture were developed. Let's quickly present
each one of them and what is the important
relation to generative AI? Using the first method, meaning recurrent neural networks, the machine learning system is processing the input
data sequentially, meaning one data
element at a time, each processed element is adding some tiny knowledge and changing the internal state
of the train model. It is slowly capturing connections and patterns
between data elements. As you can imagine, it will
take a lot of time to process all data elements because we are doing it
sequentially one by one. It's not a good solution
for huge amount of data. Still, it was one
of the main methods to train models with the planning for all kinds of tasks related to
processing languages. The next type of neural
network architecture that evolved is called
convolutional neural networks. It is specifically designed for image and video processing task like image classification, object detection, image
segmentation, and so on. In around 2017, 2018, a team at Google developed a new architecture to
process input data. It is called the
Transformer architecture, which was a key breakthrough
in the planning. It is now the de facto standard for training many models
with the planning. Using this architecture,
an ML system can process input data in
parallel instead of processing data
sequentially like in gene. The parallel processing is done during the training phase, as well as later on when the train model is ready and
being used in production. This approach is more efficient helping to speed up
the training phase. By the way, this training
phase is based on a famous hardware
component called GPU. GPU stand for graphics
processing unit, and for many years,
they have been a great solution for
running amazing graphics. Every computer is
equipped with a GPU chip. Another key use case
that evolved in the market is for training
deep learning models. Those GPUs are the
power host for parallel processing using the
transformer architecture. All Cloud providers like Amazon
AWS and Google Cloud are buying those GPUs to provide the cloud infrastructure for
training models at scale. Parallel processing is
the first key advantage when using the
transformer architecture. Let's talk about the
second key advantage. Do you remember using markers in your notebook while
analyzing some text? I use that a lot during
my engineering studies. My strong hand is my left hand, and I was always slower than the rest of the
class when writing. Writing fast and clear
is not my strong side. At some point in time,
during my studies, I decided to change my
learning strategy and listen without writing
anything, even a single word. At the end of the class
after a couple of lectures, I just use the copy machine
to copy from someone else with much better
handwriting capabilities. As you can imagine, my
best tools were markers. I used the sophisticated at color method for
marking keywords. Many people are using
that simple method because it helps to emphasize in a visual way which words or group of words in the
sentence are more important. Marking key terms to remember. Your mind can focus on those key terms and map the
connection to other words. The transformer
architecture uses a similar approach when
analyzing input data to help the system better
understand which data elements as part of the input are more
important than others. As a big surprise, it is called
the attention mechanism, helping the system to pay more attention to
specific elements as part of the data stream, like a marker putting more emphasis on a
specific keyword or a pair of words while using the red color and another
marker with a green color. Let's take a simple example. If I will provide the
following sentence to a GNAI system, how to create a Python code that can calculate the
sum of two numbers. The AI system will break down the sentence into
individual words. This process is called Tchanization, creating
small tokens. We'll talk about tokens
in the next elections. Then it calculates
something that is called attention scores for
each pair of words, trying to determine
how much attention should be paid to one word
when processing another word. This helps the AI system to
better understand context. For example, the atension
mechanism might determine that calculate as
a word is closely related to sum because
calculating a sum is a co action. Or PyTon will have a
high attention to code. Okay? Since PyTon described the language used to
write the code and so on. So as a quick summary, using the transformer
architecture, the parallel processing, coupled with the
attention mechanism enables GNAI system
to digest more data, process it faster and catch
more complex patterns. It is the backbone
of generative AI. Now, why it is important? Well, the power of
generative AI is based on the capabilities to digest and understand the
human language. In machine learning terminology, it is called natural
language processing. Before using the
transformer architecture, it was very difficult
to generate a good machine learning model that could handle
the human language. Human language is
highly complex. It is unstructured data. The text is context. One word inside a sentence
or a sentence inside a paragraph influence the meaning of the next word
or the next sentence. This architecture
is helping to train very complex models that can
handle the human language. They are called LLMs. Large language models. Those models are now the foundation of
many Genea use cases. Now, as you can imagine, training an LLM model requires huge amount of data and huge amount of
computing resources. Who can build and train
those large language models. So to the concept of
foundation models, our topic in the next lecture.
20. S04L04 Foundation Models: Hi and welcome back. In
the previous lecture, we talked about the evolution
of deep learning methods. With the introduction of the
transformer architecture, before using the
transformer architecture, most of the models
were more simple, specific purpose AI solutions, meaning the train model was great for handling
a specific task. This architecture created
the required environment to train models with much more data and make
them more generic. It was a huge step
moving forward. That's the concept of
foundation models. Foundation models are large scale more generic
models trained on massive amounts of
data and can be adapted to perform a
wide range of tasks. Therefore, they provide
a strong foundation that can be adapted to a
large number of use cases. It's a great building block. Think about a library of
thousands of different signs, books, or historical books. Each book is used as an input to train a
specific foundation model. The result will be a
foundation model that has knowledge on a variety of
topics, millions of topics. Training a foundation model is an expensive resource
intensive project. We need the ability to collect, store, and process
huge amounts of data. We need the hardware and
software infrastructure and a team with the
relevant skill set. That's where the big players
can leverage their power. Big players like Google, Microsoft, Amazon, and others. They can train large
foundation models using huge amounts of data. One of the most popular
foundation model is GPT, running inside the
famous HAGPT service introduced by Open AI. GPT stands for generative
pre trained transformer. Let's break down those words because we already
cover those terms. The first one is generative, it means that the model can generate content
based on the input. This is the main task of that
model, generating content. Next word is pre trained. This model was trained on a large amount of data
from diverse sources, such as website,
books and articles. It is a foundation model. And the last one transformer. That's the internal
architecture of the model, which is becoming the
de facto architecture for creating foundation models. ChaGPT is a great
example of using a foundation model designed
to be accessed by anyone. User can directly interact with the model using a
simple text as a pump, ask a question and
get an answer. Given this satility of
a foundation model, smaller players like
medium sized companies or startups can leverage those foundation models developed and provided
by the big players. Instead of investing
millions of dollars in training such
models from scratch, they can adapt an existing
foundation model for a fraction of that
amount and introduce new AI based products
more quickly. As I mentioned, a foundation
model is a building block. Now there are many types
of foundation models. So are focused on handling a
natural language processing. Some of them are focused on computer vision tasks like
image and video generation, speech recognition and more. One of the co
foundation model types is for natural
language processing. They are called LLMs, and that's a topic
of the next lecture.
21. S04L05 Large Language Models (LLMs): Hi and welcome. In
the previous lecture, we talked about
foundation models. A foundation model
is a model that was trained with a large amount of data coming from
different data sources. It can be used as the
building block of foundation for other
more tailored models. One of the most popular types of foundation models is the
large language model. Large language models are
the core capabilities of generative AI to handle
text as input and output. That's one of the main engines running inside in
any GAI solution. They are widely known for their amazing
ability to analyze, understand, and generate high quality written
text as a response. Using those models,
machines can understand, and respond in a
native human language. And they are getting
better and better. The introduction of
LLMs is a huge step in the AI industry with tremendous potential to
impact almost any domain. In the previous lecture, I mentioned the famous HGPT in the context of a
foundation model. CHEPT can serve as a base for many types of
application and tasks. The type of model that
HAGPT is based on is LLM. Another important
aspect to mention is that not all LMS
are created equal. The word large can be misleading and we need to
explore that a little bit. LLM is a generic term for
training large language models. But what is the
meaning of large? How one LLM is larger
than another LLM. One way to estimate the power of an LM model is to look
at the model size. A model size typically
refers to the number of parameters in the model. It's like the number
of brain cells. Parameters are the small
elements or variables adjusted during training so the model can process input
data and generate output. That's the knowledge
of the new model. More parameters are directly correlated to more capacity
for stowing knowledge. The number of parameters in a typical LM is measured
today in billions, like 1 billion, 10 billion, 100 billion, and more. Today, a model size
of around 100,200 billion is considered to be a large model with
great capabilities. The model size is not always information
that is published by the company that train that model because those
players are competing. And they may decide not to
share that information. This is something to consider. In addition, we need to keep in mind that this industry is
in a constant evolution. 100 billion of parameters today may seem small five
years from now when the models
will be measured, I don't know, maybe with 100
trillions of parameters. Those numbers are not
written in stone. They will eventually change and the benchmark to be considered large is going to be higher. Let's talk about
the famous TAGPT that is based on GPT model. GPT has evolving version, one, two, three,
four, et cetera. As you can imagine,
each version is bigger. They are using more data
to train the model. There are more parameters
in that model. However, it's not a free meal. A bigger, more complex
model will require more computing resources
to train and deploy. The complexity of the model
will influence many factors like the required data set and volumes to
train the model, the number of computing
resources that will impact the cost of cloud resources that's
running that model. Also the skill set
to keep it running. Secondly, a bigger model isn't always better for
a specific use case. As an example, if I
would like to build or use an off the shelf
model that can recommend the best restaurant at a given location
based on an input data. The user can write some text about the
required restaurant. It will provide some
recommendation. In that case, I don't need a model that was trained on
the complete human history. That knowledge is nice but less relevant to
that specific task. I need a smaller, more
cost effective model that is more tuned to
handle that specific task. As I mentioned before, training foundation models is
the job of big players. They have the
resources to create such models with
billions of parameters. Other market players
will focus on what kind of tasks they
would like to automate in their workflow using
AI and then check which models in the market are cost effective
for those tasks. In the next lecture, I
would like to talk about the main categories of
LLMs. See you next.
22. S04L06 Model Types: I welcome. There are many
types of foundation models, and if I will be more specific, there are many types of LLMs. And it will be
useful to categorize them based on
several dimensions. The first dimension is general purpose LLMs versus
domain specific LLMs. As the name suggests, general purpose LMS handle
a wide range of tasks. They will be trained by
taking massive amount of data from the Internet
and other data sources. The outcome will be a
large language model that can handle a
variety of topics. Chachi PT and Google Gemini
are general purpose LLMs. The second one is
domain specific LLMs, also called specialized LLMs. Those LLMs are trained to handle tasks related to a specific
domain like finance, legal, cybersecurity,
gaming, medical, and countless other
type of domains. All the data for training a domain specific LLM will
be related to that domain. Those LLMs can be
further fine tuned to handle more a niche area
of a specific domain. I assume we will see more and more companies that are training domain specific LLMs. The next important dimension is open source versus
closed source LLMs. Open source models are
models that are available to the public and can be used
without any commercial cost. It's like an open source code. Anyone can download a model, change and customize it. Like any open source project, it is not always
tuned for production, but it can be used as a
development framework, a starting point for proof of
concept project, and more. One big advantage is that we will have full control
over the model. We can inspect the code and
underlying architecture, customize it as needed, and most importantly, there
is no license to use it. Another advantage is
related to data privacy. In some cases, an
organization cannot upload sensitive data to a
third party service so by using an
open source model, which is used internally, sensitive data will be used with a low risk of data leakage. On the other end, closed
source cell lens are proprieties models owned by companies and are not available to the
public as source code. They are available
to the public as web based services
or using APIs. It is a model which is
encapsulated in a box. All we need is to provide the input and then
get the output. The company that developed
a closed source LLMs will provide some level
of free access with limited capabilities, and on top of that, more premium options based
on a pricing model. They are monetizing
the trained LLMs. CAGPT and Google
Jiminy are examples of services based on a
closed source LLM. A closed so CLLM will be more optimized for
production because someone else is investing resources to ensure it
is working as expected. Secondly, it will be
much faster to deploy. This approach simplifies
the integration of AI into many applications. With a few lines of code, developers can integrate
advanced AI capabilities into their application and
without dealing with the whole concept of training models and
keep them updated. Now we are reaching the
interesting question about LLM. How does it work? What are the magical elements
inside that LLM model? Well, you may be
surprised that it's not as complicated
as you may think. See you in the next lecture.
23. S04L07 Prompt and Tokens: Hi, and welcome back. We
covered in a high level the concept of LLMs,
large language models. The main capability of a large language
model is analyzing, as well as generating a text. In addition, we divided LLMs
into several categories like general purpose
versus domain purpose and open source
versus closed source. In this lecture, I would
like to take a step back and talk about
the input of an LLM, meaning prompts, and how a prompt is bloken
into little chunks. It's another building
block while we keep revealing the
secrets of generative AI. Let's start with the questions. What is a prompt? A prompt is a piece of text that is given to
the LLM as input. It can be used to control the output of the model
in a variety of ways. The output of the model is known as completion or response. When we provide a
prompt to the model, it generates a response
based on that prompt. The prompt is a group of words,
sentences and paragraphs. Let's consider the English
language as an example. It has more than
1 million words, including different
forms of the same word, technical terms, slang,
compound words and more. It is a complex language that
is evolving all the time. A typical natural language will have huge number
of possible words, creating a complex combination
of words and sentences. As a result, the complex
text format is not the most efficient
way to process and store data for a machine
learning system. It should be simplified somehow. The solution for that challenge is to convert parts of words, complete words,
and combination of words into numbers,
numerical data. Those numbers are called tokens. In that case, the
model handles tokens which are numbers instead
of dealing with words. So what is a token? A token is a fundamental
term in generative VI. Tokens are essentially
numerical representations of characters, words or phrases. Tokens refers to
units of text that the model process tokens can be a single character
like the letter B, or a complete word
like a flower, or a combination of
words like ice cream. By representing words as
numerical numbers, using tokens, the model can perform operations on them more
quickly and efficiently. The set of full tokens used by the model is called
the vocabulary of the model and the process of splitting text into token
is called tokenization. The component that is performing that process is
called a tokenizer. In essence, an LLM is getting a sequence of tokens
created by a tokenizer, breaking the input
text to tokens. Let's see that in action. I will Open AI
tokenizer website. I hope they will keep
the link available to the public so you can
play with that as well. This tool can be used to see how a specific model will break
any prompt into tokens. I will paste the show
text as a prompt inside and see how it is
converted into tokens. This counter shows the
total amount of tokens. Each token is coloured
with a different color, so we'll be able to see the flow of identified
tokens in the text. Just keep in mind that
the list of tokens for a specific given pump text may be divided differently
depending on the model. More advanced models may
use a different tokenizer. You can say that tokens are part of an internal process
inside the AI system, so why should we
care about them? That's a good
question. Let's answer that one in the next lecture.
24. S04L08 Total Tokens and Context Window: Hi and welcome. We just
talked about the concept of breaking and translating text
input prompts into tokens. Tokens are numerical
representation of characters, words or phrases. It is an internal process
inside a generative AI system, and the question is
why should we care about it? Let's
tackle that question. Tokens are a fundamental metric for measuring usage
in GAI system. The total number of
tokens for a given input, plus the total number of tokens generated as part
of the output is a measurement used
by companies to track and limit the usage
of generative AI services. As an example, assuming
a company developed an application that is powered by a third party GNAI service, in that case, they will optimize the input prompts
to minimize cost. Usage is eventually
translated into cost and cost optimization
is important. Another key issue is related
to the model limitations. Different models have
different token limits, which can affect the length and complexity of the
prompts we can use. Let's assume that you are discussing with your
friends about some issue. It's a long ping pong session. You are saying something
and based on that, your friend is saying something. Both of you are
trying to consider the context of the
discussion to keep the flow. But if it's a very long session, you or your friend
will have some trouble remembering all the things each and every one of
you mentioned over, I don't know, the last 3 hours. We have limited capacity in
our short term memory, right? Going back to GAI systems, it is called the context window. A context window refers to the maximum number of
tokens that a model can process and consider at once as a group when
generating a new token. This window is a crucial factor related to the model
ability to understand and respond to complex prompts or generate long term as output. A larger context window
enables the model to generate more sophisticated and contextually
relevant outputs. Imagine that you are feeding a GEI system that has a ten K, a context window and you are feeding an article which
is broken into 15 tokens. In that case, the
GAI will not be able to digest and
process the full article, leading to unexpected
behavior of the model. This model can consider looking back up to those ten K tokens. To make an effective use
of the context window, it is important to manage how text is presented
to the model. For example, long
documents may need to be chunked to fit within
the context window. In our example, I can
break the article, which is 15 tokens to
several chapters where each chapter is not larger than the size
of the context window. Let's take another example. If we are building applications
that interact with generative EI systems
through APIs, understanding tokens can help us better design APIs
more effectively, like setting appropriate
limits on token usage, providing feedback on
token consumption, break queries into
smaller prompts before we are using those into
the actual GAI system, and overall, we can use that to optimize the API
performance on our side. As a quick summary, by
understanding tokens, we can better utilize
a specific GEI system, optimize our usage, and get the most out of
those powerful tools.
25. S04L09 Next Token Please!: Another important aspect of tokens is related to the
output of the model. We may assume that an LLM can think and understand
the meaning of words, but this is not really the case. LLMs may seem very
sophisticated, but on a practical level, those are machines that just see a pattern and easy
to predict the next token. Let's explain that concept. An LLM model is operating
in a sequential mode, each time it will predict
one single token as output. Then this predicted
token will be used as input to generate the next
token in a sentence and so on. It looks like a sentence, but from the model perspective, it is just a sequential list of related tokens that
are still under the range limit of the
model context window. Using this approach, LLMs
can continue to predict a sequence of words that
looks like a complete answer. That's the magic behind
this technology. An LLM model is a sophisticated machines to generate a sequence of tokens. Another interesting
question is about the way a token is generated as
output by the model. Well, based on statistics, and to be more precise, it is based on
probability distribution. Let's open that concept
at a very high level. LLMs are trained on
massive datasets, so they have tune parameters on the statistical
relationship between tokens. They learned which tokens have stronger statistical
correlation to other tokens. By using this information, an LLM can make that prediction with some level of
statistical confidence. The model process
the sequence of tokens as part of the input and calculates a probability
distribution for the next possible token. Let's take a simple example. Assuming the input tokens
are the following. The cat set on and
a missing token. This is the predicted
next token. The model will take
the provided tokens like cat and set on, and it will internally
generate a list of possible next tokens with a
probability distribution, like the next token
can be fence, sofa, roof, et cetera. Then it will select the
best matching next token, maybe so far with the
highest probability. The model will predict
the next token based on calculating which list of tokens have a stronger
statistical correlation. To make it less deterministic, it will not always select the token with
the highest probability. It may add some level
of randomness to make sure we do not get the same
output for the same input. If you use the solution like
Cha GPT or Google Gemini, you may notice that
for the same question, you may get slightly
different answers while running the same
prompt several times. Now, what is the logic to make the model less deterministic? Well, it is useful
to add flavor of creativity and to allow
for more diverse outputs, simulating some level
of creative thinking. That's about how those LLMs are generating text as output. It is a sequential process to generate the next token
in some feedback loop. However, I haven't mentioned yet how those LLMs are trained. That's the topic of
the next lecture.
26. S04L10 Self Supervised Learning: Hi, and welcome. We covered
many topics related to the underlying
technologies of generating I and specifically
Zoomed on LLMs. We talked about the prompt input and how it is
divided into tokens, considering limitation
like the context window. We also mentioned that
after the training phase, an LLM holds the patterns represented as statistical relationship
between characters, between words and
between sentences. It's basically using tokens. Using those patterns, the model can predict
what comes next. What is the next token
or word in a sentence? Then it will use the
previous predicted word to predict the next word, one by one
sequentially to create more complex structure like
sentences and paragraph. It is a sequential process. In this lecture, I
would like to uncover another small secret
about generative AI. I mentioned that LLMs
are created or better say they are trained using massive amounts of
unstructured data. The question I would like to
tackle is how that possible? What is the method being used? As you can see from the
title of this lecture, it is based on a process known as self supervised learning. Let's explain that concept. In supervised learning, the training data
is labeled data, meaning each data point
example as input and output. The output is the
label data and someone should provide those
examples with labeled data. In many cases,
providing label data is a costly process
with many limitation. It's not always possible to get enough amount
of labeled data. It is becoming even
harder when talking about training models
for handling languages. Text is unstructured data. Self supervised learning is an interesting method in machine learning
where a model is trained using the data itself in a supervised manner without
using external labels. The idea is quite simple. Let's assume we are
feeding a model, a single page from a book. This page is based on multiple paragraphs
and each paragraph has multiple sentences, right? Each sentence from that page can be used for
training that model. How is the model achieving
that capability well, based on several methods, one of them is called
masked language modeling. It involves the following step. Step number one is
called masking. It will take a sentence and
then in a random way replace some words in the input text with special tokens called mask. This creates a masked sentence. Step number two, it's
called prediction. The next step will be to try
predicting that masked part. The model is trained to predict the missing words based on
the context provided by the surrounding words
in that sentence and then compare the prediction
to the actual masked part. Step number three is
basically learning by repeatedly performing these
poses on a large dataset, the model will learn to understand the
relationship between words, their meanings, and how
they are used in a context. Let's take a simple example. The input sentence will be, I love eating cheese pizza. The model can take that complete sentence
and mask a word inside, like masking the topping. The task will be to
predict the missing word. Possible prediction
can be pepperoni, cheese, margarita, and so on. So how this is
working end to end. The model analyze the
surrounding words. I love eating and
at the end pizza. Based on the context, the model predicts that the missing word is likely
a type of pizza topping. The model might consider
the frequency of different pizza topping
in its training data, as well as in the context
provided by the word eating. Then the model will assign probabilities to different
possible topping, such as paparoni cheese, margarita, and so on, then
select the most likely topping based on its
probability distribution. Finally, it will calculate
the error level between the actual masked error
and the predicted value. So this error will
be considered as a feedback loop to keep optimizing the internal
parameters of the model. If I will expose 1,000
similar sentences, and 70% will be with the
pizza topping cheese, 20% with margarita and
around 10% paperoni, that's going to influence the probability distribution
stored in the model. Now, this is just a
single sentence, right? Imagine if we trained an LLM model with
millions of sentences, billions or trillions of words by repeatedly going
through these steps, the model learns to predict missing word with
increasing accuracy. All that is done without
human supervision. It's a self supervised
learning, completely automated. As a result, it has
massive scalability. The model can leverage
vast amounts of unstructured data
and unlabeled data that is available everywhere. This process helps
the model develop a deep understanding
of the language, including relationship
between words, their meanings, and how
they are used in a context. In that case, we will get a large model that will be good at generating
a text response. That's how an LLM
model is becoming so good while generating
content based on text input. This pre training step based on self supervised learning
and other methods is super important for
creating foundation models. But it is not the
end of the story. Those models can be tuned
when generating responses, and that's the topic of the
next lecture. See you again.
27. S04L11 Improving and Adapting LLMs: Hi, and welcome back. In
the previous lectures, we covered many topics
related to LLMs. A LLM is a foundation model that is pretrained on a
large amount of data. In this lecture, I would
like to talk about the options to improve and adapt an LLM so it can be tailored and used as a building block for
more specific use cases. Let's review them one by one. The first one is called
contextual prompting. It is a method as part
of pumped engineering. Pumped engineering
is the process of designing and
crafting prompts to effectively
communicate and guide large language models to
generate desired outputs. The idea is quite simple. When we ask the AI
system to do something, we should articulate
as clearly as possible the required task
and in some cases provide context, any
required background. That's the easiest and most cost effective way to tune the responses we
get from the model. In the next section, we'll see how to use contextual prompting. Let's move to the next option. The second method to
tailor a model to a specific requirements
is called retrieval, argumented generation
or in short Rag. It combines the power of pre trained large
language models with the ability to retrieve additional relevant
information from external knowledge sources like external databases or documents. Why this method is useful and important this method addresses some common problems associated with using public GAI system. The first issue is
related to private data. Many companies are holding private data that
cannot be exposed to the public and
cannot be used by other companies to train
foundation models. There is a gap between what a generic public
LLM was trained on and specific useful private
data owned by a company. The second issue
is about limited knowledge about facts and events that took place after
the model was trained. This method is based on several steps as
part of the process. Get and analyze
the input prompt, based on the requirement, extract useful
relevant internal data from the organization databases
or any other knowledge source and then feed
the original prompt together with that
extra information to a pre trained model. It's like an enrichment process. For example, a company
that has a support chatbot can use the Rug method to enhance the pumped with information about
products and services. When a visitor is
asking something about a product by submitting a
query in a chat session, the chatbot will search internal databases and
internal documents for information that
will be useful as extra knowledge to the
Backend LLL model. It will take the visitor
query, original query, add extra information like
maybe the product user manual, which is a private data and set it via API to the LLL model. The LLM will be able
to generate content based on the original query
coupled with the extra data. In our case, the
LLL model may find the answer inside the
product user manual using this extra information. I would like to emphasize that this method is not
changing the LLM. It simply provides additional
context and information to help the LLM generate more accurate and
informative responses. By using this method,
companies can leverage internal private data as additional information
to enhance and tailor the
generated content. Secondly, it can be used
to bring the model up to date with recent event or
any domain specific content. One of the significant
drawbacks of RAG is the limited size of the context window in
most language models. This means that the model
can only process and understand a certain
amount of text at a time. So the system which is searching
the extra data must be careful with the
amount of data it is pushing to the LLM as
an extra information. Another issue is
related to latency. Retrieving information from external sources can
introduce latency, delay into the overall end
to end generation process, potentially slowing
down the response time. Lastly, is the cost of using a large number of tokens as
part of the input prompt. We need to submit a
substantial amount of extra data for each query. It may not be cost effective
for use cases that require a large amount of extra data in a higher frequency
of request. Let's talk about the next option to consider. Fine tuning. Did you notice that
the training of a large foundation model like
LLM is called pre training? It is called that
way because it is common to have a two
step training process, pre training and fine tuning. The first step
called pre training, is to train a foundation model with a massive amount of data. This step is done
by the owner of the model or the company
operating that service. Then as step number two, other companies or
individuals can take that pre trained
foundation model and retrain it with
specific data. The result will be a new
fine tuned model that is more optimized to handle specific task or several tasks. It is called transfer learning, training a smaller
task specific model on top of the pre
trained propriety model. It's important to emphasize
that fine tuning is based on small amount of data
compared to the data used for pre training
the foundation model. Therefore, it's a very cost effective option in some cases. Why not retraining a
model from scratch? Well, I mentioned that this is the job for the big players, in most cases, it's a more
cost effective option. Fin tuning is not limited
to open source models. There are many models
providers which offer dedicated APIs that allow developers to interact
with their models. These APIs are used
to feed the model with custom data
for fine tuning. The result will be
a new sub model with updated parameters. There are a couple of benefits when using fine
tuning compared to pumped engineering with
context or using Rag. First is the latency. Unlike RAG, there is no process time for each query to find the
relevant enrichment data. Secondly, the prompt size will be much smaller because
we don't need to provide extra data for every prompt,
reducing token usage. On the other end, fine tuning is an advanced method that must be done carefully to
get the right solution, and it will require some level of expertise and
knowledge in that field. Otherwise, the output will be less useful than
the original LM. That's about the
three main options to tailor and existing LLMs. Thanks for watching so far. Let's summarize the
complete section.
28. S04L12 Summary: Hi, and welcome back.
Thanks for watching so far. I think we covered a lot
of topics in this section. It was quite a
comprehensive overview of many key terms
in generative AI. I hope you feel
that we managed to uncover many secrets
related to this technology. Now, I would like to
quickly summarize this section and create a
connection between the topics. Generative AI is a type of artificial intelligence
that focus on creating new content. It added the important
capability to analyze text as a language and use it as an input to generate
different types of content. And it's not limited
just for a text. It can be used for generating
other types of content. We started by talking about
the main building blocks of any GEI solution, the artificial neural networks created using deep learning. That's the internal
structure to hold the brain and knowledge
of the model. Next, we talked about the
evolution of a couple of deep learning architectures used to train those neural networks. The latest and
greatest development is the transformer architecture. This architecture created
the environment to train and handle large models and
make them more generic. Those models are called
foundation models. They are large scale
generic models trained on massive amounts of
data and can be adapted to perform a
wide range of tasks. Therefore, they provide
a strong foundation as a building block
for many use cases. One of the most popular types of foundation models is the
LLM, large language model. LLMs are the core
capabilities of generative AI to handle
text as input and output, they are used to
analyze, understand, and generate high quality
written text as a response. Now there are many
types of LLMs. We mentioned categories like general purpose LLMs versus
domain specific LLMs, general purpose LLMs handle
a wide range of tasks as they are trained by taking massive amounts
of public data. On the other end, domain specific LLMs are trained to handle tasks related
to a specific domain, like finance, legal, et cetera. All the data used for training a domain specific LMS will
be related to that domain. Another important dimension is open source versus
closed source LLMs. Open source models are models that are
available to the public. Anyone can download the model, change and customize it. On the other end, source
LLMs are proprietary models owned by companies and are not available to the
public as source code. They are available to the public as web based
services or via API. Those are closed boxes. Then we covered
several key terms that are important when
using GAI models. The input size is called
the prompt as text, and the output is called
a completion or response. When we provide the
prompt to the model, it generates a response
based on that prompt. Next, we talked about
the concept of tokens. Tokens are numbers
that are generated by a component called tokenizer, which is used to map the text
into a numerical format. It is more efficient way
to process data by models. The number of used tokens is measured as a metric for
service consumption. In addition, each model
will have a limitation on the maximum number
of tokens that can be handled as a group
under the same context. It is called the context window. Using this knowledge,
we managed to reveal how those LLMs are generating
a complex text response. It is all about predicting
the next token in a sequence. The sequence of
generated tokens creates more complex patterns like words, sentences and paragraphs. Each predicted token
is selected by looking at the
statistical distribution of possible prediction. How those LLMs digest massive amount of
unstructured data as part of the pre
training phase? It is based on a sophisticated self supervised learning method. Using this method, a model
is trained using the data itself in a supervised manner without using external labels. For example, it is done by automatically
taking sentences, masking specific
words, and trying to guess the right answer and
learn from that experience. The last topic was about
the three main options to adapt and tailor
an existing LLM. The first one is called
contextual pumpting. It is a method, the spout of pumped engineering where
we provide context, a spout of the pump to better guide the model to
generate the output. The second method to
tailor a model to specific requirements is called retrieval argumented generation. Or in short Rag, the concept is to extract useful relevant
internal data from external databases and fit in the spout of a prompt
to a pre trained model. By using this method, companies leverage
internal private data as additional information to enhance and tailor the
generated content. And the third method is
fine tuning where we take a pre trained model and
retrain it with new data. It is a practical
method to shape an existing LLMs according
to specific requirement, optimize to specific
domains and use cases, the output will be
a fine tuned model. That's a quick summary of the topics we covered
in this section. Please use the quiz to test your understanding and feel
free to share questions. As you can imagine,
generative AI is not a perfect technology, and it has some
limitations and challenges that we must consider while leveraging this
evolving technology. That's the main topic in the
next section. See you again.
29. S05L01 Introduction: Hi, and welcome back.
Thanks for watching so far. I hope you enjoy the training. AI and specifically
generative AI is an exciting
innovative technology. The market is gaining momentum, and we can see more and more
companies and individuals trying to leverage that
technology in many use cases. That's a direction
for the upcoming e, and it's a great opportunity for anyone to join this
huge market wave. Now, I don't want to lower your excitement
and expectations. Nevertheless, we must be fully aware that NAI is not perfect. It is a new technology with
limited market experience, and like any new technology, it has multiple challenges
and limitations. Those limitations can create substantial risks in many
practical market applications. As a simple example,
the output generated by a GAI model may seem smart, sophisticated, and
very convincing, but sometimes it is
just full of mistakes. Imagine that a company from the finance industry implemented generative AI in their
support channel, and it is generating very nice answers,
but with mistakes. That's a major
issue to consider. Therefore, it is essential for individuals as well as
for companies to be aware of those
limitations and act with more responsibility when
using those technologies. That's the objective
of this section. Let's start our
exploration. See you next.
30. S05L02 Prompt Sensitivity: Did you ever try to record
yourself with a microphone? When tuning the sound amplifier
for high sensitivity, it will amplify
even a small sound. We need to carefully balance the sensitivity to avoid picking up all kinds of
background noises. Secondly, we need to record in a quiet environment to
avoid getting echos. The input we feed, the
microphone is important. Any content creator knows
about those challenges. In GNAI the prompt is the
main input to the model, like the microphone when
recording our voice. That input is used to understand the requirements
and the overall context. Therefore, those models are highly sensitive to the
prompts they receive, like an amplifier tuned
with high sensitivity. There is a famous sentence
related to processing data. I imagine that you
heard about it. Garbage in, garbage out. In our context, the quality of the prompt has a major
influence on the output. If we fill it with garbage,
meaning low quality, less organized input, we
can expect to get garbage, meaning low quality output. It is as simple as that.
The bottom line means that the model was not able to fully understand
our requirements. That's the first challenge
when using generative AI. We must pay attention
to the quality of the input to maximize the
quality of the output. Silo to the concept of
pumped engineering. We mentioned that in
several lectures, pumped engineering is the
practice of crafting and optimizing prompts
to effectively interact and guide
generative AI models. It's something that
when using GEI system, we are learning how to apply all kinds of things to make
our pumps much better. This involves crafting
the right was a context, instruction to the prompt to achieve specific
type of outputs. As part of the unleashed the
power of generative section, we will talk about a couple of useful tips related to
prompt engineering.
31. S05L03 Knowledge Cutoff: Our next challenge of a GII model is called
knowledge cutoff. As the name implies, many models are trained
on data that was available up to a
certain point in time. This is the date after which the training
data used to develop the model does not include new information. It makes sense. We collect the required
training data, train a model, and that's it. Any new information
will not be part of the model that we
already trained. Any data produced after the training phase is not
part of the model knowledge. If I train the model
on date up to 2024, any data created later
on will not be included. As you can assume, this knowledge cutoff will limit the model's ability to provide
up to date information, up to date answers. As a simple example, a developer may ask a GEI
system to generate a piece of code to solve a specific task in a particular
programming language. It will generate that code
based on the trained model. However, if a new version of that programming language was released after the
model was trained, the code may be less accurate while using things that are not relevant anymore. For use cases that are based
on up to date knowledge, this limitation can be a
significant issue to consider. Think about the
financing industry where up to date information is critical for making
the right decisions. So what are the options to
handle a knowledge cutoff? There are two main strategies
to handle knowledge cutoff. The first one is to
update and retrain the model in repeated intervals using the most up to date data. Companies that retrain a
large foundation models will just release a new version
like Chachi PT version one, two, three, four, et cetera. Still, even by
using this option, there will be always
a knowledge cutoff between the updated intervals. If I update the model
every six months, still under that six months, it will have missing information until it will be retrained. For some application,
it may be good enough as a way to
mitigate that issue, but in other cases, it
will not be good enough. A company that decides to
train or tune its customer that has the flexibility to optimize the time
interval between updates. On the other end, if we are
using a third party solution, it is vital to be aware of the cut off date and what
are the updates intervals. The second option to handle knowledge cutoff is to
connect the model with external online tools like search engines and database that will close
the knowledge gap. With the most recent events
and up to date data. As you can imagine, such
GAI system are more complex to maintain and
will cost more resources. The big players like Google and Microsoft are already
using this approach because more and more
users are starting to use GNAI as the main door
to search online. It's an interesting trend. I assume that somewhere
in the future, many models will have a very short update frequency and they will be connected
to online tools. It is part of the
market competition to deliver better models. It means that the challenge of knowledge cutoff will be slowly mitigated
by the industry. Let's move to the next
challenge. See you next.
32. S05L04 It is not Deterministic: As part of the introduction
section about GenEI, we talked about the
concept of making the answers of a GEI model
a bit more creative. Less deterministic, trying
to simulate human thinking. It's like adding some
spice to the answer. It means that the
same prompt can produce different
responses depending on the model's internal state, and level of randomness configured in that
specific model. We may assume that it is a
limitation, but it's not. This less deterministic output is achieved by design,
not by mistake. Those systems use statistical
models to calculate probabilities for a range of possible next
word of phrases. As an example, if
the input prompt is, what are the best ten
ice cream flavors. The model will try to predict it by calculating
the probability of popular flavors like
vanilla, 95%, Chocol 80%. Chocolate chip 62, cookies
and creams 57, and so on. Those probability
numbers are not real. I created them to
explain that concept. As a vanilla is,
for example, 95%, there is a high chance
that the model will always select it as
the first flavor. Then it will select the next
flavor based on probability. In our case, chocolate
that has 80%. If we run that same input
prompt several times, we may get slightly
different list of flavors because the
model may choose, for example, strawberry
over salted caramel. Even if the probability
is a bit less, it is adding some
level of randomness to simulate less
deterministic answers. In some cases, this less
deterministic behavior creates new challenges. Think about the GAI system
that is supposed to answer questions about
legal and tax issues. Those are very sensitive stuff, and user will expect to see highly professional
and consistent answers to the questions. If one day a user will get a specific tax recommendation,
and in the next day, he or she will get a different recommendation
for the same question, they may not trust
the system anymore. On the other hand, if I'm
using the AI system to generate in brainstorm IDs
for a marketing campaign, it may be useful
to get different perspective on the same topic. What is the solution
for that challenge? Well, there is a way to
control and influence the creativity and
randomness of some models. It is called the temperature
hyperparameters, measured as a simple
float number 0-1. When integrating with a
GII system using APIs, the API request can include that specific
parameters as a number. If the number is low like 0.2, it is low temperature, meaning the model will generate more deterministic and
focused responses. Outputs are more likely to be
predictable and consistent. If the number is high like 0.8, it is a high temperature. The model generates more diverse
than creative responses. The output is less predictable. Adjusting these parameters,
the temperatures allows to influence the level
of randomness and creativity in the
models responses, helping to tailor the output to better fit
specific use cases.
33. S05L05 Structured Data: Hi and welcome. Our next
interesting challenge is related to structure data. Structure data is one of the most popular methods
to organize information. Think about a simple
spreadsheet that aggregates product
reviews on a website. There are multiple columns
like the product name, category date and time when
the review was provided, maybe information
about the person that provided that rating like
age, gender location. And finally, they
provided review score. It can be like a
number 1-5 stars. If I want to feed all
that information into a GAI system and ask to predict the review score of a specific product based on a person details, it
will be a challenge. You will be surprised
to hear that a typical GAI model is not the best choice for handling structured data like tabular
data in a spreadsheet. The main reason for
that limitation is because generative AI
models are primarily designed for handling
unstructured text data such as sentences
and paragraphs. They are trained to
capture patterns, context, and semantics in text. On the other end, tabular data
is structured in rows and columns with specific schema and relationship between
columns and rows. This structure is
fundamentally different from the linear sequential
nature of text data. Therefore, it is less intuitive
for GeneI models that are mainly trained on text
which is unstructured. Another thing to take into
account is the context window. Generative models have a
limited context window, meaning they can only consider a certain amount
of text as input. Feeding a large table
with many rows and columns may cross over
the context window, and in that case, can lead to incomplete or
inaccurate responses. The model is not able to
digest a complete table. One approach to handle that limitation is to
use a hybrid solution, meaning use a specialized
tabular model as a pre processing
step that will convert the tabular data into a format that generative
models can better understand. This is just one example. As a quick summary, we
should be very mindful when trying to feed the GAI
system with tabular data. In many cases, it makes
more sense to use different AI methods or
solutions instead of using GAI. Remember, GAI is optimized
to handle unstructured data.
34. S05L06 Hallucinations: Our next topic may sound
a little bit strange. Generative AI models may generate information that
is incorrect or misleading. This strange limitation
is called hallucination. The model is making things
up using fabricated facts. The problem is that it is making things up in a very confident, organized and convincing way, like a great politician doing
a cross country campaign. This can mislead users into thinking that this
is a true baseline. I experienced that issue
several times while using GNII for
different use cases. The question is why such
strange behavior is happening? Well, it may happen
for many reasons. Insufficient training data
if the model hasn't been exposed to a wide
enough variety of data, data quality, Data that may includes incorrect or
misleading information. Lack of up to date data. That's the knowledge cut
off we discussed earlier. The model's knowledge
is based on training data up to a
certain point of time. It cannot access real
time information or recent developments leading to outdated or incorrect
responses, and more. This strange behavior can impact the reputation of a GAI
system and it will be harder to trust the output of such models and leverage them safely in a
production environment. It's a significant challenge. The good news is
that large companies like the big players are improving their foundation
models to minimize and mitigate such
issues and making those models more reliable and more safer to be used
in production environment. Still, they are not perfect and users still need to be
aware of such limitations. Not every piece of
information we get from a GAI system should be
considered as a true baseline.
35. S05L07 Lack of Common Sense: Another interesting
dimensions to consider regarding GI
is to understand that those solutions are very sophisticated pattern
recognition systems. That's it. In many cases, they will lack the common sense that is expected from
the average human being. Humans are using
common sense that is based on personal
experience and knowledge. AI models generate
responses based on statistical patterns rather
than intuitive understanding. I think that's an
important marking point between humans and machines. If you ask a person to help
you to break and open a car, he or she will try to understand the overall context and decide if it makes sense
to help you do it. Maybe you saw a child that
was forgotten in that car. It is a complex scenario
and a human being can analyze that situation
for many directions. On the other end, AI models possess language based on
statistical associations. They generate text by predicting what comes next in a sequence, which is not optimized for
complex deeper understanding of real life scenario while considering many
things in parallel. Training GAI systems to consider such complex
scenarios is a huge challenge. Adding common sense
to machines is how. Companies are making
progress by putting all kinds of safety
protocols and logic, but it's still a
major challenge. Maybe in the future, your AI based washing
machine will have a common sense until then try to separate the colors
of clothes by yourself.
36. S05L08 Bias and Fairness: Let me ask you something. Do
you think that everything that is written in a Wikipedia
is true and accurate? Well, that's an
interesting question. Wikipedia is widely used
as a valuable resource. I'm using that and many
people are lending on Wikipedia page while searching for some terms in
search engines. It's a very popular
organic result. Still, we need to
approach the content of any website with
a critical mindset. Wikipedia is the
aggregate result of many people who created
and adjusted the content. Wikipedia allows
updating content by a wide range of contributors. Each person or contributors has a unique perspective that may be biased in a
specific direction. A group of people may have an agenda about something that they would like to promote, even if it's not fully aligned
with the real situation. There is also a risk of vandalisms or intentional
misinformation. That's just Wikipedia. What about the rest
of the Internet? With billions of
articles and websites that reflect many biases
that exist in society. There are many people and many different opinions
on many subjects. Now, when we train a GAI
system with such content, the modern knowledge will
include a variety of biases on different topics which
may cause the model to generate outputs that
are maybe unfair, unethical, or even misleading. That's a huge challenge
for companies that develop GII models because they need to minimize it as
much as possible. Secondly, it's a
huge challenge for companies using
those GAI models. It can expose them to compliance challenges
while they are obligated by law to fulfill certain legal
and ethical standards. Again, the good news is that large language models
providers are investing huge resources to make sure these models are becoming
safer and less biased. They are doing that by using diverse and more representative
data for training. In some cases, they employ
all kind of tools to identify biases within the
training data as well as in the output
of the models using all kind of algorithms that can reduce such kind of issues. They will try to monitor
the model performance as an ongoing basis to identify
and address those issues. And of course, based
on that input, perform all kind of
ongoing updates to improve the model and to
reduce those biases over time. It is a constant
work in progress.
37. S05L09 Data Privacy, Security, and Misuse: Data privacy and security is another major concern to consider in the context
of GAI systems. Let's talk about a few examples. The first issue is
related to data leakage. Think about the situation in
which your company is using a variety of third party
GAI solutions like HRGPT, Google Jiminy or Microsoft Co Pilot,
running in the Cloud, so people working
in that company can leverage those tools
for their daily work. That's a typical
use case, right? What if some users are
using a third party tool? And providing as input a prompt with a very
sensitive information, like a list of sales deals
and revenue or a piece of highly sophisticated
code that was developed by
the R&D department. There is a huge potential for data leakage while users are trying to
leverage those tools. To mitigate those
security risks, companies should develop
a holistic end to end strategy like using more safer
data handling practices, educating end users about the usage of those
systems and more. Another important
dimension of GEI is the going risk that it will
be used by bad players. That's the reality,
and it's going to become more challenging
in the future. Those models can be used to
create this information, generate a deep fake identities, generate sophisticated
cyber attacks, and more. Just think that it is
becoming harder to identify the credibility of images and videos, we can see. Everything can be generated by AI in a very convincing way. That's going to
dramatically change how people rely on
digital information. It's part of the game, and we should be aware
of those evolving risks.
38. S05L10 Summary: Hi, and welcome back. I hope
that I didn't scare you too much about the challenges and limitation of generative AI. That's part of the
game when the market is starting to use
a new technology, as more market
experience is gained, those challenges will
be better mitigated. I assume that five, ten years from now,
some of them will disappear and maybe
new ones will emerge. Let's quickly review
them one by one. We started with
prompt sensitivity, meaning the input has a
direct impact on the output, which makes a lot of sense. We need to be mindful
when crafting prompts. The next challenge is
the knowledge cutoff. Most of the models are trained
up to a specific date. An event or data created after that date is not part
of the model knowledge. We need to be aware of that date and the capabilities
of the model to handle it. Some of them are updated
in repeated intervals and some are closing the
gap by using online tools. Then we talked about the
less deterministic nature of GEI models. It is embedded in their design to better
simulate human creativity, which is useful for
many use cases. On the other end, some use cases will require more
deterministic behavior. In that case, there
is an option in some models to tune that
level of randomness. It's called setting
the temperature. I also mentioned that GAI models are trained to
handle unstructured data. Therefore, we need to
be very mindful when trying to feed the GAI
solution with tabular data. It may cause strange
unpredictable behavior. Moving next, we talked about the challenge that GAI models may generate information that
is incorrect or fabricated. It is called hallucination. When the model is
making things up, we should always remember that not every piece
of information we get from a GAI system should be considered
as the true baseline. Another challenge is the
lack of common sense. Common sense is the typical
capability of human beings, and it's very hard to
simulate it with machines. Machines can be manipulated
by sophisticated prompts, and we have a big challenge to understand complex
real life situations. The next challenge is a big
headache for many companies. I'm talking about the
ethical and bias issues. When training a GAI system with a variety of data sources, the modern knowledge will include biases on
different topics, which may cause the
model to generate outputs that are unfair,
unethical, or misleading. And the last one was about data privacy,
security and misuse. I mainly want to emphasize
the risk of data leakage. When using third
party GAI solutions, we need to be more mindful on the data we're
using as proms, reducing the risk of exposing
sensitive information. If the model is controlled by your company, that's
a different story. It's a case by case situation. That's a quick summary
of this section. Moving next, we will dive into practical use cases of GAI. See you next. F.
39. S06L01 Introduction: Hi and welcome back. We covered a
substantial numbers of topics in terms about machine
learning and generative AI. I'm excited to
start this section, where we'll build
on that knowledge to explore the practical
application of generative AI and
market use cases that are shaping
many industries. We are going to see how
it's possible to leverage those capabilities to boost efficiency, creativity
and innovation. To set the stage and
manage the expectations, this section is not
about presenting a long list of AI tools and
how to use those tools. The AI landscape is
constantly evolving with hundreds and thousands of tools available for different
use cases and industries. Instead, we'll concentrate on the key use cases where this
technology can be used, ensuring that the
insights will be applicable across a
wide range of AI tools. Eventually, you or your
company will select the best AI tools to fulfill
specific requirements. All right, let's
start to explore the power of generative AI, see you in the next lecture.
40. S06L02 Text Image Video Audio Generation: Generative VI is
optimized to digest text as input and
analyzed the structure, patterns and context
in natural language. It's a very powerful
capabilities. That's the input. What
about the output? Well, there are a
couple of content types that can be generated
by GAI systems. The first and most
obvious one is text. The model type will be
called a text to text model. Text is a very broad format that can hold a variety
of different structures. Such models can generate many things like an
answer to questions. A list of Ds, emails,
articles, stories, reports, scripts,
programming code, and more. One of the most
popular application of generative VI
is the options to synthesize and generate images
based on text as inputs. Text to image models. Describe using a prompt
the required image, which objects
should be included, color patterns,
texture, et cetera. Using this text prompt, the AI system will
synthesize a new image. Such new capabilities can
empower many people that are lacking the skills to
create artistic content. It's part of
simplifying access to creative tools for
a larger audience. For content creator like me, such a text to
image models enable the rapid generation of visual elements for many
day to day use cases. For example, if I need
a specific picture for a new blog that I'm
planning to publish, I have another tool
in my toolbox. I can use such GAI
tools and generate an image based on my
specific requirements. I can articulate the required
image based on text related to my blog content without having extensive
design skills. What about creating
a short video clip? That's another interesting
direction for applying GAI. Generating a video from text is much more complex than
generating images. As a result, it is not
developed as image generation, but things are
progressing very rapidly. We can use such
tools to describe a more complex
scenario and it will generate a group of images grouped as a sequence
to create movement, creating a video clip. Those are text to video models. Big players like
Google, Open AI, and many other companies are exploring this
interesting space, and we can assume that production ready solutions
will be available soon. Maybe by the time you
watch this training, it will be more mature. It's how to estimate, but I think it's going
to happen very soon. As an example, think about a short marketing
video on some product that will cost a
fraction of the cost and time compared to
manually creating that clip. It may help many
small businesses with less heavy pockets like the big players to generate
marketing content. And the last one is
about generating audio, meaning using text
to audio models. It involves creating
audio content such as speech or sound effects
from textual descriptions. We can divide them into
several subdomains. We have text to speech. That's the technology to convert written text into spoken audio. This one of the most
developed application in text to audio with a focus on producing
natural sounding human speech. It is getting better. However, it is still a huge
challenge to synthesize speech with specific
characteristics such as different accents, emotions, and speaking styles. Some content creators
on YouTubes, for example, are using text to speech technologies
in their content. I'm personally not so keen
to watch such contents. It feels unnatural and unreal like someone is cheating while using
this technology. I guess in a couple of years, it may be more acceptable
and get into the mainstream. Another popular example
is the gaming industry, using text to speech to enhance player experience and
streamline development, like a real time
generation of voiceovers based on player interaction
or game events. By the way, it's a big headache in the entertainment industry where it is now possible to synthesize the voice
of any popular actor. It is creating challenges related to intellectual
properties. Next is text to sound effects, meaning generating sound
effects based on descriptions. This is useful for
application like video games, movies, and virtual reality. The last one is text to music, creating music compositions, based on textual descriptions. Such models can generate
complete songs, melodies and all
kinds of things. All those options are
rapidly evolving, expanding the possibilities
for creating and interacting with audio
content based on text input. As a quick summary, we
talked about text to text, text to image, text to
video, and text to audio. There will be also Sab use cases like maybe text to animation, which is a subset of text to
video or maybe text to code, which is related to
text to text and more. Let's move on and talk about
the two main methods to consume and work with JNAI
system. See you next.
41. S06L03 Web Based vs Application Based: Hi and welcome. In
the previous lecture, we talked about the main
types of content that can be generated using
generative AI like text, image, video and audio. As a result, there is a
growing amount of off the shelf GNAI solutions
for different use cases. The question is how
they can be used. There are two main options
to consume NAI solutions. The first one is called
a web based application. It means that a company
encapsulated a GNI solution in a simple web tool a great
example is a GII chatbot. A web based chatbot is an automated software
application designed to simulate human conversation and interact with user
through a web interface. The most popular examples are HGPT, Google,
GMI, Microsoft, copilot, and maybe new tools that will be available after
recording this training. Those tools are simple to use. They can handle a wide variety of tasks and therefore are
becoming very popular. This is the first option
to use GII models. It's a perfect solution
for consumers. The second option is called
software based applications. It is related to
organizations that would like to improve their software
applications by using GAI. Many GAI capabilities can be embedded as small moduls in a larger software
applications. For example, a user just added a product review on
the Amazon website. The Amazon website
is a combination of many software moduls that are connected to create an end
to end shopping experience. In that context, a GAI model
can take the product review provided as text and classified as positive or negative
review. That's the first. Next, it can extract
the key takeaway from the text to use
it as a feedback. It can decide how to route this review to the
relevant department, like a product of both
marketing and sales. This GAI module is embedded inside a larger
software application, meaning the Amazon website. Now, how do software models
communicate with each other? Well, using APIs, API stands for application
programming interface. It is a set of rules
and protocols that allows different
software application to communicate with each other. A GenaI model will have one or more APIs that are used as the interface
to exchange data. It will have an interface to
get a prompt as input and maybe a few
additional parameters and also an interface
for getting the output. That's how developers
will integrate such capabilities into their
software applications. As a quick summary, I mentioned two main options to
consume GAI solutions. The first one is a
simple web interface that can be used by anyone, and the second one is
related to using GAI as a model in a larger
application based on APIs. Let's start to review the key
use cases of generative VI, see you in the next lecture.
42. S06L04 Use Case Brainstorm Assistant: Hi, and welcome back. One of the most useful use cases that
I experienced while using GNAI is the ability to have a personal brainstorming
partner to generate IDs. At any given time, I can open one or more popular
chatbots and ask some questions that can help me to brainstorm additional IDs, additional directions, additional perspective
to consider. I'm using that capability for a variety of
brainstorming sessions. It's not perfect,
but in many cases, it is a great starting point
to give me some direction. Just keep in mind that it's not a replacement for brainstorming
with real humans. It is just another
tool in your toolbox. As a simple example,
let's assume that I'm planning to create a
new blog on a website. As a first step, I will ask a GNAI system to generate a ten, 15 IDs for a block title. I will create an input
prompt that describe the main block
objectives and what is the preferred title type or structure that I
would like to use? The first tool I
would like to use is the famous chat GPT
created by OpenAI. Let's insert this
input as a prompt. Generate 15 IDs
for a blog title. The blog is about method to optimize a knowledge
base on websites. The title should be engaging
and up to seven words. Here we go. After a few
seconds, I'm getting that list. Now, even if none of the
titles match my mindset, I can still pick up all
kind of useful from here, like supercharge,
transform, et cetera. Another option if the output does not match my expectations, I may find tune the requirements
in multiple iteration, like a ping pong session, I can find you the
output by providing a new prompt like
narrow the list to five best IDs that can be more useful for search
engines optimizations. And I'm getting a new result. Finally, I will take those IDs, select the best
matching one or two, and in some cases, adjust
and improve one of them manually adding
my own personal touch. That's just a small example, but it shows how powerful this brainstorming
assistant can be. Product managers can use
it to brainstorm IDs for product features
based on market trends. A marketing department
can use it to brainstorm IDs for
marketing campaigns, including slogans, taglines
and promotional strategies. There are so many use
cases that it can be used. As a small tip, when
writing a pump, try to provide enough context, relevant background or any
specific requirements. Don't assume that the GII
system has that information. If you don't provide it, the system will use
generic assumptions, and then the output
will be more generic. The second thing to remember is that it is an iterative process. You can keep refining and tuning your pumped until getting
the required output. We can find endless
opportunities for that use case. It is one of the low
hanging foods for GAI. Nothing special is
needed to use it, and I encourage you to add
it to your daily toolbox. Great. Let's move to
the next use case.
43. S06L05 Use Case Summarization: The next useful use case of GNAI is the ability
to summarize text. In that case, the
LLL model is used as a logical reasoning engine
instead of creating content. We can provide a long text from some article as an example, and ask to summarize
that text in a specific way like defining the number of
paragraphs or pages, bullet points, and more. And to be honest, it can do a pretty well and
impressive job. It can understand context, identify key points in the text, and then produce a concise, structured summary that can nicely capture the essence
of the original text. Is that a replacement for
reading a complete article? No. We need to remember
that it is not perfect. It can miss some
complex nuance in the text or maybe drop
important parts or maybe create two generic output
while omitting specific critical
information that is essential for a complete
understanding of the text. The quality and accuracy of the model output are based on the specific model
capabilities being used. A more powerful
model means it has better logical reasoning
capabilities to handle a text. As a small tip, I'm using more than one GNAI tool in
parallel for some tasks, so I will be able
to compare between them and sometimes
combine the output. Different models will
generate different summaries. One major limitation to consider is related to
the number of tokens. A GAI model will have a maximum token limit for the
combined input and output. I cannot provide 100 pages as
a complete book. As input. If the input text is too long, the model will not process the entire text leading to incomplete or
inaccurate summaries. In that case, we can
consider breaking the original text
into smaller elements that fits within
the token limits. If this is a book,
we can consider using the chapters
to break it into smaller chunks and then
summarize that per chapter. Let's move into the next
use case. See you next.
44. S06L06 Use Case – Text Enhancement: As you probably know, many great software
tools can be used to correct grammar mistakes and enhance the text structure. Those tools can be embedded as extensions in our
operating system, web browser or a
text editor I quote. Those tools use JNAI as the engine to analyze the text and provide
suggestions and corrections. The downside is that
those tools cost money. We need to pay for monthly or
maybe yearly subscriptions. If I'm creating content
on a regular basis, that's probably a
good investment. However, if I need it for a one time project or to use
it with a lower frequency, it may be less attractive. A nice alternative is to use those popular chat
board GEI tools. Many offer the
options to use them for free with some
limited capabilities. I hope it will not
change in the future. We can basically copy paste the text we
want to enhance as an input prompt and
explain what we need like fixing
grammar mistakes, making the content of
the text more exciting, changing the flow of
topics and so on. Those are very powerful
and versatile tools that we can control by
providing the relevant prompt. I'm using this option
for a variety of content that I'm creating like
writing content for lectures, blogs, articles, product
description, and more. My method is to use text enhancement
at a later stage when I have almost
a final draft, but it's completely up to you. The reason is that I want to maximize my personal
human touch, making sure that my mindset influence most of the content. Otherwise, I may
get content that has a high portion
created by AI. Secondly, remember the sentence, garbage in garbage out. If you provide as an input a very early draft of your text, then most probably the
output will be too generic. All right, let's move
to the next use case.
45. S06L07 Use Case Code Generation: The next interesting and
popular use case for a GII system is code generation. I would like to emphasize that it's not just
for pew developers, let me explain that concept. There is a growing number of
high tech jobs that require basic programming capabilities in different
programming languages. It is becoming useful, even if it's not the
mainstream objective of specific job. Some level of
flexibility is required. The capability to generate code using GEI is making programming more accessible and
more tangible options for people who are
not pew developers. They don't have the time
or resources to spend two years to learn a specific
programming language. Sometimes they just
want to get the job done without drilling down
to every line of code. As an example, maybe someone
is a data analyst and SQL will be useful for extracting data from
different data sources. If you're not familiar
with databases, SQL is the most
popular language to store and extract
structured data. If this person is
not using SQL daily, it will be hard to maintain knowledge and experience
to quickly use it. It may take a substantial
setup time to remember the syntax and find the right solution while
considering several options. That's a sweet point
for a GAI system. This data analyst can use it
to provide the structure and syntax of the best matching
SQR query for a specific job. Maybe the provided answer
will not be perfect, but it's a great starting
point to manuate unit. It is a great framework to quickly come up with a
couple of solutions. I would like to
share my personal experience as an example. Don't tell anyone, let's
keep it between us. I'm not a developer, but I have some background in
computer science due to my experience in practical
data science projects and all kinds of
other site projects. Now, I don't know if
you are aware about it, but WordPress is one of the biggest open source
frameworks for creating websites. Millions of websites
are based on WordPress. As part of my
entrepreneurial spirit, I decided to create
a software program, which is a WordPress plugin. A WordPress plugin is
a piece of software that extends the pu
Awodpress core capabilities. There are many plug ins
for different use cases. Now, it was a site project and I decided to build
it from scratch, building my knowledge
step by step. The challenge was that I did
not have enough knowledge or experience in web based
development languages. I guess it sounds a little bit
crazy and I totally agree. Still, I decided to
go with that project. As part of the preparation, I learned the basic syntax
for a couple of languages, and then I divided
this project into little more manageable pieces each time I handle a specific
piece of the puzzle. As you may guess, I used JNAI to help me
come up with IDs, required code syntax, examples of best
practices and more. It was very useful, and I can share with you that
I'm not sure I could handle that project effectively
without leveraging JNAI. It helped me to speed up
the development stage. Now, is that a perfect
tool for any job? No. It is just another
tool in our toolbox. For example, during
the development phase, a new version of WordPress was introduced with
new capabilities. That's the basic
nature of software. Those capabilities can
impact the way you develop a plugin as part of
the WordPress ecosystem. However, the GeneI solution that I used had a knowledge cut off. It was trained until a specific
certain point in time, and it was missing up to
date new capabilities. It took me some time to
understand those limitations. My prompt about the
question that I have was clear like the blue sky on a nice day and still
it was not working. I was not sure why I'm not getting the
required output and why the GAI system is generating unrealistic solutions
based on my prompt. It has a knowledge
cutoff and I could not wait a couple of months until this model
is updated or not. I close that gap by using
the good old Internet, meaning using search engines, blogs, articles, specific
forums, books, and more. Another thing to consider is the complexity of the code
you would like to generate. A typical software application is based on multiple modules, multiple layers
that interact with each other to create an
end to end solution. It is a complex architecture
with many moving parts. If I will try to
explain a GAI model, the requirements of a
complex application, it will need hundreds and
even thousands of lines as an input prompt to explain all the features and functionalities
of that application. Now, a typical GI tool will
not be able to process such level of complexity and generate an end to
end software tools. That's not going to happen, at least in the near future. We need to break the
required application into little pieces and then use GEI to handle
one piece at a time. When using this approach, we increase the
probability to get more focused and useful
output from a GAI tool. The last thing I would like
to share is pure golden tip. When handling more
complex projects, sometimes it is
much better to ask a real developer or real expert directly or
indirectly using a form. Don't try to rely too much
on GAI for cogeneration. It can take you forward
up to some point. One important thing I would
like to mention is that the ability to generate code is becoming an integrated
capability in software applications that are used for software development. They are called ID, integrated
development environments. Those tools use GAI and other machine learning
capabilities to suggest code snippet and code
completion generate boilerplate code as an example or maybe write a
complete function or module about something. Those AI driven
capabilities have a dramatic impact
on the speed of developing and testing
software applications. Great. That's about
co generation. Let's move to the next
use case. See you next.
46. S06L08 Use Case – Content as a Framework: Hi and welcome. We covered some key use cases of GEI
like brainstorming IDs, creating a summary
of a long text, and generating a piece of code. That's great. But GNAI can
do a little more than that. We can use GenEI to write
more comprehensive content, like a blog on some
topic, a story, an article, a script, a support answer to a customer, a press release, a post, a product description, and more. Let me show you a quick example. I will ask a Google
Gemini to generate a blog about best practices to optimize performance
for websites. The blog will be divided
into six main topics. Each topic will be
around half a page long. That's the input. Here's the generated blog. It's a very impressive output. Now, it is very tempting to copy paste and pose that
synthesized blog. It looks professional nicely structured with a good
selection of words. Is that my blog? No. Nothing related to my
writing style or ticking poses. Can anyone else generate the
same content using the tool? Yes, of course. It's not unique. It is a generic content. I don't think it is professional
or ethical to publish content that was purely
generated by GAI Okay, listen, on my perspective, we may assume that search
engines are not smart enough or the
end user will not notice what is the
source of the content. But eventually, they
will figure out that this blog is completely
generated by an AI solution. There is no soul
to that content. It is missing the human touch. Therefore, I would like to
emphasize an important point. In the context of generating more complex text like
a blog or an article, I suggest using NAI as a starting framework
for a draft version. We should avoid using
it as the final output. In our example, I will take
that blog content as a draft, read every sentence and create a new version with
many adaptations, remove things that
are less relevant and add things that
are more important. I will also change
the writing style to match my personal
perspective, remove too much fancy was that I will probably
never use, and more. Try to make the
content your content. I think you got my point. Let's move to the next lecture.
47. S06L09 Use Case – Images on Demand: Hi and welcome back.
Until this point, we talked about the
ability to generate text for many useful tasks. However, NAI is much
more than that. We talked about the
capabilities to generate other types of
content like images, video clips, and audio. It is quite an
amazing direction, and as you can imagine, that's going to reshape the
art and design industries. It is now possible to generate interesting visual and audio
elements by providing text. It simplified the
usage of tools and the speed to get visual
or audio content. I'm heavily using images for different content
that I'm creating. It can be inside
the presentation, a cross lending page, a published blog, a report
about something, and more. Usually, I will go to some
free or paid websites that are providing high quality images and start to search
using keywords. Sometimes it's a
quick process and sometimes it's very
long and slow. This may happen because it
is taking me time to find the specific image that can fully articulate the message
I would like to deliver. You remember the sentence, a picture is worth
thousands words. That's another interesting
sweet point of GNAI. Instead of searching hundreds
of images using keywords, let's describe what is needed and the GAI
system will generate it. We can call it generating
images on demand. Let's see a simple example. I will provide a prompt, generate an image of
a racing car from the future based on a color
scale between blue and black. The car should take around 30% of the picture. Here we go. Let's tune it a little bit. Reduce the color size by x percent and change
the colo scale two between red and black. Amazing. I recently started
to use image generation, and it's making my life easier. The process of creating
images based on my specific requirement
is just mind blowing, and it saved me a lot of time. As always, it is just
another tool in my toolbox, and I'm still heavily
using regular images. I still like to use
realistic images. It is not a replacement. It is another option.
48. S06L10 Use Case – Boosting AI Based Apps: Hi and welcome.
All the use cases we covered so far seems like practical options
for individuals like you and me and millions
of people around the globe. Each one of us can
access the power host of a GAI engine using a simple
web based interface. Just type a prompt and
get the required output. However, this is a
small fraction of the possibilities of using GAI let's zoom out and talk
about the business world. There are a variety of business domains, finance,
transportation, healthcare, technology, manufacturing,
retail, energy, education, construction, telecommunication,
entertainment, and more. Each business domain
has a long list of processes and workflows being used to run the
business functions, functions like marketing, sales, operation, finance, human
resource, and more. For example, in
the retail domain, let's take the
process of selecting a product online and
performing a purchase. It is based on a workflow, a chain of steps that are handled by a variety
of software tools. As you can imagine, each
business workflow, each process, and each step are candidates for integrating
generative AI modules as part of a larger application. GNAI can be used to boost
many AI based applications. Back to our example, when a customer
select a product, a JAI module will
try to recommend additional products
and services for that specific customer
based on the selection. It is embedded and integrated as part of
that end to end workflow. That's just a small example
in the retail domain. The amount of possible
use cases for those integrated applications
with generative AI is huge, and we will see a growing
number of companies and businesses that are trying to implement GAI in
different places. That's going to be the main
direction for generative AI. There is a huge potential
for business innovation, and as we know, nobody
would like to stay behind. The races on, I assume
that in the upcoming five, ten years, many
companies will develop different strategies to
integrate and leverage GNAI.
49. S06L11 Best Practices for Prompts: Hi and welcome. We covered
many use cases of GEI. All of them are based on
using a text pumped as input. The Pompt is the main entry
point or interface for getting the required
content from GAI systems. As a reminder, the pumped
can be used directly in web based GNAI solutions like
hA GPT or Google Gemini, or as an API request when integrated as part of
a loger application. It is still based on
providing a pumpt. If the prompt is so
important and it has a direct impact on the
quality of the output, we better invest a little bit to design and optimize our prompts. We already mentioned
that. It's called prompt Engineering
Engineering better prompts. In this lecture, let's
review a couple of simple tips related to
prompt engineering. Number one, be
specific and clear. The first one is to clearly define what you want
the AI to generate. Try to be specific and clear as much as possible and
state your expectations. It is important to
construct a pmt that describes the required task in title while providing
relevant background, specific assumption
and requirements. Number two, use
contextual information. The second tip is about how much it's
important to provide context that helps the model understand the scenario
or background. We need to consider what will be sufficient background information
to complete the task. If I'm asking a model to create a short story
about a topic, then contextual information will be to define the
main characters, explain the situation, and set the overall theme or
mood of that story. Another example is specifying the target audience for the content you would
like to generate. For example, write an article
about data privacy for IT professionals or explain the concept of saving money
to a 10-years-old child. By providing the context, the GAI can better
tune the response. Moving to the next tip, define the scope and
boundaries of a task. Let's assume I would
like to get a list of questions for a quiz
related to some text. I will define the scope of
the task like generating ten questions for each question generating up to four to
five possible answers. One question should be
based on yes or no answers. That's examples of
defining the scope. Another thing to consider,
ask for multiple options. If you're exploring
different possibilities, ask the AI to generate
multiple version or options. For example, provide
three different slogans for a new eco friendly
food product. Specify whether
you need a short, concise answer or a longer
more detailed response. For instance, write a
two paragraph summary or give a one sentence answer. Okay? That's all
kinds of examples. Our next tip is related to the prompt and response
structure and format. Try to organize your
input prompt in a logical structure like explaining a human
being about something. Try to specify the
desired format. What do you want to get? You would like to get
text, code or image. If you need the output in a particular format
like a bullet points, make sure to include
that in your pump. It is also useful to provide examples to illustrate
the desired output. This helps the model
understand the format, tune, and level of details
you're expecting to get. In many cases, it is
even much quicker way of providing extra
content and writing a comprehensive explanation
of the desired result. Just give it an
example. Number five, avoid confidential information. I already mentioned that as part of the section about challenges, but it's good time to
emphasize that again. When using a third
party GII model, we should always be
careful and more mindful about what kind of data we provide as
part of the prompt. Avoiding providing any confidential information,
as part of the prompt. Number six, request for
simplification and clarification. If you need the GII system
to explain complex concept, ask it to simplify or
clarify the information. For example, explain
quantum computing in simple terms for a beginner. Number seven, ask the
model to consider different viewpoints
or alternative to generate more comprehensive
and diverse response. For example, describe
the benefits and drawbacks of
working from home. So this is two different
perspective on the same topic. Number eight is
highly important. Break down complex tasks. Most GII models will
not be able to digest, understand, and
generate good responses for a very complex task. As you recall, when
I talked about generating code for
a complex project, it is more practical
and useful to break a complex task down
into smaller tasks. This approach allow us to better zoom in on
a specific task, generate a good prompt
to explain what is needed and increase the
quality of the output. Anytime you have a
complex task to handle, try to break it down first
and then utilize AI tools. Next one is the last
one, iterate and refine. In a typical conversation
between two people, where one person asks something with a group
of sequential questions, each question is used to refine
the original question so the other person can fully understand the requirements
and overall context. The same concept should be used while using
a GEI solution, especially when you are using
a solution like CharPT, we can ask something with a group of sequential iteration. Sometimes our first
pumped will not be perfect to get the
required response. It is based on experience while
working with those tools. In most cases, we will
need a multi step process. In each step, we'll try to guide the model to provide
the required output. This is tip number
nine. All right. Those are the key tips
I wanted to share. Thanks for watching so far. Let's summarize this section.
50. S06L12 Summary: Hi and welcome. Let's quickly summarize the key takeaways
from this section. We started by reviewing
the main types of content that GenEI models can
generate like text, image, video and audio. Each content type is
a complete category that can be broken down
into a variety of formats. We also talked about
the two main options to consume and use GEI models. The first option is
web based like hAGEPT. That's the most popular option for consumers like you and me. The second option is
application based, meaning using GEI models
inside other applications. That's going to be a
huge innovation way for businesses and
organizations. Moving next, we reviewed the most typical use cases for leveraged GEI in our daily work. Those are the low hanging foods of that technology that
can be used by anyone. The first one is to use it as a personal brainstorming
assistant, meaning generating IDs,
thinking direction, new perspectives, et cetera. The next use case is the
ability to summarize text. It can understand context, identify key points in a text, and then produce an eye summary. It's a great way to speed up the work of analyzing
some long articles. We can also use GAI models
to enhance an existing text, which is more related when we would like to
generate new content. Just copy paste the
text we would like to enhance as an input prompt
and explain what we need, like making the content
of the text more exciting or changing
the flow of topics. Okay? There are many
things that we can ask the GAI all kind of way
to enhance the text. Next one was about
code generation, which is becoming a very
popular use case by making programming a more
accessible option for a wider audience. Sometimes we just
want to get the job done without drilling down
to every line of code, and we need some examples
to quickly remember the syntax of a particular
programming language. The next one is more
heavy usage of JAI by using it to generate full
content about something. I decided to use the name
content as a framework to emphasize that it's
not supposed to be replacement for
real human creativity. I would be used as a draft, a framework that will be used to develop
something unique, useful, and less generic. I also mentioned the
interesting use case to generate images from
a text description. I decided to call it
generating image on demand. That's something
that can be very useful for content creators. Instead of searching for a specific image
based on keywords, we can describe our requirements and get a unique
synthesized image. The last use case is
probably the biggest one. I'm talking about boosting AI
based application with GAI. It means that the GAI model is embedded in a larger
enterprise level application. The market potential
to integrate GAI in almost any
business domain is huge, as there are millions of applications that
can leverage NAI. The last lecture was used to review some best practices for crafting more effective prompts when working with GAI models. We should be specific and
clear with the required task, provide any relevant
backcount information. If applicable, try
to scope the task, define the expected structure
and format of the response. Avoid sharing any
confidential information. Try to break down a complex task into several simple sub tasks. Finally, iterate and refine our requirements with
sequential prompts. That's all for this
section. See you next.
51. S07L01 Let's Recap: Hi and welcome to
our last section. Our training is almost
at the final stage. Thanks for watching so far. I hope it was interesting, as well as useful for you. At this point, I would like
to recap the key terms and topics we covered
while trying to make it more of an
end to end story, so it will be
easier to remember. We started by defining AI, artificial intelligence
as the human desire to create a digital brain and
mimic human intelligence, so machines can perform
more and more complex task. As a result, AI is not limited to a specific
group of domains. It is a general
purpose technology that can be used
almost anywhere. Over the evolution of
different technologies and their impact on
the AI landscape, machines have been
increasingly sophisticated. However, something was missing. Machines were missing
the basic capability to learn and improve, which is a foundation capability
of human intelligence. That's where machine
learning algorithms were able to boost AI
into new frontiers. Instead of building a
software program that is preprogrammed with fixed
rules and knowledge, it is possible to create
a system that can dynamically digest and
learn patterns from data. Now, to be able to digest and handle more complex patterns, deep learning was
introduced while using artificial neural
networks that are inspired by the human
brain using layers of interconnected networks
for a long period, those machine
learning methods were focused on specific
tasks like a prediction, classification and
clustering of data while doing a pretty good
job generative AI added the important
capability to analyze text as a language and to
generate creative content. That's a breakthrough
in the AI industry, making us a step further while trying to
mimic human intelligence. After building the main
pieces of the AI puzzle, we move to the next section as a soft introduction to the key
terms in machine learning. Machine learning
is the foundation of all those amazing
technologies. We talked about using the input and output
books illustration to describe ML solutions. Those solutions can be
divided into four categories, prediction, classification, clustering, and
content generation. Any machine learning box must be trained to perform
a specific job, and that's part of
the training phase. Doing that phase, an algorithm will consume training
data and use it to optimize the trained
model parameters to better map the input of the box to the
output of the box. The data going inside of an ML box can be divided into
several main data types, structure, unstructured
and semi structured data. When zooming on the data
input of an ML box, we will find features. Features are the small
elements of the data input. The selection and
transformation of features is a critical step to make sure the model is getting
the right data. We also mentioned the main
types used for training model, supervised learning,
unsupervised learning, and reinforcement learning. All of them are useful
for different use cases. As soon as we managed to create a good knowledge
of machine learning, we move to breaking down the generative AI concept
into small pieces. We started with the
main building block of any GNAI solution, the artificial neural network created using deep learning. It is the internal structure to hold the knowledge
of the model. As part of the evolution of different deep
learning architectures to build ural networks, the transformer architecture
was introduced, adding the capability
to process data in parallel instead
of sequentially, speeding up the
training process. The transformer architecture
was a perfect solution to train complex models based
on huge amount of data. However, building
complex models require substantial computing resources
with high price tags. It is an expensive resource
intensive project. As a result, it is a playground for the big
players, not small companies. Those players can leverage
the resources to train large models and make
them available to the public based on
different services. And that's the concept
of foundation models. A foundation model is a
generic model that can be adapted and tuned to
a wide range of tasks. One of the most popular types of foundation model is the LLM. Large language models. LLMs are the core
capabilities of generating VI to handle text as
input and output. The input of the LLM is
called the prompt and it is boking down into small
elements called tokens. Tokens are numbers used to map the text into a
numerical format. The number of used tokens is measured as a metric
for service consumption. In addition, the context
windows is a limitation of the maximum number
of tokens that can be handled as a group
under the same context. Using this knowledge, we
managed to reveal how those LLMs are generating
a complex text response. It's all about predicting the next token in a
sequence, one by one, in a loop to create complex patterns like sentences
and paragraphs and so on. How does the model
predict the next token? Based on statistical
distribution of possible predictions and
selecting one token. How does the model know to calculate the statistical
distribution of possible predictions by
consuming massive amount of text data and using the
self supervised method. What are the options
to influence the output of a foundation
model like LLM? We mentioned three options. One, contextual prompting by providing more context as
part of the input prompt. Retrieval argument generation,
a more complex solution, where the solution is leveraging internal databases to
enrich the input prompt. Number three, fine tuning, where we take a predefined
model and retrain it with new data to create
a new fine tuned model. The next phase as part of
our learning journey was to review some of
the key challenges and limitation of generative AI. We talked about
prompt sensitivity. Any noise we put inside will be amplified by the GII system, so we should be mindful of the quality of the input prompt. Knowledge cutoff
is the last date that the model was trained on. Any event or data created after that date is not part
of the model knowledge. That's something to
consider while using a specific model for
a specific use case. We should be aware that
many models are less deterministic by design to
make them more creative. And in some cases, it is
possible to tune the level of randomness to make them more suitable to
specific use cases. GII models are trained
on unstructured data. So they may have limitation to handle structured data
like tabula data. We also need to remember
that GII system lack the common sense
expected from a human being, especially while handling
complex situation. When training a GII system with a variety of data sources, the model knowledge will include biases on
different topics which may cause the model to
generate outputs that are unfair, unethical,
or misleading. Those models are getting better while handling
those challenges, but we should always
be more careful and mindful when using content
generated by a GAI system. The last section was
dedicated to reviewing the most typical use cases of using GAI, brainstorming Ds, summarize and enriched
text, code generation, using that for content
as a framework, images on demand and the last one was boosting
AI based applications. And the last lecture was used to review some best practices for crafting more effective pumps when working with GII models. That's our end to end story.
52. S07L02 Thank You!: Wow. You reached our last
lecture, and that's great. I want to thank you for
watching the complete training. You're more than
welcome to visit again and refresh your knowledge on specific topics and check if I released some
interesting updates. I hope that you
enjoyed the training and learned some interesting
things along the way. My main objective is to
trigger your curiosity about generative AI and hopefully help you to keep learning
and develop your skills. That's the future, and it is a great opportunity to break down into new evolving domains. My last request is to get
your important feedback. It will be awesome and
useful if you can spend two, 3 minutes to rate the course inside the platform and share
your personal experience. Each review is important. Secondly, feel free to
share your experience and achievement on social
media like LinkedIn. Just tag my name, Idan Gabriell. That's it. Thanks again
for joining this training. I hope to see you again at other training courses that
I have or going to release. Bye bye, and good luck.