Transcripts
1. Introduction: Your Buzzy and I get it. AI can seem complicated and you want to learn it as
quickly as possible. With a back schedule
and a full time job, you do not have the time to
go through a 50 hour course. You just want
practical AI skills to boost efficiency in graphics, text, emails, code, and more. If that's you, then
this course is perfect. Imagine impressing everyone
with AI knowledge, right, when it counts. You came across as a true pro. In this course, you get
a clear understanding of AI, LLMs, and diffusion models, how to use LLMs like Chachi BT,
with prompt engineering, exploring multimodality
and top performing models, prompting techniques for difusion models like
Dai AdobViafly, mid chourney stable
difusion, flux, and more. Insights into AI powered videos, voices, and even music
creation is covered. And by the way, if you
ask yourself who I am my name is Arnie, and I teach AI classes before ChIPT
even was a thing. So I am relatively
long in the game. I also have a small
German YouTube channel, and that's what I do.
2. What is AI?: We can dive deeper
into the AI world, we need to define
what AI actually is. So AI is simply a term
in computer science. The goal is to create machines with human like intelligence. For example, pattern
recognition, decision making based on data
and also task execution. And don't think of terminator. It's simple tasks. It can also be writing some
text like Chachi Pit does. What's the ultimate goal? The ultimate goal is AGI, so artificial general
intelligence. And that simply means learning, understanding, problem solving, and creative process as
well or better than humans. So artificial
general intelligence is smarter than most humans, and that is a goal
and nobody knows exactly when this
will get reached. And ultimate goal and nobody knows if this
ever happens is ASI. Artificial super intelligence. This AI would be smarter
than all humans combined. And like I said, don't think
of terminator right now. What AI is not? AI is not all knowing, is not self confident, has no emotion, and
the current goal is simply to achieve a set goal. You tell the AI, Hey, write me some text or make me a picture and the
AI will do that. That's for right now, but we have also robotics and so on, but that is not the main
topic in this course. Let's just make some
examples right here, and I also can write this. So some examples we
have voice assistant, and most of you know
voice assistant. So CIria s Google
Assistant but also GPD voice and they simply understand and respond
to voice commands. GPT voice or the WISPA
API is really cool. We will also dive in this
later in the course. Then we have recommendation
systems, and this is old. Just think about Netflix or
Spotify or even YouTube. You look at video, and
based on your behavior, the algorithms will
find similar videos. And then we also have
autonomous driving. So self driving cars using AI to understand where they are and then they drive in
that direction. And this is simply real AI. As the FSD from
Tesla, for example, is real AI, so they are not programmed to
drive at this road. They look at the road and
then adjust their behavior. And of course, we have
LLMs and diffusion models. So large language models
and diffusion models. Large language models make text and diffusion
models make pictures. This right here is the core. And because it's the core, we start with LLMs. So in the next video, I will see you and we will take a closer look at what LLMs are.
3. What are LLMs like ChatGPT, Claude, Gemini, etc: Of the people know hachBD. JaCPT is an LLM, and here you can do a
lot of stuff and we will make a deep
dive into hachPT. But let me tell you, we
have a lot more LLMs. Basically, if you
go on this website, the chatbot arena, you see that we have a lot
of different LLMs. They come from Opie, so HachiPT comes from OpeIe. Then we have Gemini.
This comes from Google. We have Grock. This comes
from XI, so ELN Mask. We have cloud, cloud
comes from tropic. I just want to tell you that we have a lot of different LLMs. And in this video, I
want to show you how an LLM works because
you need to understand the concepts of tokens and so on in order to use
them correctly because tokens and the structure
of an LLM is important so that you can use it
correctly as fast as possible. Basically, they are
just two files. And we make just a simple
example with Lama two. For everybody that already knows exactly what an LLM
is and how it works, of course, you can
skip the lecture. Basically, an LLM
is just two files. We have one file, and this file is basically
the parameter file, and I simply make it here as P. This stands for parameter. And we have a second file, and the second file is just
to run these parameters. I just call it run file. This run file is most
of the times written in C. C is a programming
language or in PyTon. So both of these can
eventually work. So what we have right here is the parameter file
and the run file. And the run file
most of the time is simply 500 lines of code. So we use 500 lines of
code to run this file. And this file is where the magic happens because this
file is gigantic. Make an example with an
LAM that is open source, and the LAM is called ama two. So ama is, of course, the LLM from meta, and they have different models. And this Lama, too, where
we make this example, this is the 70 B model. So this simply means that we
have 70 billion parameters. So you know this is a relatively big file
that we have right here. So this parameter file has
70 billion parameters, and how do we get all
these parameters? We need to train this file and we train it
on a lot of text. We use 10 terabytes of text
in order to train this file. So we use ten Theraby text. This is text from all
over the Internet. So this could be
Wikipedia articles, websites, and much, much more. And this file, we can simply
compress this file down, and this file is only
140 gigabytes big. So this file is
just 140 gigabytes big and we train it on
10 terabytes of text. You know, we can
compress it down, a lot. This parameter file,
you can simply think of this file just
like a zip file. It simply compresses
down all this data. In order to compress this data, we need a lot of GPU power. So we need a lot of GPU power in order to
compress this data. And that's also why NVDA was such a great stock
over the last years. If you look, for example, at the NVIDIA stock, you see, we have a gigantic run, and this is basically
because everybody needs GPU. So you see we have
a gigantic run, but this is not about
stocks right now. So basically, we
use a lot of GPU. I make this really simple. I also have more
detailed explanations, but I don't think we need
this in this course. So we simply compress ten Trabte text into
a 140 gigabyte file, and then we have
the second file. The second file is the Run file. It's just a few lines of code. And if we have an open source, at a am just like
Lama two or even Lama three or whatever open
source like that you want, we can download this file and we can run these files
locally on our PC. And this brings us
maximum data security because nothing goes
over the Internet. With these two files, they are a little bit
magical because here works the transformer
architecture in the background. You can simply think
about the neural network. We don't need to dive that deep. But basically, the neural
network sees words and predicts what next
word comes most likely. So it works basically like this. We train on all the texts, and so the LLM simply learns
how text is structured. If we ask, for example, what should I eat today, the LLM will simply predict what words a human
will most likely hear. This right now this
is simply called the pre training with
the pre training, we simply hallucinate
stuff out of this file. But then comes the second thing. The second thing
is the fine tuning and with the fine tuning, we give the LLM a lot of examples how humans want
to have their response. We would feed, for
example, a question. What should I eat today, and then we would feed an
answer that humans like. For example, you could
eat steak today. If we feed this over
and over and over, the LLM learns how humans
want their responses. This is called divine tuning, and this is the second
part in order to run LLMs. Then the last part is the so called
reinforcement learning, and we can break this down
really really simple. After the pre training
and divine tuning, we will simply do this
reinforcement learning. This basically means
that we ask a question, we get an answer,
and then we tell the LLM if this is good or not. This is basically the
reinforcement learning. So we have three
phases of training. The pre training, we
simply use a lot of GPU to compress a lot of text down into a smaller
so called ZIP file, and we can hallucinate
text out of these. In order to make these
hallucinations better, we do the fine tuning. So we feed a lot
of questions with answers structured in a
way that humans like. And in this phase, D LLM learns how humans want their responses. And lastly, the
reinforcement learning, we simply take a look, Hey, makes this sense or not, I yes, thumbs up, if no thumbs down, and LLM will simply learn further how we want
our responses. Now the next thing
that is really, really important is you
have already learned it. In this transformer
architecture, there are neural nets, and neural nets they
work with weights. Basically, they
work with numbers. And in order to make
sense for the neural net, of course, we need
to have numbers. So the first thing
is, of course, if we feed a question
into an LLM, the LLM will make numbers
out of these questions, the so called tokens. These tokens are numbers, and with these numbers, the neural net can
make its calculations. What word will come most
likely as the next word? I want to show you simply how these docons are structured. If we go on this
tokenizer, we can see it. We can simply type in What
can I eat today for example, and now you see we
have five tokens, 20 characters, and the tokens
are structured in this way. If we press here on token IDs, so this is basically
what the LLMCs. The LMC is numbers, and with these numbers, the neural net can make its calculations and
gives us a good response. If I press right here, for example, on clear, once again, then go
on show example, you see a bigger example. And here you also see that not every single
word is one token. This gets divided a
little bit different. Here you see invisible
is, for example, two tokens, and here this
point is also separate token. So we have a lot of
different tokens, and if we press on token
at these, you see, this is basically
what the LLM sees and the LLM makes its calculations
out of the tokens. But why I show you
this because this is important because we
have a token limit. Every single LLM has
always a limit to how many tokens it can
understand right now. If we go on this article
right here, what are tokens? Omei tells us that a token is roughly four
characters in English. It means that 1,500 words
are roughly 2048 tokens. And this is important because every single LAM has a
different token limit. You can see it down here. Right now at this minute, GPD for Turbo and
also GPD for Omni, and a lot of other models, they have roughly 128,000
tokens stocen limit. We have also models that have
2 million endocen limit. We have also smaller
open source model that have only 4,000
docenestocen limit, and the important
stuff is that you understand that as soon as
dtcen limit is reached, LLM will no longer understand the things that you talked
previously with the LLM. I just want to show you
an example into chat GPD. I simply tell the LLM
write a story about a fox, and now our first
tokens get generated. And as soon as we are
close to the token limit, so as soon as I
talk, for example, about other stuff, right now, let's just assume
that I want to have different stories right
here in this chat. For example, tell me
a story about a frog. Right now, of course, new
tokens get generated. And as soon as we
reach our token limit, the LLM will no longer know
our previous question and also not the answer
because the LLM always just knows
the last few tokens. In this case of JCEPT, the context window
is relatively big, so it knows 128,000 tokens. These are roughly 100,000
words, roughly, like I said. And after it, it will no longer get what we
talked previously. So please, please,
please remember always the last
few tokens count, and everything that
is over it will no longer be in the
knowledge of the LLM. You can call it this way. Of course, we have a lot of techniques to increase
this knowledge, for example, direct
technology and so on, we will talk about this later. But for now, you
need to understand that every LLM has
a token limit. Eventually this will go away. Eventually, the token
limit will be so big that we no longer
need to think about it, but right now at this minute, we have these limits and
we need to know this. Basically, if you
ever wonder why the LLM no longer knows what
you talked about previously, it's simply because the
docen limit is reached. In this video, you have
learned how an LLM works. Basically, we have
just two files. We have a parameter
file and a run file. The run file is just some code
to run the parameter file, and the parameter file is simply a lot of texts
from the Internet, but it is compressed down into a small file similar
to a CIP file. We need a lot of GPU to do this. This was the pre training. After the pre training
comes divine tuning. Here we feed the
LLM questions and answers so that LLM can learn
how we want our responses. And after divine tuning, the final step is the
reinforcement learning. We simply ask questions, get answers, and rate the answers if they
are good or not. And with this last phase, the LLM will get
better at these tasks. You have also seen that in the background works, the
transformer architecture. These are neural nets and neural nets they make
calculations with numbers. That's why we need to
divide our words in tokens. With these tokens, we can
make the calculations and calculate what word comes most likely as the next
word what we want to have. You need to understand
these tokens because every LLM has a so
called token limit. As soon as the token
limit is reached, the LLM will no longer know about what things
you dogs previously. It always looks at
the last few tokens, and of course, the token
limits is the model dependent. Sometimes it's 4,000 tokens, but it can go up
until 2 million. And one last thing, of course, it's really important
what questions we ask LLM because with
good questions, we get good answers. This is called
prompt engineering, but more on that,
of course, later. I see you in the next video, I know we did this
a little bit fast, but I think this technical
detail is everybody should simply have a grasp
of this. So we did it fast. We did it not in
complete detail, but this is more than enough
to work with this model. You need these
technical details in order to understand
that you have not unlimited questions here before JGBT forgets the stuff, and you also need to
understand it because prompt engineering is really important to get good outputs, and you only get good output
if you give good input. It's called prompt engineering. I want to talk about
prompt engineering in the next section.
4. The Interfaces of LLMs: This video want to
show you some of the most important LLMs and, of course, also their interface. Now, you already saw that we have a lot of different LLMs, and we can find countless of LLMs on these chatbot arenas. The most important are, at least how I see JetPT from Opmei, Clade from anthropic, Gemini from Google, eventually
also open source models, and we can use them either on Grock or we can also
use them with Oma. Now, we want to start with HPT because I think this is at
least right now the best one. Yes, some people love clot because Clot is also
really good at coding. So basically, yes,
they can also code. I want to show you
the interface in detail of JCPT because if you understand JCPT you understand also every
single other interface. This right here is the bar where you can type
your questions. And these questions
we call the prompts. And of course, prompt
engineering is the art of writing
the right questions. If you want to upload stuff in HathiPT, you have
this right here. You can attach files. You can upload pictures or PDFs and stuff and
you can analyze it. This right here is the
search, the web button. If you press on these,
hatchiPT will search the web. Let's just test this
out for one time. If we press on search, we can type in
Bitcoin price today. So here you can basically
see we get the text back and we also get some links where we
can click if we want. So these are the sources, and if you press on them, we can see the hPD searched the web. We use coin Market
Cap and so on. Now if you use a new
chat on the left corner, it's empty once again, and your old chats
are right here. The next thing that you
can do is, of course, to press on hatPD and
use different models. We have the normal GPD for
Omni great for most tasks. Have GPD four oh with canvas. If you press on these,
canvas is also really nice because let's just say you want to
generate some code. Give me the biting
code for a snake. Chet GPD will open
up this canvas, and in this canvas, we can
edit this code a little bit. This is really nice. So here on the right side,
you can click. You can either review the code, you can port it to
other languages like JavaScript or
something else. You can also fix bugs. You can add logs, and you can
add comments if you want. For everybody that codes, I hope you get what I mean. If we generate normal
text with this canvas, it's also nice because we can also edit our text
in this canvas. We can either suggest edits, we can adjust the length,
so we can make it, for example, shorter
if you want, and if we send it out, it will
get rewritten but shorter. And there you see it, we have basically the same text
but a lot shorter. Then we can adjust
the reading level. So for example, for graduated
school or for kindergarten. The next thing is that
we can add final polish. If we press on this, JetPT will do it completely
automatical. It will simply rewrite and
restructure it a little bit. Maybe there's something wrong
or a little bit too short, and you see you get
better outputs. And the last thing, of course, we can also add
images if we like. And there we have
nice little images. Besides this canvas, we
have also 01 preview. Open preview is the
model that thinks. If we give JCPT a hard task, JCPD is able to think a little bit before
it gives answers. Is this a good YouTube title? I like it on Mars?
Think about keywords, click through rate and more. And JCPT will start to think. So you see HHIPT is thinking. It generates itself some token. Here you can see the
thinking process, and then it can get up with
better answers because JCIPT gives himself always new
tokens to think through, and there we have our output. Besides the 01 preview, we have also the 01 Mini. This does basically the same
thing, but it's faster. And if you press on
more models, right now, we have GPD for O Mini
and GPD for Legacy model. If you just want to
have temporary chats, you can also include them. If you go on this question mark, you see that you can report illegal content you
can use shortcuts. You have terms and
policies, release notes, help guide, and ac you, and this criteria is
simply your name. In the left corner,
they ***** you. So if you press on these, you can upgrade your plan. I pay right now
20 bucks a month, but you can also start for free. If you use the business plan, you need to pay
25 bucks a month. Basically, you get
the same thing. But the most important
thing is that your data will be automatically
excluded from training. So this is a little bit safer. On the left side,
you can also close the sidebar and bring
it back to life. You can press on search chats. And here you can
search the chats that you already
had with hat GPD. And if you press on
these right here, new chat like you have a NUCat. Then you have these
things right here. These are called GPDs and I want to show you
more on CPDs later. But if you press on Explore GPD, basically what you can
do is that you can search specific GPDs that
other people have made. If you want to do, for
example, programming, you can click on
Programming and find specific GPDs that are
tailored for programming. This is a GPD for PyTon and
if you press on Sarchat, you can simply chat
with this GPD, and this is, like I said,
specifically for PyTon. That's basically the
JahiPT interface. If we go into Cloud, you basically see that the interface is
relatively similar. Here you can type in
what you want to do. You can also upgrade Cloud. This interface is a
little bit simpler, but basically it does
the same thing as JahiPT but simpler,
like I told you. M snake code, and also clot
will give me snake code, and also clot will add
something in like canvas. This right here is Gemini. Right now, Gemini is
in German here for me, Gemini is also a normal LLM, and it can also do basically the same thing as
Chachi Bitty and clot. This right here is
Grock and on Grock you can basically use
open source LLMs. And the interface
is minimalistic. You can type in your stuff right here or you can also
talk to these things. And by the way, you can also install hat ChiPT on your PC, and you have it as an app, and you can also install it on your smartphone and you
can talk to hatchiPT. This right here is
the hatchiPit app, and if we talk to the
hatchipit app, it will answer. Hey, Chat Chi Pit, tell me
a small story about a fox. Once upon a time,
in a lush forest, there lived a clever
fox named Fiona. Known for her quick wit, Fiona loved to explore and learn about
everything around her. One day, she
stumbled upon a trap set by hunters
using her cunning. That's basically the
advanced voice mode. I think this is right
now a paid feature. So if you pay for ChatBD and simply install the
application on your local PC, you can use this
Advanced voice mode. And the last thing that I
want to show you is Olama. If you download Oma, this will run
locally on your PC. Don't worry if you
don't want to do this. I just want to show
you how it works. You simply press
Download right here, then you can go on models, and you can search the
models that you want. And the next thing
that you need to do is to go into your terminal, and this thing
will work locally. In your terminal, you can search the models
that you want to use. For example, Lama 3.2, you can simply press
O Lama run Lama 3.2. If you copy these and
throw it in your terminal, you can download these Lama
models or you can run them. If they are already
installed, you can run them. So I have this installed, and now you can also do
this stuff right here. Tell me a story about of rock, and then Lama will tell me
a story about the rock. This right here is especially
cool for data privacy, but of course, there's not
a nice interface in OLAM. You can link this together with, for example, anything LLM, but this is too big
for this course because we need to
learn this stuff fast. So basically, if you want to run the stuff locally, you
can totally do this. But for the most part,
if you are starting out, just use HachiPT in the
standard interface. So in this video, you saw all the interfaces
that are important. If you want to run LLMs
as fast as possible. In the next video, I want to
show you what LLMs can do.
5. What can LLMs do?: This video want to give you a quick overview
what LLMs can do, and it does not matter
in what LLM you are. Basically, most of the frontier models can do the same thing, and also the open source
models will come over time. Every single LLM can
make text bigger or code and make text
or code smaller. So you can summarize
text or expand text. Let's just make an example. You can type in a little bit of words and get a lot of words. Give me a marketing text for
my website, AI with Arnie. No, I do not have really
this marketing text. Right now, I use the
OO preview model just because it was active. Now CHGPT thinks a little bit what marketing text
it should write, and then I will get my answer. So here you see we make a little bit of text
into a lot of text. I hope you get what I mean. Next, we can summarize text. This right here is an article
on medium about LLMs. You can basically simply
copy a little bit of text. You can throw it into JTCPD
and say summarize in bullets. So basically, you
can summarize text. And there you have
it. Now we have some bullet points about this
text. Same thing with code. You can generate code. We can do basically
something like this. We can create a lot
of code really fast. Give me the code
for HDML web page that has three buttons. I can only turn on two of the
buttons at the same time. It should illustrate that it
is not possible to be broke, smart, and busy
at the same time. Now it will generate
some HDML code. Now there's the code. Let's just see if it works.
I copy the code. I make a new text file. I throw the code in the
text file and I save it. Now I save it as HDML HDML Yes. And I open up the webpage
broke, smart, busy. It does not work because, like, think for yourself, yes, you can be smart and busy, but then you are not broke because you work on
the right stuff. If you are broke,
you cannot be smart and busy because if you
would be smart and busy, you would not be broke, but you can be,
of course, broke. And busy but not smart. If you are broke and smart, you are not busy because you do nothing because it simply
does not work that way. And of course, if you have
a lot of code, for example, on a web page, you can also
try to make the code smaller. So yes, you can also
summarize code if it works. You can also generate
some tables if you want. So this, for example, is a table about the
macros of a banana. So text can be, of
course, also tables. And now comes the fun part because LLMs can also use tools, like a calculator, a
Biton interpreter, or a diffusion model. A diffusion model
makes pictures. I want to show you. What is
three times 98 times 98? If we send this out, you
see that we are analyzing. So basically, we use tools. I think JahiPT will simply write us more PTN
script to do this. If you press on view analysis, you see ChachiPT uses a Python interpreter
to give us the result. Make a picture about the banana, and HCBT will use a diffusion model like Dali
to create this picture. And there we have the banana. Of course, we can
also analyze stuff. Let me show you what
is in the dataset. That is basically,
let me show you a dataset with some
social media stuff. This is basically
the usage of people, so where they are,
are they on Snapchat, ****, dock, pin
arrest, and so on. And you see this is a really, really big table, and we
can analyze this stuff. Here it gives me a table. Excuse me, right
now, it's in German, but we want to talk
in English right now. And that's basically also the next thing that I
wanted to show you because, of course, LLMs can
also translate stuff. So here is stuff in German. You can simply say to hathPT, translate this in English. And you can do it
also, vice versa. The dataset contains 1,000 rows with the following columns. User ID, app, daily
minutes spent, posts per day, likes per
day, followers per day. And here you get everything. So you see Pinterest, Facebook, Insta, **** Doc, and LinkedIn. Daily minutes spent,
post per day, likes per day, follows per day. Make a chart out of this because we can use
tools, you know. JGBT will use a BTN chart to
create a nice graph for us. And here we have it, Facebook, Installink then,
Pinterest, and so on. And of course, if
you press on these, we can switch to an
interactive chart. We can also use, for example, different colors if
you like other colors. And then if you like it,
first, you can make it bigger. But if you like it, like I said, you can also download
it by this button. And HGPT also understands
the context of this chat. Make a pig that
illustrates the dataset. JCPT will simply understand that this is about social media and most likely we will get some people that use a phone
or something like this. At least that's how
I would guess it. And there we have it. This
is a social media page. And of course, with some data
because this is dataset. By the way, this is
called function calling. We do not have enough time to dive that deep
into these things. Just think about it that
way that every time the CheBT or an LLM
is not smart enough, they will use different
tools to do this. Andrew Karpathy also likes
to tell us that the LLM is our new operating system like a computer that can
use different tools. And on the tool use, please also do not forget that
they can use the Internet. They can also use the Internet to search live information. I already showed you
this in the last video. And also important before we talk about training our LLMs, of course, they are
also multi model. This means they can
hear, speak and see. About hearing and speaking, you saw this already.
In the last video. I just want to show you
that they can also see. If you are in ChachiPT, you can upload pictures. For example, this,
this is a picture from Hugging Face about
reinforcement learning, and yes, this looks complicated. What is on the big? Explain it like I am five. And by the way, yes,
the quality is awful. Let's just see if
JahiPT can get it. Yes, it gets it. Start
with language model. Imagine the computer
is like a child who already knows some words
and sentences and so on. Then give it a reward, make it pract these
combined learning steps. This is reinforcement
learning from Hugging Face. This is basically this picture right here from Hugging Face. This is right now
the good quality. In HHIBT, I have uploaded
purposely the bad quality, but even with the bad quality, ChachiPD can see it and
can explain it like five. So LLMs can also
see speak and hear. You can also train
different LLMs. We can train different
LLMs with prompts. This is the so called
prompt Engineering. We can also use direct
technology or fine tuning. I want to dive deeper into
the prompt engineering in the next video because prompt engineering is really important. In this video, you
have learned that LLMs can do a lot of things. First, they can generate text. Second, they can summarize text. Third, they can create code, also make code smaller, and they can use a lot of different tools in
order to analyze data, to create pictures,
to use a calculator, and to do a lot of cool stuff. Just think for yourself what
is most important for you. You can do whole
tasks with an LLM. Just think about it that way. You can write a story about a company that does
good, for example. Then you can make
some calculations, how they do in the future. Then you can make some
tables, how they are doing. And lastly, you can make a
picture of a happy investor. This is a whole presentation. So JGBT and LLMs, they can really help you a lot.
6. Prompt Engineering: Let's talk about
prompt engineering. This guide comes
directly from Opie. So the company behind JCPD. And yes, JATCPT or
the Op MI models, they are also included in
the Microsoft copilot. This right now is in
German, but of course, we can use copilot also
in the English version, and yes, we can also use
it with white background. This is simply the
theme that I use. Later, we will use it with
the white background. Let's just come back
to prompt engineering. Prompt engineering is important because if you don't
give good inputs, you will not get good outputs. And I want to show you
the prompt engineering in Microsoft copilot, but this works
completely the same. Of course, also in Jet CCPD and every single model under the sun because these
concepts are always the same. This resource, you can read
this yourself if you like, but we want to do this
as fast as possible. We do not have the time for every single prompt
engineering technique, so we make it fast. This right here, this is the example of a really,
really bad prompt. Give me an article
about smartphones. Why is this prompt bad? This prompt is bad because
we don't give any context. So if we sent this
out and we use, for example, balance right here, we will get most
likely an answer, but the answer is not specific because we don't give
specific inputs. And boom there we
have our output. So here is an article
from the guardian. And we have simply an article. We have a link where
we can click on. Now, this is a bad prompt, and we need to expect
to get bad output. Why is this output bad? I wouldn't necessarily say that this is really
a bad output. It's just output
what we asked about. We ask for an article, and we have an article
that is not specific. Maybe you had something in mind that you want to
post on your blog. But you can't do it
with this article. This output is simply bad because we don't
give any context. Now, I will tell you
right now it's really, really easy to give context. And in order to give context, you only need to understand
one key principle. This key principle is called
semantic association. What does semantic
association mean? Let's just assume
that I tell you a word or two words
or ten words. Let's just assume
that I say to you, for example, Greek god. With these two words, you have immediately 100
other words in your brain. 100 other pictures maybe
also in your brain. You have different Greek
gods in your head. You have maybe also
different pictures from Greek gods in your head. You have maybe also like
the old Rome in your head. You have things like a
good body in your head. You have different
stuff in your head. And that's basically
the whole concept of prompt engineering. We need to give context. We need to use semantic association because all these
large language models, so copilot that use HHIPT, all of them, they
are associative. So if we tell these LLMs
just one or two words, they have all the other
words in the background. They have this in
their knowledge. If we say, for example, smartphone they have a
lot of different words that are similar to smartphones.
Why they have this? Because they are trained
on text, as you know. They simply search
for the text where they find the word
smartphone a lot of times. If we give them a
little bit more words, all of these will
get more precise. We can give them, for example, words like Apple or Android or blog article if you want to make a blog article and
much, much more. The key concept is right
here that with a few words, you will give a lot of context to LLMs because they
are associative. Let's just make one example. We press new topic and
we start from scratch. We use a balanced output and I tell copilot
something like this. This would be a prompt
that makes a lot of sense. We start with
something like this. You are an expert for
smartphones. Why we do this? This right here, this is
called role prompting. So we give the large
language model. In this case, co pilot
or hechiPT a role. So he is an expert
for smartphones. And then we give
some more context. You know, the Google Pixel
eight pro in detail. Why is this important? Because if we tell him
that he is an expert for smartphones and he knows the
Google BxelEdP in detail, he will search in articles
where all of this is included. So we get really, really
expert good outputs for smartphones and the LLM will search in articles about
the Google BixelEdP. And then we tell the LLM
exactly what we need. We need a 600 word article why the pixel eight pro is good. We want to have a
positive article. This is also key. This right here, this is the semantic association
that I talked about. Of course, all of this is semantic association related,
but this especially. I just include three
words Gemini nano, LLM, and on device. These are simply free words, and if we use these free words, the LLM will search
articles where all of this is included because
for me, this is important. This is one of the key features that makes at least in my mind, the Google Pixel Eight P. So good because we
have Gemini nano, a large language model, a small, large language model
that runs on device. We can also include stuff
like no latency if we want. So don't worry if
you don't get it right now because we
will get an article. So if you are an expert
in stuff like this, you can simply tell the LLM that also the
LLM is an expert. We simply tell him he's an
expert for smartphones. He knows the Google
Pixel eight P, and then we give him
some words that we need or we want to
include in our article, and the LLM will search
the right stuff for us. So we sent this out, and I am relatively sure we get output that is a lot better. Of course, you can also
include stuff like write the article for a 10-year-old if you want
to make it really, really simple
because, of course, the semantic Association
can also do that. So we will search for
articles that are really, really easy to understand. But right now I don't
want to do this. I simply sent this out and
we will get a good article. This we can maybe include
also in a website. And here we have our article
and I hope that you see that the output is completely
different than previous. As a smartphone expert, I can tell you that
the Google BixelEightP is an excellent
device that offers a range of features and capabilities that make it
stands out from the crowd. Here are some reason and so on, so the design and build quality, the camera, the software, the Gemini nano and ALM. So the BixelEightP is powered by Google Dancers G three chip. Of course, you can
also be more specific. Like, for example, make this article for
my website or make this article as a Twitter
thread or something like this. Make the article for
a Twitter thread. Readers are students of
tech, so include details. And we will get every
single detail and the format will be okay
for a Twitter thread. So now you see, we
have a lot of details. So we talk about the software. We talk about how
much megapixels and sensors our camera has
and much, much more. And we can also make it simpler. Let's just say you want to have this article for 12-years-old. Make the article
for 12-year-old. We will most likely exclude the words that are a bit
harsh for our younger guys. You see it immediately. One of the best things about the Google is the
camera and so on. We don't use all of these harsh words and
we get easier output. And that's basically all
that you need to understand if you want to start immediately
to write your prompts. You need to make
structured prompts. This right here, for example, is a structure prompt because
we start with a role. This is also called
roll prompting. In the next video,
I will give you some more quick, quick examples. We start with the role,
so you are an expert in X Y andZ and you know
maybe some details. Then we use our structure prompt to tell the LLM what
we need exactly. Want to have an article that is roughly 600 words long about
the pixel eight probe, and we need to know
why it is good. And then we trigger the
semantic association just with a few words. So you don't have
to use these words. It's just important that you
include some of these words. So this video was about
prompt engineering. I just want to
tell you that LLMs are relatively
simple to understand because they can only
do two things if we break it down to
the key principles. They can make text bigger and
they can make text smaller, and we need to use good prompts in order
to get good outputs. We need to trigger the
semantic association. We can do this with
structured prompts. We can give, for example, a role we need to tell the LLM what we want
to have exactly, and we need to make
sure that we use a few words that are similar
to stuff that we like. Of course, we have Cillian
different prompting concepts. We have the chain of thought, the tree of thought,
and much, much more. I have other courses that
cover this in detail. But in this course,
I want that you can use as fast and as
efficient as possible. In the next video,
I will show you one or two more tricks that are important for
prompt engineering, and then you are ready to
rock as fast as possible. Just remember give context
in order to get good output.
7. More Prompt Engineering Tips: This video, I want to give you
a few more tips and tricks how to make efficient prompts for CIPD or in this example, of course, for copilot. So let's just see what tricks I have for you to work fast. Of course, you already
saw the role prompting. So just give the LLM a role. You are an expert in XYZ. We covered this in
the last video. But this right here
is completely new. The shot prompting. In the shot prompting, you simply give examples. Now what does this mean? You can say, for example, you are a copywriting expert, and here is a copy I like, and then you simply paste
a copy and you tell the LLM make a similar
copy for X Y and Z. And these two things right here, they are really, really cool. Take a deep breath and
think step by step. Why these two things
work? I want to explain. Take a deep breath and
think step by step. You can also throw
this together. This works simply because also the LLM will think step by step. This is not only better for
you but also for the LLM. Let's just make
one quick example. Let's assume that you
want to install BTn, for example, but you don't
know nothing about PyTon. If you simply type in
how to install BTN, the probability is
relatively big that you get an output that starts on a point that you
don't understand. Maybe they start
with a step that you don't understand yet. This is not only
problematic for you, but maybe also for the
large language model. If the LLM is not trained
on the perfect text, it makes always sense to tell the LLM to think
step by step because the LLM will start with stuff like let's just open up
the Chrome web browser. So this is the first step. If you tell the LLM to
think step by step, or maybe also to
take a deep breath, the LLM will simply
start at the first step, and the first step is most
likely to open a web browser. After this, you need to type in in Google, for
example, Python. And if you see all of this, you get, first of
all, better output, and the LLM can always associate more stuff because also
the LLM has new words. The LLM starts to
type in stuff like Google Chrome like search
for PyTon and so on. And in that instance, the LLM has more stuff in
their own context window. This is really,
really practical. So this is a tip that I can
really don't stress enough. Take your deep breath
and think step by step. And by the way, I
don't make this up. There are studies out there that show that these two words, these two sentences
make the output better, and here comes a funny one. Something like this also
works really, really good. I give you 20
bucks, for example. So we give Chachi PT, we give copilot, we give
the LLM a nice little tip. We give him some money or at
least we offer some money. Also, this sentence
right here shows that the LLM creates better output if we tell that we
simply give some money. Now don't ask me
exactly why this works. I just know that it works, and I know that there
are studies out there that also tell
you that this works. So you need to
simply understand by adding sentences like
take a deep breath, think step by step, and I give you 20 bucks. You will get better
output from copilot. So write this down. This is important for me. And the role prompting
you already understand. For the shot prompting, I want to give you an
example right now. We take your new topic, and let's just assume that I really want to have a
copy for something. We can start with
something like this. You are a copywriting expert. I like this copy. So we simply start
with our role. We give him the role as
a copywriting expert. I like this copy. And now we include a
copy that we like, and we do it this way. So this stuff right
here that I include, this is simply the copy or at least a part of the copy
from my course all of AI. So we simply have a
copy that I really, really like because like I
have written this copy myself, and then we can tell the LLM
a lot of different stuff. I make this a little bit shorter just to show you what
this is all about. Right now, I also show
you a nice little trick. Answer, only with Okay. You can do this always
to save up some tokens. So we can send this out and
we will get an okay back. And after the ok, we can simply tell
the LLM more stuff. So you see, we have the ok back, and now I can tell the LLM
like what we want now. The LLM has the copy or at
least a part of the copy. Remember, LLMs are associative, so they understand how
the copy is structured. We get our o back to
save up some tokens, and now we tell the LLM what
we want to have right now. Give me a similar copy but for a course named
Microsoft Co Pilot. This is important
because I use this a lot just to get more
ideas for my copies. This is really,
really practical. So first, you have
written a copy yourself or you found a copy on
the Internet or whatever. You give this as an example, and you tell the LLM to
answer only with okay. Get your okay back, and now you can ask for the next task. For example, give
me a similar copy, but for the course named
Microsoft copilot. And here we have a similar copy. So welcome to the introduction to Microsoft Copilot course, your journey into the world of AI powered code completion. If we scroll up, this starts
similar to my original copy. Welcome to all of AI. GPD mid tourney Sabi fusion
and app development. You journey into the world
of artificial intelligence. This master class is perfect
for anyone and so on. And this is also
true right here. This course is
perfect for anyone. So you see we use
a similar style, but not exactly the same words. Now, this is really,
really cool, and this is the strongest
feature of the shot prompting. Let's just go back on this
nice little thing right here. So you already saw how
the shot prompting works. We simply give examples
and we will get similar output but
not the same output. If you use shot prompting, you don't need to
take a deep breath. You don't need to use
things step by step. And you also don't need to say that you give money
because you have a nice example and
the LLM can be associative enough to
understand what you need. This is more likely true
if you don't use examples. If you use normal
role prompting, then it would make a lot of sense to include
take a deep breath, think step by step, or I give you 20 bucks
at the end of your text. The key concept is always
you need to give context. Right now I'm not sure how
to write this in English, maybe this is a bit better. And you always
need to understand that the tokens
are not unlimited. Because of this,
you already saw in this nice little example that we use something
like the Okay, so answer only with o. This is just to save
up some tokens. So you don't want to
both endless examples and endless stuff that
doesn't make a lot of sense. You always need to understand that these LLMs are
associative and you will get precise answers or short answers if you tell
answer only with okay. And then you can ask
your next question, and that's basically it. So in this video, you have
learned a lot of cool tricks. You should include, let's
think step by step. Let's take a deep breath, and you can also
offer some money. You will get better outputs
if you do it like this. If you have the chance to give examples of stuff that you like, you should totally do this, and this is just called
the shot prompting. Key concept is always to trigger the semantic
association. So you need to give context, but you need to keep in mind that your tokens
are not unlimited, and for that reason, you
also have the trick to just ask for a quick o as
answer from copilot. Because remember, the
token limit always counts against it
counts what you put in, but also what the LLM spits out. All of this will count
against your token limit, and sooner or later, your token limit
will be reached and the LLM doesn't understand anymore what you
are talking about. A lot of tips and
tricks one cover, but I really, really recommend you to try all of this out.
8. Customizing LLMs with System Prompts and RAG (Retrieval Augmented Generation): Talk about training LLMs.
We have two options. We can train them either with prompts or with
direct technology. First, I want to show you
what direct technology is. Then we start with prompts, and then we will use
direct technology. You already know
we have chat GPT, we just simply call it GPT. And hat GPT can
answer questions. Sometimes it's not smart enough, so GPT can go on and
use different tools. You already know this. For
example, the Internet. I can go into the Internet
and search different stuff. But let's just say
you want to train a GPT on your own data. Let's just say on data
from your own business or on your own marketing
text or whatever. Now you have two options. You can either do
this with prompts or you can do this with
a vector database. We will not explain
a vector database because you will just learn how to use this stuff quickly. Basically, what you can do is to upload a lot of
context in a file, and then hatchPD will browse your file and then have
all this knowledge. I want to show you
one or two tricks first in the prompts and
then in a vector database. The easiest thing if you want to customize JachPD is
the system prompt. If you press on this
thing right here, you can go on customized GBD, and here you have
the system prompt. And you can simply
fill this out. What would you like JCPT to know about you to provide
better responses? And if you press
on these, op Mey helps you. Where are you based? What do you do for work?
What are your hobbies? What subjects can you talk about for hours and what are
some goals of you? So just type this in and then JCPT will give you other
outputs, better outputs. Let's just make an example. I live in Italy
but speak German. I am an AI educator. My interests are
LLMs and diffusion. I like to talk about AI. My goal is to make
a good course. And then the next thing
is even more important. How would you like
hhiPT to respond? If you press on it, how formal or casual should hechiPT be? How long or short
should the responses? How do you want to be addressed? Should HGPD have opinions on
topics or remain neutral? You remain neutral.
Call me Arnie. Your answers are short and
if possible, bullet points. Now we press safe and now our model is trained
on our specific data. The model simply react
a little bit different. So let's just make a quick test. HPD, can you give me some
info about the election? We also use the web search? Because we had the election at this minute as I am
recording this course. We are searching
the and ChachiPT tells me that November
5 was the election. So you see, it's really, really short and concise and
we get some links. Now ChachiPT does
not call me Arnie. Now why is this?
I will show you. If we go into a new
chat and we do it without the search and
we do something else, let's just make a different
example because this does not work that great
if we use the web search. Hey, GBD, I want to
market a course. Give me some examples
how to do it. I would guess that Jet GBD tells me right now, Hey, Arnie, you can try this,
then some bullet points like boost on
social media and so on. Hey, Arnie, all
right. Let's dive into some powerful
marketing and so on. Use engaging social
media previews, run a free webinar, leverage email marketing,
create a lead magnet, collaborate with
influencers, and so on. So you see it is short, it is concise, and Jet
GBD calls M Arnie. This is basically
the system prompt, and with the system prompt, you can customize HHIPD. Of course, you can also
use the shot prompting, but I have already told you
how the shot prompting works. Just give an example. Now I want to show you how
the RC technology works because this is the
most powerful tool if you want to train an LLM. Now, in HachiPD at this moment, I think this is a Bit feature. You can press on Explorer GPD and search the GPD.
You already know this. But you can also press
Creator GPD or you can go on my GPD if
you already have GBD. I just want to show you one GPD. For example, this
diffusion prompt GPD, this is specifically trained to write prompts for
diffusion models. Diffusion models make pictures. If I press here on CAT, I will get a prompt for a
CAT and the prompt will be specifically tailored
for mid journey and also includes camera
lenses, and so on. So here you see, this
is a perfect prompt, and with this perfect prompt, I can use this in order to make good pictures in
a diffusion model. Now I want to show
you how this works, how we can train these things. If we go back once
again on Explorer GPD, my GPD we go on these diffusion prompts
and press Edit GPD. You see that we can give
a name, the description, then the instructions, so
how the GPD should behave. And lastly, we can also upload documents documents
where we give examples. We will do this
now from scratch. We make an example. Let's
just say we are a company, and in this company, we want to have a GPD that does the onboarding for
us. So Create. We go not on Create,
but on configure. We call it onboarding. Onboard new members, I want
to do this really simple. You are the CEO of the
company AI With Arnie. Your goal is to onboard people. If they have questions, you search your knowledge
and give them info. So this is basically a
really simple system prompt that we can give right here. Now we can give, for example, Zone conversation
starters if we want. All the people that
try to work at my company just ask me
these two questions. Where is the toilet
and when is lunch? So these are some start
up questions like, come on, you can think
about it yourself, what you want to include. Then the knowledge, now
we can upload files. And now we make a simple file. This could be a PDF. This could be a text
file or something. We just simply do it with a simple text file that
I am creating right now, and here I write some infos, but this could also be a big PDF with 50 pages or something. And this is the infos that
the people need to know. The toilet is not here. We do not need to
be at our company. We have lunch when work is done. We work seven days a week. We do not have holidays. If you want more info, go here, and here we can basically
also give a link if we want. I just do it with my
free school community, but this is in German. So let's just make an example. We include this right here. Now we save this,
we come back into Jet GPD and we upload
our knowledge. So upload files. This
is basically the file. Now we can also use other tools. We do not need the web
search and we do not need Dali as the image
generation for this GPD. But let's just
assume you want to have the data analysis included. But I think also this is
not really necessary. What you also can do if you are a programmer is to
create new actions, but I think this is not really the point of this
fast little course. If you press and
create new actions, you can basically put in peichm and include
the different URL. You can basically also call
different API and boots from. But like I said, this is
not the point right now. We press Create, we give it anyone with a link,
and we press safe. This is the link that
we can share with the people that work
at our company, and we press view GBD. And then we can simply ask, so where is the toilet? And if I ask, hat
GPD will say most likely that the company
does not have a toilet. So basically, you
can see it here. It appears that our company does not have designed toilets. I started, the
toilet is not here. We do not need to
pee at our company. And if you want more info, you can press on this link, and basically you are here. Then the next
question, let's just say when do we have holidays? We work seven days a week and always the
link to our company. Now, let's just say you do not want to have this link anymore. You can also go do
this right here. You can always
customize the GPT. Explore GPT, my GPT, then here on Edit GPT, and here on Configure, you only give the link if people ask about more
info and update. View GPD when we have holidays, we don't have holidays at our company and we work
seven days a week. This is basically how
you can train an LLM. You can use system
prompts and you can type in how ChtGBD
should behave. Then you can use
normal prompts in the interface with
the shot prompting. You already know this. And lastly, you can also use direct technology and
train your own GPD. And this GPD, you
can also share it with other people so you
can send them the link. This is the so called
direct technology. Here works a vector database. We don't need to do a
deep dive in these, but just make yourself clear. You can give instructions
and you can upload files, so the chat GPD can
browse these files and has specific infos about
you or your company. And yes, working at my
company is not fun.
9. Perplexity and Huggingchat: You want to explore more
tools where you can use LLMs, you can take a closer
look at Hugging chat. Hugging chat is
really easy to use. Here you can press what open
source LLM you want to use. For example, Lama 3.1, the 70 B model, a Quin model, some models from MNVdia or some models
from Microsoft. Just click on the model
that you want to use. You can type in a system
prompt if you want, and then you press New CAT. And here you have also tools. So yes, they can also use
different tools just like HGBD. They can use a diffusion
model to generate picture. You can include image editors. They can vet RL. You
have a document, bar ser, a calculator,
and a web search. So this is basically
somehow like an open source HIPT
forever for free. And then we have perplexity. Perplexity is similar
to HHIBT search. You can play with
this a little bit. I do no longer use this
tool a lot because HHIBT is also right now relatively good
with this search tool, but you can try
perplexity if you want. You can also start for free. You do not have to
make an account. Just start for free,
see what you like, and maybe you stick
with something.
10. Developers Can Use LLMs via OpenAI API: You are a developer,
you can also include HCBT in your own apps. You can use it in
the OpMIPlayground. This is maybe also
interesting for you if you want to use the
newest HHIBT models, but you do not want to
pay 20 bucks a month. On this playground, you
can simply pay as you go, you pay per token. And I want to show you
how much you need to pay, how it works, and how you
can make ABI calls to HGBT. First thing is that you
go on this platform. So platform domi.com slash PlaygrounD and here you can
play with all their models. On chat, you can play
with the chat models. You can use their newest ones. So GPD four mini, GPD 40 and so on, you can select
whatever you want. You can also import functions. So yes, you can also do function calling if
you are a coder. I just want to make this
quick. Please excuse me. Then response format,
this is right now text, but you can also use
JASNFmat and so on. Here we have temperature
and maximum length. You can simply read
this for yourself. Basically, if you
decrease the temperature, JGIPD will be more accurate, but it can be a little
bit repetitive, especially for math
tasks, this is good. And the context length
is simply the output. So how long can the output
be that ChachiPD gives you? These are the most important
settings right here. Then here in the
middle, you see that you have the system
instructions, so this is basically
the system prom just like the custom instructions that I showed you
in the last video. So you are a helpful
assistant, for example, and here you can type in
your text just as normally. Tell me a story about
turtle in the desert. You press Run, and then
ChatBT will basically talk to you and you can use always the newest models
without a limit, and you always pay as you go. I want to show you
how much this costs. If we go on this
pricing section, see that we can use
GPD 40, for example, and we need to pay $2.50 for 1 million in input tokens
and $10 in output tokens. And every model has their
different pricings. If you scroll down, for example, you can also call
the other models. You can use the
GPD 40 Mini Model. This is really, really cheap. You can use the Obo preview. This gets a little
bit more expensive. You can use the real time API. This is really expensive. So here it can go up until $200 per 1 million
output tokens. This is simply if HHIBT
will talk to you, so in the audio format, and you can also
generate pictures with Dali if you
call the endpoints, and you pay $0.04 per Image. If we come back here,
I want to show you in the left corner that you
have here the real time. So you can press
on the real time, and you can also talk
here with these models. Give me a small little
joke I want to laugh. Sure. Here's a
little joke for you. Why can't you give Elsa a balloon? Because
she'll let it go. So that's basically it, and here we need to
pay for re output. Then we have the assistance. These assistance,
this is basically exactly the same
thing as these CPDs, so we can include the Ruck
and all these things. And we can also make our own
applications with these. If we go into text to speech, you can type in text and
you will get speech back. So, hey hat GPD, basically, I want
to generate it. Hat GBT, I like you. And there you can hear it. Alloway tells us these
things that we type in here. Hey hat GBT, I like you. And then we have also the
completion mode here. If you want to use this, you need to press on your account. You need to press
on your profile, you go on to billing, and here you need to
insert your credit card. So simply press to
payment methods, and here you need to
include your credit card. Then you need to give hatchiPT
a little bit of balance, and then this thing
will work for you. Of course, you can
also set some limits. If you go on limits, you can
give hachPT some limits. Right now I have 500
bucks per month as limit. If you press on usage, you can always see how
much it costs you per day. So this was a day
where I had to pay five bucks because I
have also some chatbots, and here a chatbot talked a lot. And then if we go on October, this is also the
usage from October, so right now it's 28 bucks. These are chatbots that I have included in some websites and people are using this chatbots and that's why I need
to pay a little bit. If you just play with
this thing a little bit, I think you will just
play a few cents. Here you can see with $0.13 you can play
with these models. You come back here
to your dashboard, you can also see that you can
do a lot more things here. You can go on fine tuning, and here you can fine tune
your own model if you like. This is not really the
point of this course. But if you go on API keys, you can also make
calls to the API. So you simply need to
create a new secret key. You give it a name,
and then you can copy your API key and call them
in your own applications. If you are a developer, just go on the documentation. You can go on to the quickstart, and here they tell you
what you need to do. You need to create an API key. Then you can call these
endpoints, for example, in PyTnPp install Opmei,
this is the first thing. And here you can
see, for example, if you want to generate text
in your own application, you can use Ashima like this. If you want to
generate an image, you can use something like this. We would call Dali for example, and if you want to create
vector embeddings, you can call sate. It's really easy with
this quick start. So if you are a developer, the Opmeei API is
really easy to use, and you can call it with JavaScript with
Piton or with Curl. If you are not a developer, this platform is most likely not for but generally speaking,
it's relatively easy. I like, for example, flow wise, and use the OMI API
to make AI agents. But like I said, this is
not a complete deep dive. If you just want to learn
this as quickly as possible, this platform is maybe
an option for you if you do not want to
pay this 20 bucks a month for the HGPTPlus
interface because here you can work with the newest
models and you only pay for the tokens
that you generate. And the tokens are relatively
cheap to generate. So you can play with this
platform a little bit around and see if
it's for you or not. And of course, also
all the other LLMs have their own APIs. So Google has also the API
for the Gemini models. Andthropic has the API
for the Cloud models. And if you want to work
with an open source LLM, you can use, for
example, the Grock API, or you can also make
your own server with, for example, ALM
Studio or with Oma. So you have endless options. You can either make your own endpoints if you use it locally on your PC or you can
use different API calls. Like I said, this is more like a general guide
for developers if you want to develop with these things and if
not, skip this video.
11. Recap of LLMs: This section, you have
learned a lot and we did it as quickly
as possible. We started with
all the interfaces of these different LLMs, and you know there are a lot. Cha chiPD clot, Gemini, you can also use Oma,
you can use grog. You can use a lot of
different interfaces, even hugging chat
and much, much more. All of them work
relatively similar. You always have a nice
little chat interface. LLMs can basically
do only two things. They can expand text or
they can make text smaller. But this is big.
You can use code, you can use normal text. You can make tables, and
LLMs can also call tools. And tools can be, for example, a bit interpreter,
a diffusion model, the Internet, and you
can analyze data, make charts, and do a lot of cool stuff
with these things. Maybe in the future, they become a complete new operating system, and by the way, LLMs can
also talk to each other, and then we call them agents. And you also learned that
LLMs are multi model. They can basically
see, speak, and hear. Only get good output if
you give good input. And I showed you the basics
of prompt engineering. Please remember
semantic association. You need to give context. You can do this via shot
prompting, are roll prompting. You should structure
your prompts, and there are some tips like, for example, think step by step. Besides that, we also have
the chain of thought, the tree of thought, reverse prompt engineering,
and much, much more. But I think for most
people, this is overkill. This is not really needed. If you want to customize realm, you can totally do this. Easiest way is probably
the system prompt. You can simply give
some instructions. Then we have direct technology, so we can simply upload
data and then hat CPD or every single other
LLM can browse this data and simply
react in a specific way. Of course, if you
are a developer, you can do all of this
also over the API. You can develop your
own apps and you can do all of this also in
your own applications. You can do function calling
in your own applications. You can make
complete agents with your own applications with
tools like flow wise. You can create pictures inside
of your own applications. You can use vision inside your own applications,
you can do all of it. You have learned the
basics of these LLMs. They can do a lot of things, and I think you should start. Simply use them
because remember, you only learned if you
change your behavior. Earning means same circumstances
but different behavior. Maybe you did not know how to
use LLMs, now you know it. You only learned if you do it. If you want to be
a smart cookie, you can simply share this course because more people know
always more the view people, so everybody can learn together. Thank you for that,
and I'll see you in the next video
because this was it for Llams now we start to create pictures with
diffusion models.
12. The Diffusion Model Explained: Section is about
diffusion models, and there are a lot of
diffusion models out there. We have Dali, we have Imagen, we have stable
diffusion. We have Sra. Ra makes videos. We have mid Journey
and diffusion models can also make music and, of course, also audio. So basically what we do is, I want to show you the diffusion
process in this video, and then we will dive deeper into some of the best
diffusion models. So first, how
diffusion models work, and we do this really
easy and fast. So I have found a really, really nice article for medium. All I need is this
picture right here. Let's assume we have a big, big computer and we train our computer on images
on images like this. So we give the computer
images, for example, of this beach and we
describe it with a text. We give the computer the image, and we say maybe a beach with
the blue ocean, blue sky. There's some green on
the mountains and so on. We are really, really specific. After that, we add some
noise to the picture, like you see here, but we still describe what's
on the picture. So a beach, blue ocean, blue sky, and so on. More noise, same text, more noise, same text, more noise, same text
until you get only noise. In this process, the computer learns how these
pictures look like. This process he
simply understands that the words that you gave the computer yield
to this picture. So we can reverse this. If we have only noise, and we tell the
computer a beach, blue sky, blue ocean. There are some green on
the mountains and so on. The computer can reverse this and make out of the
noise this picture. Of course, we don't do this
with just one picture. We try to give the computer every picture that we can find. And there are, of course,
different diffusion models. For example, there's
also Adobe Firefly. Adobe Firefly is trained on
pictures of Adobe stock. Stable diffusion is open
source and it's free. Everybody can use it. And stable diffusion was trained on pictures
from the Internet. And because of this, we also can create nearly everything
that is on the Internet. We can create even celebrities. We can create NSAfeF
work stuff, and so on. Stable diffusion
is not restricted. Nearly everything that
is in the Internet, we can create with stable diffusion if we
give the right prompts. The prompts are the
descriptions that we give the computer to
make our picture. And for that instance,
it's really, really important to make good prompts because
we need good pictures. If we are not specific, we can create a pictures
that look like this. If we simply tell maybe a beach, we will get a random beach. If we tell him a beach, blue ocean, blue sky, and so on, we will get
exactly this picture. A quick illustration of
this process because some people like this
illustration, I use this a lot. Just imagine you lay down on the ground and
you look in the sky. Beside you is your girlfriend or your boyfriend or
whoever you want. And she tells to you, Can you see this cloud? It looks a little bit like an apple, but you don't get it. You don't see the apple. But then she tells
you, of course, just look, here is the apple, and then you start to
understand you see the cloud, and now your eyes see an apple because your brain
is trained on apples. Your brain most likely
knows how apple look like, and then you see the
apple in the cloud. Even if there's no apple there. And if your girlfriend doesn't say it's
maybe a green apple, maybe you think of a red apple, and that's exactly why we need to use good
prompt engineering. Because if we don't
are specific, we will get random pictures. If you want to have
a green apple, you need to tell the computer that you want
to have a green apple. Just like your
girlfriend need to tell you that the apple
in the clouds is green. If she doesn't tell you that, maybe you'll think
of a red apple, maybe of a green apple, maybe even a yellow
apple you doesn't know, so you need to be specific. So in this video, we took a quick look at the
diffusion model. The diffusion model
works simple. It's trained on
pictures and on text. Then noise gets added. The computer learns in this process how this
picture looks like. And if we give the
computer text afterwards, it can create these
pictures because it will randomly select the pixels that are right for our picture. I hope this makes sense for you.
13. Prompt Engineering for Diffusion Models: Starting with DALL E: This video, we start to use
our first diffusion model, and we want to start with Dali because Dali is the
easiest to use. Dali works inside of
JathPT so we already know the interface and the
prompts are really easy to write because
hatchPT helps you. So the LLM will help you
create better prompts. The first thing
that you can do is, of course, to simply
go into JathPT. You can work with the
normal multimodel JathPTO you can explore GPT and
you can search for Dali. If you go on buy hatchPT, you can press on Dali and
here you can start the chat. And here you can
create your pictures. Can either add here stuff for your prompts and you can also
use different aspect ratio. Let's just use wide screen. And now I just want to start
with a really simple prompt. I just want to type in CAT. We leave white aspect
ratio, and we send it out, and then we will get
our first picture back. And there we have it. Here are our first two pictures. Now, if you press
on this picture, you can see exactly what
prompt yielded to this result. So if you press on these right
here, this is the prompt. A beautifully detailed
white spec image offers a rain cat sitting by a window with soft
sunlight and so on. So you see the prompt
is really detailed, and I want to show
you how we need to write prompts for
this diffusion model. Remember, in Dali,
it's so easy because Chachi BT helps you write
such beautiful prompts, and then it's really no magic
to create good pictures. Dali is not the best
diffusion model, but it's the easiest to use. If you want to write good
prompts on your own, you should take a look at these. You need to include
subject, medium, environment, lightning,
color, mood, and composition. What all of this is meaning. So you can make
pictures of persons, of animals, of characters, locations, objects, and so on. The medium could be a photo and illustration
or something else. The environment could be outdoors on the moon
or somewhere else. The lightning could
be studio lights, neon lights or something else. The colors can be vibrant, colorful, black and
white, and so on. The mood so the cat could be, for example, calm or peaceful
or something like that. And the composition could be, for example, a full body view. So make sure to
include these things. You do not have to
include these things, but if you do not include it, the pictures will
be more random. So you can get a photo
or an illustration. If you do not see it specifically,
everything can happen. There are also bigger
prompting guides. And you can include
stuff like subject, actions, environment
options, color, style, mood, lightning, perspective, or viewpoint textures,
time period, cultural elements,
emotions, medium, clothing, text, and so on. This is a gigantic
prompting guide. I just want to leave you with these so you can
read it yourself. But if you want to do it fast, just think about the things because these
things matter most. An example that could work
is something like this. An illustration of a cat relaxed in a city
in vibrant colors, full body view at golden hour
with a 16 to ninpec ratio. So if we simply copy these, we can throw it into the Ali. So back into the Ali, we include it, and then
we get a specific output. And even here, ChatBT will help you to create
even better prompts. But this is a prompt that works in every single diffusion model. The prompting techniques they
work every time the same. And here you see right now we have a really specific picture. We have exactly the picture
that we wanted to have. And if you click on it
and go on the prompt, you see the JetPD make
your prompt even better. You can make prompts even better by including some magic words. For example,
cinematic film grain, ultra realistic,
dramatic lightning. You can use different shots and camera lenses if you
want the point of view, the drone shot, and so on. Can use cameras with
cinematic look. You can use different
filmmakers. You can use Genres. You can use keywords
for movements, for example, action scene. You can use different
photographers, for example, sport
photographers. You can use cameras
with action scenes, for example, the canon EOS, one D X, Mark two. You can use all these different lightning
so bright lights, warm, cold, low key
lightning, and so on. You can use the gold ener, and you can use all of
these different emotions. So make sure to include
what you want to see. This is the most important
thing because all of these diffusion
models are trained on pictures with detailed
descriptions, and if you make a
detailed description, you also get back what you want. If you just type in cat, the cat could be random. And now I want to show you once again these
diffusion prompts. I hope you know
how we make this. This helps with
prompt engineering. If we type in stake here, we will get a detailed
prompt for a stake, and you already know
how this works. If I simply copy
these, of course, I can throw it into
the Dali interface, and then I will get back
a picture at the school. So let's just throw
these in here. The Spec ratio is
right now one by one. This is the devolt settings, and this prompt will
work really good because we have
trained such a GPT. You already know how
to train such a GPT, and now I want to show
you the training data. But first, let's just
take a look at the stake. The stage is really
good because we also include cameras with
camera lenses and so on. If we go on to the
diffusion prompts, I simply tell into
the instructions that this GPD needs
to make good prompts. And then I upload this document, and this document is
a complete structure, how the LLM should
structure these prompts. My training data looks
something like this. The prompt structure a
medium of subject with the characteristics relation to background, then the background, the details of the background, interactions with
color and lightning, and then take on or drawn with
specific traits of style. I give some descriptions, then some examples that I like. And lastly, of course, I include all the nice little keywords that make these pictures better. You can just use my GPT if you do not have the time
to train your own GPD, and I will simply
link this GPT to you. So you can make really
good prompts really fast. So in this video,
you have learned how to use any diffusion model. It's important to
write good prompt, and a good prompt should
be specific with theme, medium, setting,
lightning, color, mood, composition, and eventually
also the spect ratio. And if you do not want to
write these prompts yourself, you can use ALE, and hechPT
will help you automatically. And if you want to write
really good prompts also for every single
other diffusion model, you can simply use MGPD
and get better outputs. And in the next video, I want to show you the basics of Maturne. ALE is the easiest to use, and Mahoney can do
a lot more stuff. And I would strongly
recommend you to make your first picture in DLI right now because you
learn most by doing.
14. Midjourney Basics: This video, I want to
talk about Mi hourney. In my mind, Mi Journey is one of the best
diffusion models, especially if you want to
make realistic pictures. The first thing that you need to do is to go on their webpage. Right now at this Minish, you can try this out
completely for free. I think you can make
roughly 30 pictures for free on their webpage. You need to go to
mimichourny.com, and then you create
your account. You can simply log in
yourself with Google. As soon as you have
created your 30 pictures, most likely, you need
to upgrade your plan. It costs you, I think,
nine bucks a month. If you are on Explore, you can see what other
people are making, and you see the pictures,
they look really good. You can also go on the search
and search, for example, for dogs, and then you can
find some pictures about dogs. The next thing is that you can search for hot for top daily, and for likes, and then you can simply find for
yourself what you like. If you want to create something, you should go over to create. Here are the pictures that
you have already created. Most likely, you have none. And if you want to
create new pictures, you need to type in
your prompt right here. So you simply type in
what you want to see. I just want to run
with this prompt here. Christmas deer head with pink, bow and Christmas wrath. Pastel watercolor
on white background in the style and so on. The next thing that you
can do is to press here, and here you have some settings. So you can make this in the
aspect ratio that you like. Let's just say one by one or 16 by nine because we can see it a little bit
better in a course. Then you have the mode. You can use the standard
or the raw mode. The raw mode is better
for realistic stuff. You can use different versions. Normally, we always
use the newest ones, so for example, 6.1
at this minute. This is personalized, so if you already have created
a lot of pictures, you can adapt your style. Then you have stylization, and if you do not
know what this means, just go with the mouse over it. Mid churney can add a
specific mid journey style, and if you increase it,
you have more style. Wildness can make you generate unexpected results and
the variety in your grid. So you create four pictures, and if you go up
with this variety, these pictures will vary a
little bit in your grid. Then you have fast and Durbo
just leave it at fast, and then we create
our first picture. If we send this out,
we can create this. And while this is creating, I want to show you the
seat because the set is always the first starting
point of every single picture. If we press on these
and type in dash seat. We can use a random seat, for example, this right here. And now we will get two
different pictures. This picture will not be completely the same
as this picture, but if I do this once again and also use once again
the same seat, we will recreate exactly the
same picture once again. Let me just show you for a
quick moment because the seat is important if you want to
create character consistency. So if you go down here, these are the first
four pictures. This Christmas deers are nice. Now are the second four, and you see they are not completely the same
as the first ones. So you see we are a
little bit closer. Generally, they are similar
but not the same ones. But now if we go up here, you see that we have exactly
the same pictures as here. So this is the same picture as this picture because we
have used the same seed. So if you want to have
character consistency, you can work with the seeds, and then you can maybe tweak
the prompt just a tiny bit, and you have always
really similar styles. So remember, the
seed is important. This is basically the first
thing that you can do. And if you do not like
one of these pictures, you can also edit them. If you press on these pictures, you see that you have a lot of different options
that you can do here. Here you can make small
or strong variations. By pressing on it, it
goes automatically. Then you can make an upscaling. You can make a zop dial or a creative upscaling and
the resolution gets bigger. So let's just press on upscale. Then you can also remix it. And if you don't understand it, just go with the mouse over it. If you press subdile or strong, you can simply tweak your prompt and make it
a little bit different. But right now, I do
not want to do this. The next thing is pan, Zoom, and here you
have also more. But before I show you this, I
want to show youvy upscale. If I close this
down and go back to create you see that this right here are
the first variations. So you see we have this picture, and now we have four
different variations of this picture that are really, really similar, but a
tiny bit different. Sometimes a little bit
more of these red things, sometimes a little bit less. So you see these are
just small variations. And here, this right
now is the upscaling. So we made a small picture
in bigger resolution. If you press on this or
if you would download it, this simply has the
higher resolution if you zoom in a lot. So you see the resolution
here is really, really good. Compared to the first one, it's a lot better, so you
see it is more clear. So it makes simply the
resolution a little bit bigger. Then we have pan and Zoom. I do not like this
anymore because right now we have
on more the editor. And if you press on this editor, you can edit this picture. And here you can
do the same thing as with the Pan and Zoom. You can simply do
this right here, for example, and then
you press submit, and now Mick Cherney will do the out painting and paints
also here new pixels in it. But you can also do more. You can edit also
with the inpainting. Let's just say that you do
not like this right here. You can simply delete it and then make your prompt a
little bit different. So we do not want to
have the pink pow. So we press submit, and then we will get an in
painting without the pink pow. Let's just go on create and then you can
see what happens. So here are the first
four generations, so you see we have simply
generated a few new pixels. This was also not perfect, but yeah, come on. At least the picture got bigger. By the way, I think
I like this one. That's not that great. Yeah, they are okay. And here are the next ones
without the pink pal. So this is how you can
edit your pictures. If you go on organized, you have a lot of
different folders that you can make just to make it
a little bit clearer. If you go on personalized,
like I said, you can like different pictures, and then you can adapt
your specific style. If you go on edit, I think not everybody
has this right now. I think you need
to be a long time on this web page in
order to get it. Maybe as soon as you see the
course, you also have this. You can simply
upload an image from your computer and you can do the in painting completely the same. So just press on this, and now I just want to upload
this picture right here, and let's just say I want
to have a green hat. If I delete this, I can
type in in the prompt, what I want to see
guy with green then we send it out and we will get the green
head most likely. We will also create right
here the background, at least how I see it because this picture
had not a background. So you can edit your own
pictures really, really fast. And there we come
on, this is a mess. But maybe the next
one is better. Yeah, this is a lot better. Also, this works. Yeah, come
on. These things are cool. The first one is a little bit
of a mess, but the second, the third, and the fourth one, they are relatively okay. So you can also edit
your own pictures, and also here, you can
do the out painting. Let's just say you want to
have different resolution. You can simply
press Submit, edit, and then you will get
your new picture, and you recreate the
pixels down here. And, boom, there we have it
four completely new pictures. Some of them are good, some of them are not really that great. And by the way, if you do not like a picture that
much, of course, you can simply go in and
edit it with the inpainting. So let's just say
this was not perfect, and maybe also this was not
perfect, you can edit it. I think you get what I mean. Next thing that you can do
as soon as you have created such a picture or
as soon as you have edited it with the
arrays or with whatever, is that you can
also do re texture. If you press on re texture here, so this is right now no longer the edit,
but the re texture. You can change this
picture a little bit. You can make similar pictures. This works similar
to stable diffusion. Stable diffusion calls
this control nets. And here Matron also
tells you what happens. Re texture will change
the contents of the input image while trying to preserve the
original structure. For good results, avoiding
using prompts that are incompatible with the general
structure of the image. So what we could do here
right now is, for example, that we type in guy with green head or just
guy with heat, and we also type in cyberpunk. Then we simply press
submit re texture, and then we will get something that looks somehow similar. So we will have a similar pose, similar compositions, but
in a cyberpunk style. I hope you can see
how this works out. This is really a cool feature. Until now, this was possibly in stable diffusion with
the so called control nets. And now we can also do
this in mid journey. So remember, with the edit, you can simply edit
all your pictures, and with the redexture, you can redexture them. You can use stuff that is called control net and stable
diffusion also in mid journey. Here, you don't have
that much control, but this is also a nice feature. That is basically everything
that you need to know inside of M journey if you
want to create really fast. Yes, the tool is a lot bigger, but if you just want to
start as fast as possible, this is everything
that you need to know. You can create pictures, you can edit pictures. You can use different seeds to recreate the same style
over and over again. Have fun in Mjourney
like I said, as fast as possible.
15. Ideogram and Adobe Firefly: This video I want to give you an overview of two
diusion models. We have ideogram, and
we have Adobe Vrefly. These are also two completely
separate divusion models. Adobe Firefly comes from, like Adobe, and it's also integrated into
Photoshop and so on. I think Adobe is special in that manner because you
can create pictures, and Adobe only trains on
pictures from Adobe Stock. So you do not have to worry
about copyrights and so on. This is special because
Money and so on, they can create pictures from beepers or also from companies, and sometimes you can
get copyright claims. But if you use Adobe Firefly,
this is not the case. And the ideogram is special because it's really
good with text. So as soon as you go on
one of these web pages, this right here is ideogram, I am in the free plan. So no, I also do not pay for every single
model under the sun. And here you have a
really clean interface. You have home, and here you can type in what
you want to see. The prompt engineering
always works the same. Here you have all realistic
design, three D and anime, and you can simply look for yourself for what
things you like. If you use ideogram, I would strongly recommend
you to create pictures, for example, like these. Pictures where text is
included because here, ideogram is really good. Let's just make a test. A fox that holds a
sign with the letters, catch me if you can and then we can simply
make some adjustments. So the magic prompt, do we want to have
it on out or off. If you leave it at
on, your prompt gets automatically enhanced. Then the specs ratio,
the visibility, you can only go
private if you pay, then the model and the
color palette if you want. But at this minute, I just
want to send this out. There we have our four pictures. If I press on them, yes, this took a little bit of
time right now because they can only generate slow
if you do not have a plan. But you see the text is really
good. Catch me if you can. The text is perfect. As
the fox is somehow good. Then let's just
see the next one. Where is it? This right here, catch me if you can. The fox is really nice. So I really like this
prompt or this picture. This one is also
relatively good, but this sign is floating
a little bit around, so I like this one
a little bit more. And this is the last one,
catch me if you can. Also, this is really good. So basically, just go into this program and play a
little bit for yourself, especially if you
want to render text. This is really great. Here is
also something that I like. Logos and so on are
completely perfect. There's a picture that I like, so play with this a little bit. If you go on creations, you can see what
you have created. So basically, there are some
pictures that I have made. And if you go on Canvas, you can also edit your stuff
similar then into Murne. This is basically everything
that you need to know about Ideogram Idogram is
really, really easy to use. The next thing is Adobe Firefly. Adobe Firefly works similar. Here you also have
generative film, text to image, generative
extent, and generate videos. Videos at this
moment do not work. Here you need to
join the wait list. But you can absolutely create
and edit with Firefly. If you press on
these right here, you are on their
Firefly webpage. And if you go back once again, you see what things
that you can do. You can do text to image, generative film, generate a
template, generate a vector. So if you use Adobe Illustrator, you can also generate vectors, generative recolors
and text effects. You can play with all
these things around. The interface is really easy. If you press on text toimage here you can simply try it out. You can also use the pictures that other people have made. Let's just say you
like this one, if you press on it, this
gets automatically copied. Down here, you can type in your prompt and you
can try this prompt, and on the left side, you
can use what you want. So let's just use Firefly three. I want to have the fast mode, it should be, for
example, four by three. Then what is the content type? Is it art or photo? For example, art, then
the compositions, you can also upload the
reference pictures. If you want to upload
reference pictures, then you can upload, for
example, reference styles. So let's just say you want to have this reference picture, yeah, but for this prompt, it's really not perfect. So this would not
work that great. So I put the strength
down ar to zero, and then I want to have, for
example, a style reference. Let's just say I want to
have a little bit more neon, so I include the
style reference. Then we can also include
other popular effects. For example, the
hyperrealistic effect, then the color and tone.
Let's just say warm. Then the lightning, studio
lights, the camera angle, let's just say white angle, and then you can
press try prompt. And yes, this prompt is
right now a complete mess, but I hope you get what I mean. These settings are
really easy to use, and still we have impressive
pictures. Yeah, come on. I really like this tiger here, so you can absolutely play with these things
around a little bit. If you like your picture, of
course, you can download. And the next thing
is, of course, that you can also
edit your pictures. You can either edit
these pictures here if you simply
press here on edit, or you can also edit
your own pictures. If we go back once again and
press on generative fill, you can upload your
pictures here or you can edit the pictures that
are already included. Let's just say you want
to edit this picture. If you press on it, you
can edit however you want. You can either insert,
remove or expand. If you press on Expand, you can make these
pictures bigger. If you simply press generate, the biv light will simply do the out painting and
includes here something. Then you need to see
what works for you. Let's just say I want to
have this and I press keep. The next thing, I want to
remove something, for example. Let's just say I do
not want to have this funny thing here because I have no
clue what this is. I can simply remove it, and then it should go away. And, bam, there it is. I want to keep it because
I think this is nice. The next thing is insert. Let's just insert
something here. Let's just say I want to
insert the tiger, for example. So tiger, we press generate, and then we can insert
different things here. If you want to edit, for example, Bebor
so this works. You can change clothes. You can change hair colors. You can change
whatever you want. Yes, this tiger is a mess. Come on, let's just keep it. I want to show you one
more thing with a human. So let's just say I want to
add it to this right here. I want to do the insert, and I want the cheese wearing, for example,
different clothings. I can simply copy these
clothes right here, and then I can type in
what I really want to see. Let's just type in,
for example, Jacket. And there we have it, and I think this turned
out somehow okay. Let's just keep the first one. None of this is
completely perfect. Adobe Firefly, this is a
tool that I don't use a lot, but some people really like it. It's especially powerful
if you already work with Adobe Photoshop because
here it is included. If you work with Illustrator
and Photoshop and so on, you should totally work
with Adobe Firefly. So this was basically ideogram. Use ideogram if you want to generate text
inside of pictures. And Adobe Firefly, I
would personally say, use it if you use already
the Adobe product, so Illustrator and
the Adobe Photoshop, or if you want to be 100%
certain that you never ever infringe copyright because firefly is trained
on Adobe stock. So try these two tools out. And, of course, the prompt engineering is always the same. See you in the next video.
16. Open Source Models: Talk about open source
difusion models. Mainly, it's stable
difusion and flux, but there are also
other models like recraft and Omnigen
and much more. This topic is gigantic, and you have the
most flexibility. You can either download
these models and run them locally on your own machine or you can also run
them in the Cloud. The easiest and fastest way
is to run them in the Cloud. But nonetheless, I
want to show you some free options so that
you can also run them completely for free and not to pay for every single
feature under the sun. So the first option
would be CFY. Now, science you do not have a lot of time in this course, it's maybe not the best option. The learning curve is
really steep. This is CFI. I have a course that
covers this in detail, but CFY is not the thing
that works really fast. The second option is, for
example, web UI Forge. This runs relatively easy, relatively fast, but also here, you have to download
a lot of stuff. So it's also not that great. With Forge, you can also run stable difusion flux
and much, much more. What I want to
show you right now is focus because with focus, you can run stable difusion, and stable difusion
is open source, and you can run it for free. Either in a CLP notebook or
you can install it locally. If you want to
install it locally, you can simply do
it via this link. So this right here, and then
you can run it locally. But what I want to show you
right now is the fastest way, and this is simply
this CLP notebook. So opening C and
then you can run this called notebook by
simply pressing on play, and then we will get a radio
link with a nice interface, and here we can run
stable ifusion. I want to show you
how this works. Then I want to
show you Leonardo, and then I want to show
you lax. We do this fast. After a while you get this link, run on public URL, and we press on this
link right here. Then a grado phase will open up. And here you have
a lot of options. The first thing is that
you can press on Advanced, and here you have
a lot of settings. If you want to start fast, just leave here
initial, use speed. Number of images,
let's just say one. Here, we have the
special sauce and stable diffusion that we
also have a negative prompt. You can type in what
you do not want to see. For example, ugly and
blurry or also colors, let's just say red. We do not want to have
red in our prompt, and then we type in
what we want to see. Let's just say Instagram model. And if we press generate, we will create our
first picture, and we will have an
Instagram model, and it will not be
an ugly picture. So this is the
picture quality and not the Instagram
model that we create. It will not be blurry
and it will not be red, so red is most
likely not included. And there we have it
like normal brown hair. We have a nice picture, and the generation is
also somehow okay. Come on. We use our
free cooled notebook. We can use this
forever for free, and I think this is cool. There we have our picture. The quality is really good. Then the next thing, you
can press on styles. Here you can type in the
styles that you want to see. For example, a side
three D model. If you press on this
and if you type in CAT, for example, let's just say CAT. You will create a CAT and it
will look somehow like this. I have also included a sharp
and focus version two. So we will also mix in a
little bit of photoalisms. If we decrease this weights here and only use the
si three D model, it will be a little
bit more in this. So why stop this, for example, and I create once
again only with this si three D model and
then it should work better. For the next pictures,
I can include, for example, the other
styles once again. And I just stop this right now. The next thing is models. You can also use different
models and different auras, but most likely if you just
want to use this fast, you don't need to do a deep
dive in models and Las. And the advanced settings, most likely, you
do not need them. But what you eventually
need is enhance. If you press on enhance, you can make small variations, and you can also do upscalings completely the same
as in mid journey. And what you can also do is
to press on input image. Here you can upload images, and also here you
can make upscalings. Let's just make once
again a realistic cat. Let's just type in cat here. Yes, I do a really bad example here with the
prompt engineering. I just want to make a cat, and then I want to show you
what we can do down here. And there we have it right
now we have our cat. And if we throw this down, we can make variations. So either subtle or strong
and if you press on Zu dial, you can also type
in, for example, happy, and you get a happy cat. You simply can press Create, and then everything will
change just a tiny little bit, and maybe the cat
tries to smile. Yeah, let's just see
how this works out. This works a little bit better with people if you include this. Yeah, come on, maybe it looks
a little bit more happy. It works better if
you do this with humans and if you type in smile, for example, or with colors, with this cat, you could change the colors just a tiny bit. So with these variations,
you can play with them. You can also make upscaling, so you can make
upscaling in two weeks, the resolution, press on this and then press
generate. Let's just see. Yeah, come on, it looks
a little bit more happy, at least how I see it. Then what you have
is image prompt, and this is especially cool because you can
press on Advanced, and then you can upload here your things and you
can use Image prompt, Ba kenney, CPDS and face swap. Let me explain you
how this works. If you include this right here
and you use Image prompt, you can also type
in, for example, do and if you press Create, the first few frames will be completely the same frames
as this right here, so we can use the
style of this picture. So just see for yourself
the style is really, really similar than
the style from the previous generation
because we use the input image with
the image prompt. So we have a really similar
style than in this picture. I hope you can already
see. And there it is. We have a really similar style, so you see the green background, similar lightning, similar
colors, and so on. The next thing that you can
do is Piracani or CPDs. These two things are
called control nets, similar to the previous
mid journey video. If we type in, for example, Dier right now, we
will use a Pyraky. We will use a control
net that controls the depth or the poses
of these images. Basically, we will
create a dire that is in a similar post
to this kitten here. It will sit most likely somehow, and it will be a really, really similar post in this right here. Be also the tail will
be completely similar. Also the ears will
be really similar, but we should get a tiger. Just see for yourself, we have the same compositions, but you see we create
a tiger right now. Yeah, this will
get cute, I think. A small little tiger that sits completely similar
than our kitten, but the frames will be
recreated with a tiger. And after 50%, the frames can also take
over a little bit more, and it also changed
a little bit. So right now, you see it gets
more and more and more like a tiger and less
than our kitten. And if you want
to have even more kitten in it or even
a more similar pose, you need to play a little
bit with these control nets. You see, like the
pose was not perfect. It is similar, but
it is not perfect. What you can do is to increase the weight a little
and the stop bet. If we increase the stop bet, for example, at 0.8, we will use 80%
of the steps from the generation in order
to recreate this kitten, so it should be a
lot more similar. You see it right now, it's
really like the kitten, but a little bit to different
colors for the tiger. And this will go on right
now until 80% of the frames, and just the last frames will take over a
little bit more. Let's just see if
this works or not. Like I said, you need
to play with these. So I think this
picture get messed up because we also
add this thing here. Yeah, this is not perfect. We need to play
with these things. I tried it once again, and I think this right now
is a little bit better. We have a really
similar pose right now. So these control nets
allow you to use the pose. This is especially
powerful if you have, for example, humans that
are in a specific pose. If you have a ballerina
that does something fancy, you can recreate with this Bacani something that
looks really similar. The next thing is face swap. You can upload, for example, a picture from your face
and simply swap it. And you can also combine
more of these things. You can use, for example,
Bakani from a ballerina, and then the face swap
from another human and then maybe something
else as the style reference. So you can play with this
a little bit around. The next thing is
the inpainting. You already know how this works. You can simply throw this down, and let's just say that we do not want to have this tail here. We can simply do
the in painting. Now the painting in focus with stable
diffusion is really big. Here we can do a lot of things. But generally speaking, if
you just want to work fast, work just like in mid journey. This is a gigantic tool. We cannot go over
every single detail. The next thing is describe. If you use describe,
for example, for this prompt
and press describe this image into prompt,
we will get the prompt. You can also upload images that you have on your computer, for example, and then you can see what a prompt
could look like here. This is the prompt that
the diffusion model Z. An orange digger
stands on some rocks. So, come on, this is.
Then we have an hand. You already know we
can make upscalings and so on and the metadata. If you include this picture, for example, you can
also apply metadata, and this metadata is
especially powerful if you include it or if other
people are including it, then you can use their settings. The next thing that I want
to show you is the logs. If you press on settings, you can go on to
the history logs. And here you can see what
you have created previously. You can see all your
creations and you see what resolution was prompt and what settings got
you to this result. This is basically the fastest
way to explain your focus. So focus is a gigantic tool. Stable diffusion works
in the background, you can use it forever for free. If you want to use a web
interface for stable diffusion, you can use leonardo.ai. Leonardo.ai is also one of my favorite tools if you want
to work in a web interface. And here you have basically
the same things as in focus. It is also a little
bit easier to use, but don't worry about
every single tool under the sun in Leonardo AI, you also need to pay
relatively fast. Also here you have,
for example, Canvas. You have the real time
generations, you have motion, you have image creation, you have upscalers,
you have canvas. You can train your own models, and you have three D
texture generation. So a lot of control
in LeonardoEI they have also some small tutorials how to use all their tools. So just take a look at
these if you want to dive deeper and also let me know if I should include a
separate lecture. But normally like we want to
do it as fast as possible, and I think you should
work with focus if you want to use stable diffusion
as fast as possible. Now, if you want to use flux and the different
other difusion models, you should go on replicate. Replicate is not for free. Here, you need to
sign in with Github. So yes, these open source tools, they can get a little
bit overwhelming at diverse glen but as
soon as you get it, they work also really fast. Here you can use the lux models, you can use re craft. You can use every single
model under the sun. Stable diffusion 3.5 large. There are a lot of
really good models. If you press on these models, they are really easy to use. You can simply type in in the
left what you want to see, and on the right side,
you get your output. So this looks really realistic. Something that works really
good in flux is also text. Let's just say a woman holding
a sign with the letters, I am not real. And then we press
Run but attention, this costs you, I think, $0.06. Yes, $0.06 per generation and you need to connect
your Geta profile. Here you can see some pictures that were created
with this model. So this model works really
good and just wait for this output because
also the text is rendered stunningly good. I am not real, and this is a perfect picture. In this video, we took a look at the open source
diffusion models. We have stable difusion. We have flux, we have recraft. We have a lot of
different things. We can run it also with a
lot of different options. We can download them and
run them locally with, for example, CFI or Forge. One of the easiest ways
focus inside of Google Colab because you can press Play on one button and use
it for free forever. And if you want to work
over an API, use replicate, and here you can use every
single diffusion model under the sun that is open
source and has an API, but here you need to
pay a little bit. So you can play with this
around just for a tiny bit. I would guess that you
should stick to focus if you want to create fast. See
you in the next one.
17. Recap of Picture Generation with Diffusion Models: This section, we have
learned how we can use normal standard divusion
models to generate pictures. You have learned how
they work, computer trained on text and picture. In that process, the computer learns how to generate
this picture, and then you can recreate it, and you need to have good
prompts for good outputs. You need to be specific. We have a lot of different
divusion models Dali, McTerny, ID gramatob Viavly,
table diffusion, flux, re craft, and
much, much more. But all of them work
relatively similar. You always need good prompts, and you have learned
how to write them, and you also so
that you can edit your pictures within
painting and out painting. Right now we want to tell you. Learning is same circumstances
but new behavior. So basically, until
now you maybe did not know how to use
these diffusion models, now you know, so you
should totally use them. Make some pictures
for your marketing, for YouTube thumbnails,
for presentations, for ads, for whatever you want. Only then you have learned. Or you have just some fun
creating these pictures. I also want to tell you what
good learners do they learn together because more people always know more than people. So if you could
share this course, this would really
mean the word to me. Maybe it also means the
word to the other person, and if the other
person gets value, they describe the
value to you because you have told them. So
thank you for that. And I see you in
the next section because diffusion models
can do a lot more. They can make audio. They can make entire songs, and they can make videos. So see you in the next section.
18. Ai Videos with Kling AI: Yes, AI can also make videos, and we have a gazillion
different tools. We have BCA labs, we have runway, we have hotshot. We have dream machine
from Lumaabs. We have SRA from Open AI. Yes, SRA does not
work right now, and we have Kling AI. Of course, there is a lot more, and all of these tools, they
work relatively similar. If you go on PCabs, they have something
special here, so you can also create these videos that you saw
going viral sometimes. These videos right here
where stuff is melting. So they got viral on social
media from time to time, and the BCA, you
can create them. In way, you have also
a lot of flexibility. You can simply log yourself in and create all
of these videos, and you can also see
their own tutorials. Hot Shot works really easy. You simply type in text
and you get video back. In the dream machine
from uma Labs, you have basically
the same thing. We always also start and end frame in
most of these tools. And I think right now at this minute Kling is also
one of the best things here. You have AI images, AI videos, video editor, and so on, and that's why I just
want to show you ling AI because like I said, right now at this
minute, King AI gives you really good results, and you can start
completely for free. That's at least in my mind, the coolest part of it all. Most of this stuff
works for free. Most of these AI
video generators, they work relatively similar, so I just want to
show you Kling AI, and if you really want, you can play with the other
tools for yourself. The first thing that
you need to do is, of course, to go on cling.com. This is our Chinese web page, but they have also
their English version, and here you can do a lot. If you go on home, of course, you can see the overview. You can see the best
shots from the videos. Here they have generations where they have also included sound. Am I dreaming? I am so tired. So if you take
your time, you can really make cool generations. These are all short films. You can simply look
at them for yourself. They are stunning. Then you
see the best creatives. These are just pictures. You can see that they make also really nice pictures here. This is also something
that I like, for example. So you can create videos, you can make short films, if you clip some things together and you can
work really nice. You can make AI
images and AI videos. If you press on AI images, you can simply create images. I have to tell you I do
not love this feature inside of link because
for AI images, I think midtone
stable diffusion and so on is a bit
better than cling. So don't waste your time with
AI images inside of Kling. But what you should
do is to press on AI videos because
with AI videos, you can do really a lot. You can type in a prompt. You can increase or
decrease the creativity. Then you can use the mode
that you want to use. If you use the
professional mode, you need to have upgrade
to the premium feature. You see simply the quality
gets a little bit better. I had here the premium plan, but right now I do not have it. Then you can use five or
ten second generations. You can use different
aspect ratios and the number of generations. Lastly, you can also
use camera controls and the negative prompt just
like in stable diffusion. But the negative prompt
is also optional. So let's just try this out. Let's just use one prompt here. And of course, they have the
best practices if you want to dive deeper into the prompt engineering
specifically for ling. But generally speaking,
you should always just use the same prompting techniques that
you already know. So subject with the
movements, the scene, the scene description,
the camera language, and the lightning atmosphere. And here they give you
a detailed description how you can write such a prompt. Here they give you
some examples. This is a classic prompt, then this is a prompt that
you made a lot better, and here they have a really,
really descriptive prompt. And down here, you see what
changes in these videos. If you press on these, you
see that generally speaking, you got a good video,
but of course, the better prompt yielded
even better results. Let's just look at these. You see you have a
few more effects, and I think the video is
generally a little bit better. And if you have a really
descriptive prompt, you see that it gets even a
little bit more impressive. What you can do is, of course, to simply copy this prompt
and then throw it into your application and see for yourself how these
things are working. Here they show you a lot of different examples with a
lot of different prompts. Like, there is no point that I show you every
single prompt here. You can simply look
at this for yourself. It's really easy to use. Then if you go back to
Kling, you can, of course, use either Kling
1.0 or Kling 1.5. If we go in 1.5, we have,
generally speaking, a little bit better quality, but some features are not
included, but they will. Let's just work with Kling 1.5. Include a good prompt, the creativity at medium, the standard mode, 5 seconds, 16 by nine, one video. I don't want to include any
specific camera controls, but you can do it
if you want to have horizontal vertical
Zoom or some o, come on, let's
just use the Zoom. And I just want to have
a small Zoom here. And then a negative prompt, let's just use logo, watermark, blurry, ugly, and then we press generate and we
pay ten credits here. All in all, we get, I think, like 100 credits a day, and then you can
create this stuff. And while this is creating, you can also leave the page and do similar things
in the meantime. So let's just do this.
If you go on cling 1.5, you can do basically
the same things here. But if you are in 1.5, some features are not there. If you scroll down here,
the camera movements, they are disabled in 1.5, but I am sure they
will come back. If you go once again
vacuum in cling 1.0, they are included once
again, of course. Then if you go on
image to video, so this is text to video. If you go on image to video, you can throw up your images, and then you can mix
them with a prompt. And you can also use
this motion brush. I want to show you this
motion brush immediately. You have also here creativity, standard mode,
length, and so on, and also the camera movements, but they are right now disabled, and you have a negative prompt. So if you use, on the other hand, cling 1.5
right now at this minute, you do not have the camera
movements right now included, and you do also not
have the motion brush. So let's just use 1.0, and then we upload the picture. It does not matter
what picture you use. Let's just use something
from my generations. I just want to upload
this right here. So we can simply
animate this guy, and I want to do
it really simple. Come on. A guy, docking. Then, of course, you can use draw motions with
the motion brush. If you do not use it, this will be just
a random creation. But if you use the draw
motion on the other hand, you can simply tell the diffusion model
how it should behave, and they also give you
some instructions. You can use, for
example, the area one, use Shrek, and then press some specific things
that you want to use. You can either mark
this for yourself, with a static area. Or you can also
use, for example, the auto segmentation and press on the stuff that
you want to animate. If you want to delete
something, you can also delete. So you can do this
however you want. It's important that you just mark the stuff that
you want to utomate, not automate animate, of course. What I want to do right
now is, of course, to add movements, and for that, I do not use static, but I use area one, the
outdo segmentation, and I simply press on
every single thing that should be not
still this time. As soon as you have found out
what you want to animate, so let's just say
I want to animate right now this whole guy,
like you can see it. What we can do is
to press on track, and here we can now draw
what this guy should do. So let's just say
this guy should go in this direction and maybe a little bit then in
this direction. So we can simply
draw here something, and then you see how
this is working. If you press confirm,
this is okay. If you don't confirm it, just do it once again a
little bit different. So let's just say you want
to have him in this way. I think right now
this is working, so we press confirm right now. And then we will animate
this guy and this guy will simply walk into this direction as soon as we press
generate, of course. In the meantime, we had our other video with the banda
that is drinking coffee, reading a book that has also some glasses so you see you
can make cool generations. Then this guy is doing and
he is moving after it. Then if you go down, you have
your motion path included. You have, of course, also the creativity and so
on. Press generate. And then you will
see that we can animate this picture with ease. By the way, you have also
a motion brush user guide. If you press on
it, they show you exactly how you
can use this tool, and they give you also a lot of examples that you
can take a look at. Here they have
animated this ship. Let's just take a closer look. This ship, then it was marked where these
things should move. So here, they used the brush
tool to move the ship in this direction and the
water in that direction. And this was the video. So you see it works
really, really great. The animation is awesome because the ship moves in a different
direction than the water. Get this cool effect that it
would be windy on the water. The water moves in
this direction, but still the ship can move
into the other direction. The same thing is true
here for these dogs. They have simply
marked the dogs, and then they have told the dogs in what direction
they should look. And if you press here, play, you see that the disc also
turned out to be perfect. Let's just make it big. The dogs look exactly into the direction
where you brush it. This thing with the
apple is also great. They have simply
marked the apple, as you can see down here, and they used the brush tool
to move the apple downwards. You can see the output here. It worked great. And you see, we also have the water
that is splashing. Let's just make this big. If you look closely, it is not 100% accurate, not 100% perfect, but
this is a nice video. You can even make commercials
with these videos. And here they have
the cat and the cat is jumping over this thing here. Let's just take a look. Here you see that the cat is jumping. This also turned out
to be really nice. Yeah, the landing
was not perfect. She's not on point, but this can happen to a
cat from time to time. Also, here, you have a lot of
examples that you can use. Like you can make really
stunning animations. You can brush here
however you want. The next thing that I want
to show you is, of course, that you can do even
more in the meantime. So if you go on image to
video, you can, for example, delete this guy here, and then you can also press
at end frame at the end. So let's just do something
really cool right now. I want to upload this picture. This is a mid journey picture. Then I press at end frame, and then I upload
the next picture. So you see these two pictures, let me just open them up. This is here a girl,
and I have recreated a girl with the same seat
that is a little bit older. You already know the
game how this works. So this is her a
little bit older, and this is she a
little bit younger. And now we want to
transform her with a video. These videos they got
viral from time to time. And here we can simply
type in a woman aging, for example, we have the start frame, we
have the end frame. Then we can not use the motion brush right
now at this minute. But we have here every single other thing
at the default settings, and then we can simply
press generate once again and we will recreate something
really, really cool. So here you can make a lot of generations one
after another. In the meantime, I will show you some generations that I
have made previously. So here you see, this was
a really simple prompt. I think the prompt was a
small dog is lying on a cat. Here you see a beret that
dances in the jungle. Here I used, for example, a picture from flux, and I have simply made her dog. You see this works
really, really nice. There are a lot of
posts on eggs that got viral that did
something like this. Here I did the same thing, and the second generation turned
out to be even better. This really looks like
real generations. The only thing that is messed up here is this hand a little bit. In the first generation, also the hand is messed
up a little bit. Here I have made
something with, like, a landscape, and then we
go into another picture. This is start and end frame. So you see basically we
can move here around. Then this is our panda
that I have generated. This panda is right
now simply reading, and then we get our
new generations, and I will show you them as
soon as they are done because this is done like in a
few seconds right now. One of the generations is done, and surprisingly, it's this one, the thing that we started later. And here you can see how
she is getting older. You see that this works
really, really nice. She starts out young, and then she transform
into this older version. These are these videos that got viral sometimes on Twitter, and you can recreate them
right now if you want. Yes, sometimes it does not
turn out to be perfect. But if you play a
little bit with these, you can totally shoot for these. And that's basically every single thing that
you can create. I will blend the next thing in as soon as this is generated. So basically, this is how
you can work with Kling AI. You can simply make an account, and then you can start for
free, at least right now. You can dipe in
text and get video, and you have a lot of control, and they also tell you how
you can write your prompts. The next thing is, of course, that you can also
images to videos. You can simply upload an image, and you can also transform
it with this motion brush. You can mark it
and you can simply tell the AI where
this thing should go. And the next thing
is that you can also include start
and end frame. And with start and end frame, something like transformations
is really, really cool. So please just give
this tool a shot. I am convinced that
you will find it cool.
19. Text to speach with ElevenLabs & more: That AI can make voices. Yes, I like that,
too. So this is Aloy. This is text to speech from the OpmiPlayground, and
you already know this. We have a lot of tools that
can make text in voices, and we can do a lot more. This is one of the
easiest tools. So on the OpmeiPlayground, you can simply type in
what you want to hear, and then Opmeai will
simply create this. There are also open
source alternatives, for example, F five TTS, can install this locally, and if you want
to test it quick, you can also make it work on this hugging phase space
completely for free. You simply upload an audio
and then you can type in the text that you
want to generate and you will clone your voice. But I think one of the most
powerful tools is 11 labs. Because in 11 labs, you have a lot of flexibility. You can also start
for free and you have a lot of languages.
Let me just show you this. The 11 labs voice
generator can deliver high quality human like
speech in 32 languages. Perfect for audio books, video voiceovers,
commercials, and more. So you hear the
voices are really, really good, and you
can do a lot of stuff. And that's why I want to
show you as quickly as possible what you can
do inside of 11 Labs. I think if you want
to start quick, 11 labs is the way to go, like because you
can start for free, and later if you want
to create a lot, you need to pay. But it's fast. The first thing that
you do is to go on this web page and
then you press GTA. Then you will be in
app and of course, you need to register yourself. Just make an account with
Google with whatever you like. The interface is really easy. You have here on the right
side simple and advanced. First, we start with
the simple interface. The first thing that you see
here is that you can type in what you want to type in and then I can use different voices. This is a deep male
voice by Arnie. I have created
this voice myself. If I press generate speech, I think I like this tool. Then you see we can
generate this speech, and this goes
really, really fast. And if you like the output, you can download it by
pressing on this button. And then if you go on history, you see the generations
that you have made, and you can also simply
download the generations. Yes, I have made a lot of stuff, so you see there are pages and pages and pages of generations. And you can also go back really, really fast and you can recreate these
things really fast. If you go back on
generate once again, you have most likely no voice that you have
generated yourself. If you scroll down a little bit, you see that I have
a big voice library. I have cloneed voices
from Elan Mask from me for also from myself and
also from Angela Merkel. And we have also some
generated voices here that I have made, and then we have
the default voices. Right now, at this minute, you have most likely just
these default voices. But of course, I want to show you how you can
clone these voices, even voices from yourself. So this is a voice that is, like, somehow like me. I think I like this tool, so let's just generate
this with my voice. I think I like this tool. Yes, you see, even the
English is better than mine. Maybe I should replace
myself with a I. I'm sure we will
get to this point. That's the point of all of this. Then the next thing is that you can also press on Advanced, and if you go on Advanced, you can use here
different models. Here on settings, you see 11 multilingual version
two, our most lifelike, emotionally rich mode
in 29 languages, best for voiceovers, audiobooks, post production, or any other
content creation needs. We have English,
Japanese, Chinese, here is also German
and a lot of voices, so this works really great. Besides that, you can also use different models if
you really want to. You can simply switch
here to different models. For example, the
Turbo version 2.5, Di version two, D
version one, and so on. These things get worse
and worse and worse. The only thing that
you can eventually do is the Turbo voices. Our high quality
low latency model, so this is a little bit faster, but I just work with
the normal one. Then you have stability, similarity, and the
style exaggeration. You can play with these things, but generally speaking, the standard settings
work really well. Then you can also include the
speak booster if you want. If you mess with these too
much and you simply press, for example, to
default settings, of course, you will get
your default settings back. I have to say to you, I normally don't mess a lot with these advanced
settings here because the default settings work great. Then on the left side,
you see that you can not only make
text to speech, by the way, here, you can simply throw in whatever you want. You can throw in
nearly entire books and you can make audio
books out of these. T should also work completely for free. This is
really awesome. We look at the pricing later because you can start for free. The next thing
that you can do is you can go to voice changer, and the voice changer
is really awesome. Here you can upload speech
and you get speech back, but in a different voice. You can use, for example, let's just say deep
male voice by Arnie. Now I can record
myself or upload an audio and I can simply
recreate this voice. So let's just try this out. I want to record
here this audio. If I press here, I will start. This will be a test
if the stool from 11 Labs is working
in real time or not. I hope you don't let me down. Then we simply press
generate speech. This will be a test
if this tool from 11 Labs is working
in real time or not. I hope you don't let me down. And you hear like
even my stupid accent will get duplicated. But you see, we have
a different voice. I can also make here
like other voices like Adam is one of the legacy voices that
works really, really great. We could also make myself
talk like a woman and do stupid stuff with these and
can also add other accents. The next thing that we can
do is to press on voices. And here on voices,
we can do a lot. You can go on all on personal, on community, and on default. At this minute, you will have most likely just
the default ones, and you can always listen how these voices sound
if you press play. Trust yourself, then you will know government of the
people by the people. The world is round,
and the place. There is no greater harm. So you hear there
are great voices. If you press some community, you hear the voices that
the community likes and voices that the community has
created. For example, this. We have committed the
golden rule to memory. Let us now commit it
to exist is to change, to change is to mature. To mature is to go on creating. You can't blame gravity
for falling in love. This is great stuff for you. Then you can go on personal. Here are the voices
that you have created if you have
created voices. If you do not have
created voices, you can press on add new voice. And here you have
either voice design, instant voice cloning, the voice library or
professional voice cloning. If you press on voice design, you can simply type in
what you want to see. Let's just say you're
female, young, accent American accent
strength. Yes, this is okay. And then you have an example
how this would sound. And then you can press
either use voice or first generate to hear how
she is sounding. First, we thought the
PC was a calculator. Then we found out
how to turn numbers into letters and we thought
it was a typewriter. It's okay, but let's just say you want to have a
different accent. Let's say British and you
want to have a strong accent. First, we thought the
PC was a calculator. Then we found out how to
turn numbers into letters, and we thought it
was a typewriter. You see you can make this
work however you want. You can also do male old, Australian, low
accent, one last time. First, we thought the
PC was a calculator. Then we found out
how to turn numbers into letters and we thought
it was a typewriter. And if you like it,
you press use voices, and this will be in
your voice library. If you do not like these, you can press once again here and do Instant
voice cloning. If you press on this, you
can give it a name like me, for example, then you would
upload a few examples, and here they tell you
what you can upload. No items uploaded yet. Upload audio samples of the voice you would
like to clone. Sample quality is more
important than quantity. Noisy samples may
give you bad results. Providing more than 5 minutes of audio in total brings
little improvements. So what I tell most
of the people is to use roughly four to
8 minutes of really, really good and
high quality audio. You can spread this
over up to 25 samples. The only thing that
is important is that the samples are not
bigger than ten megabyte. So you can upload, for example, three tracks, every
track can have, for example, two or 3 minutes
with good audio quality, and then you get your voice. And then you can simply give
a few labels if you want, add a small description, and then you need, of course, to accept that you do not do any stupid stuff
with these voices. Then you press that
voice and you are done. I have done this with my voice E and mask and with a lot more. The next thing that
you can do is, of course, the voice library. You already know the library. So here you simply find
stuff from other people. And the last thing that
you can do is, of course, if you press once again on add new voices, professional
voice cloning. For that, you need to
pay a little bit more, and you can simply
talk to 11 labs. You can send some sample voices, and then they create a voice that sounds really,
really crisp. Most of the people do this if they want to clone
their own voices and make entire audio books out of these. This works great. A friend of mine has done this, and he gets more streams with his cloneed voice than
with his original voice. So like, you can do
cool stuff with these. Then, of course, also here
you find this library, and here you can find
a lot of things. Let's just say you want to
create stuff for social media. You can use a lot of
different voices. Videos with eyes,
YouTube, shorts, os, hedges and of course, these are also
different languages. You can make a lot
of cool stuff here. Besides that, you also
have sound effects. So you can create sound
effects for whatever you want. Let's just make dog barking. Here you get a few examples. Sounds great. My dog
right now is not here. Normally, he's always around, but this would sound
like nearly like him. So you can simply type in
whatever you want to create, press on it, and, yes, you can use this
stuff commercionally. Then if you go and
explore, you find, of course, voices that
other people have made. So you can find a
lot of stuff here. Here you see the weekly topics. This is something
cool, for example. And you can also sound hear
what you want to hear, and they have also categories. If you press on animals, you will find a lot
of animals cat meow, birds singing, frog, and so on. And you can always
use just the prompt or also download this
stuff if you like. Then you can also use like booms or brams or do
whatever you want. You can make really good sound effects with
these and like I said, you can use them commercially. The next thing that I
want to show you is project because you can
make entire project. To explain you this
really, really fast, I want to show you this
video because this is a feature where you need
to pay a little bit more. I have the basic plan, but if you want to do a lot of stuff here inside of this tool, you need to have the
stronger subscription. I want to show you the subscription at
the end of the video. Introducing project, your enter end workflow for crafting
audiobooks in minutes. Whether you're
starting from scratch, pulling from a URL
or uploading EPUB, PDF or TXT files, projects has you covered. With your text in place, you can convert everything to audio with the
click of a button. If you want to mix up
voices in your audio, you can now easily assign particular speakers to
different text fragments. Chapter one, the bus stop. Hey, do you know when the
next bus is? Matteo asked. I think it should be here now. If you need to fix a section, projects lets you
seamlessly regenerate. So basically, you can make entire projects with different speakers
and do a lot more. If you have more interest, you can watch this
video yourself. But then you need, of course, a better plan for this. I want to show you
this right now because I get some questions
from time to time. You have a lot of
different plans. I am right now in this
current starter plan, and this is cheap. I pay, I think, like
five bucks a month, but you can use more. So the free plan, you
can play a little bit. With the $5 a month plan, you can play a little bit more. And then with the creator plan, this is the most popular plan. You can start for
11 bucks a month, but then it will go
up, I think, to 22. I am also sure that this thing
will change a little bit. And you can also see
what you get here. So for this 11 bucks
a month at the start, you get professional
voice cloning. You have projects, you
have audio native, and you have higher quality. And with this pro plan, you get even a little bit more. So these are
basically the plans, and you can also
start two months for free if you use the
annual subscription. So you can play with this a little bit for
yourself if you want. But the next thing
that I want to show you is the VoiceOver studio. The VoiceOver studio is
also really, really cool. Right now it's in better. And also here you need
to upgrade your plan. And this guy here
explains you every single thing what the
voiceover studio can do. Basically, also here, you
can make whole projects, you can upload videos and make voiceovers natively
with 11 labs. This works also really great. I have tested this
out a few times. You can generate speech and
sound effects in one editor. You can import video directly, layer your audio tracks, and you have precision
in editing these. So this is basically video
editing with audio that comes natively out of 11
labs. This works great. Then you have the
bugging studio. Here, they also have
some resources, so I don't want to spend
a long time with these. I have also generated
a few things here. If you simply press
Create NU Dup, you can simply give
your project a name. Then you give the
source language and the language that you
want to translate it in, and then you can upload
your track either from YouTube TikTok or other stuff
you can also do manually, and then you can
create these things. This will cost you
3,000 credits. I have right now at this minute 55,000 credits left
for this month, so I would be able to
do this a lot of times. This is also something
that I really, really like that I
really love because you can translate your
videos really fast. And of course, they can tell you a little bit in more
detail if you want to. Because I think there's no point that I show you
every single step, the same steps as they show you. Basically, create a new step, upload your stuff, and
you are ready to rock. You can recreate your
stuff in other languages. And the coolest
thing is here, yes, that you can do this also
in these basic plans, so you can translate
videos easily. Then you have audio native. And also audio native
is really cool. And also here you
need a stronger plan. Basically, what you can do is you can simply
use a code snippet, copy the code snippet
on your webpage, and then you will have on
your webpage such a bar, and this bar will read
out your entire webpage. I for myself, I do
not have a web page, but if I would have a webpage, I think I would include this. If I would publish
articles all the time, you can use these
things and then the people that come to
your webpage can simply press on this button
and 11 labs will read the article out
loud in front of them. Logic will get you from A to B. Imagination will take
basically, they have this bar, and this bar will read your
entire website for them. Even the New York Times has included this and a lot
of other web pages. If you go on an article from the New York Times, you
see this right here. Listen to this article. You can simply press on this, and then basically 11 labs will read out this article
for your loud. I am not sure if I can play this here because like it's
the New York Times. And the last thing down
here is the voice isolator. If you press on the
voice isolator, you can simply drag and
drop an audio le that has not good quality and you
can make it a lot better. Demo video shows you
perfectly how this works. And these audio files can
be big up to 500 megabyte. Mm action. Need to remove background
noise from your video. Use our new voice isolator model for crystal clear
audio every time. So you see this works perfect. If you have noisy voices, if you have a lot of
background stuff going on, you can upload your
audio generations, and this will get a lot better. And these things can be really
big with 500 megabytes, and you will get crystal
clear outputs here. Here, you always see how
much you can create. In total, I have 60,000
credits a month. Right now, I have
55,000 credit left. Then you have some
notifications. If there's something
special going on. Then the next thing that
you can do is, of course, that you can press on your name, and you have a lot of
other things here. You have your profile, and if you press on it like
you see some informations, then you can press on API keys. If you are a developer, you can generate API keys and you can make applications
with 11 labs. Next, the subscription, here you can manage
your subscription. The payouts, if you
are an affiliate, and if you are not an affiliate, you can press on
become an affiliate. Here you can get up to 22% in commissions, and
I have to tell you, yes, I am an affiliate of this program because I use
it myself and I love it. And I think I have made like roughly 100 bucks with these because I have published one
or two videos about this. Then the usage analysis, if you want to dive deeper they have a whole documentation. If you are a developer, you can simply see for yourself. So the documentation, then the change lock,
the help center, the affiliate program, so
a little bit more about this program and the
AI speech classifier. And lastly, of course,
the terms of privacy. Yes, you are able to
use this commercially, but you are maybe not able
to create voices from different people
where you do not have the agreement to
use their voices. And lastly, of course,
you can sign out. If you want to
become an affiliate, because I see it all the
time, people ask me these. You just have to contact
the affiliate team, you press here, you type
in your information, and then you get a link
that you can promote. You will get such a link. I think I did this
over partner stack, so this would be my link. Maybe I include it
in the last lecture. And if you want to make a
subscription in 11 Labs, can also include this link, and then you can support me. And you can, of course,
also do the same thing. You can simply make yourself
such an aflat link. You can place it in videos
on social media or wherever, and maybe you can earn even the same amount that
you pay for this student, and it's basically for free. So in this video, you
learned how 11 Labs works. Generally speaking, it's
one, at least in my mind, it's one of the best
AI tools if you want to generate
speech from text. And you should
totally try this out.
20. Transcribing with Whisper: Let's talk about the whisper. Whisper is the free
open source tool from Openi and you can
even run it locally. You can make speech into text. You can make transcripts. If you scroll down, you see how the technology is working, you can dive deeper if you want. And here you get
the whole setup. So if you want to
install this locally, here you get this full setup. You need to bip install
Open May whisper. Then you need to bip
install this right here. Then the upgrades and so on. And then you can
basically use it. Now, if you do not
want to do this, you have a lot of other options. The easiest option
is probably inochio. And if you simply download this thing and unzip
it on your PC, you will get an interface that
looks something like this. And here you can also
type in, for example, whisper, and if you press on it, you can simply download. Pinocchio makes it
really, really easy, and if things are not installed, you can simply press install, and then these things will
work completely automatic, so you do not have to
worry about anything. This thing will
work automatically. If you go on the platform
from OpmeaI, of course, you can use Wisper also in PyTon so you can make API calls. And it's also
really easy to use. You can simply use
this right here, and we will make API calls. To whisper, so you can either
use it locally for free, or you can integrate it in
your own projects with PyTN. And WispA is also really
cheap over the API. If we scroll down once
again in this article, you see that WispA
costs you 0.006/minute. Oh, yes, this is really cheap. If you upload a few minutes, it's nearly for free. In the meantime, Wisper
also got installed locally, and here you get your
gradio web interface. Here on Open WebUI, you can simply use Wisper and
it's really, really easy. You can use whatever you want. You can press on these. Normally Version
Large two works fine. Then you go on
automatic detection, or you can also use a
language that you want. You can type in English
or whatever it is. And then you can simply drag
and drop here your file. I just want to make an example with something from this course. So I uploaded my file, and then I press
generate subtitle file. Here we initialize the model, then we will get this output. And this is also
basically a video. So you see this is
a MP 44 Aflx video, and this should also work. If you use MP three, of course, it goes faster. And
there we have it. You see this Do
right now 3 minutes. Of course, this was
running locally, and this is a video, and the video is also
relatively long. Now I can simply press on these and I can
download my file. And now I opened up
here my text file, and here you see I
have my text file, and I have also the timestamps. So what I am telling
in what timestamp. This is completely awesome, and you can work with these. So in this video, you have
sown how you can use whisper. You can transcribe whatever you want in no time whatsoever. And this is really,
really cheap. And if you want to run this locally completely for
free, you can also do this. It's really that easy.
21. Generating AI Music with Udio: Next thing is, of course,
that we can even make music. Because you can make text, you can make sound effects. You can also make music. I hope you understand this
diffusion models are big. One of the best tools right
now at this minute is Udio and Udio has also
introduced version 1.5. If you simply press on these, you can also see how this works, and here I can simply show you one or two generations
that I have made. If you simply press
play right here, mosquitoes bustle around. Big. You hear that this
thing is working. You can also always hear the things that
are stuff picked. So they think this music
right here is cool. Let's just play this
one for a brief moment. Partnership Ste. You're right from the
east to the west, from the north to the south. So you see this sounds really, really good, at least right now. This thing works really good. Of course, you can also upgrade your plan if you press on it, but you can also start for free, but then you are limited. And if you want to use more, of course, you need
to pay a little bit. And you can save a little
bit if you pay annually. Just the same stuff as always. But you can start out
completely for free, and it's really easy to use
if you simply press Create. Here you get an interface. This interface always
changes just a tiny bit, and you will always get
new options and so on. Basically, you can type
in what you want to see. You can get suggestions. You can make it longer up to 130 seconds with
one single generation. You can add your own lyrics. You can do a lot of stuff here. Now I want to show
you the easiest way to create a song with these. We can simply type in
what we want to have, and of course, we
need to log in. So just log in with Google
with discard or with Twitter. I will go on with Google. I have already made some
songs in this tool. And now we simply type
in what we want to have, for example, a song
about a rabbit. And then we can also do a
lot of different stuff. We can use the manual mode. If you start out, just
use the default settings. I'm also no expert in music. So if you use the manual mode, of course, you can
do a lot of stuff. You can do different tags. So should it be a
rock, electronic, pop, chess or something, I think electronic would be cool
with our rabbit song. Then the lyrics, do you
want to have custom lyrics? So if you press
some custom lyrics, you can type them in or
they will be automatic. Of course, if you include
this manual stuff, you can always type in
the stuff that you like. Then the instrumental, how
should the instrumental be? Do you want to include
something or not? And then the auto generated, if you want to do
everything automatically. Just for now, I exclude
this right here, and we simply use here, for example, electronic
and Electro as our text. And we simply press Create, and then we will wait like one or 2 minutes and
we get our song. The song is 1 minute long, and after that, we can
also remix the song. Let's just wait until we have our song. And there we have it. We have our two songs. It took about 7 minutes
to create them, and let's just see how they are. We are midnight house.
Let's go. Let's go. Go. Moonlight glows. First leg. Here's back. Watch
the bunny flow. Hop skip, Acrobat. Watch the bunny flow,
then Bunny beads. Hello with those bunny feet. Round, hop, round, jump
h with those bunny feet. This is awesome, so you can play all day
long with this tool. Now we can do the
following three things. We can remix them. We can extend them or
we can publish them. If you press on mix, you can do here a lot
of different stuff. Of course, you can change
the text, for example, you can change the instrumental, you can change the stuff
that is out generated, and of course,
also the variants. You can make it more
different or less different. You can remix however you want. If you think it's cool, but you want to have it longer, you simply press on extend. If you press publish, you can share it with
everybody on this platform. If you press on these free dots, you can remix, extend,
like you know. You can view the track, you
can add it to a playlist. You can share it, download, delete or report the song
if something is not okay. I think I press extend
because I really like this, but you don't have to
listen to the whole song. I think the best
thing that you can do is to play a little
bit with this tool. Udio is, right now, at least in my mind,
hands down the best tool. Udio brings music that
we can really listen to. We can create and listen
to music in a few minutes. This was never, ever possible. Just think about what
you need to do to create a song in this
quality without A. You need to learn to
play instruments. You need to learn to sing, or you need to find
the right people. You have to go to a studio. You have to record it, you have to edit it.
This is enormous. Now we can make our own music with a few clicks and the music, at least in my mind, is nearly as good as
music from professionals. Remember, this is
the worst version that you will ever play with. Audio will also get better and better and maybe a new tool comes around the
corner that is as good as the top artists
on the planet. AI is just awesome. Just play with the stool and let me know if
you'll love it. I know you will.
22. Recap and THANK YOU!: Congratulations. You did it. And first of all, thank you. You have learned AI
as fast as possible. We started with the basics. So what is I and what are LLMs, how they are trained
and how they work? This was a little bit of theory, but you need to understand
this because you need to understand that
to get good outputs, you need good inputs, and you need to
understand tokens for. We started what LLMs are out there and how we can use
them. We have a lot. We have closed source
Lams like hachPD, clot, Gemini, and much more. But basically, these
are the big three, and then we have
open source ams. And the open source LL ams, we can use them
either on Olama in LM Studio or also
on hugging chat. Then you have learned
what these LLMs. You can make small text
bigger or big text smaller. And with all of this, you can do a lot because you
can also make code. You can make text for marketing. You can write antire books. You can write emails, and you can do a lot more. Then we talked about
prompt engineering. We have the role prompting,
the short prompting, structured prompts, and some tips like
think step by step. The most important thing
is semantic association, so you need to give context. You can also customize
your LLM either with the system prompt or
with direct technology. And of course, you can use
all of these LLMs via an API and you can integrate them in your own projects if
you are a developer. Of course, there's a lot more. There are endless AI
tools like perplexity, something that works
cool for some, and if you want to play also
the hugging chat is cool. Then we talked about
divusion models. We started with the
picture generation. Divusion models are models
trained on text and pictures, and they can recreate
pictures if you type in text. Also here you need to be specific to get
specific outputs. So prompt engineering
is important, and it works in every single
diffusion model the same. Just think about what matters. You saw all the most
important things about mid journey, a Dogram, adobVaFly and even the
open source models like stable diffusion in focus or flux and recraft on replicate. Then you have learned
that di fusion models can do more because you can also create audio,
video, and voices. Some of the most
popular tools for videos are ling,
runway and Beca. If you want to generate text, 11 labs or a five DDS and
the OMI API is great, if you want to create songs, I think dio right now
is the best tool. Also sooner works and eventually also 11
labs in the future. Besides that, you can also use WispR open source
for transcriptions. Just install Binochio
and you can make transcriptions really
easy and for free. So basically, you
have learned a lot, and I want to tell you once
again what learning is. Learning is same circumstances,
but different behavior. Maybe you did not know that
AI can do so much things. Right now, you know it, so
you should totally do this. This is the most
important thing. Use AI tools only then
you have learned. And I want to tell you what
really good learners do. They learn together
because more people always know more than people. So if you could
share this course, this would really
mean the world to me. Maybe it also means the
word to the other person, and if the other person gets
value out of this course, they will describe the value to you because you have told them. Thank you for that, and
I'll see you, of course, once again in this course
or in another course. And one last time, thank you
from the bottom of my heart because you have gave me your most valuable
asset, your time. Everybody on this earth has just limited time and you decided to spend
your time with me. So thank you for
that, and you have learned AI as fast as possible.