Transcripts
1. Welcome to the course on Google Gemini AI!: Everyone, and welcome to the
course on Google Gemini. Did you know that Google
Gemini has officially surpassed 750 million
monthly active users? That's nearly three quarters
of 1 billion people. To put that in perspective, Gemini's growth is currently outpacing almost every other
AI chatbot on the market, closing the gap with ChatGPT faster than
anyone predicted. But it's not just about the numbers because Gemini
is built by Google. It is now the most
integrated EI in the world. It lives inside your Gmail, your Google Docs,
your Chrome browser, and your mobile phone. This represents the biggest
shift in how we work and create since the
invention of the Internet. We are moving toward a world where EI is not just to use it. It is a collaborator that
is already where you work. My name is Anna and I'll be your instructor for this course. Online instructor
with my other courses available here on the platform, focusing on product
management and generative AI. By joining this course, you will get access to over
4 hours of HDVDo content, step by step tutorials and activities highlighting
real world, practical applications
of Gemini tools, PDF summaries for reviewing the key insights from the
course and much, much more. We'll kick off by learning
what Gemini is capable of, how to communicate with it
and structure your requests, and how to make Gemini
work best for you. From there, we will go through
hands on scenarios using Gemini to brainstorm ideas and
get professional feedback. Building your own
personalized EI systems for specific tasks and
generating high quality visuals. We will also cover advanced techniques like
deep research for turning complex tasks into detailed
reports and building fully functional apps just
by describing what you want. No coding required. And we will make
sure you know how to spot and prevent incorrect
responses from AI, so your work is always accurate. And yes, you don't need
any technical background or prior knowledge of AI to
get started with the course. So let's begin Ilsa
in the next video.
2. What Is Gemini? Understanding Google’s AI Ecosystem: Everyone, and welcome to
the first course lecture. Think back to every
science fiction movie you have ever seen. There is always
that one character, an Assistant that doesn't
just wait for a command, but actually understands
the hero's world. It anticipates
problems before they happen and acts as
a true partner. For years, this
was just fiction. But with Gemini, we are
getting closer and closer to a future where that kind of partnership is
becoming a reality. So what is Gemini? I like to think of it as
three layers of a house, the foundation, the brain. These are the Gemini models themselves built by Google's
Research Lab Deep Mind. In this course, we will be using the latest generation
of Gemini models. This includes high
level reasoning models for complex logic, advanced image generation tools for photorealistic visuals and next generation video
models that can generate high definition
scenes with sound. These models are natively multimodal meaning they
don't just process text. They see here and think across every medium at
once, just like we do. Coming back to the
house analogy, the second level is the
living space, the assistant. This is the home base we will be spending most of our time in the app on your phone and the website at
gemini.google.com. It's a creative space
where you can chat codes and use tools like Jams to
customize how the EI behaves. And finally, the third layer
is the infrastructure. This is Gemini living
inside Gmail, Google Docs, and search it's the EI
overview that summarizes your search results or the help me write button that
drafts your emails. In this course, our focus is on that middle layer,
that geminiEIsistet. The Google's vision
regarding it is centered across the three piece,
personal, proactive, powerful. Let's explore what this means. First, it is personal. Most AI models are generalists. They know a lot about the world, but quite little about you. Gemini is designed to be
your personal extension. With your permission,
it can connect to your personal
context, your emails, your files, and your
history to provide help that is uniquely
relevant to your life. Second, it is proactive. Today, most AI is reactive. You ask it answers. The future of Gemini is
about seeing what's coming. If you have a big client
presentation on Friday, Gemini should not just
remind you it is coming. It should look at your calendar
a week before and say, I noticed your strategy meeting with company A is on Friday, based on the proposal
in your drive and the latest email
threat with their team. Here is the preparation brief and three questions
you will likely face. Third, it is powerful. With the latest
advancements in Gemini, we are moving beyond simple text generation into thinking things into existence, whether you are building
an entire website from a single prompt or creating cinematic video for a
marketing campaign. The power that used to require a whole team of specialists
is now at your fingertips. But having all of this power
doesn't mean I is in charge. It is important to remember that even when Gemini is
being proactive, it is always taking your lead. It doesn't have its own secret
agenda or set of beliefs. It is designed to
follow the orders. You give it through your
instructions and preferences. So whether it is acting
as your researcher, your coder or your
creative collaborator, you are always in
the driver's seat. Productivity is not the
EI doing its own thing. It is the EI
anticipating what you need because you have
already defined the goal. Now that we have explored the vision and the architecture, it is time to move from
theory to practice. In the next lecture, we will take a closer look at the different specialized models for reasoning, images and video. And I will also show
you how to set up your account with Gemini.
I'll see you there.
3. Meet the Gemini Model Family: The last lecture,
we talked about Gemini as three layered house, the brain, the assistant
and the integrated engine. Now let's go one level
deeper into that brain. Most older EI models
were trained on text first and then had other
capabilities layered on top. Gemini was built differently from the ground up
to be multimodal. This means it does not just read a description
of a video, I actually understands
the video, the audio, the images, and the text, all
at the same time. Whether you are uploading
1,000 page PDF, an hour long video or
a massive code base, Gemini processes it all
in one unified space. It's not secretly translating images into text
behind the scenes, it's seeing them directly. When you open Gemini
at geminiggle.com, you will notice a
model selector. Think of these as
different modes, each routing you to a different
underlying model that Google has optimized for
a specific type of task. The full Google Model
family is vast, but for everyday use, these are the ones you
will reach out for most. Before we walk through them, a quick note on what a model actually is Think of
it like a specialist, you are hiring for a job. Each model has been
trained differently, fed different kinds of data, optimized for
different strengths. When you pick a mode in Gemini, you're essentially
choosing which specialist to hand your task to. Fast is our sprinter
quick and conversational. This is the specialist you reach for when you need
an instant answer. A fast summary or help
drafting a quick message. It's optimized for speed and handles a high
volume of requests. Just don't bring it
in for anything that requires deep multi
step reasoning. Thinking is our strategist. This specialist pauses
before responding, mapping out its logic before
giving you an answer. If you have a complex problem, multi step plan to
work through or a nuanced question where a quick
answer might get it wrong. This is the one that
thinks before it speaks. Pro is our expert. You bring it in when the task
is complex, deep research, analyzing a large document, advanced writing that needs to get the tone exactly right. Pro uses the most capable
underlying model in the family, which means it can hold
more information at once and pick up more nuances the other
models might miss. The trade off is
that it's slower and has lower daily
usage limits. So save it for the tasks
that actually needed. These three fast thinking and pro are Gemini language models. They are what powers
the conversation. But Gemini family
doesn't stop there. It also includes
dedicated models for image and video generation, and you trigger them simply by using the generate image or generate video
commands directly in your chat or in
Gemini interface. When you do, Gemini
quietly hands the task to the right
specialist behind the scenes, and we'll meet those specialists
later in the course. Now, once we have figured out what models we are
going to work with, let me walk you through how
to get access to Gemini.
4. Setting Up Gemini and Your First Chat: Go to gemini dot Google forward slash
subscriptions to see the current plans and just heads up pricing and availability
do vary by country. So what you see on your
screen might look a little different from
what I'm showing here. The free plan gives you
everyday access to Gemini. It's a good starting point and requires nothing more
than Google account. Google AI plus gives
you more access to the most capable
models and features, including enhanced image
and video generation, and you would get access
to Gemini in Gmail, as well as Google MIT. Google AI Pro steps
that up further with higher usage limits
Gemini inside your Gmail, Google MIT Docs, as well as slides and two terabyte
of Cloud storage. And finally, Google AI
ultra is the top tier. It gives you highest
usage limits, plus exclusive early access
to new features from Google. My recommendation here
would be to go ahead with Google AI as long as it
offers a free trial, which means you can follow
along with everything I demonstrate here in the course at no cost for the first month. And after that free trial month, you can decide if you want to continue with your
membership or you would downgrade to the Google plus or return to
the free membership. To get started, select
your membership plan, click on Get Started. Next, you need to provide a
payment method for the trial, but you won't be
charged if you cancel or downgrade before
the month is up. Once you logged in, this is what you see in the
top right corner, you see your membership plan. Pro in case if you
decide to subscribe for AI pro membership
or plus if you decide to go ahead with
that plan in the center of the screen is your main chat
input below the input bar, you will notice a row
of quick start buttons. These are just shortcuts to
get you started quickly. You will also see
a mode selector. It currently shows fast. This is the model selector
we just talked about. Click it to switch between fast, thinking or pro depending
on what you need. On the left side, clicking the menu icon, opens your sidebar where
you'll find your chat history. You can also start a
new chat from here. Let's try to do this. I keep it on fast
mode for this chat, since I'm going to ask a
straightforward question. I'm starting the course on
Gemini based on today's date. What are the three most
recent major updates Google has released for
the Gemini ecosystem? I request Gemini to
search the web to verify and summarize them for
me. Let's hit Submit. Notice that Gemini does not
just answer from memory. It goes out and searches the
web in real time and then brings me the results relevant for today when I
record this tutorial. Here are the three most
recent changes that Gemini has introduced
in the past month. And, of course, we are going to talk about them
here in the course. In the next section, we
take everything we've just set up here
and put it to work, starting with how to write a great prompt.
I'll see you there.
5. Prompting Gemini for Better Results: Section Intro: Welcome to the new section
on prompt engineering. This is the part of the course where you learn a
skill that makes every AI tool more
useful how to write prompts that consistently
give you great results. We will start with the
definitions what a prompt is, what prompting means, and how prompt engineering fits
into the bigger picture. Then we'll look at two modes. There is no prompting
in chat and production prompting when you design
prompts to be reused. After that, I'll walk you through a simple
prompting formula. You can use for almost anything. You will also practice
iterative prompting, how to build on
earlier responses and improve the
output step by step. You will learn how to
guide with examples, how to request the exact
output format you want, and how to work with
files and attachments. And of course, we will
use multimodal prompting. Man and your prompt can
include text plus documents, screenshot images and links. By the end of this section, you will feel confident
using these prompting skills in real tasks for work or
personal projects. Let's begin
6. What Is a Prompt? Prompting, Prompt Engineering, Personal vs. Production Prompts: Everyone. Think of the last time you asked someone a question. The way you phrased
that question likely influenced the
answer you received. That's exactly what we are seeing today in the world of AI. We'll start by breaking down three key terms that are essential to communicating
with AI systems. What exactly is a prompt? What do we mean by prompting? And how does prompt engineering
bring it all together? We'll also explore that
distinction between chat and enterprise
prompting. Let's get started. A prompt is the input
you give an AI, your instruction, what you want, and the context you provide. Text, files, images,
links, examples or data. Think of it as the what that
drives the EIs response. Prompting is the act of
writing these prompts. It's the general
activity of interacting with and giving
instructions to AI models. This is the process of
communicating with the model. Prompt engineering is more specialized and
systematic approach to creating and
refining prompts. It involves understanding
how the model reasons, testing and iterating on instructions and
considering He cases. Think of it like cooking. A prompt is like
a single recipe. Promptin is like
cooking in general, and prompt engineering is like being a
professional chef who systematically
develops and tests recipes while
considering ingredients, equipment, user
preferences, and so on. Now, there are two main types of prompting you
need to be aware of personal prompting and production or
enterprise prompting. Personal prompting is what
most people do in a chat. You write request,
the AI replies, and you can keep refining
it through conversation. It's flexible and informal. If your first message isn't
perfect, no big deal. You just follow up,
clarify and iterate. For example, asking N
AI to help you write an email brainstorm ideas or summarize a document
in the chat interface. That's personal prompting. Production or enterprise
prompting, on the other hand, is when you design prompts
to be reused by you, by a team or inside a
product or workflow. The goal is not just
a good answer once, but consistent results across
many runs and many inputs. For instance, imagine
customer support assistant on a company's website. It needs to answer thousands of customer
questions reliably, including MC inputs like typos unclear requests
or missing information. In this setting, prompts
have to be more structured, more predictable,
and more reliable. This is why production prompts usually include clear rules, stricter output format, and more guard rails
because they are meant to work repeatedly,
not just once. In other words,
personal prompting or chat prompting helps you get great results first and production prompting helps you get reliable results repeatedly. Why do we talk so much
about this distinction between personal prompting
and production prompting? Because the way you
write and refine prompts changes depending
on the setting. If you search for extra
materials on prompting, you will often find advice that's designed for
production use, prompts that need to work
reliably across many users, many inputs, and
lots of edge cases. That's super useful
when you are building repeatable workflows or
integrating EI into a product. But if your main
use case is just using an AI in a chat to
get help at the moment, you don't need to overcomplicate it so keep this
distinction in mind. In this course, we'll
focus mostly on personal prompting
in a chat interface. Al right now that we are on the same page with
the terminology, let's dive into the practical
side of personal prompting. Allca in the next lecture.
7. How to Talk to Google Gemini AI The Building Blocks of an Eective Prompt: Everyone. Welcome to our first
lecture on chat prompting. Here, you will learn
how to approach creating and refining prompts that can be used in
the chat interface. Let's get started. When chatting with a friend, you don't use rigid templates
or formal structures. You have a natural
flowing conversation. The same principle applies to chat prompting with AI models. However, there are times when a bit of
structure can help us get better results and make one prompt more
effective than another. So let's cover the
key ingredients of an effective prompt. The central part of every prompt is the
core intent or task. This can take the
form of instructions, such as write a five
paragraph email to introduce new productivity
app to small business owners, focusing on its time
saving features. Think of instructions as the task you want the
model to perform. Another form the intent can
take is a question such as, what steps should I follow to create a compelling
Linkin profile? Or how do I structure a business
plan for a startup idea? When writing a task, your goal is to be clear and specific about what you
would like to achieve. Writing something like help me with presentation
won't be enough to get a high quality
document that you can confidently present
to your boss, colleagues, or investors. As the rule of thumb,
remember that anyone without specific knowledge of
your subject should be able to understand your
prompt and execute on it. If they would be confused about how to follow
your instructions, the EI system will
be confused as well. Don't assume it has any contextual information
about your task, such as how the
results will be used, who the intended audience is. What successful task
completion looks like or a list of points
you won't cover it. You need to provide
these context or task details yourself. For example, if you want
to create a presentation, include information about
the number of slides, the purpose of the presentation, the key topics to be covered. Here is an example of
a well crafted prompt. Create a seven
slide presentation on the topic of
personal branding. Include what it is, wide meters, key components, and steps
to develop your brand. Or another example,
explain how to write a compelling email
in five easy steps. The instructions should cover crafting and engaging
subject line, structuring the email clearly and using the professional tone. Make the process simple
enough for anyone to follow even without prior
experience in formal writing. You can provide context, not just for the task itself, but also for the tone
you would like to use. For instance, use a
conversational tone that balances professionalism
with accessibility. You can also specify rules or constraints the EI
system should follow. For instance, in the email writing guide prompt
that we just covered, you might add When your prompt involves factual
claims like statistics, current events,
product features, legal or medical info or anything where accuracy
really matters, there are two extra ingredients that can significantly
improve the result. The first one is reality
check, also called grounding. This is when you
are telling the EI. Don't just sound
confident, be verifiable. So you can add a
rule like if you make factual claims,
cite sources, and tell me what you
are unsure about, the second ingredient
is reasonsm. A lot of topics
change quickly tools, pricing features,
policies, best practices. So it helps to tell the EI
what time window to use. For example, use sources from the last 12 months unless all
the resources are required. Here is what it looks like
when you add both to a prompt. These two additions
are especially helpful when you are using AI for
research or decision making, not just writing,
because they push the response to be clear
about what's proven, what's current, and
what's uncertain. Another way to enhance
your prompt is to assign a specific role
when performing a task. This is also known
as role prompting. Role playing helps
AI models adopt the nuances of
specific perspectives, improving the relevance and
quality of their responses. For instance, act as a seasoned executive assistant with over 15 years of experience managing high level
business correspondence or pretend to be a
professional writer, turned email writing consultant. You can take role prompting
a step further by providing audience context in addition to the
role. For example, Notice how the EI adapts
the examples for dos and don'ts to make them relatable for technical
professionals. It's pretty amazing. And if you are feeling overwhelmed by the idea of crafting such
detailed prompt, don't worry. The beauty of working in a chat interface
is that you don't need to design a perfectly thought out prompt to
begin the conversation. You can start with a
broad question or task and refine it through
dialogue with the EI model. This iterative approach
allows you to clarify your needs and improve the responses you
receive over time. We'll talk more about the interactive prompting
in our next video, and for now, let's sum up what we've talked about
in this lecture.
8. Building on Gemini’s Responses: Iterative Prompting: Everyone, welcome back. If after watching the
previous lecture, you feel like creating a good
prompt is an arduous task and that you need to turn into a prompt engineering to
succeed in this job. Here is a secret
the experts use. Think of prompting as a conversation or a
multi step process, not a one time question, just like you might clarify directions in a new
city with a local, you can refine your prompts
based on the EI responses. Let's walk through a
real world example of iterative prompting
to see how it works. Let's say we would like
the EI to help us create a business proposal for a
mobile dog grooming service. Step one, the initial prompt may be quite broad like create an outline for a
business proposal for a mobile dog
grooming service. In the second step, we
narrow down or refine our initial request by saying something like,
take the outline, you create and expand the
market analysis section, focus on demographic data and
competition in urban areas. On the third step, we ask for specific details. For instance, now develop the financial
projections section, include start up costs, monthly operating expenses, and revenue forecasts
for the first year. We can repeat step two
and step three several times depending on how satisfied we are
with the responses. Sometimes iterative prompting
is even more powerful when you are working on something that
needs to be accurate, not just well written. For example, step
one, start broad. Give me an overview
of the market for mobile dog grooming
in urban areas. Step two, ask for
assumptions and evidence. List the key assumptions
you are making. If you mention facts or numbers, tell me where they come from and flag anything you
are not sure about. Step three, cross check. Now sanity check
your own answer. What parts are most likely
to be wrong or outdated? What would you verify first? This way, you are not just
polishing the wording, you are improving
the reliability of the content as you go. Please note that just as
a skilled project manager builds upon previous
discussions and decisions, chat based AI keeps context
through your conversation. That means you can refer
back to earlier parts of the chat and build on them instead of repeating
everything from scratch. So you might ask
something like based on the marketing strategy we
discussed earlier in this chat, let's build on it, but focus on suburban families in areas
with limited grooming options. Of course, if you feel that your conversation is not
going in the right direction, you always have the
option to start over and reframe
the first question. The final step of the
iterative process usually involves asking the AI
to polish the response. I alternatively, you can ask to provide feedback on the entire piece of content. In this case, the
business proposal, focusing on how it can
be further improved. Then you can include those changes in the final
version of the document. This step by step
approach allows you to review and refine the
output at each stage, make adjustments based
on intermediate results, maintain control over
the final product, and build complexity gradually. Think of it like sculpting. You start with the basic shape, and then you gradually refine the details until you achieve
exactly what you want. And that's it for the video. Let's sum up the key points
that we've just covered.
9. Making Gemini Truly Yours: Personalization: Hi, everyone, and welcome back. Sometimes when you are
talking to an AI assistant, it feels like you're starting from scratch
every single time. You can write the perfect prompt and still get a generic answer because Gemini has no idea who you are and how you
work in this video, we are going to look at how to make Gemini work
the way you work. There are three levels of personalization you can use
to customize your experience. Level one is basic
personalized instructions. You tell Gemini how you wanted to behave
every single time. Always be professional, always format answers as bullet points. Whatever works for you, it saves you from repeating yourself in every single prompt. Level two is intermediate
chat memory. This is where Gemini
starts remembering facts and preferences from your
previous conversations, so you can pick up exactly
where you left off. And level three is the most advanced
personal intelligence. This lets Gemini
connect the dots across your entire
Google ecosystem, your GML, your photos, YouTube, even your
search history. Imagine instead of spending
hours playing a weekend trip. You just say Gemini plan a trip for this Saturday
based on my favorite hobby. Personal Intelligence finds your recent hiking gear purchase in Jimel pulls your favorite trail photos
from Google Photos, checks your YouTube watch
history for local guides, and suggests a specific trail, knowing exactly what
difficulty level suits you. One thing worth noting
before we begin, personal intelligence
is still rolling out, so we'll be focusing on the
first two levels today. Also, these
personalization features are part of the Google
AI Pro subscription. If you haven't upgraded yet, check out our lecture where I showed you how to get
access free of charge. Let's get into the demo. We are starting by
heading over to the Gemini web app at
gemini.google.com. I have already logged
in into my P account. Next, look at the bottom left of your screen and click
on Settings gear icon. From this menu, select
personal context. The first setting is called
your best hats with Gemini. When it's turned on, like on my screen here, Gemini learns from your history to understand you
better over time. When I just activated
this setting, here is what Gemini
suggested for me. It correctly summarized
all the things that I've been
working on recently. And by the way, if
you ever want to have a private conversation that is not stored in the chat history, you can use temporary chat. You see that it is
available here on the top left side of the
screen. So let's click on it. We see the same interface that
you already familiar with. Let me ask something. I'm using a fast model as this is just a
very quick question. So here are the suggestions. They are pretty good. And since we were tasting
the temporary chat, let me look at my chat history. You see that we don't have anything related to
a flat white here. Let me try to refresh the
page to make sure that this temporary chat won't be
saved into the chat history. Yes, all good. But
at the same time, we also lost that
conversation as well. Alright, let's get back to the settings, personal context. The second Google here is called Your Instructions
for Gemini. We see that they are active by default as well to add
a new instruction, a click on AD. And here we can include any information
regarding your behavior, personal communication style, any preferences that you
want to share with Gemini. So here is my prompt. So I'd like to divide the
instructions into two parts. First, I tell the EI what I do. You see here that I
shared my role as an educator as well
as as a consultant, providing a little
bit of context on what I do in both
of these roles. And second, I explained
how I like to work. Let's save those instructions by clicking on Submit button. All good. And finally, to see everything
Gemini has stored, return to settings, and from
here, click on Activity. This is the list of
all the activities that you recently
had with Gemini app. You can manually delete specific chats in case if you don't need them
for certain reasons, and you can also set up
out a delete schedule. So your data is cleared
out every few months. For instance, I can
choose a duration here. I live 18 months, which is a reasonable period of time to get rid of the
old conversations, and I click next. Perfect. And that's
it for this tutorial. Now you know how to customize gemini to work exactly
the way you want. And Alca in the next video.
10. How to Share Files and Other Content with Google Gemini AI: Hi, everyone, and welcome back. In the previous lectures
on prompt engineering, we talked a lot
about how to frame your instructions and what
information to include. But apart from the instructions, sometimes you also
need to provide the EI with source materials
like documents, spreadsheets,
screenshots or PDFs, so it can review
and analyze them. Let's see how it works. You can provide
information from documents and images to Gemini
in two main ways by pasting the
text directly into the chat or by attaching the entire file
to the conversation. So the first option pasting
the text works well when you only need help with a specific
fragment of your document. For example, here is my resume, and I want feedback on just
one section of the document, so I can just copy it, paste it into the chat, and then give the
instructions to Gemini. So I said that this is a
fragment from my resume, and I asked Gemini if these skills are relevant for a head of product position
for a Fintech startup. And here is the response. But oftentimes you want Gemini to work with
the entire document, like a long PDF or a
complex spreadsheet. Gemini can handle almost
any common file type from Word document to CSV files, photos, and even videos. To attach the file, click on the plus icon on the left hand side
of the chat bar. You can choose a file
from your device, from your Google Drive,
your Google Photos. So let's take one example. I need some ideas for
what to cook for dinner. What I'm going to
do, I will upload several photos of ingredients
I have in my fridge. These are the
ingredients that I have. I'll ask Gemini, what are the three simple dinner recipes I can make in under 20 minutes. And here are the recommendations
that Gemini provided. You see that it successfully identified the ingredients
based on the pictures. Here we see Gemini's ability to recognize objects and
apply creative Frisonin. Next, let's try document. Let's say you have received Complex utility bill document. So you can upload this
PDF to Gemini and ask if it can summarize the main
charges. Let's try this out. I will return to the same chat, click on the plus icon and then choose files
from my local Drive. And here is my prompt. Let's use fast model here because it should be pretty
straightforward request, and let's see what reply
we're going to get. Yeah, pretty great correct
summary of the charges, as well as my data
usage. All good here. Alright, let's try
something else and submit different types of documents to Gemini to see if it can really work
with different files. I have a PDF with my flight itinerary for my
upcoming trip to Phuket. And here I have a travel guide with some information
regarding the tours. That I can do there
while I'm in Phuket. All right. This demo
takes quite a while. So what I'm going to do, I'm
going to stop this response. I'll copy this prompt
and open a new chat. I included the same prompt, and here, let's
change to thinking. Because I have quite a
complex PDF document here. I also have visuals with
concrete dates that Gemini needs to analyze and compare with the dates
in this document. So perhaps it would be better to switch to a smarter model. Let's try this out. Now we got the result
almost immediately. So let's read what
Gemini tells us. It recognizes all
the information in the documents
that I provided, and it also figured out
nice recommendation on what I can do just after I
arrived into my destination. This is where we see
Gemini acting as our personal
cardinator connecting dots across different
file types. And please remember that
while Gemini can read and analyze these files
to generate summaries, tables or
recommendations, it won't actually change the
original file itself. All right, moving
on with our demo, let's say that I have an audio file that I
want Gemini to analyze, as always clicking
on Plus button. Then I select in my audio file, and here is my prompt. Can you summarize the key
points of this audio? I'm going to continue using thinking mode here because this is more complex task rather than just asking
a quick question. And here is the summary. This is the correct summary
provided by Gemini. I can confirm this as this is the recording that I prepared
myself for my other course. Great job Gemini. And let me also demonstrate
how it can work with videos. I have this link to Google
keynote presentation. And since right now I'm
working on Gemini course, I want Gemini help me
find all the moments when speakers talk about
Gemini app, new functionality. Let's hit Enter and look
what Gemini will suggest. Here is the detailed
analysis of this video. And what I really
like here is that it included the time codes. For example, we see here that Gemini mentioned about
personal context, and it included this specific
time code where one of the speakers were talking
about this functionality. So if I would like to
review this conversation, I can just click
on that time code. I will be redirected to this
part of the presentation. And that's it for this lecture. Let's briefly sum up
what we've learned here. Most modern AI models
accept common file formats, including PDFs, Word documents, Excel files, CSVs,
images and text files. Fils can be uploaded using an upload button or attachment icon on
the chat interface. You need to give clear
instructions about what you want the AI
to do with the files. Being specific
with your requests leads to better results. You can upload
multiple files and ask the AI model to compare them
or analyze them together. The AI usually won't
edit your file directly, but it can generate
improved content. You can copy back
into your document. All right, and I'll see
you in the next lecture.
11. Using Examples in Your Prompts: Everyone, and welcome back to the new lecture where
we continue talking about how to communicate with EI systems and what to
include in your prompt. So far, we've covered several components that can
be included in a prompt, a task or what you
would like to achieve, followed by specific details or context and rules necessary to perform the task
or answer a question. Next is role context, a specific role that the EI will be playing when
performing a task. Optionally, you can also introduce the intended
audience for your task. Lastly, we mentioned
that you can share additional content by
attaching documents to your conversation or by
including the text as input data directly
in the chat and regarding the order of
components in your prompt. The ordering matters
for some elements, but not for others. For instance, it is
recommended to include the RL context earlier
in the prompt, while input data may not be necessary depending
on the task, and its ordering
is also flexible. But in general, if you stick to the ordering shown in the
course presentation slides, it will be a great start
to an effective prompt. Okay, let's introduce another prompting
element. Examples. Examples also known as
shots act as demonstrations that guide the
generative AI model on what kind of output
you are looking for, including the answer format
and what you want to avoid. Perhaps you have
heard of terms like one shot or few shot prompting. These refer to using one or several examples in
your prompt description. For chat prompting, examples
typically demonstrate tone. For instance, formal
versus informal, serious, versus schedule, empathetic versus
matter of fact, and style such as
sentence length, format patterns, bullet points, versus paragraphs,
technical detail level, basic versus advanced
terminology, and so on. Let's go over some
concrete examples. First, I will ask Gemini for a simple email without
giving any example. So here is my prompt. For this demo, I'm
going to be using Fest model. Let's run it. This email is fine, but it is also pretty generic. Now let's make it much
more specific by showing an example of the tone and
structure that we want. So here is my other prompt. So I have the same
instruction at the beginning, and then I provided an example as a style reference that
is mentioned the tone, sentence length,
and the structure that I would like Gemini to use. Let's run this second version. Now, if we compare that new response with
the initial version, we see that it feels more human. The sentences are shorter
and the structure is closer to what we
showed in the example. And while we are here
on the email example, let me quickly show
you what Gemini can do with that email next. It turned out that you
don't need to copy and paste the email
into your inbox. If you look right
below the response, you will see more icon. Let's click on it. And here you will see draft in Gmail option. If you click it, Gemini will
open a new Window and place this exact text into
a real Gmail draft, which you can further edit and eventually send it
to your recipient. So let's try to do this. Gemini is drafting an email. Let's take a look. I'll
click on Open Gmail. We see that it correctly picked
up on the email subject. This is the exact text
that we saw in the chat. Let's try something
a bit more advanced. So far, we have used examples to fix the tone and
style of response. But you can also use examples
to set a mental frame. Mental frame does not just
change the words Gemini uses. It changes the logic it
uses to solve your problem. So instead of writing a
long list of rules like be practical or don't
be too academic, you can simply
show Gemini a shot or example of the perspective
you wanted to take. So let's go step by step. First of all, I'll
open a new chat. And here, I would like to
switch to a pro model. And just a heads up if
you are on a free plan, you will still have
access to the pro model. You see, I'm using
my free account, and I still can
select this model. But your usage limits may be
lower than on paid plans. So I'm coming back to my account that I
use for this demo. First, let's see how Gemini handles request with
no framing at all. I'll ask about a popular
topic personal branding. I want to learn about
personal branding. How should I start?
Let's hit Enter. If we are interested, we can look at Gemini's
thinking process. You see those are
the steps that it took to give us this
recommendation. Everything is correct, but
it's very theoretical. It feels like a long to do list before you
have even started. Now let's use a one
shot example to shift the logic to a
hands on mental frame. I want Gemini to
act like a coach who values immediate small
wins over big theories. So here is my new prompt, apart from my
original instruction. I also included an example
of hands on logic. Let's enter and see
what Jimmy and I would suggest here. See that? Because I labeled the
logic as hands on and showed Gemini the
hello world example, it isn't giving me a
reading list anymore. It literally tells me the
hands on recommendations, things that I can do right now. So now, Gemini is mirroring
the way of thinking, not just the tone and style, like in our first example. Alright. And let's take
one more quick example. This is especially useful
when you are doing research. Let's say you want Gemini not only to answer the question, but also to show where the
information comes from, you can include an example that demonstrates the
format you want. For instance, you can write
a full prompt like this. And what's important, I also
provided rules for Gemini. For the cases, it cannot find a reliable
source for a claim. Let's run it. This
kind of example makes the output much more
structured and easier to trust because you are
showing the exact format, you want for
evidence. All right. Apart from one or
few shot prompting, there is another technique
using interactive examples. Interactive examples differ from regular examples in that
they create a dynamic, back and forth learning
experience where each example builds upon previous understanding
or feedback, while regular examples
are study demonstrations. Interactive examples involve active participation
and iteration. Here is how interactive
examples work. You provide an initial
version example. The AI gives specific
feedback and suggestions. You create an improved version
based on that feedback. The AI analyzes the improvements and suggests further
refinements. You iterate again if needed. The key is that each
iteration builds on the feedback from
the previous version, creating a collaborative
improvement process. Okay, great. And that's
it for this video, let's quickly cover what
we've learned here. And I'll see you
in the next video where we will cover yet
another prompting technique.
12. Specifying Output Format in Gemini: Every one. We're almost done covering the key ingredients
of a good prompt. There is yet another component you might
find worth including in your prompt information on what format you want the
AI's response to take. Let's talk about this now. Remember that in our first
lecture on prompting, we said it's important
to include information regarding the basic
outline or list of points. You won't cover it as
context for your task. It turns out you
can also specify your formatting preferences
for the response, which can help organize
information more effectively. This information may not be necessary depending on the task, but if you include
it, edging it towards the end of your prompt is
better than at the beginning. Let's go through some examples of formatting you can request. You can ask for specific
formatting styles. For example, if you need a business report,
you might say, Please format this as a
professional report with headers, subheaders and short
clear paragraphs. AI will structure the
information accordingly, making it ready for
professional use. When working with
data or analysis, you can request tables
or specific layouts. Instead of a wall of
text, you might say, present the comparison
of these three products in a clear table format with
features in the left column. This makes complex information easier to understand and use. And here are a few more format and patterns that are especially useful for research or decision
making. Comparison table. Give me a comparison table of these options with
columns for key features, pros, cons and best four. Source mapping, list the
sources you used and briefly explain what each
source supports in your answer. Facts versus interpretations. Separate your response
into two sections, facts, verifiable statements,
and interpretations, your reasoning, assumptions
or recommendations. You can request a specific
markdown formatting. The AI can use bold text, italics, headers, and
bullet points as needed. Just ask for key points in bold or important
terms in italics, and it will format the
response as you requested. You can organize your tips using bullet points for
claridm main tip, supporting detail,
and another detail. Lastly, remember that
you can always ask to reformat the response if the first version isn't
quite what you needed. It's perfectly fine to say, Could you reorganize this
information as a numbered list? Or please break down this into shorter paragraphs for
weather readability. Okay, and that's it for
this brief lecture. Let's recap the key points
we've just covered. Always specify your
desired format upfront to get the
most useful response. You can request
specific structures like reports, tables, or lists. Comparison tables are
great for decision making. You can ask for a
structured table with pros, cons and best form. For research tasks, you
can request sources and even separate facts from
interpretations for clarity. A AI model can adapt
its writing style to match your needs from
casual to professional. Markdown formatting helps highlight
important information. You can ask for reformatting if the first response
isn't quite right. Clear formatting
instructions lead to more useful and
actionable responses. And that's it for this video, and as always ALCa
in the next one.
13. Follow-Along: Choosing the Right Model and Brainstorming with Gemini: Everyone. Up until now, we've been exploring
Brampton in isolated pieces. It is time to bring those pieces together into a complete
end to end workflow. And along the way, I'll show you some productivity i packs available in Gemini, like how to double
check responses for accuracy and export them
directly to Google Docs. We are going to explore two
scenarios that are by far, one of my favorites when it
comes to working with Gemini. Those are brainstorming
and getting feedback. But before we start with
our first scenario, let's talk a bit on how
to choose your AI model. You have seen me switching between them throughout
this section demos, and you may be wondering, so what model should you choose? And when your choice depends
on your subscription plan. If you are a paid user, I suggest you make thinking
your default choice. Its reasoning power handles almost everything
switch to fast, only for low stakes tasks
like quick grammar checks or quick questions and switch to pro when you are dealing
with long documents, deep research or
anything that requires sustained focus
across a large amount of content that's where
it earns its place. I've been working with Gemini
for quite some time now, and this is the
best workflow I've come up with after a
lot of experimenting. If you are a free user, keep fast as your
default because the more advanced models have limited daily
quotas on the free plan, so you need to be
strategic and save those credits for when
you really need them. Switch to thinking
when a task requires deep logic or multi step
reasoning and switch to pro when you are working with large content or need that high level of
nuance and depth. Now, with that in mind, let's jump into our first follow along scenario of
brainstorming process. I want you to imagine you are the marketing manager
for a very ambitious, imaginative sleep tech startup
called Snooze AI systems. We are about to launch
the Snooze One, the world's first
autopilot for your dreams. As you can see from
our internal briefing, This mattress has everything from climate zoning technology, dream sync analytics, and
the vibe sing story engine. Need to build a social
media launch campaign that makes smart sleep
sound essential. So let's open Gemini
to start the demo. I'm selecting the thinking
model because we need a creative strategist
who can handle nuances. And let's begin
our brainstorming. Here is the first prompt
that I'm going to use. You see that I first introduce the role that
I want Gemini to take. Then I included a bit of context in terms of what
we are about to launch. Our target audience. And then I gave a task
to Gemini to suggest tent content themes for
our 30 day launch window. And let me also include
the PDF file that you just saw to provide even
more context to Gemini. And let's hit Enter so here are the ten themes
that Gemini suggested. I like this theme the best. So let's ask Gemini to dig deeper into this
specific theme. So here is my second prompt. And let me actually
specify that I want ten cost ideas. Let's hit Enter. Great suggestions. And in case if you don't
like some of them, you can always ask Gemini to
suggest you ten other ideas. So let's do this. I notice when you do this several times, you can come up with
really great suggestions. So please try do
this and don't just use the first list of ideas
that Gemini provides. Let's do one more iteration. I gave some feedback to Gemini regarding the list of
ideas that it provided. Nice. I see that we can work with some of the ideas further. But before we start doing the actual scripts for
our post or videos, let me ask Gemini
another question. Before we proceed next, I want to know what are the current social
media content trends for tech product launches, like in our case. Here are the trends. You see that it correctly
picked up the current year. And here is my next prompt. I'm going to ask
Gemini to suggest ten short form video script IDs for the vibe check
Storytelling series. Let's say that I would like Instagram to be our
platform of choice. And notice that I also
included this PDF with the viral hook ideas that I want Gemini to use
when preparing the response. This is what is
called grounding. So I'm anchoring
the EIs response in our specific brand style so the scripts
don't feel generic. Next, I also provided
the structure for the script and that's it. Let's hit Enter. Alright, we see that Gemini included
some placeholders, and I really want to
have a full script ready for the teleprompter so that we can just record the video. So when brainstorming,
I start with asking Gemini to explore a
wide range of ideas, then I might iterate on
those ideas several times. And then I usually select an
idea which I like and ask Gemini to narrow down on
that topic and let's say, create a post or a story related to that
idea of my choice. Alright, our script is ready. I can continue talking
with Gemini and ask to adjust that script or take
another idea to expand. But let's say I'm
fine with this one, I can actually export this script directly
into the Google Doc. You see three dots I can hear. If I click on it, I can
choose export two dogs, and let's see what happens. Gemini tells me that the
new document is created. Let's click Open. Very nice. We even have a table
with time codes and exact text that we
need to say very cool. And you also see
here Geminis jests to export this
table into sheets. Let's try to do this as well. Personally, I like export into Google Docs
for this scenario. I think it works better
for this type of document, but you got the idea. That's it for this tutorial
and Alca in the next one.
14. Follow-Along: Getting Feedback with Google Gemini AI: Everyone. Welcome to the
second follow along video. Let's explore getting
feedback from Gemini. This use case is one of
the first I started with. When using EI assistant. I used to submit my documents like presentations,
reports, resume, and ask EI for feedback
so that I can get a second opinion on it
and make improvements. But Gemini moved this process to the entirely new level as
it's natively multimodal, meaning it can process
not only texts but other types of
content like videos. You can now get
personalized feedback on how you actually perform, not just on what you wrote. The reason Gemini is so dominant here is its massive
context window. That's the first time
we are using this term. So let's introduce it. The context window is essentially the IIS
short term memory. It is the amount of
data the model can hold in its brain at once
to understand the request. While other models
might struggle to remember more than a
few minutes of footage, Gemini can process up
to 1 million tokens. To give you an idea
that's about an hour of video or thousands of pages
of text in a single go. This massive memory
is exactly why we see so many users switching to
Gemini for video analysis. But don't just take my word
for it. Let's verify it. I'm going to use thinking
mode to verify the claim. And this is the prompt that
I'm going to use first. Let me hit Enter. The reason why I started with
this question is because I want to show you the double
check response function. And here is the response with the details on
why professionals are making the switch to Gemini and to access the double
check response function, click on three dots icon at
the bottom of the response. And here you will see
double check response. This feature uses Google
search to find content that's slightly similar to or different from statements
generated by Gemini. And please note that
this feature is specifically built to
verify factual claims. It won't appear for things
like creative writing, code, or similar tasks. Gemini started evaluating
the statements And here we see the
green highlights confirming the claims
that Gemini did. And we can even expand
this window to see the detailed article that Gemini used to
validate this claim. That's pretty
convenient feature. And now let's get technical. I recorded a video of myself during a Zoom interview for
a head of product role. This is a 1 hour recording, which is a massive
amount of information. And because of this, I'm going
to choose the pro model. But first, let's
start a new chat. Here I'm going to
choose P. Pro Model is designed with a much
higher intelligence ceiling and is superior at maintaining a coherent understanding across the entire hour of footage. So let me attach
the footage first. I have ten different
video fragments here, and I also submit
my instructions. I started with giving Gemini a role of executive
leadership coach. I provided context in
terms of the video, what I'm doing here,
and this is my task. With the specific questions that I want Gemini
to go through. My expectation from Gemini is to provide me with information
in terms of my presence, communication,
style, and clarity, my strength, and areas
for improvement. And I also asked Gemini to provide the specific
timestamps for its observations so
that I can quickly find the fragment Gemini is referring to and
rewatch it myself. Watch how Gemini processes
this information. And here is the feedback. Those are great observations and things I could
definitely improve on. And now let's take that feedback and turn it into
something useful. I'm going to ask Gemini to rewrite my tell me
about yourself script so that it is more
punchy and it is more relevant for the head of product role I'm
going to apply for. When you work with Pmdel
like in our current example, the response generation takes
significantly longer time, so be aware of this. And finally, here is
the rewritten version of my Tell me About
Yourself introduction, it looks quite good. But of course, if I would use it in a real
conversation next time, I would prefer to
change some things to make sure that it
does sound more like me. Great job Gemini. And just like that, you have turned Gemini
into your personal coach. I can imagine so many use cases for this kind
of video feedback. Imagine you are doing a
28 day yoga challenge, and you need daily feedback
on whether you are improving or maybe you have
a fear of public speaking, so you can record yourself, submit the video to Gemini, along with your presentation slides and ask what
worked and what didn't thing I noticed when
I started doing this regularly is a positive side
effect I didn't expect. The fact that you are recording yourself makes you
more self aware. Even before Gemini
says anything, you start paying more attention to what you are doing
and how you're doing it. But that's it, and
that is important. Take AI feedback with
a grain of salt. These models are
incredibly powerful, but they do make mistakes. For instance, in the
example we just looked at, Gemini told me I was seated the entire time while
I was standing. So use the insights
as a starting point, but always rely on yourself
for the final judgment. Please let me know in
the Q&A for this video, what scenarios
you're experimenting with Alcia in the next one.
15. Keeping It Real: Practical Strategies to Minimize AI Hallucinations: Everyone, imagine
asking AI assistant about a recent news event, and it confidently cites a detailed article that does not actually
exist or asking it about public figures and
getting responses that mix real facts with
completely made up details. These aren't bugs or glitches. They are what we call
hallucinations in AI. And they are one of the biggest challenges when working with large
language models. Let's explore why
these hallucinations happen, how to spot them, and most importantly, practical
techniques you can use right away to get more
accurate, reliable responses. To understand why
these errors happen, we have to look at how
these models are built, unlike a human who truly
understands a topic, language model works by predicting the most likely
next word in a sequence based on statistical patterns because they are designed to
be as helpful as possible, they often prioritize
providing a complete, fluent answer over
admitting they are unsure. When a model reaches a gap in the information
it was trained on or when it encounters
an ambiguous request, it might fill in the blanks by guessing the most probable
sound in response. It isn't a glitch. It's
a side effect of the AI prioritizing a smooth
conversation over verified truth. Now that we understand
why hallucinations occur, let's explore how to
spot them in practice. Think of this as developing
your AI fact checking skills. Once you know the warning signs, they become much
easier to catch. Here are the key warning
signs to watch for. Overly specific details. When AI model provides
very specific detail, especially about recent
events or statistics, this should trigger
extra scrutiny. For example, if it
provides exact numbers or statistics for very niche
or rapidly changing events, without citing a live source, that is a red flag. In these cases, the AI
may be generalizing from similar historical
patterns rather than reporting on the specific
event you asked about. Perfect sounding citations,
examples or statistics. If you notice an answer
that sounds too perfect, that's a good reason to
double check the information. And believe me, the
more experience you become working
with EI tools, the better you will be exporting these two good to
be true moments. You will develop an
instinct for recognizing when something feels
off or overly polished. And that's your
cue to dig deeper, verify facts, or
cross check sources. Trust but verify. That's the golden rule when working with EI
generated content. Inconsistent answers. If you ask the same
question multiple times and get different
specific details each time, that's a strong indicator
of hallucination. Overly definitive statements. When AI makes very
definitive statements about topics that should
have some uncertainty, especially regarding
future events or complex topics, be cautious. Knowing why
hallucinations happen and how to spot them
is a great start. But how do we actually
prevent them? Let's go over four
useful strategies that will help you
get more reliable, accurate responses every time. Strategy one. Be explicit
about uncertainty. Instead of asking
a direct question that forces the AI to guess, give it a clear out by asking it to prioritize accuracy
over completeness. For instance,
instead of writing, what were the key findings of the Johnson's
report? Try this. If you have verified access
to Johnson's report, please share its key findings. If you are not 100%
certain about any details, please explicitly state which
parts you cannot verify. Or instead of list all the companies
using this technology, try this based on the
data you were trained on. Can you list verified examples of companies using
this technology? Please provide the
specific sources or context for each example and indicate if any of these cases are speculative
rather than confirmed. Instead of what's
the market size for AIchatbds right
now, try this. Can you provide the most
recent market size estimates for AIchatbds from
reliable cited sources? Please specify the exact
time period for any data you share and let me know if you don't have access to
the latest figures. Notice how each revised
prompt explicitly gives permission to acknowledge
uncertainty and limitations. This simple change can dramatically improve the
reliability of responses. Strategy two, demand
evidence based citations. When you ask for sources, don't just look for
a list of links. AI can sometimes generate perfect looking citations to papers or websites
that don't exist. Instead, instruct
the model to quote the specific sentence from the source that supports
your conclusion. By forcing the EI to match its claim word to word
with an existing text, you significantly
reduce its ability to invent details mid sentence. Strategy three, use
structured output formats. Requesting structured outputs can help minimize hallucinations by forcing AI model to organize information
more systematically. For example, please analyze these sales data using the following structure,
verified data points, direct numbers
from the document, calculated metrics,
show your calculations, interpretations, clearly
labeled as interpretations, and uncertainties, areas where data is unclear or missing. Strategy four. Implement verification steps. Include verification steps
directly into your prompts to enhance the accuracy and
reliability of responses. For instance, you can ask list any assumptions it made
during its analysis, highlight areas where it has lower confidence
or certainty. Recommend additional
information that could help validate
its conclusions. This approach ensures more thorough and
transparent output, making it easier to assess
the quality of the response. Right now that you have all the information
on AI hallucinations, take a moment to review one
of your recent prompts. How could you modify it using the strategies
we've just covered? Remember, the goal isn't to eliminate
hallucinations completely, but to create a
workflow where they are less likely to
impact your results. Please share your original
and revised prompt under the Q&A section
for this video. And as always, let's briefly recap the key points
of this lecture. AI hallucinations happen when language models generate false but plausible
sounding information. Hallucinations happen
because the AI is confident storyteller that prioritizes a smooth
conversation over checking its work against
a textbook or real facts. Warning signs of hallucinations include overly specific details, perfect sounding citations,
inconsistent answers, and overly definitive
statements. Be explicit about
uncertainty in prompts to encourage AI to
acknowledge its limitations. Request citations and
reasoning to verify AI outputs and identify
hallucinations. Use structured output
formats to minimize hallucinations by organizing
information systematically. Incorporate verification
steps in prompts, such as highlighting uncertainties or
listing assumptions. All right. And that's
it for this lecture, and I'll see you
in the next video.
16. Working with Gemini Canvas and Gems: Section Intro: Welcome to the next section. By now, you should have a good understanding of
how to talk to gemini. While we will keep building
on those fundamentals, it is time to level up. We are moving beyond basic back and forth prompts
to explore Canvas and jams. We'll begin with
Canvas a side by side workspace where
you can edit text, compare versions, and
iterate on your work. Not starting from scratch
every time and do much more. Then we'll learn jams. These are like custom made specialists that remember
your specific rules, so you don't have
to repeat them. We are going to build
two of them together, grammar and spelling reviewer. This jam acts as a professional editor
to profit your writing while keeping your voice unchanged and an
AI fitness coach, this one can watch
your workout videos, check your form for safety, and even design custom motivational
backgrounds for your phone. By the end of this section, you won't just be
sending prompts. You will be creating
your own personal team of experts to turn your quick thoughts
into finished pieces of work or to automate
your routines. Let's get started.
17. Welcome to Gemini Canvas: Everyone. Welcome back to the first lecture
of this section. So far, we've seen Gemini's
standard chat interface, like the ones we're used to working with in
different messengers. It's great for a quick question, getting feedback
or brainstorming. But it can feel a
bit limited when you are working on a
brand new document. Or a piece of content that
needs multiple revisions. This is because when you are
drafting something complex, you need more than back
and forth conversation. You need a workspace with
various editing tools. That's where Gemini
Canvas comes in. Think of Gemini Canvas as
a collaborative workspace. In a standard chat, the EIS gives you an answer, and if you want to
change one sentence, you usually have to ask for the whole thing
to be rewritten. In Canvas, Gemini opens
a side by side window. On the left, you have your chat. On the right, you have
a living document. It's no longer just a chatbot it is an editor sitting
right next to you. You can click into the text,
change words yourself, or highlight a specific
paragraph and tell Gemini. Make just this part puncture. If that sounds good, wait until you hear this. Canvas is not just for writing, it's also for building. Right from the interface menu, you can generate web pages, visual infographics
for complex data, and even study tools like
quizzes and flashcards. For those who prefer listening. There are audio
overviews that create podcast style summaries
of your findings. Perhaps most impressively, you can generate
functional mini apps. Simply describe a tool like
a family recipe organizer or a personal calendar and Canvas will build and run the
code for you in real time. You don't need to
know how to code. You just need to describe
what the tool should do a process now
known as vibe coding. Now, because Canvas
is so powerful, it can be tempting to jump straight into building
apps and games. However, we are going to take
this one step at a time. For now, in this
section of the course, we're going to focus entirely
on document drafting. Using an imaginary AI mattress
company as our example, we'll see how to use the
Canvas workspace to refine a narrative and generate support and visuals
in one fluid session. Once we have mastered
document creation, we'll move into the more
advanced features like interactive app creation and rapid prototyping later
on in the course. In the next lesson, I'm going to show you how to
open the Canvas interface, and we'll start our very
first collaborative draft. I'll catch you in the next one.
18. Follow-Along: Creating and Editing Documents in Gemini Canvas (part 1): As promised in this video, we are going to get hands on. We will explore how to
navigate the Canvas workspace, how to do targeted editing
using the ask Gemini feature, we will change specific parts of the document without
rewriting the whole draft. We will also take a look
at the quick actions for changing things like
document tone and length. Finally, we'll go multimodal. We'll bring the
brand to life with EI generated logos
and product visuals. Let's switch to
Gemini for the demo. Let's begin by switching
to Canvas mode. For this, I'm clicking on Tools and choose Canvas in
the pop up window. Let's also change the
model to thinking. And I'm going to start with
broad conversational prompt. Here is what I'll type. I gave Gemini some context in terms of what I'm about to do. I provided the task. I said that I need a brief
description of the company and the new product that this
company is about to launch. I also provided details
about the style. I want Gemini to pick up. Let's hit Enter and see
what Gemini will write. It is opening the Canvas
workspace with the chat on the left hand side and with the text on the right hand side. We see here it created the company description
including name, motor, and a brief overview of what this company is doing. Next, we have the information
about the product, including the key
features of the mattress, and it even suggested some brainstorming objectives
for my upcoming demo. Perfect. Let's explore
this workspace on the right hand side. On top of the workspace, you can first of all
see some editing tools. For example, you can change
heading style for your text. You can add a bullet list
or a number at list, or even some formulas here. If you like, you can
print this page. Into a PDF document, and there are other functions
here that we are going to explore a little bit later in this and the
following tutorials. The real magic in this workspace is the ask Gemini feature. Let's say that you
want to make a change in one part of your text. And instead of asking for a
whole new draft in the chat, you can just highlight
the part you want to edit and then write
your request to Gemini. For example, I would like to change the location of
the company office. So what I'm going to do, I
will highlight this text, and I will just include my instruction for the change
that I want Gemini to make. You see, Gemini did the change and included this new text directly
in the document. And on the left hand sidebar, we see that it included the information text and even some description
of that change. Let me skim through this
text and see what kind of edits I would like to make in addition to
the office location. M I can continue working on that document
and going back and forth, including the changes
up until the moment when I will be fully
happy with the text. Frankly, I use Canvas for document creation because
of this ask Gemini feature. As in most cases, I need to adjust a very
specific part of a document. However, here is what I discovered after weeks of
experimentation with it. Since Gemini is focusing on that specific part
of a document, it sometimes misses
the big picture. I have noticed cases where it repeats phrases used
in other parts of the document or brings in terms that aren't
introduced until later. So definitely give your work a quick review to make
sure it all fits together. And that's it for the first
part of this tutorial. And I'll see you
in the second one.
19. Follow-Along: Creating and Editing Documents in Gemini Canvas (part 2): Welcome to the second
part of the tutorial, where we explore Gemini
Canvas for document creation. Apart from ask Gemini, there are quick
actions that you may find useful for making
changes to your text. The first quick action
is change length. This is great if you need to
quickly expand on a section with more detail or shrink it
down into a punch summary. Let's say we want to change
the length for our text, I'm clicking on this button, and then I need to
choose the length that I would like
for my new text. Let's say I want it to be
longer than the current one, and let's wait for the changes. And Gemini has
expanded this text. You see that it highlighted the new text in blue color here. Let's come back to the
week action buttons. And the second one is
for changing tone. So in case if you want to sound more professional or
on the other hand, a bit more chatty, this is the button that
would help you to switch the vibe of your writing with
literally just one click. Let's select change tone, and I can go from formal to very formal or casual
and very casual. Frankly, I'm good with the
current tone for the text, but for instance,
let's make it a bit more formal for the
purpose of this demo. We see that Gemini has changed almost the entire
text fragment here. I would prefer to return
to the previous version. But I think you got
the idea of what this change tone option can do. So I'm returning to the previous
version of the document. And lastly, there is also
function to suggest edits. This is like having
a writing body. Gemini will give you feedback
and show you ways to make your writing better without changing your original
text right away. Let's try this function
as well. Alright, great. We see that Gemini has included some changes with
the information about the reason
for that change. If I'm good with
all those changes, I can apply them all. If you don't like
suggestion from Gemini, and you would like to return to the previous version
of the document, you can tell this to Gemini
directly here in the chat. Cool. So let's
click on apply for the remaining suggestions
so that we can keep them in the new
version of the document. All right. Let's
continue the demo. And as the next step here, I want to create some
visuals to show you the multimodal
capabilities of Gemini. We will have a
dedicated section on visual content creation
later in the course. So for now, I'll just type very short
straightforward prompt. And let me press Andrew
to see the results. And here is the first image. Amazing that Gemini
even included the product name here on
one side of the mattress. Gemini also tells
me that it can only generate one image at a time. It is asking me if I would like to go ahead
with the company logo. Gemini is getting very good at including texts
inside the images. And let's ask for several visuals for features. Great. And you see why it is
important to create images in this same chat where we
created the original text. Gemini uses the context from the previous conversations
to create the image. You see that it took information about
three degree angle, even though this angle looks a bit larger to me.
But that's fine. We can adjust this through iterations working
on this image. It also included the
mattress name here. Let's create the fourth
image. That's awesome. You see that in the description, we have the information
that this feature creates a clean air
dome over the sleepers, and that's exactly what we see here on the
picture. Amazing. And let's check the text. Optimal humidity, air quality. Yeah, and the text is correct. I don't see any mistakes here. Alright, let's
finish this tutorial before it becomes too long. We will continue
working with the text and images in our next video.
20. Follow-Along: Turning a Gemini Draft into a Polished PDF with Gamma: We now have our
brand back story, product features and images
organized within Gemini. Think of this as our
drafting studio. The space for core
thinking and writing. However, our working draft
is not finished deliverable. If you need to present this to a manager or a client as
a professional report, we need to move
this content into a dedicated design tool
like Canva or Gamma App. You might think,
cannot I just ask Gemini to generate the PDF
for me? Good question. And yes, that was my
intention as well when I first got this task
to create the final PDF. Here is how Gemini handles this. If you try to create
PDF in Canvas, you won't get that
final document. Canvas tool is built for live editing and collaboration,
not for publishing. Because it operates in
a private workspace, it cannot see your
local image files to include them in the document. If you try to export from here, you will see file with empty placeholders where
your images should be. Of course, you can try
a regular chat as well. It is more functional. It can generate files in the background to give
you a downloadable PDF. However, it lacks the
layout control and polish required for a
professional presentation. Here is the PDF Gemini
created for me. It is a good start, but it required
significant manual formatting to look right. So to get our presentation
ready finish where text flows correctly around images
and branding is consistent, we move from the drafting
studio to a design studio. In the next tutorial, I'll use Gamma app
to demonstrate this. It's been my primary
tool for nearly a year, and it's what I use for
almost all my design work. However, the same
principles apply to other similar platforms
like Canva or Adobe. Let's head back into Gemini and prepare our
content for the move. Let's transfer our assets
text and images to Gamma app. I'll begin by copying the text. For this, I'll click on
Share and Export button. And from here, I'll
choose Copy contents. And I already downloaded the four images that we generated
in the previous tutorl. So everything is ready
for us to move to Gamma. Let's open Gamma app. Here is Gamma main page. The central part is
the content grid. This area displays our
projects also called Gammas. The top bar here is for
creating new documents. On the left hand side,
we have templates. Here we can access
preset layouts to jump start our
presentation design. We have such useful
things as MAI images. Where we can view and
use images that we've generated using Gammas
built in EI image tool. We can also create folders
so that we can separate our materials by specific
themes or topics. So let's get straight to
creating an PDF file. I'll choose Create New with AI. And here we have
different options. Since we already have a text, which I copied from Gemini, I'm going to choose this
paste in text option. And here I will include
the text from Gemini. Next, we have
several options for what Gamma app can
do with our content. And it's important that we choose preserve this exact text. Meaning that Gamma won't be doing any modifications
of our draft. This is the most
effective method for our example because it
allows us to use Gemini for the heavy lifting of thinking and drafting
and then use Gamma to handle
the formatting and beautification of
the final document. I'm going to select continue
to prompt editor here. Here we can choose different
themes for our presentation. Let's choose this one
and click Select theme. Before we hit Generate, notice the two modes at the top, free form and card by card. Let me quickly explain
the difference. When you choose card by card, Gamma automatically
breaks your content into separate numbered slides. One idea per card, but you can still rearrange
the cards or add new ones. It is perfect for presentations. Reform keeps everything as one continuous flowing document, more like a report
than a slide deck. Same content, but
it reads top to bottom without heartbreaks
between sections. This gives you more control
over the layout and flow. It is great for
documents or reports. For our demo, I will
choose freeform because I want our text and images to
flow together naturally. And let's hit generate. Gamma starts
creating our slides. First of all, what I usually do, I ask Gamma to suggest
several other layouts for me so that I can compare the default layout with
other suggestions. So for this, I click on
Edit with Agent button, and from here, I
choose Try New layout. Let's do one more turn
to see if there is anything better than our
first default option. I think I'm going
to choose this one. I like this background
image here. Let's move to the next slide. I will include our logo
image instead of this one. To change the picture, I'm
going to click on that one. Next, I go to Edit Image. And from here, I'm choosing
image upload or URL. I have my images on my local
Drive. And here we go. This is our first image. Let's attach it. Perfect.
Let's move to the third slide. All right, we are ready to go. Let's do the final check and take a quick look at
all of our slides. To export this file, we click on the three dots icon. Here we choose Export, and I'm going to export to PDF. Let's open the file
straight away, and here we go. Looks cool. So this is my favorite
way to work when it comes to creating new documents. I let Gemini to do the
creative thinking part, and then I let my
design tool of choice like Gamma handle making
it look good part. I hope that you
enjoyed this tutorial, and as always, I'll see
you in the next one.
21. What Are Gemini Gems, and Why Do We Need Them?: Everyone, when you start
using Gemini regularly, you quickly notice
that there are certain things you use
it for again and again, whether it is for brainstorming, getting feedback or
generating new content, you may find yourself typing out the same prompts and giving the same context over and over, which can start to
feel a bit repetitive, like your own digital
version of groundhog day. Well, today we are
ending that cycle. We are going to explore a feature that lets
you package up those repetitive
instructions and turn them into your team of AI experts
or personal assistants. They are called Gemini Gems. And, no, we aren't talking
about diamonds here. Though once you see how
much time the save you, you might think they're
just as valuable. So what exactly is a jam Think of them as
customized versions of Gemini built to help you tackle repetitive tasks or get deep
expertise in specific areas. When you chat with Jam, Gemini remembers your
goals and guidelines automatically saving you from repeating yourself
in every prompt. So while a standard Gemini
is like a librarian, who knows where everything is, a gem is like a
dedicated specialist. It does not just
know about a topic. It follows your specific rules
to perform work for you. There are three types
of jams, premade jams. These are out of the box
tools built by Google. You cannot see or edit
their underlying logic. You can only pin them to your
sidebar for quick access. They often have unique
interfaces like the ten page storybook layout that regular jams
simply cannot mimic. Custom Jams. These are the focus of our next tutorials because
you build them yourself. You provide the
instructions and can upload up to ten personal files to act as the Jams
knowledge base. It is the difference between
a general assistant and a dedicated expert tailored specifically to your
data and your goals. Jams in Opal. Ople is an experimental project that moves AI beyond
simple chat windows. These drums are
interactive mini apps that follow a specific workflow. Their standout feature is
the ability to remix them. You can take a pre built
tool like a fashion stylist and modify its internal steps
to create something new. They are highly visual
and can generate text, images and video simultaneously. We are going to explore these dams in the later
sections of the course. Now, since we have already
worked with Canvas, you may now have a
logical question. How is a Jam actually different? The key is to think of Canvas
as your shared workspace. It is the collaborative
desk where you and the EI work side by side on
long form documents or code. Gems, on the other hand, are your tactical specialists. You use a drum to produce
the initial draft, like generating a
specialized first version based on your uploaded data, and then you hand off that work into Canvas to refine
and polish it. One is the specialist, you call for the initial output. The other is the desk where
the project is completed. Of course, you can
also use drums entirely on their own
for certain tasks, and that brings us to our
next follow along lecture. But before we start
working with drums, let's briefly recap what
we have learned here. All right. And that's
it for this video. I'll meet you in the next one.
22. Follow-Along: Building a Grammar Check Gem: Everyone, and welcome to our first tutorial
on Gemini Gems. Today, I'm going to
show you how to build a custom expert to
proofread your writing, whether you are
drafting landing pages, product descriptions,
quick emails, or any other texts. It's like having a
second pair of eyes that gives you total confidence
in every word you share. Let's open Gemini
to create that Jam. We are going to start
by clicking on Jams. In the sidebar, we go
to Jam Manager here, the section where we
create custom Gems. And here I'll click New Jam. Let's begin by
providing the name for our Jam here is my
gem description. Next, I included
my instructions. This is by far the most
important part of your gem. I included role description, saying that you're an
expert at checking grammar, spelling and punctuation in English text and fixing them if you encounter
any mistakes, then I provide target
audience description if you follow along and
building the same type of gem, you can change the target
audience to something which is more relevant to your
use case and domain. Next we have core rules
followed by the information about what output we are looking for and we also have
a starter prompt. You see that I'm using hash
tags in the instruction text. These act as section dividers that create a clean skeleton
for your instructions. They make Gems brain
more organized so that the AI knows where one rule
ends and the next one begins. Now let's get back to the set
of rules and discuss them. How do I actually come
up with this list? I highly recommend doing the
task you want to automate three to five times manually before you even
try to build the jam. If you go straight
into the instructions, it can feel intimidating. Every rule in this list exists because it is a
specific preference. I discovered over weeks of
manually prompting the AI. You also may notice that I'm
using words in cups log, like for example, here. There is no technical
requirement to use them. Gemini is very sophisticated. It understands lower case, just as well as upper case. But I found that using
them is still helpful. Think of those words
as power words. We can use them to highlight
the non negotiable rules, so the AI knows exactly what
is a must versus a maybe. Alright, let's move next. I'm fine with these
instructions for now, even though we can always
get back to this list after we create this jam
and further edit them. We can also choose
a default tool. This tool will be
selected when you start the new conversation
with the Jam. I'll choose Canvas
as the default tool. Instead of a messy
chat conversation, your directed text
will slide out in a clean side panel perfectly formatted and ready
to copy them. You can also include files
to the knowledge base if you want your jam to reference
any external sources. When preparing the response, you see that we can upload files from different
sources here. But for this specific example, I'll leave it empty. And we are all set. So let's save the am. I'm clicking on Save button. And we can start our new chat. Here is the text that I
want Gemini to check. I made several grammar
mistakes here on purpose. So let's see if it will be able to find them and
correct this draft. It is opening a Canvas
with our new text. Look great to me. And remember that you can use this Canvas interface to make some quick edits in
that text in case if you feel like you want to
introduce some changes here, for example, let's highlight
reconcile and ask to find alternative And if we are fine with these edits, we can click on
Share and Export, choose copy contents, or we can choose to export this text
directly to our Google Docs. Let me return to
our JAM you see we have it in the list of gems here on the left hand side bar. One thing I noticed, there is no conversation
starter here. So when I opened
this am interface, it's not quite clear to
me what should I do here? I did some research, and I found this article with
exact same question. It turned out that those conversation
starters are not supported by gems at the moment. There is also a
workaround we can try. The article says that you can simulate starter prompts like this by including additional
description into your Jam. Right, let's try to
include an example of a conversational starter
to see if this will help. I'm returning to my Jam. If I click on the
three dots, CN, I can choose Edit option, and we can make any
changes here we want. Let me just include this example below the current version
of the instructions. And what we can also do here, apart from including an
example of our starter prompt, we can use this
magic button so that Gemini will rewrite our
instructions and improve them. Let's try this out. Maybe it would help. I see that Gemini has removed our example of the
starter prompt. What I decided to do, I included the rule number six, asking Gemini to always start the conversation with the
following starter prompt. Let's see if this will work. So I'm going to update
my gem instructions, save them, and let's test. When I opened my updated Jam, I still don't have any
conversation starter here. Unfortunately, all my
other experiments with defined Jams instructions to add the conversation starter
turned out unsuccessful. Given this, let's define the
jam description to provide information on what a user has to do to begin
the conversation. For this, let's return to
the JAMS editing interface. I included submit your
text to get started. Text at the end of
the Jam description, I'm going to update it, and let's test it out again. Our instruction is here, and let's submit something
else for a change. I have this fragment. Let's see how Gemini
will handle it. Perfect. And if I'm fine with this jam and I want to share it with
my friends or colleagues, I can click on Share
button and choose Share. Jim and I will create a link. I can copy it and
then send it out. I leave the link to that
jam in the resources for this video in case
you want to test it. And I'll meet you in
the next tutorial where we are going to build
the personal coach Jam
23. Follow-Along: Building a Fitness Coach Gem (part 1) : Now let's build a jam
that works with video. Let's say I'm doing an online 28 day app
workout challenge, and I want to know if I'm
actually improving day by day. I'm going to record myself doing the daily exercises and ask my AI fitness coach for
feedback as a word of caution, as we already discussed, while the AI is a good partner for tracking your
movement and form, it is not a medical expert. Always consult with
the healthcare professional before starting
the new fitness program. This tool is for coaching and progress, not
medical advice. Okay, let's open Gemini
to begin the demo. Let's create a new Jem. I'm expanding this menu. Go to Jams. Here we see Jams made by labs. I'm scrolling down
to Jam Manager. I already have grammar
and spelling review Jam visible here in
the list of my Gems. And for now let me
create a new one. I'm clicking on New Gem. Let's provide the name, the description and
instructions for our personalized AI coach Jam I included this description. This jam analyzes
your workout videos to provide detailed
performance feedback, and it creates custom vertical motivational
phone backgrounds to keep you inspired. And here are my instructions. So as always, I started
with describing the role. I want this jam to play. In our case, I wanted to be a
professional fitness coach. Then I included a
task for this jam. We are telling Gemini to
analyze our workout videos, looking for engagement
and safety cues like Cin or Domin and I also described that I want
Gemini to create a vertical image with
a motivational quote. I also included starter prompts, even though we've seen that starter prompts are not
quite working right now. But still, let's check what
will happen this time. And to make this
drama truly personal, I'm going to upload an image
to the knowledge base that represents the vibe of the motivational image that
I want Gemini to create. I'm clicking on Plus button. I have my reference
file on my local Drive, so I will choose Upload files. This is my folder, and that's the motivational quote
that I selected. Of course, you can also include other files
here, for example, in case if you have
a research paper that you want this
jam to analyze when providing the recommendations
and not just use its general knowledge can
always upload this file here. And in terms of
the default tool, for this jam, I'm not going
to choose anything here. This is because
our fitness coach is doing two very
different things. It gives us text feedback, and it creates a high
resolution image. So by letting Gemini choose
the best tool for each task, we ensure our phone
backgrounds look sharp and our feedback is delivered
without any technical issues. Everything is good here. We
are ready to click on Safe. And by the way,
notice that there is also this preview window which you can use to test your instructions
before saving them. But in my case, I already did the first test before I started
recording this tutorial, so I'm good to go. I'll just click on Safe
and let's start our chat. Have uploaded my first video from the day one of my workout, and let's wait a bit for
the Gemini to process it. Our video has been uploaded, and before we press Enter, let's talk about
model selection here. So since this jam involves multimodal analysis,
watching video, checking for safety queues and providing
structured feedback, I'll choose a
thinking model which prioritizes reasoning
over pure speed. And we are all set here, and I'll just hit Enter. And here are the
recommendations from Gemini. First of all, I really
like that it tells us that this information is for
informational purposes only. And for medical
advice or diagnosis, we should consult
the professional. That's totally true. Notice that it successfully identified that this is my day one workout session because of the relevant name of that file, there was a day one
workout in the name. Here is the scorecard, what I nailed, and
one thing to improve. I can agree with this. And
next, there is a question. Would you like me to
create your custom daily motivation
phone background, based on your day one progress? Yes, definitely, yes.
So let's just reply. Yes. And here we go. We have this perfect quote, but there is one issue
with that image. If we compare it with my
original reference image, we would find that
they are not the same. Here is an image that I
asked Gemini to create. You see that background
is completely different. So let's get back to our jam and let's work with
Gemini to see if we can change this and
make sure that it creates images with
similar background as in our reference file.
24. Follow-Along: Building a Fitness Coach Gem (part 2): Welcome back. In the first
part of this tutorial, we set up the core logic
for our fitness coach Jam. But we came up
across a limitation. Even though we uploaded a reference image to
the knowledge base that generated daily
motivation backgrounds didn't look anything
like our original image. Let's fix that by
understanding how the system actually processes these different types of data. Have mentioned before that
Gemini is multimodal. It can see, read, and hear all at once. That is all true. However, there is a
technical difference in how a gem reads a file and
how it creates an image. When we applaud a reference
to the knowledge base, Gemini uses its vision capability
to analyze the file and summarize it into
text based data for its long term memory. But when the am
generates a new image, it triggers a separate
image generation model. According to Gemini's
technical documentation, this generation model
cannot directly see the raw pixels of your
knowledge base files. It only receives a
text based prompt. If your instructions simply say match the style in
the knowledge base, the AI is working
from a summary, not the original source, and the original
style gets lost. To solve this, we move from
referencing to specifying. Instead of showing
the jam a file and hoping it interprets
the style correctly, we are going to write a visual specification directly
into the instructions. This ensures that every time
the jam creates an image, it follows your exact rules
without any guesswork. Here is how we do this. Go to your list of Jams, find the one that you'd like to edit and click on the edit icon. And from here, go to
your instructions. In the motivation section, let's remove this
vague instruction. Next, we will add a description for our image to create it, open a separate chat, applaud your reference image, and use this prompt. I suggest switching to thinking model here
for by the results. Once you have the
image description, paste it right into
your Jams instructions. Here is the description that I have for my reference image. This defines the layout, the phones, and the atmosphere. So the model has a clear
set of guard rails. Once we do this, we can click on Update to save the changes. Let me start a new chart to test the changes
that we just made. A You see that our new image and the
reference one are not the same but very
similar in their layout, visual hierarchy, and
overall aesthetic, a frosted glass textbook over a soft pastel cityscape at dusk. And that's it for this tutorial. Please write in the comments for this video what jam you
are planning to work on. And I'll see in the
following video.
25. Gemini for Visual Creation: Section Intro: Welcome to this new
section of the course. You have already
seen me creating a few images with Gemini
earlier in the course, and now it is time to
get into the details. We are going to take Gemini's
image and video tools for a proper test drive. And I think this is one of the most visual parts
of the whole course. We will start with image generation and
not just the basics. I'm going to show you how to use techniques like
contextual blending. Where you combine
reference images to create something completely
new and iterative refinement, where you direct gemini
like a photographer adjusting one element at a time until you get
exactly the shot you want. We'll also look at visual
synthesis where you hand Gemini multiple ingredients and let it build a single
seamless scene. From there, we will head into what I call
the editing suite, where we will use Gemini to work with images
you already have think restoring old photos, turning rough sketches
into product shots and making precise edits using
Geminis building markup tool. Well then look at building complete visual
systems, infographics, flow diagrams, and assets adapted for different
platforms and screen sizes. We will finish this section with the tutorial on video creation. And of course, I will also
share my top prompting tips the practical
recommendations I have developed from
working with Gemini other AI image and video
generation software that will help you
get better results. All right. Let's get creative.
26. What Is Nano Banana? Key Features Explained: You might have noticed strange little banana moja
appearing in your Gemini app. It's not just cute icon. It is a tiny clue to a funny naming story
behind this model. Before this model was
officially released, Google submitted it for anonymous testing on a
platform called ALM Arena, a public site where
people compare two AI models side
by side and vote on which result they prefer without knowing
which model is which it is how AI labs gather real world feedback
before a full launch. The model needed a
placeholder name, something that would
not hint it was a Google product to submit
it into the LM Arena site. At 2:30 in the morning, Google product manager named
Nina typed Nano Banana. Thinking it was just
a placeholder label that nobody outside the testing
platform would ever see. But the model performed
so well that people on X became obsessed
with this mysterious, powerful Nano Banana, speculating about which
lab had build it, whether it was a
secret Google project, whether it was
something entirely new. Instead of quietly
correcting the record, Google leaned into it. They addit the banana image
or the Gemini app and even made a limited edition
banana themed merchandise. The reason the banana went viral was not just
the name of horse. It was one specific
capability that EI image tools had been
getting wrong before, character consistency
in the past, if you uploaded a photo of yourself and asked an
AI to reimagine it, you would get something that
looked vaguely like you. What people started calling
your AI distant cousin, Nano Banana changed that you upload one
photo of yourself, and it preserves
your actual likeness across completely
different scenarios, you can turn yourself
into a graffiti mural. Custom to card or a ceramic k, and it's recognizably
you in each one. You can transport yourself
to different places, different outfits,
different decades. The face stays yours. You can even add motion turning aesthetic portrait into a
short video where the subject turns their head or
shifts expression we will look at that
in more detail when we get to view
Gemini's video model. But character consistency
is just one piece of it. Let me walk you through the other things that make this model worth understanding. Scene blending, lets you upload two separate photos and fuse them into a single
coherent image. You can put yourself and
historical figure at the same table or create a group photo of people who have never actually
been in the same place. Gemini handles the lighting,
angles and context. So the result feels
like one image rather than something that
looks stitched together. Multiturn editing turns
your conversation into a living canvas. You don't have to get everything right in
your first prompt. You can start with an empty room and talk it into existence, paint the walls,
add a leather sofa, place a steaming cup of
coffee on the table. Each prompt builds on the last. One important thing to remember, the chat keeps context
across your edits. So if you want to start a
completely separate project, open a fresh chat rather than continuing in
the same thread. Design mixing is about
taking the texture or visual language of
one thing and mapping it into something else
entirely the pattern of a butterfly wing becoming
a high fashion gown. The texture of marble tile wrapping around
a pair of sneakers, it is less about editing
a photo and more about merging two worlds that don't
normally belong together. Now, one important thing to understand about how all
of this fits together, Gemini itself is a reasoning and language
model at its core. The image and video
capabilities come from dedicated specialist models that Gemini calls behind
the scene for images. That's Nano Banana. Officially named Gemini
2.5 flash Image, though nobody calls it that. For video, it is a
model called VO. Think of them as
Gemini's creative team available on request. When you ask Gemini to
generate or edit an image, it hands the task
to Nano Banana. When you ask for a
video, it calls VO. The conversation
stays in Gemini. The specialist work happens underneath in the next lecture, we are going to
open Gemini and try creating our first images.
I'll meet you there.
27. Creating Your First Image with Gemini: So now that you saw the preview of Gemini's visual capabilities, let's get our hands dirty
and create our first image. Image creation is
available on all plans. Let's open Gemini
and get to work. To create an image,
you have two options. Option one, create an image
in your existing chat where you ask questions or work on creating a
new piece of content, like in our last
lecture when we worked on our product brief for
an AI mattress company. Option two is to
start from scratch. That is what I'm going
to do this time. I'm going to start with
the simple prompt. A fluffy orange cat
sleeping on a sofa. To tell Gemini that we are
going to create an image, let's choose image in
the list of tools. This way, Gemini knows that we are expecting an
image as the output, so we don't need to type these verbal instructions
in the prompt. The next step before generating an image is to choose an
image generation model, either fast thinking or pro. I'll choose fast this time. An alternative way to create
an image would be to type in create an image of
directly in your prompt. And in this case,
we don't need to select Create image
from the list of tools. This is my preferred way
of working with Gemini. But for this demo, let's go ahead with
Create Image selected. Our image is ready, pretty good given
how short our prompt is and that it's just
our first iteration. You can share, copy or
download that image, or you can continue adjusting
the image just by chatting with Gemini and adding more details to your
original prompt. You see that Gemini modifies
the image prompt by prompt adding more details while keeping all the previous
context in place. But in case you
want to start over with one of your
previous iterations, click on more and choose
branch in New chat. Then you can give the
prompt to Gemini, and in that case, Gemini will
change that selected image Of course, you can give Gemini the entire
prompt straightaway, or instead of describing
details yourself, pick a style for your image. For instance, instead
of describing what light we want to
have in our image, let's choose cinematic
from the list here. You saw me selecting between
fast mode and thinking mode. In the Gemini app, these modes represent how
much processing power and reasoning the AI uses
to build your image, while the specific model
names under the hood, like nana Banana evolve rapidly the way these
two modes function. Remains constant. I always recommend checking the official Gemini
support pages for the latest version names. But here is the best way to
think about your workflow. Think of fast mode as
an interactive layer. It is built for speed
and quick iteration. If you are changing shirt color, trying a new hairstyle, swapping a background or
generating lots of variations, keep it on fast Thinking mode, the reasoning layer, this takes longer because it's more
careful before it generates. Use it when you need
precision, like clean, readable text on assign
consistent product shots or complex scene where
details really matter. You can ask me, but Anna, why I wouldn't just use thinking all the time
if it's more powerful. It is a fair question, but there are two
practical trade offs. First is time. First mode
is speed of thought tool. Thinking mode requires
waiting period while the EI thinks
through the prompt. Second, is usage limits because thinking mode is more
computationally expensive. It usually has tighter daily
limits than fast mode. My recommended
process use fast mode to explore and generate
rough options quickly. And once you have found
your hero concept, switch to thinking mode for the final high
fidelity polish. Start with thinking
mode immediately, only for high
complexity tasks like visualizing process flows or creating images with
specific localized texts. All right. Now you have
an initial idea of how to prompt Gemini
to create visuals. In the next video, we are going a bit
deeper and learn how to create a good prompt
LCR in the next video.
28. 7 Prompting Tips for Creating Better Visuals: Hello, everybody, and
welcome back to the lecture. As this section of the course is all about generating visuals, we cannot overlook such
an important topic as how to create
those instructions. In the upcoming video, I will share my top
seven recommendations on how to craft effective
prompts. Let's get started. Sometimes you will
see solid outputs with simple open ended prompts, especially if you are
open to surprises. However, when you have a
specific vision in mind, describing various details can help lead you to perfection. But regardless of the
direction you want to take I recommend starting with a simple prompt and then adding extra details one by one to see how they
affect the image. Begin with the description
of your subject matter, person, animal, landscape, fictional
character, and so on. Generate your first
image and then include extra details or context
like its location, information about the
environment and lighting, as well as emotions or moods
you'd like to introduce. To clarify the idea of
what you want to create, it's helpful to ask yourself
a series of questions. Here is a checklist
you might use. Decide if you want a
photo or an illustration. What is your subject
matter, person, animal, landscape, fictional
character, and so on. Think of specific
effects and details you want to include art
movements, themes, techniques, effects,
materials, concepts, color, and tone, lighting,
and composition. Go beyond the basics and include additional descriptions in
your prompt that can take the creative process in a completely different
direction or add extra flavor and
nuance to your images. Here are just some examples
of what you can add. Type of photography,
environments, emotions and moods,
specific art styles, cinematic or painterly effects. Experimenting with these kinds
of descriptors is one of the most enjoyable parts of working with Gemini
image generation. Small additions can dramatically change the feel of an image. Pay attention to the order
of the words in your prompt. The words at the beginning carry more weight than the
words at the end. So if your snowy landscape matters more than the
cabin in the foreground, lead with the landscape. Try reordering the
same set of words, and you will often get
noticeably different results. Be mindful of third
party rights. Gemini does allow you to reference historical artists
and art movements by name. So asking for a man like quality or a style of Vang
works perfectly. However, the EI will block prompts that ask for the styles of living or contemporary
artist to protect creators. It also restricts copyrighted
characters and brand logos. If you want the look of modern artist or
a specific brand, describe the visual
qualities you are after instead of
naming them directly. Look for inspiration and examples when crafting
your own prompts. If you are new to AI image generation and don't
have design background, it can be challenging to write detailed descriptive
prompts at first, and that's completely normal. A great way to get
started is to browse I generated image
communities online, find images you like, look at the prompts
behind them and start experimenting by
making small modifications. It is also a good idea to
create a mood board of images you like and might
want to reference later. Save the image, the prompt used, and any style notes
alongside it. This becomes a really useful creative
reference over time. Last but not least
enjoy the process. At first, it might feel like the EI is doing all
the creative work. But without your unique ideas, your instincts about what looks good and your curiosity
to experiment, the EI would not produce
anything interesting. So be yourself, throw
your ideas out there, and have fun with it. To recap. Here are the seven tips. Start simple, then add
details one by one. Ask yourself a series of questions to clarify
your vision. Go beyond the basics at
descriptors for environment, mood, style, and more. Word order matters. What comes first
carries more weight. Be mindful of third
party rights. Artists styles are fair game, but avoid copyrighted
characters and brand imagery. Look for inspiration
online and build the mood boards as creative
reference. Have fun with it. As always, Alca in
the next video.
29. Contextual Blending, Iterative Renement, and Visual Synthesis: Welcome back. So far, we met the banana Banana and learned how to create
an image from scratch. But in most cases, you aren't just looking
for cool images. You're looking for assets. You need that perfect
hero image for a website or a social media ad that actually stops the scroll. In this video, we are going to explore how to
create these assets. Of course, you can start from complete scratch and prompt
Gemini what image you want. But think about it. Describing a specific
lighting angle, a unique texture or complex physical structure
with just text is hard. You can spend 30 minutes writing the perfect prompt and still
not get what's in your head. But if you show Gemini
reference image, you provide an instant
map of your expectations. Today, we're going
to look at how to use images to talk to the AI. Let's start with the classic
marketing challenge. You have a product,
in this case, skincare bottle, and you want it to look vibrant,
fresh, and premium. For this, we're going to
use contextual blending. Watch what happens when I upload a simple photo of
the bottle alongside the reference image
and then guide Gemini to place it into a
completely new creative scene. In our first prompt, we aren't just asking
for a random picture. We are telling Gemini
exactly what we want by referencing the original image and asking to
replace parts of it, swapping the water for juice and the original bottle for
our skincare brand. Let's begin with fast mode. I hit Submit, and
here is our image. The text is crisp and the bottle is perfectly
under the waterline. Now let's make some changes. First of all, I will add
this phrase into the prompt. Phrases like Ecommerce
product shot, bright studio lighting or pure white background
are the pro secrets that make an image look like a real commercial rather
than an AI experiment. Let's also switch to
thinking mode here. I used the same prompt, but the bottle is suddenly
on top of the liquid. Why? Because the model is actually reasoning
through physics, it knows that orange juice, unlike water is non transparent. It thinks if I submerge
this bottle in juice, the bottom half of the
label will disappear. Let's try to force it by adding half submerged
instructions to the prompt. Similar results. Thinking mode is prioritizing product photography logic over my specific layout instruction. It assumes a good photo
must show the whole brand, so it fixes my composition by lifting the product
out of the juice. Now, let's look at
iterative refinement. This is where Gemini
really shines. You don't have to get the
perfect shot in one go. Instead, you direct it like
a photographer adjusting one element at a time until you land exactly
where you want. For this Gemini brew coffee bag, we are going to build up a rich textual product
shot step by step, starting with placement, then
refining the composition, adding spill and depth, and finally, dialing
in the lighting. Watch how each prompt
nudges the image closer to that premium
roster aesthetic. And finally, let's look
at the technique I think is the most impressive
of all visual synthesis. Sometimes you have an
entire campaign kit, multiple products,
a model, an outfit. In the past, pulling
this together required a massive creative brief and
a lot of back and forth. With Gemini's thinking mode, we just handed the pieces and
let it figure out the rest. Creating from scratch is about direction, not just description. You have seen how
to blend context, refine a shot step by step and synthesize multiple
elements into a single complete image. But what happens
when an image is almost perfect and just
needs one specific change. In our next video, we're heading into the
editing suite where we'll use Gemini to fix restore
and precisely edit images. You already have Alca there.
30. The Editing Suite: Turning Sketches into Prototypes and Photo Restoration: Everyone, and welcome
back to the series of lectures on creating
images with Gemini. In this video, we
are heading into the Gemini editing capabilities. I'm going to show you how to use Gemini thinking layer to fix, restore and literally read and then adjust the images
you already have. This is where we move from being creators to being
sophisticated editors. Let me open Gemini
to begin the demo. It usually starts on a
napkin or a whiteboard. You have a vision for a product, but you aren't a designer. Here is what we are going to do. I'm uploading this sketch of a new chair design to Gemini. I don't need to be an artist. I could just tell Gemini, interpret this sketch into a photo realistic product shot because we are
in thinking mode. Gemini uses the lines
as a structural guide. It understands the perspective I intended and fills
in the details, I could not draw myself. This turns your rough drafts
into prototypes in seconds. Let's change the chair fabric. But instead of explaining the
color and texture I want, I'll use reference images. Surprisingly, I got
this book image since I used the word
cover in my prompt. Let's start a new chat
to make the image right. And, of course, we can give
this share 360 degrees spin. Here I have the hair
image and my video pmt. And I also selected video from the drop down menu to make sure Gemini understood
my task correctly. Now let's look at one of the most powerful repairs you
can do for the restoration. We all have those old
faded family photos or low quality digital
shots from years ago. Instead of just coloring it in, I'll ask Gemini to restore it. Using its thinking layer, Gemini analyzes the textures
and historical context. It removes the scratches,
sharpens the faces, and applies natural
realistic colors as if the photo
were taken today. It's not just the filter. It is the EI reconstructing the quality that was lost
over time. Let's take a look. Mm. Oh, what feeling dancing on the pedal lost
in the rhythm of the sunny
31. The Editing Suite: Targeted Edits with the Markup Tool and External Annotations: Let's move on. What if
the image is great, but you want to change
one specific thing. Let's explore how to work with Gemini's dedicated
image markup tool, and also its alternative. I would like to edit this image. I'll upload it to Gemini and
to open the markup tool. I simply click on the image. And here we have our
editing workspace. What I will be doing here is
called special prompting. I'm showing Gemini
exactly where I want the change and describing
what the change should be. First, I'll choose a color. Let's go with red. And I circle this fireplace. Next, I need to
explain the intent, so I'll switch to the text
tool and type Ed fire. Notice I used a verb here. You can be specific with
actions like add or replace, or you can just
describe the object. For example, let's add two cups of coffee on
this side table here. If you made a mistake, you can always hit the
undo button to go back. I'm clicking on Done as I just finalized the
annotations and let's hit Enter without providing any instructions because we
just made them on that image. And here is the new image. We see that Gemini has
successfully included the changes. We see the fire in
the fireplace and we see here two cups of
coffee. Great job. When I'm opening this new image, you will notice
that clicking on it doesn't open the
markup tool again. So that tool is specifically
for your initial uploads. However, you aren't stuck, you can continue to refine the result using
conversational edits. So here is my new prompt. Gemini is contextually
aware of the image. It just created and will continue making the changes
that you requested. And coming back to my
original annotations, Notice that I like to
match the text color to the circle color while the AI primarily
tracks coordinates. This is a great best practice for keeping your
instructions organized. You can also bring
in annotations from external tools like Canva. For example, here,
I have marked up this photo of the Bursch
Khalifa building. I want Gemini to make
those exact changes. I want this building
to be removed, and I want to change colors for some parts
of the building. I've opened a new chat, and I submitted this
image to the chat. For complex tasks like this, I recommend switching
to thinking mode. This triggers more
powerful reasoning model, that is much better at following these
precise instructions. I will also include
these instructions, including this prompt
here is important. For example, here is the image
that I got when adjusting that same image without providing any
instructions to Gemini. We see that Gemini has
successfully made the change. However, we still
see the annotations, and that was my original image without any
instructions provided. Let's return to our
chat and hit Enter. Unfortunately, this
time, we still have the instructions
on the new image, and we also see that Gemini has successfully
made other changes. We don't see the building
here on the right side, and the new colors has
been successfully applied. Let's ask Gemini to remove
annotation instructions from the image. And here we go. The second attempt
has been successful. As you can see, Gemini
recognize the text, remove the building, and
change the colors perfectly. And then we provided the second instruction to remove the annotations.
All good here. Finally, let's
look at how Gemini reasons about the world
inside your photos. For example, if you upload
a photo of a city skyline, you can ask Gemini
to annotate it, watch as it identifies
the landmarks and adds labels exactly
where they belong. This is not just drawing,
it's information design. It's taking a raw pota
and turning it into a smart educational asset for
a presentation or a manual. And that's really the theme of everything we cover
it in this video, whether you are bringing
a rough sketch to life, restoring an old fora, annotating an image or smart
labeling a complex scene, Gemini handles the precision
work, so you don't have to. In our next video, we are going to bring
all these skills together to build
complex visual systems, including infographics and
data visualizations that transform complex data into something instantly clear.
I'll see you there.
32. Complex Visuals: Menus, Diagrams, and Infographics: Welcome back. So far, we have covered a lot of
things creating from scratch, editing with precision, and
synthesizing complex scenes. Now, let's look at
what Gemini can do when the task gets
even more ambitious, building multi piece
visual assets like infographics diagrams
and assets that work across different social
platforms and screens. Let's get started. I want
Gemini to create a one page. Infographics menu using
these coffee images. I wanted to identify each
drink and place it in a clean section with its name
and a short description. Let's also choose create images from the
selection of tools. As from the Pam
description here, it's not quite clear
if I want an image or a text as the final
output. Let's start. And here is our picture. Because Gemini has that
deep resonin layer, it sees the difference between the images we submitted
and can identify a coffee cup with the ice cubes inside versus the one
with the warm milk form. Let me ask Gemini to
change this layout for a bit and also change colors
to fit our brand colors. Oh, this is a great design. I like it better than
our first iteration. And let's do one more change. I want to change this coffee
menu text to our brand name. And here is our image. I like it very much. The only thing that
I want to change, I would like to remove those coffee beans so that
the text is fully visible. But instead of doing this as a series of iterative prompting, let's actually try to use
another technique here. I'm going to use the markup tool that we covered in
the previous demo. Let me download this
full size image. I created a new chat, uploaded our image that
we just generated. Next, I opened the markup tool and let me highlight
the coffee beans. I added the instruction to Gemini to remove
the coffee beans. It's going to be a
bit tricky because we see the beans together
with the text. But let's try to make it work. I'm choosing the
thinking model here and also select and create images. So my first attempt
was unsuccessful. You see that the OF images are still here inside the image. Let's try to describe the change that I
would like to make. And here is our image. It's really incredible that
Gemini did so well following my prompt instructions
and removing those coffee beans from the
top right corner of the menu. And now we can see
our text clearly. Awesome. And let's move
to the second demo here. Sometimes you need to explain
the how like the journey from bean to cup in my
Gemini coffee brand example. So here is our brands
signature brewing process. I'm going to ask
Gemini the following. I want Gemini to finalize this five step Gemini
Brew signature process into a clean architecture
flow diagram. I wanted to use minimalist
layout and match the colors to those that
we use in our PDF file. Let me choose thinking mode. And for this example, I'm also going to
choose Create images. And here is our diagram. Gemini built the structure, created the icons, and
also labeled every step. What I don't like here is those throws that are
definitely unnecessary. And this text that we
can see on every box. Let's ask Gemini to remove this. And here is the cleaner image. And I also would like
to remove this frame. Let's ask Gemini to do this. And this is a much
better picture. And I want to do
one more iteration to make this image
more beautiful. Look at this. This is a
completely different aesthetic. Let me know in the
Q&A for this video, which one you prefer. And we're moving
next with our demo.
33. Complex Visuals: Adapting Assets Across Formats and Platforms: Of course, you can further
edit this image if you like, either by continuing asking Gemini for improvements
directly here in the chat, or you can copy
this image and go ahead with markup
tool directions. But let me show you
another example while we are here on this image. Let's say that we are planning an international expansion
of the Gemini Brew brand. So we need this diagram to be translated into
other languages. So I'm going to ask
change the image so that the texts are shown
in Chinese language. And this is our
translated diagram. Notice that in my prompt, I explicitly say that I want
Gemini to change the image, not just show the texts in Chinese language so that
it is crystal clear to Gemini that I need another variation of that image translated into
Chinese language. All right. And let's
take one final example. Let's say that we
need assets for the Gemini Brew marketing
campaign that will work everywhere from
Instagram stories and posts to a hero
image on our website. We are going to take this shot we built earlier with Gemini, and I'm going to tell Gemini that this is
our master asset. And now I need a version for a vertical social media story, a square post, and a white header for the
Gemini Brew website. Have also attached the image that I want Gemini to modify. And here is the message
that I got from Gemini when I tested this prompt before
recording the tutorial. This is because Gemini can
create one image in time. While Gemini can process many
reference images at once, its goal is always
to synthesize them into one final high
fidelity composition. If you ask it for several
separate image files in one go, like in my example here, it won't be able to
proceed with your request. So always frame your request as a single project like an
infographics, a menu, or a campaign shot where all your elements live
together in one image. So let me change the prompt. I first would like to create a white header image for
the Gemini Brew website. As always, I'm selecting
thinking mode, and let's also choose
Create images to give Gemini clear instructions that I'm expecting to see
image in that case. And here is our new white
hero image for our website. We see that Gemini doesn't just stretch our original
image, it outpaints it, so it adds more details into it like those old coffee machines, as well as these coffee beans on the left and right side of the original image
while making sure that our product is always perfectly positioned in the
center of the composition, regardless of the screen size. Let's also create one
vertical size image and square size image
for our Instagram posts.
34. Beyond Chatting - Deep Research and Building with Gemini: Section Intro: What happens when you give Gemini research task that would normally take
you half a day. That's what this
section is about, and then we take those findings somewhere
you might not expect. We are going to do this using a Gemini feature
called deep research, and we will work through three very different real
life situations with it. One that most of us deal
with every single week, one about making a purchasing
decision without falling down the rabbit hole of review
sites and raided threads, and one about getting up to speed on a completely
new subject. In each case, I want you to see not just what Gemini produces, but how to prompt it, so the output is
actually useful to you. And then we are
going to take it one step further using Canvas to turn one of those
research outputs into a working interactive app
built from a conversation. No code require it.
I hope you're ready. So get yourself a cup
of tea or coffee, and let's get into it.
35. Deep Research: Beyond Blueprint Answers: Raise your hand if this
has ever happened to you, you ask a chatbot a hard
important question, something like, I want to
raise the Series A funding. What are the most active
investors in my space right now? And it responds with
a list of options, which is quite shallow, and you also get a bunch of
high level recommendations. Like you should research active investors
in your category. You should build a target list. You should reach
out to your network for warm introductions
and so on. Google's Product Team
has a name for this. They call it a blueprint answer, a high level map that
tells you what to go find while leaving every bit
of the actual work to you. You are still one drowning
in 50 open browser tabs, trying to separate the useful
signal from the noise. Gemini deep research is what
can help you to move past the blueprint and get something very comprehensive you
can act on right away. Deep research is not
just a smarter chatbot. It is an agentic system, meaning it autonomously plans, searches reasons and synthesizes information across hundreds
of sources on your behalf. Think of it as having a PhD level research assistant in your team who does hours of complex investigation
in minutes and comes back with a polished
report, not a to do list. So what does a PhD level
research assistant actually do for you in practice? Let me give you the three
most powerful use cases. First, topic understanding, going deep on complex subjects. Imagine you are an
HR manager trying to understand how AI will impact the workforce over
the next three years. You don't just want a
surface level summary. You need to understand
the landscape. How does AI automation
compares to AI augmentation. Which roles are most
at risk and which ones are evolving what are other
companies already doing? And what does the
research say versus what just hype deep research
dives into academic papers, industry reports, expert commentary and real world case studies
simultaneously. It comes back with a
structured analysis that maps out the landscape, contrasting competing ideas, surfacing the relationships
between concepts, and explaining the
why behind all of it. Second, professional
due diligence. Think about preparing for an
enterprise sales meeting. Before you walk into the door, you need to understand the prospects core
business challenges, the recent strategic moves, the competitive pressure
they are facing, and how your product
fits into all of it. Deep research investigates
the company's products, finding history, leadership team and
competitive environment. And this is very important, merges it all with your
own internal notes on the client relationship
what would have taken a junior analyst a full day to compile is now
ready in minutes. So you walk into that
meeting room knowing more about their business
than they might expect. Third, high stakes,
personal decisions. Not everything is about
work, buying a car, choosing a neighborhood,
comparing insurance options. These decisions
matter just as much, and the research Rabbit Hole
is just as deep instead of a weekend lost going through conflicting blog posts
and raided threads, you get report structured
around your specific situation, the pros, the cons, and the nuance that generic
advice never gives you. And here is what makes all three of these use cases
possible in practice. Deep research does not just
hand you a list of links. It produces a comprehensive
multi page report, structured analysis,
cited sources, and even things like infographics that bring
the data to life. In the next lecture, we are going to get
our hands on it. I'll show you how to
launch deep research, how to create the research
plan before it starts, and we will walk through a real example together
so you can see the full process from prompt to final report.
I'll meet you there.
36. Deep Research in Action — Topic Understanding: As promised, let's see
deep research in action. We are going to start with the topic understanding
use case, and I picked an example that I think most of us can
relate to personally. We are going to use deep
research to cut through one of the most confusing
topics in everyday life. Breakfast, nutrition,
you know the feeling. You Google RX healthy and get ten completely different
answers depending on which article you land on to
follow along with this demo, you will need a
paid Gemini plan. If you are currently on a free
plan and want to upgrade, check out the lecture in the introductory course section where I walk through
how to do that. Okay, let's go. To launch deep research, open a new chat, and choose deep research from
the list of tools. By default, Gemini uses Google search as
its primary source. But you can expand that. For example, you can
choose your Gmail or Google Drive as a source
or upload your own files. This is what makes deep
research so powerful. It's not just searching the web. It can merge public information with your own private documents. For this demo, we will keep
it simple and use web search only here is the prompt
I'm going to use. Notice how specific
this prompt is. We are not just asking, What should I eat for breakfast? We are giving deep research, a clear research agenda
with three distinct tasks. The more direction
you give it up front, the more useful the output. As for the model selection here, the specialist analogy
we introduced earlier in the course stays exactly the same when you activate
deep research. The mode you select
dictates how that specialist behaves during
the research process. Fast remains your sprinter, performing a broad rapid scan of the most relevant
sources to give you a quick summary without
digging into every detail. Thinking is still your
strategist posing to cross reference multiple sources and resolving contradictions to
find a more logical angle. Pro remains your expert deep diving into everything
from dense reports and technical PDFs to long email threads to give you a truly
comprehensive synthesis. I'll choose thinking here. Now let's hit submit and
see what happens first. This is the goal
decomposition step, and it's one of my favorite
parts of the process. Instead of diving
straight into research, deep research pauses and builds a personalized multi
step research plan based on your prompt. You can see it mapping out exactly what it intends
to investigate. If you need to, you can edit
this plan before it starts. If you want to direct it toward a specific angle,
add a subtopic, or remove something that
isn't relevant to you, do it now before a
single search is run. For this demo, I'm happy
with the plan as it is. So let's approve
it and let it run. And now the search begins. Gemini is working through
sources in real time, academic papers, nutrition, guidelines, health publications. It is deciding which
threats to investigate in parallel and which ones
need to happen in sequence. You can even click on any
of the websites here if you are curious on what sources
Gemini is going through. As Gemini deep research
reads each source, it does not just collect
information and move on. It thinks about what
to search for next. It is running a continuous
self critique process, spotting contradictions
between sources, flagging vague or
unsupported claims and recognizing when
a piece of data simply does not add
up you can see it adjusting its research
directions in real time, as new information comes in, and when it hits a dead end, say a study is behind a
paywall or a website is down, it does not stop. It reroutes and finds another
path to the same answer. There is one more thing that makes this possible at scale. Deep research works
inside a context window, the IIs, working memory. In practical terms, it
means Gemini holds in memory every single source it has read for the entire session. Nothing gets lost or forgotten
as the research grows. And this is also why follow up questions later are so sharp. I never loses the threat of
what it already investigated. And you might already
guess you don't need to sit there watching
all of this happen. Deep research is asynchronous. You can close the tab and
get back to your work, and Gemini will let you know
when your report is ready. If you are on the web app, you will see a
notification appear next to the chat thread
in your sidebar. And if you have the Gemini
mobile app installed, you get a push notification
straight to your phone. And I just got mine. Our report is ready. So let's get back to
Gemini to take a look. This is what deep research delivers and notice
what it is not. It is not a list of links. It is not a bullet
point summary. It is a structured
multipage analysis with cited sources,
organized sections, and actual conclusions
you can act on the tiered ingredient table we asked for is right here, tier one, tier two
and tier three, clear, actionable and
based on current research. And if you are curious
about any of the sources, every claim has relevant links. You can click through and read the original
research yourself. I don't know about you, but it would have taken me
hours to read through all of these resources and
compile the report manually. And it is important that deep research is not
replacing your judgment. It is doing that tedious
groundwork so that your judgment is
actually more informed. In our next lecture, we will take deep research into a personal context and walk through a few more examples.
I'll see you there.
37. Deep Research in Action — Purchasing Decisions: In this lecture, we
are going to look at two more use cases for deep research that
I think you will find immediately useful
in your own life. The first one is about making a confident
purchasing decision, and I'm going to use a
very real life example. The second is about learning
a completely new subject. I will show you something
I haven't shown before. How to turn a deep research
report into an infographics, a quiz and flash cards
all without leaving the Gemini deep research
interface. Let's start. My Aura slip tracking
ring recently broke. I would like to replace it, but I'm not sure if I should just purchase
the latest ring of the same brand or use this as a chance to switch
to something better. And there is one
specific feature I've been wanting for years. Vibrating sleep cycle
aware silent alarm that actually wakes you up at the right moment in
your sleep cycle, not just at a fixed time. Let's use deep research as our personal shopping
assistant to cut through online
reviews and articles. Here is my prompt. Notice a few things about
this prompt. It is personal. I've given deep
research real context about my situation and
what I'm looking for. I included the vibrating alarm, not just because I want it, but to see if Gemini can filter
out the obvious choices. Most popular rings actually
don't have vibration models. So a basic search
might just give me a top ten rings list that
ignores this requirement. Deep research should catch that. The prompt has a clear research
agenda with three tasks, and it asks for a specific
output format at the end, a feature table, which means the report will be
immediately usable, not just the wall of text. Let's choose deep research
from a list of tools. I'm going to rely on search
here as the main source, and I'm choosing thinking
mode. And let's start. Gemini has prepared this
research plan for me, and I would like
to make a change here for this I click Edit plan. Next, I will type in
the change that I want Gemini to make
in the current plan. I want Gemini to also include a specific brand
into its research. We see that the list of
brands has been updated. I'm now fine with this plan, so I'm going to approve it
and start the research. And in a few minutes, our report is ready. Let's walk through it together. You can see that
deep research has identified the top
three candidates, analyzed them across exactly
the criteria I asked for, including vibrating
Smart alarm system and produced the feature
comparison table right here. This is the kind of output
that would normally require at least an
hour of tap switching, ready threads, and conflicting
review sites analysis. I have it in minutes structured around my specific
situation and requirements. And here is the list of strategic recommendations
from Gemini. A notice because I gave it
personal context upfront. The recommendations
aren't generic. They filter it through
my actual priorities. Value for money, no
heavy subscription and slip alarm, that
actually works. This is a great example of using deep research for making
purchasing decisions. Instead of drowning in options, you walk away with a clear,
reasoned short list. In the second part
of this tutorial, we will continue exploring deep research for another
use case, ACA there.
38. Deep Research in Action — Learning a New Topic: Now let's look at something
a little different. Using deep research to speed up your learning when you're
getting into a new subject. I have recently started studying
real estate investment. I attended my first
class and took some notes on the topics
that we've covered there. Now I want to learn more about those topics
using deep research. I can upload this photo
directly into the prompt. Gemini I will read my
handwritten notes, extract the key topics, and use them as the foundation
for a research report. I don't need to
re type anything. Let me show you how this works. First of all, let's choose deep research from
the list of tools. I'm going to switch to
thinking mode here, type in my prompt, and then I will attach
my handwritten notes. What I love about
this approach is that the research is anchored to what I have already
started learning. So the report reinforces
and expands on my existing knowledge rather
than starting from scratch. For this, I specifically
asked Gemini to refer to the key
themes in my notes, when researching and
drafting the report. And here is our research
plan all look great to me, so I'll hit start the research. And our report is ready. You can see it picked up
all the key topics from my notes and build a structured
analysis around them. Definitions, context, relationships between concepts,
practical implications. We can use this information
as a study companion, not just a summary. But here is where it
gets really interesting. Once the deep research
report is ready, we can transform this wall of text into active learning tools. You will notice
create a button in the top right corner
of the Canvas panel. Click it and you get
a drop down menu with several options for
transforming the report. First, let's look at
the infographics. Gemini takes the
complex information like the difference between residential and
industrial assets in our real estate example and turns them into
a visual summary. This is perfect for a quick, high level review or for sharing one pager
with a stakeholder. Let's return to our real
estate investment trends report to continue the demo. Next, to ensure the
information actually sticks, we can generate a quiz. Gemini creates
interactive questions based specifically
on the report. As you answer, it provides
immediate feedback, helping you identify
exactly where your understanding of a new
topic might need more work. I And finally, we have flashcards. You have two ways to use this. You can generate a full deck of flashcards to review every
keyterm from the report. But if you have just
finished the quiz, like in our example here, Gemini can generate cards based specifically on
your quiz results. It targets the areas where
you struggled. Let's do this. So we see a complete
learning loop here, research, understand,
test yourself, and reinforce your knowledge
all inside one tool, in the next video, we are going to move on from deep research and revisit
it to you already know, but we'll explore its
advanced use cases, specifically building
AI applications. And as a heads up, we are going to use the
key takeaways from one of our deep research reports
as the input data, our app will be built around. And more on that
in the next video.
39. Beyond Documents: What Else Can Canvas Do?: Welcome back. So in our
last Canvas lecture, we focused on document drafting. How Canvas gives you
a life work space to refine writing with
gemini right beside you. But document drafting is really just the beginning of
what Canvas can do. And you have already seen some of it without realizing it. Remember that create
button that appeared after your deep research
report was radium, the infographics, the quiz, the flash cards,
that was Canvas. Deep research delivers its
report directly inside Canvas, which is why you could
transform it into all those formats without
ever switching tools. Deep research and
Canvas are connected by design Google built them to flow into each
other seamlessly. So let's look at the full
picture of what Canvas can do. The first thing Canvas can build beyond documents is web pages. And I don't mean plain
HTML with some text on it. I mean structured interactive pages with
information cards, charts, visual layouts,
and clickable elements. Think about the last time
you had to share a report or a brief with someone who wasn't going to read
a wall of text. With Canvas, you can take
that same content and say, turn this into a webpage or simply click on the
web page button. And within seconds, you have something that actually
looks like a real page. You can share it with the link. No publishing or
hosting setup required. Next is infographics. If you have ever
tried to explain something complex to a non
technical stakeholder, a process, a comparison, decision framework, you
know the challenge. Words can only do so much. Canvas can take your raw content and restructure it
into a visual format. Clean sections,
digestible chunks, icons, comparison side by side. And you can keep refining
it in the same chart. Make the second section bigger, change the tone to
be less formal, and it updates it in real time. Third, Canvas can also generate interactive quizzes
and flash cards from any content
you throw at it. This is useful beyond
just studying. Think client on boarding, team training, product
knowledge checks. You describe what
you want and Canvas, build a working
interactive quiz. No third party tool, no form builder, no extra steps. There is also an audio mode. Canvas can transform
written content into a podcast style
audio overview, conversation between
two AI hosts that discuss and
summarize your material. It is useful if you want to go through a long
document while working or share findings
with people who would rather listen than read
welcome back to the Deep Dive. Today, we're unpacking a
vision that feels like it's really shifting
under our feet. We are moving past
that old idea of a smart assistant that just
sets timers or plays music. We're looking at this concept
of a universal assistant. A partner that
actually anticipates what you need before
you even ask. And then there is the
big one Canvas can build fully functional
apps, working software. You describe what you want, recipe organizer, trip planner, or quiz tool, or budget tracker, and Canvas generates the
code and runs it for you. Right there in the window, you don't see the code. You don't need to
understand the code. You just see a working
interactive app, and it is not static. You can keep chatting
with Gemini to adjust it. This is what's been
called vibe coding. Building software by
describing what you want rather than writing
code line by line. We touched on this concept in the Geni Implementation
impact lecture of the course. And now we are about
to see it life. Here is what I love most
about Canvas in this context. It is not a separate
developer tool. It's the same workspace
you have already been using to write
documents and outlines. The move from draft
me a document to build me an app is
just one conversation. In our next lecture, we are going to do exactly that. We are going to pick up
right where we left off. We used deep research
to finally get a clear evidence based answer
on breakfast nutrition. And we are going to
turn that research into a family breakfast
recipe app that suggests healthy quick meals
for both adults and kids. Let's go build this up.
40. Follow-Along: Building an App with Canvas - From Research to a Running App: Welcome back. Here we are
building breakfast chef up, quick meals under 20 minutes, family friendly with photos
of the finished meal. All inside Gemini Canvas, no code, no technical
background needed. Just a good prompt and a bit of back and forth with
Gemini. Let's go. To keep our workflow organized, we are going to follow
four simple steps, ID Eight, build, refine,
and finally share. And here is step one, IDed. This is our deep research
report on breakfast nutrition. Let's brainstorm with Gemini on the idea behind the amp
and what it will do. I have some initial thoughts, but I want to expand on them. I started by describing
the purpose of the app. I also said that I want the app to use the
research findings, and I referenced the comprehensive
tireedGrocery framework from the report to emphasize
that I don't need a random list of ingredients
for the recipes. I want Gemini to come up with three cool
features for the app, and I also suggest an Aviall
look and feel for the app. I put some descriptive
words here like fun, warm, approachable to give
the aval direction for what I want to see. I'm looking for detailed
description of the app. The concept, we can start
building the actual app on. Let's hit Enter. And here we have our
app description. Let's ask Gemini to make some
changes into this concept. The first feature, the
front loader family timer, seems to be quite complex, especially for the first
version of the app. So let's ask Gemini
to replace it with something more
straightforward. Simple question on what kind
of meal is preferred today. And I also add additional
details to make sure that every time
we ask for a recipe, we get a new one and
that the app takes strictly the ingredients
recommended in our report. So I'll hit Enter
again and let's see how Gemini will
incorporate those changes. And here we have
the updated version of the app description. I'm good to go
with this concept, but before we move to step two, build the app, we need to check our settings Look at the
model selector here. You might be tempted by P. It says advanced math and code. So it sounds like the
most powerful choice. But here's what I
found when I tested both while building this
breakfast app before. Pro actually made
the process harder. It took more back and forth
to get the results I wanted, and I burned through
my P credits quickly, leaving me waiting a few hours
before I could continue. Thinking mode got
me there faster. So here is my recommendation. Always start with thinking. It is designed for step
by step reasoning, which is exactly what
app building requires. Working through logic,
structure, and flow, save pro for when your
app needs to work with a large volume of content
from multiple sources, documents, videos,
images, and more. Let's begin the step
two, building the app. My previous tests
show that if you send this request in
this chat directly, Gemini won't start
the building process, but send you the app concept
description one more time. Yes, that's what happened
this time as well. You see that instead
of creating the app, Gemini just made some changes
into the report itself, and that's not what we need. So to initiate the app
software creation process, not just textual description, click on Create and describe your own app
section, write this. Build an app based on
the description above. You see that Gemini shows this command under our
app description here, and it starts building it. And while Gemini is
building the app, let me answer a question you
are probably thinking now. What if I'm not starting
from a deep research report? What if I just want to
build an app from scratch? In that case, start by
opening a new chat. But before you type anything, switch to Canvas mode
first. Here is why. Gemini can only build and
run apps inside Canvas. It is a dedicated workspace designed specifically for that. A regular chat can help
you think through ideas, but it cannot actually
construct a working app. Once you are in Canvas, brainstorm your app
idea with Gemini. Describe what you want to build, what it is for, and
what it should do. When you are ready
to start building, hit the Create button, type in your prompt, and Gemini will get to work. Okay, back to our demo. And our app is ready. We begin by selecting how we are feeling today and what kind
of meal we would prefer. And Gemini would
suggest a healthy meal. Accordingly, we see here
a list of ingredients, followed by instructions on
how to prepare the meal. We have the great foa illustrating what we
are about to eat. And we can also choose
a kid chef mode so that we have a list of tasks
for our young helpers. Pretty cute. Now let's move
to the third step, refine. As you would imagine,
we are not done here. We can continue iterating
and enhancing our app. Let's say I want to
adjust a few things. I'll type my requests
in the chat. You just saw me introducing
several changes into our app. When you do so, introduce
one change at a time, rather than trying to include everything in
one single prompt. Let me make several other
changes to our app. Here is the version
that I've got so far. I decided to add the
possibility to include other ingredients in addition
to the predefined list. And in case it happens to be from the tier three category, there will be a
relevant message shown, but the recipe still
will be created. I also added the possibility to save a recipe
into the favorites, which are accessible here. And finally, I added the
reset button in case we want to start all over again and choose
different ingredients. As you can see, we have been able to make quite
a lot of changes just by casually chatting with Gemini with no coding involved. I'm happy with our
current progress and the user experience
we have created. In the second part
of this tutorial, I'll show you another
way for how you can make changes in your app using
the Canvas toolbar. And we will also take a look on how to share it with others. I'll see you in the second part.
41. Follow-Along: Building an App with Canvas - Refining and Sharing: Everyone. Welcome to the
second part of the tutorial, where we explore how to
build working software by describing what we want rather than writing
code line by line, the process known as vibe
coding, as promised, I want to show you
another option for making changes to your app as
part of our refined step. Notice this Gemini
Canvas toolbar. Let's explore what
it can do for us. Let's start with
this sparkle icon. This is the AI feature injector. It adds EI capabilities
to your app. When you click it,
Gemini analyzes your current app view and
suggests smart components, such as an AI storage
bar or text and image generation and then it injects those elements directly
into your app's logic. Let's ask Gemini to add AI
features and see how it works. In the chat on the left, Gemini provides an overview of what AI features were
added to the app. We can respond in the chat and ask Gemini to make
additional changes. But first, let's try
out these new features. Here is the magic
feature number two. We see that Gemini I proposed more health
ingredient instead of the one that I just selected, but I don't have it right now, so I'll just click Cancel and
go ahead with these three. Here's the EI wisdom
card pretty nice. And of course, let's try out how the audio
narrator works. Rise and shine.
Today's mission is the sunny side spinach
and avocado clouds. The iron rich spinach paired with mono and
saturated fats from avocado provides a
clean energy boost that keep you feel
nimble and refreshed. Let's make a change to
one of the feature. Gemini confirms that the
change has been made, so let's test it. Take a deep breath and
let's start the day. Your recipe today is
the Emerald Cloud Nest. The combination of iron rich
spinach and mono unsaturated fats from avocado ensures
a slow release of energy, keeping you feel light and airy. Wasson, we just saw how Gemini has followed
our instructions, and I suggest that we return to the Gemini Canvas toolbar
and explore it further. The next I can hear
is the drag handle. It is used to move
the atolbr so it doesn't block your app's
navigation during tasting. And there is also a third
icon, the refinement tool, which tells Gemini to modify a specific
element of your app. You might notice, it
is not visible here in our golden hour app. That's actually intentional. Gemini recognizes that this app has gone through
enough iterations, so small automated
edits could be risky. If it tries to tweak one element but
misreads the context, it could break something
else that depends on it. So it hides the icon as protective measure
to demonstrate how the refinement tool works. Let's switch to a simpler app. I started building before
recording this tutorial. I have only made a
few iterations there, so the icon is available. Let's say that I want to change
the color of this button. So I'm choosing select and ask, highlight this button,
and type in my prompt, suggest another color palette. I Notice what happened here. Instead of changing
just this button, color, Gemini redesigned
the whole app. Why is that? It turns out the word
palette is the problem here. A color palette refers to the entire set of colors
used across your app. So Gemini takes that literally and updates
everything to match. It's not doing anything wrong. It's just following your
instructions precisely. To change only the
color of this button, you need to clearly describe the scope of the change.
Let me show how. I'm selecting the button again and typing in another prompt. You see that my detailed prompt has worked, and this time, Gemini I applied the changes to the element that I indicated
through the refinement tool. That is a really useful
thing to keep in mind. The more specific your prompt, the more precise the result. Let's come back to
our golden hour app. Now that we've covered how to
refine and adjust your app. Let's talk about what happens
when you're happy with it. Step four, share. Once you are done, you can get a sharable link and
send it to anyone. They can open and use the
app directly in the browser. No downloads, no signs, no technical setup on their end. They can even remix it. That's one of the features
Google has built into Canvas. Someone can take
your app, open it, and create their own
version from it. All right. And that's
it for this tutorial, please share what
apps you are working on in the Q&A section
for this video. I would love to see
what you're building.