Gemini Google AI: The Only AI That Handles It All (Images, Video & Text) | Anna Kolenkina | Skillshare

Playback Speed


1.0x


  • 0.5x
  • 0.75x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 1.75x
  • 2x

Gemini Google AI: The Only AI That Handles It All (Images, Video & Text)

teacher avatar Anna Kolenkina, Product Builder, Entrepreneur

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

    • 1.

      Welcome to the course on Google Gemini AI!

      3:06

    • 2.

      What Is Gemini? Understanding Google’s AI Ecosystem

      5:57

    • 3.

      Meet the Gemini Model Family

      4:38

    • 4.

      Setting Up Gemini and Your First Chat

      4:53

    • 5.

      Prompting Gemini for Better Results: Section Intro

      1:39

    • 6.

      What Is a Prompt? Prompting, Prompt Engineering, Personal vs. Production Prompts

      4:59

    • 7.

      How to Talk to Google Gemini AI The Building Blocks of an Eective Prompt

      7:57

    • 8.

      Building on Gemini’s Responses: Iterative Prompting

      5:54

    • 9.

      Making Gemini Truly Yours: Personalization

      7:14

    • 10.

      How to Share Files and Other Content with Google Gemini AI

      9:20

    • 11.

      Using Examples in Your Prompts

      10:37

    • 12.

      Specifying Output Format in Gemini

      4:46

    • 13.

      Follow-Along: Choosing the Right Model and Brainstorming with Gemini

      9:43

    • 14.

      Follow-Along: Getting Feedback with Google Gemini AI

      8:48

    • 15.

      Keeping It Real: Practical Strategies to Minimize AI Hallucinations

      10:12

    • 16.

      Working with Gemini Canvas and Gems: Section Intro

      1:38

    • 17.

      Welcome to Gemini Canvas

      3:39

    • 18.

      Follow-Along: Creating and Editing Documents in Gemini Canvas (part 1)

      5:38

    • 19.

      Follow-Along: Creating and Editing Documents in Gemini Canvas (part 2)

      5:42

    • 20.

      Follow-Along: Turning a Gemini Draft into a Polished PDF with Gamma

      9:29

    • 21.

      What Are Gemini Gems, and Why Do We Need Them?

      5:34

    • 22.

      Follow-Along: Building a Grammar Check Gem

      10:10

    • 23.

      Follow-Along: Building a Fitness Coach Gem (part 1)

      7:23

    • 24.

      Follow-Along: Building a Fitness Coach Gem (part 2)

      4:46

    • 25.

      Gemini for Visual Creation: Section Intro

      2:11

    • 26.

      What Is Nano Banana? Key Features Explained

      6:42

    • 27.

      Creating Your First Image with Gemini

      7:05

    • 28.

      7 Prompting Tips for Creating Better Visuals

      6:17

    • 29.

      Contextual Blending, Iterative Renement, and Visual Synthesis

      7:50

    • 30.

      The Editing Suite: Turning Sketches into Prototypes and Photo Restoration

      4:14

    • 31.

      The Editing Suite: Targeted Edits with the Markup Tool and External Annotations

      6:30

    • 32.

      Complex Visuals: Menus, Diagrams, and Infographics

      6:15

    • 33.

      Complex Visuals: Adapting Assets Across Formats and Platforms

      4:32

    • 34.

      Beyond Chatting - Deep Research and Building with Gemini: Section Intro

      1:25

    • 35.

      Deep Research: Beyond Blueprint Answers

      5:48

    • 36.

      Deep Research in Action — Topic Understanding

      8:54

    • 37.

      Deep Research in Action — Purchasing Decisions

      5:23

    • 38.

      Deep Research in Action — Learning a New Topic

      5:28

    • 39.

      Beyond Documents: What Else Can Canvas Do?

      6:08

    • 40.

      Follow-Along: Building an App with Canvas - From Research to a Running App

      9:30

    • 41.

      Follow-Along: Building an App with Canvas - Refining and Sharing

      6:57

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

30

Students

--

Projects

About This Class

Are you tired of switching between multiple AI tools for different creative tasks? What if you could work with one AI that understands text, analyzes images, processes videos, and integrates seamlessly with the tools you already use every day?

Meet Google Gemini AI – the tool that's changing how creatives work with multiple content formats at once.

With over 750 million monthly users (and growing faster than ChatGPT in many markets), Gemini isn't just another AI chatbot – it's your creative partner that lives inside Gmail, Google Docs, Chrome, and your phone. It's AI that meets you where you already work.

What Makes This Class Different:

This isn't a technical AI course. It's a creative toolkit for anyone who wants to produce better content faster, generate stunning visuals, and turn ideas into reality – all without technical knowledge.

In this hands-on class, you'll discover how to:

Multimodal Content Creation:

  • Analyze images and get creative feedback on your visual work
  • Process videos to extract insights, summaries, and content ideas
  • Combine text, images, and context in ways ChatGPT simply can't
  • Generate AI visuals directly within your workflow

Creative Ideation & Brainstorming:

  • Generate endless creative concepts across multiple formats
  • Get professional-level feedback on your work instantly
  • Overcome creative blocks with multimodal inspiration

AI-Powered Productivity for Creatives:

  • Build personalized AI assistants for specific creative tasks (grammar checking, brand voice, fitness coaching)
  • Turn complex research into detailed creative briefs with Deep Research
  • Summarize long documents, videos, and visual content in seconds
  • Manage creative projects across Gmail, Docs, and Drive seamlessly

No-Code App & Prototype Creation:

  • Build functional apps and interactive prototypes just by describing what you want, without writing code

Why Gemini for Creatives?

Unlike other AI tools, Gemini excels at understanding visual and textual context together, maintaining creative direction across extended projects, and working inside the Google tools you use daily. It's like having a creative director, visual analyst, and content writer combined – available 24/7, wherever you work.

What You'll Learn:

Foundation (Perfect for AI Beginners):

  • How to communicate with Gemini using effective prompting techniques
  • How to structure your prompts for better results
  • How to work with text, images, and video in one conversation

Creative Applications:

  • Brainstorming with text, image, and video analysis combined
  • Creating marketing campaigns with visual and written content
  • Building personalized AI assistants for your specific creative needs
  • Using Deep Research to turn ideas into actionable creative strategies

Advanced Creative Techniques:

  • Combining visuals and words for better creative solutions
  • How to spot and prevent AI mistakes (hallucinations)
  • Building no-code apps and prototypes for your creative business
  • Integrating Gemini into your existing Google workspace workflow

You don't need to understand how AI works or have any programming knowledge. If you can use Gmail or Google Docs, you can use Gemini. This class is designed specifically for non-technical creatives who want powerful results without complexity.

Course Structure:

  • 4+ hours of step-by-step video tutorials
  • Real creative projects you'll build alongside me
  • Downloadable resources including prompting templates and guides
  • Community access to connect with fellow creatives and get support
  • Certificate of completion to showcase your new AI skills

Who Is This For?

Freelancers & Solopreneurs:

  • Content creators who work with multiple media formats (text, images, video)
  • Graphic designers needing AI assistance with concept development
  • Photographers wanting AI feedback and creative direction
  • Coaches and consultants creating educational materials

Marketing & Business Creatives:

  • Social media managers creating visual and written content
  • Email marketers crafting multimedia campaigns
  • Brand strategists developing comprehensive creative strategies
  • Small business owners managing content across platforms

Creative Professionals:

  • Writers combining visual research with content creation
  • Course creators developing multimedia learning materials
  • Presentation designers working across formats
  • Anyone juggling multiple creative tools and wanting one unified AI partner

Why Now?

AI is transforming creative work, but it's not replacing creatives – it's empowering them. The creatives who learn to collaborate with AI today will have a massive advantage tomorrow. This class gives you that edge.

Meet Your Teacher

Teacher Profile Image

Anna Kolenkina

Product Builder, Entrepreneur

Teacher

I help professionals and fresh graduates to learn digital skills, start new careers and advance in their roles.

I started my journey in the IT industry and software product management 15 years back from being an IT and management consultant and then transitioning to a full-on startup Product Manager and Product Director. I've built products from scratch for different industries - commodities trading, logistics, natural language processing, and e-learning - and also for different markets, from Europe to Asia. I have a Master's Degree in Applied Informatics and an MBA from the National University of Singapore.

Before joining online education, I shared my expertise and knowledge with only a limited number of people - my co-workers and mentees. With Skillshare, I'd like to s... See full profile

Level: Beginner

Class Ratings

Expectations Met?
    Exceeded!
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Welcome to the course on Google Gemini AI!: Everyone, and welcome to the course on Google Gemini. Did you know that Google Gemini has officially surpassed 750 million monthly active users? That's nearly three quarters of 1 billion people. To put that in perspective, Gemini's growth is currently outpacing almost every other AI chatbot on the market, closing the gap with ChatGPT faster than anyone predicted. But it's not just about the numbers because Gemini is built by Google. It is now the most integrated EI in the world. It lives inside your Gmail, your Google Docs, your Chrome browser, and your mobile phone. This represents the biggest shift in how we work and create since the invention of the Internet. We are moving toward a world where EI is not just to use it. It is a collaborator that is already where you work. My name is Anna and I'll be your instructor for this course. Online instructor with my other courses available here on the platform, focusing on product management and generative AI. By joining this course, you will get access to over 4 hours of HDVDo content, step by step tutorials and activities highlighting real world, practical applications of Gemini tools, PDF summaries for reviewing the key insights from the course and much, much more. We'll kick off by learning what Gemini is capable of, how to communicate with it and structure your requests, and how to make Gemini work best for you. From there, we will go through hands on scenarios using Gemini to brainstorm ideas and get professional feedback. Building your own personalized EI systems for specific tasks and generating high quality visuals. We will also cover advanced techniques like deep research for turning complex tasks into detailed reports and building fully functional apps just by describing what you want. No coding required. And we will make sure you know how to spot and prevent incorrect responses from AI, so your work is always accurate. And yes, you don't need any technical background or prior knowledge of AI to get started with the course. So let's begin Ilsa in the next video. 2. What Is Gemini? Understanding Google’s AI Ecosystem: Everyone, and welcome to the first course lecture. Think back to every science fiction movie you have ever seen. There is always that one character, an Assistant that doesn't just wait for a command, but actually understands the hero's world. It anticipates problems before they happen and acts as a true partner. For years, this was just fiction. But with Gemini, we are getting closer and closer to a future where that kind of partnership is becoming a reality. So what is Gemini? I like to think of it as three layers of a house, the foundation, the brain. These are the Gemini models themselves built by Google's Research Lab Deep Mind. In this course, we will be using the latest generation of Gemini models. This includes high level reasoning models for complex logic, advanced image generation tools for photorealistic visuals and next generation video models that can generate high definition scenes with sound. These models are natively multimodal meaning they don't just process text. They see here and think across every medium at once, just like we do. Coming back to the house analogy, the second level is the living space, the assistant. This is the home base we will be spending most of our time in the app on your phone and the website at gemini.google.com. It's a creative space where you can chat codes and use tools like Jams to customize how the EI behaves. And finally, the third layer is the infrastructure. This is Gemini living inside Gmail, Google Docs, and search it's the EI overview that summarizes your search results or the help me write button that drafts your emails. In this course, our focus is on that middle layer, that geminiEIsistet. The Google's vision regarding it is centered across the three piece, personal, proactive, powerful. Let's explore what this means. First, it is personal. Most AI models are generalists. They know a lot about the world, but quite little about you. Gemini is designed to be your personal extension. With your permission, it can connect to your personal context, your emails, your files, and your history to provide help that is uniquely relevant to your life. Second, it is proactive. Today, most AI is reactive. You ask it answers. The future of Gemini is about seeing what's coming. If you have a big client presentation on Friday, Gemini should not just remind you it is coming. It should look at your calendar a week before and say, I noticed your strategy meeting with company A is on Friday, based on the proposal in your drive and the latest email threat with their team. Here is the preparation brief and three questions you will likely face. Third, it is powerful. With the latest advancements in Gemini, we are moving beyond simple text generation into thinking things into existence, whether you are building an entire website from a single prompt or creating cinematic video for a marketing campaign. The power that used to require a whole team of specialists is now at your fingertips. But having all of this power doesn't mean I is in charge. It is important to remember that even when Gemini is being proactive, it is always taking your lead. It doesn't have its own secret agenda or set of beliefs. It is designed to follow the orders. You give it through your instructions and preferences. So whether it is acting as your researcher, your coder or your creative collaborator, you are always in the driver's seat. Productivity is not the EI doing its own thing. It is the EI anticipating what you need because you have already defined the goal. Now that we have explored the vision and the architecture, it is time to move from theory to practice. In the next lecture, we will take a closer look at the different specialized models for reasoning, images and video. And I will also show you how to set up your account with Gemini. I'll see you there. 3. Meet the Gemini Model Family: The last lecture, we talked about Gemini as three layered house, the brain, the assistant and the integrated engine. Now let's go one level deeper into that brain. Most older EI models were trained on text first and then had other capabilities layered on top. Gemini was built differently from the ground up to be multimodal. This means it does not just read a description of a video, I actually understands the video, the audio, the images, and the text, all at the same time. Whether you are uploading 1,000 page PDF, an hour long video or a massive code base, Gemini processes it all in one unified space. It's not secretly translating images into text behind the scenes, it's seeing them directly. When you open Gemini at geminiggle.com, you will notice a model selector. Think of these as different modes, each routing you to a different underlying model that Google has optimized for a specific type of task. The full Google Model family is vast, but for everyday use, these are the ones you will reach out for most. Before we walk through them, a quick note on what a model actually is Think of it like a specialist, you are hiring for a job. Each model has been trained differently, fed different kinds of data, optimized for different strengths. When you pick a mode in Gemini, you're essentially choosing which specialist to hand your task to. Fast is our sprinter quick and conversational. This is the specialist you reach for when you need an instant answer. A fast summary or help drafting a quick message. It's optimized for speed and handles a high volume of requests. Just don't bring it in for anything that requires deep multi step reasoning. Thinking is our strategist. This specialist pauses before responding, mapping out its logic before giving you an answer. If you have a complex problem, multi step plan to work through or a nuanced question where a quick answer might get it wrong. This is the one that thinks before it speaks. Pro is our expert. You bring it in when the task is complex, deep research, analyzing a large document, advanced writing that needs to get the tone exactly right. Pro uses the most capable underlying model in the family, which means it can hold more information at once and pick up more nuances the other models might miss. The trade off is that it's slower and has lower daily usage limits. So save it for the tasks that actually needed. These three fast thinking and pro are Gemini language models. They are what powers the conversation. But Gemini family doesn't stop there. It also includes dedicated models for image and video generation, and you trigger them simply by using the generate image or generate video commands directly in your chat or in Gemini interface. When you do, Gemini quietly hands the task to the right specialist behind the scenes, and we'll meet those specialists later in the course. Now, once we have figured out what models we are going to work with, let me walk you through how to get access to Gemini. 4. Setting Up Gemini and Your First Chat: Go to gemini dot Google forward slash subscriptions to see the current plans and just heads up pricing and availability do vary by country. So what you see on your screen might look a little different from what I'm showing here. The free plan gives you everyday access to Gemini. It's a good starting point and requires nothing more than Google account. Google AI plus gives you more access to the most capable models and features, including enhanced image and video generation, and you would get access to Gemini in Gmail, as well as Google MIT. Google AI Pro steps that up further with higher usage limits Gemini inside your Gmail, Google MIT Docs, as well as slides and two terabyte of Cloud storage. And finally, Google AI ultra is the top tier. It gives you highest usage limits, plus exclusive early access to new features from Google. My recommendation here would be to go ahead with Google AI as long as it offers a free trial, which means you can follow along with everything I demonstrate here in the course at no cost for the first month. And after that free trial month, you can decide if you want to continue with your membership or you would downgrade to the Google plus or return to the free membership. To get started, select your membership plan, click on Get Started. Next, you need to provide a payment method for the trial, but you won't be charged if you cancel or downgrade before the month is up. Once you logged in, this is what you see in the top right corner, you see your membership plan. Pro in case if you decide to subscribe for AI pro membership or plus if you decide to go ahead with that plan in the center of the screen is your main chat input below the input bar, you will notice a row of quick start buttons. These are just shortcuts to get you started quickly. You will also see a mode selector. It currently shows fast. This is the model selector we just talked about. Click it to switch between fast, thinking or pro depending on what you need. On the left side, clicking the menu icon, opens your sidebar where you'll find your chat history. You can also start a new chat from here. Let's try to do this. I keep it on fast mode for this chat, since I'm going to ask a straightforward question. I'm starting the course on Gemini based on today's date. What are the three most recent major updates Google has released for the Gemini ecosystem? I request Gemini to search the web to verify and summarize them for me. Let's hit Submit. Notice that Gemini does not just answer from memory. It goes out and searches the web in real time and then brings me the results relevant for today when I record this tutorial. Here are the three most recent changes that Gemini has introduced in the past month. And, of course, we are going to talk about them here in the course. In the next section, we take everything we've just set up here and put it to work, starting with how to write a great prompt. I'll see you there. 5. Prompting Gemini for Better Results: Section Intro: Welcome to the new section on prompt engineering. This is the part of the course where you learn a skill that makes every AI tool more useful how to write prompts that consistently give you great results. We will start with the definitions what a prompt is, what prompting means, and how prompt engineering fits into the bigger picture. Then we'll look at two modes. There is no prompting in chat and production prompting when you design prompts to be reused. After that, I'll walk you through a simple prompting formula. You can use for almost anything. You will also practice iterative prompting, how to build on earlier responses and improve the output step by step. You will learn how to guide with examples, how to request the exact output format you want, and how to work with files and attachments. And of course, we will use multimodal prompting. Man and your prompt can include text plus documents, screenshot images and links. By the end of this section, you will feel confident using these prompting skills in real tasks for work or personal projects. Let's begin 6. What Is a Prompt? Prompting, Prompt Engineering, Personal vs. Production Prompts: Everyone. Think of the last time you asked someone a question. The way you phrased that question likely influenced the answer you received. That's exactly what we are seeing today in the world of AI. We'll start by breaking down three key terms that are essential to communicating with AI systems. What exactly is a prompt? What do we mean by prompting? And how does prompt engineering bring it all together? We'll also explore that distinction between chat and enterprise prompting. Let's get started. A prompt is the input you give an AI, your instruction, what you want, and the context you provide. Text, files, images, links, examples or data. Think of it as the what that drives the EIs response. Prompting is the act of writing these prompts. It's the general activity of interacting with and giving instructions to AI models. This is the process of communicating with the model. Prompt engineering is more specialized and systematic approach to creating and refining prompts. It involves understanding how the model reasons, testing and iterating on instructions and considering He cases. Think of it like cooking. A prompt is like a single recipe. Promptin is like cooking in general, and prompt engineering is like being a professional chef who systematically develops and tests recipes while considering ingredients, equipment, user preferences, and so on. Now, there are two main types of prompting you need to be aware of personal prompting and production or enterprise prompting. Personal prompting is what most people do in a chat. You write request, the AI replies, and you can keep refining it through conversation. It's flexible and informal. If your first message isn't perfect, no big deal. You just follow up, clarify and iterate. For example, asking N AI to help you write an email brainstorm ideas or summarize a document in the chat interface. That's personal prompting. Production or enterprise prompting, on the other hand, is when you design prompts to be reused by you, by a team or inside a product or workflow. The goal is not just a good answer once, but consistent results across many runs and many inputs. For instance, imagine customer support assistant on a company's website. It needs to answer thousands of customer questions reliably, including MC inputs like typos unclear requests or missing information. In this setting, prompts have to be more structured, more predictable, and more reliable. This is why production prompts usually include clear rules, stricter output format, and more guard rails because they are meant to work repeatedly, not just once. In other words, personal prompting or chat prompting helps you get great results first and production prompting helps you get reliable results repeatedly. Why do we talk so much about this distinction between personal prompting and production prompting? Because the way you write and refine prompts changes depending on the setting. If you search for extra materials on prompting, you will often find advice that's designed for production use, prompts that need to work reliably across many users, many inputs, and lots of edge cases. That's super useful when you are building repeatable workflows or integrating EI into a product. But if your main use case is just using an AI in a chat to get help at the moment, you don't need to overcomplicate it so keep this distinction in mind. In this course, we'll focus mostly on personal prompting in a chat interface. Al right now that we are on the same page with the terminology, let's dive into the practical side of personal prompting. Allca in the next lecture. 7. How to Talk to Google Gemini AI The Building Blocks of an Eective Prompt: Everyone. Welcome to our first lecture on chat prompting. Here, you will learn how to approach creating and refining prompts that can be used in the chat interface. Let's get started. When chatting with a friend, you don't use rigid templates or formal structures. You have a natural flowing conversation. The same principle applies to chat prompting with AI models. However, there are times when a bit of structure can help us get better results and make one prompt more effective than another. So let's cover the key ingredients of an effective prompt. The central part of every prompt is the core intent or task. This can take the form of instructions, such as write a five paragraph email to introduce new productivity app to small business owners, focusing on its time saving features. Think of instructions as the task you want the model to perform. Another form the intent can take is a question such as, what steps should I follow to create a compelling Linkin profile? Or how do I structure a business plan for a startup idea? When writing a task, your goal is to be clear and specific about what you would like to achieve. Writing something like help me with presentation won't be enough to get a high quality document that you can confidently present to your boss, colleagues, or investors. As the rule of thumb, remember that anyone without specific knowledge of your subject should be able to understand your prompt and execute on it. If they would be confused about how to follow your instructions, the EI system will be confused as well. Don't assume it has any contextual information about your task, such as how the results will be used, who the intended audience is. What successful task completion looks like or a list of points you won't cover it. You need to provide these context or task details yourself. For example, if you want to create a presentation, include information about the number of slides, the purpose of the presentation, the key topics to be covered. Here is an example of a well crafted prompt. Create a seven slide presentation on the topic of personal branding. Include what it is, wide meters, key components, and steps to develop your brand. Or another example, explain how to write a compelling email in five easy steps. The instructions should cover crafting and engaging subject line, structuring the email clearly and using the professional tone. Make the process simple enough for anyone to follow even without prior experience in formal writing. You can provide context, not just for the task itself, but also for the tone you would like to use. For instance, use a conversational tone that balances professionalism with accessibility. You can also specify rules or constraints the EI system should follow. For instance, in the email writing guide prompt that we just covered, you might add When your prompt involves factual claims like statistics, current events, product features, legal or medical info or anything where accuracy really matters, there are two extra ingredients that can significantly improve the result. The first one is reality check, also called grounding. This is when you are telling the EI. Don't just sound confident, be verifiable. So you can add a rule like if you make factual claims, cite sources, and tell me what you are unsure about, the second ingredient is reasonsm. A lot of topics change quickly tools, pricing features, policies, best practices. So it helps to tell the EI what time window to use. For example, use sources from the last 12 months unless all the resources are required. Here is what it looks like when you add both to a prompt. These two additions are especially helpful when you are using AI for research or decision making, not just writing, because they push the response to be clear about what's proven, what's current, and what's uncertain. Another way to enhance your prompt is to assign a specific role when performing a task. This is also known as role prompting. Role playing helps AI models adopt the nuances of specific perspectives, improving the relevance and quality of their responses. For instance, act as a seasoned executive assistant with over 15 years of experience managing high level business correspondence or pretend to be a professional writer, turned email writing consultant. You can take role prompting a step further by providing audience context in addition to the role. For example, Notice how the EI adapts the examples for dos and don'ts to make them relatable for technical professionals. It's pretty amazing. And if you are feeling overwhelmed by the idea of crafting such detailed prompt, don't worry. The beauty of working in a chat interface is that you don't need to design a perfectly thought out prompt to begin the conversation. You can start with a broad question or task and refine it through dialogue with the EI model. This iterative approach allows you to clarify your needs and improve the responses you receive over time. We'll talk more about the interactive prompting in our next video, and for now, let's sum up what we've talked about in this lecture. 8. Building on Gemini’s Responses: Iterative Prompting: Everyone, welcome back. If after watching the previous lecture, you feel like creating a good prompt is an arduous task and that you need to turn into a prompt engineering to succeed in this job. Here is a secret the experts use. Think of prompting as a conversation or a multi step process, not a one time question, just like you might clarify directions in a new city with a local, you can refine your prompts based on the EI responses. Let's walk through a real world example of iterative prompting to see how it works. Let's say we would like the EI to help us create a business proposal for a mobile dog grooming service. Step one, the initial prompt may be quite broad like create an outline for a business proposal for a mobile dog grooming service. In the second step, we narrow down or refine our initial request by saying something like, take the outline, you create and expand the market analysis section, focus on demographic data and competition in urban areas. On the third step, we ask for specific details. For instance, now develop the financial projections section, include start up costs, monthly operating expenses, and revenue forecasts for the first year. We can repeat step two and step three several times depending on how satisfied we are with the responses. Sometimes iterative prompting is even more powerful when you are working on something that needs to be accurate, not just well written. For example, step one, start broad. Give me an overview of the market for mobile dog grooming in urban areas. Step two, ask for assumptions and evidence. List the key assumptions you are making. If you mention facts or numbers, tell me where they come from and flag anything you are not sure about. Step three, cross check. Now sanity check your own answer. What parts are most likely to be wrong or outdated? What would you verify first? This way, you are not just polishing the wording, you are improving the reliability of the content as you go. Please note that just as a skilled project manager builds upon previous discussions and decisions, chat based AI keeps context through your conversation. That means you can refer back to earlier parts of the chat and build on them instead of repeating everything from scratch. So you might ask something like based on the marketing strategy we discussed earlier in this chat, let's build on it, but focus on suburban families in areas with limited grooming options. Of course, if you feel that your conversation is not going in the right direction, you always have the option to start over and reframe the first question. The final step of the iterative process usually involves asking the AI to polish the response. I alternatively, you can ask to provide feedback on the entire piece of content. In this case, the business proposal, focusing on how it can be further improved. Then you can include those changes in the final version of the document. This step by step approach allows you to review and refine the output at each stage, make adjustments based on intermediate results, maintain control over the final product, and build complexity gradually. Think of it like sculpting. You start with the basic shape, and then you gradually refine the details until you achieve exactly what you want. And that's it for the video. Let's sum up the key points that we've just covered. 9. Making Gemini Truly Yours: Personalization: Hi, everyone, and welcome back. Sometimes when you are talking to an AI assistant, it feels like you're starting from scratch every single time. You can write the perfect prompt and still get a generic answer because Gemini has no idea who you are and how you work in this video, we are going to look at how to make Gemini work the way you work. There are three levels of personalization you can use to customize your experience. Level one is basic personalized instructions. You tell Gemini how you wanted to behave every single time. Always be professional, always format answers as bullet points. Whatever works for you, it saves you from repeating yourself in every single prompt. Level two is intermediate chat memory. This is where Gemini starts remembering facts and preferences from your previous conversations, so you can pick up exactly where you left off. And level three is the most advanced personal intelligence. This lets Gemini connect the dots across your entire Google ecosystem, your GML, your photos, YouTube, even your search history. Imagine instead of spending hours playing a weekend trip. You just say Gemini plan a trip for this Saturday based on my favorite hobby. Personal Intelligence finds your recent hiking gear purchase in Jimel pulls your favorite trail photos from Google Photos, checks your YouTube watch history for local guides, and suggests a specific trail, knowing exactly what difficulty level suits you. One thing worth noting before we begin, personal intelligence is still rolling out, so we'll be focusing on the first two levels today. Also, these personalization features are part of the Google AI Pro subscription. If you haven't upgraded yet, check out our lecture where I showed you how to get access free of charge. Let's get into the demo. We are starting by heading over to the Gemini web app at gemini.google.com. I have already logged in into my P account. Next, look at the bottom left of your screen and click on Settings gear icon. From this menu, select personal context. The first setting is called your best hats with Gemini. When it's turned on, like on my screen here, Gemini learns from your history to understand you better over time. When I just activated this setting, here is what Gemini suggested for me. It correctly summarized all the things that I've been working on recently. And by the way, if you ever want to have a private conversation that is not stored in the chat history, you can use temporary chat. You see that it is available here on the top left side of the screen. So let's click on it. We see the same interface that you already familiar with. Let me ask something. I'm using a fast model as this is just a very quick question. So here are the suggestions. They are pretty good. And since we were tasting the temporary chat, let me look at my chat history. You see that we don't have anything related to a flat white here. Let me try to refresh the page to make sure that this temporary chat won't be saved into the chat history. Yes, all good. But at the same time, we also lost that conversation as well. Alright, let's get back to the settings, personal context. The second Google here is called Your Instructions for Gemini. We see that they are active by default as well to add a new instruction, a click on AD. And here we can include any information regarding your behavior, personal communication style, any preferences that you want to share with Gemini. So here is my prompt. So I'd like to divide the instructions into two parts. First, I tell the EI what I do. You see here that I shared my role as an educator as well as as a consultant, providing a little bit of context on what I do in both of these roles. And second, I explained how I like to work. Let's save those instructions by clicking on Submit button. All good. And finally, to see everything Gemini has stored, return to settings, and from here, click on Activity. This is the list of all the activities that you recently had with Gemini app. You can manually delete specific chats in case if you don't need them for certain reasons, and you can also set up out a delete schedule. So your data is cleared out every few months. For instance, I can choose a duration here. I live 18 months, which is a reasonable period of time to get rid of the old conversations, and I click next. Perfect. And that's it for this tutorial. Now you know how to customize gemini to work exactly the way you want. And Alca in the next video. 10. How to Share Files and Other Content with Google Gemini AI: Hi, everyone, and welcome back. In the previous lectures on prompt engineering, we talked a lot about how to frame your instructions and what information to include. But apart from the instructions, sometimes you also need to provide the EI with source materials like documents, spreadsheets, screenshots or PDFs, so it can review and analyze them. Let's see how it works. You can provide information from documents and images to Gemini in two main ways by pasting the text directly into the chat or by attaching the entire file to the conversation. So the first option pasting the text works well when you only need help with a specific fragment of your document. For example, here is my resume, and I want feedback on just one section of the document, so I can just copy it, paste it into the chat, and then give the instructions to Gemini. So I said that this is a fragment from my resume, and I asked Gemini if these skills are relevant for a head of product position for a Fintech startup. And here is the response. But oftentimes you want Gemini to work with the entire document, like a long PDF or a complex spreadsheet. Gemini can handle almost any common file type from Word document to CSV files, photos, and even videos. To attach the file, click on the plus icon on the left hand side of the chat bar. You can choose a file from your device, from your Google Drive, your Google Photos. So let's take one example. I need some ideas for what to cook for dinner. What I'm going to do, I will upload several photos of ingredients I have in my fridge. These are the ingredients that I have. I'll ask Gemini, what are the three simple dinner recipes I can make in under 20 minutes. And here are the recommendations that Gemini provided. You see that it successfully identified the ingredients based on the pictures. Here we see Gemini's ability to recognize objects and apply creative Frisonin. Next, let's try document. Let's say you have received Complex utility bill document. So you can upload this PDF to Gemini and ask if it can summarize the main charges. Let's try this out. I will return to the same chat, click on the plus icon and then choose files from my local Drive. And here is my prompt. Let's use fast model here because it should be pretty straightforward request, and let's see what reply we're going to get. Yeah, pretty great correct summary of the charges, as well as my data usage. All good here. Alright, let's try something else and submit different types of documents to Gemini to see if it can really work with different files. I have a PDF with my flight itinerary for my upcoming trip to Phuket. And here I have a travel guide with some information regarding the tours. That I can do there while I'm in Phuket. All right. This demo takes quite a while. So what I'm going to do, I'm going to stop this response. I'll copy this prompt and open a new chat. I included the same prompt, and here, let's change to thinking. Because I have quite a complex PDF document here. I also have visuals with concrete dates that Gemini needs to analyze and compare with the dates in this document. So perhaps it would be better to switch to a smarter model. Let's try this out. Now we got the result almost immediately. So let's read what Gemini tells us. It recognizes all the information in the documents that I provided, and it also figured out nice recommendation on what I can do just after I arrived into my destination. This is where we see Gemini acting as our personal cardinator connecting dots across different file types. And please remember that while Gemini can read and analyze these files to generate summaries, tables or recommendations, it won't actually change the original file itself. All right, moving on with our demo, let's say that I have an audio file that I want Gemini to analyze, as always clicking on Plus button. Then I select in my audio file, and here is my prompt. Can you summarize the key points of this audio? I'm going to continue using thinking mode here because this is more complex task rather than just asking a quick question. And here is the summary. This is the correct summary provided by Gemini. I can confirm this as this is the recording that I prepared myself for my other course. Great job Gemini. And let me also demonstrate how it can work with videos. I have this link to Google keynote presentation. And since right now I'm working on Gemini course, I want Gemini help me find all the moments when speakers talk about Gemini app, new functionality. Let's hit Enter and look what Gemini will suggest. Here is the detailed analysis of this video. And what I really like here is that it included the time codes. For example, we see here that Gemini mentioned about personal context, and it included this specific time code where one of the speakers were talking about this functionality. So if I would like to review this conversation, I can just click on that time code. I will be redirected to this part of the presentation. And that's it for this lecture. Let's briefly sum up what we've learned here. Most modern AI models accept common file formats, including PDFs, Word documents, Excel files, CSVs, images and text files. Fils can be uploaded using an upload button or attachment icon on the chat interface. You need to give clear instructions about what you want the AI to do with the files. Being specific with your requests leads to better results. You can upload multiple files and ask the AI model to compare them or analyze them together. The AI usually won't edit your file directly, but it can generate improved content. You can copy back into your document. All right, and I'll see you in the next lecture. 11. Using Examples in Your Prompts: Everyone, and welcome back to the new lecture where we continue talking about how to communicate with EI systems and what to include in your prompt. So far, we've covered several components that can be included in a prompt, a task or what you would like to achieve, followed by specific details or context and rules necessary to perform the task or answer a question. Next is role context, a specific role that the EI will be playing when performing a task. Optionally, you can also introduce the intended audience for your task. Lastly, we mentioned that you can share additional content by attaching documents to your conversation or by including the text as input data directly in the chat and regarding the order of components in your prompt. The ordering matters for some elements, but not for others. For instance, it is recommended to include the RL context earlier in the prompt, while input data may not be necessary depending on the task, and its ordering is also flexible. But in general, if you stick to the ordering shown in the course presentation slides, it will be a great start to an effective prompt. Okay, let's introduce another prompting element. Examples. Examples also known as shots act as demonstrations that guide the generative AI model on what kind of output you are looking for, including the answer format and what you want to avoid. Perhaps you have heard of terms like one shot or few shot prompting. These refer to using one or several examples in your prompt description. For chat prompting, examples typically demonstrate tone. For instance, formal versus informal, serious, versus schedule, empathetic versus matter of fact, and style such as sentence length, format patterns, bullet points, versus paragraphs, technical detail level, basic versus advanced terminology, and so on. Let's go over some concrete examples. First, I will ask Gemini for a simple email without giving any example. So here is my prompt. For this demo, I'm going to be using Fest model. Let's run it. This email is fine, but it is also pretty generic. Now let's make it much more specific by showing an example of the tone and structure that we want. So here is my other prompt. So I have the same instruction at the beginning, and then I provided an example as a style reference that is mentioned the tone, sentence length, and the structure that I would like Gemini to use. Let's run this second version. Now, if we compare that new response with the initial version, we see that it feels more human. The sentences are shorter and the structure is closer to what we showed in the example. And while we are here on the email example, let me quickly show you what Gemini can do with that email next. It turned out that you don't need to copy and paste the email into your inbox. If you look right below the response, you will see more icon. Let's click on it. And here you will see draft in Gmail option. If you click it, Gemini will open a new Window and place this exact text into a real Gmail draft, which you can further edit and eventually send it to your recipient. So let's try to do this. Gemini is drafting an email. Let's take a look. I'll click on Open Gmail. We see that it correctly picked up on the email subject. This is the exact text that we saw in the chat. Let's try something a bit more advanced. So far, we have used examples to fix the tone and style of response. But you can also use examples to set a mental frame. Mental frame does not just change the words Gemini uses. It changes the logic it uses to solve your problem. So instead of writing a long list of rules like be practical or don't be too academic, you can simply show Gemini a shot or example of the perspective you wanted to take. So let's go step by step. First of all, I'll open a new chat. And here, I would like to switch to a pro model. And just a heads up if you are on a free plan, you will still have access to the pro model. You see, I'm using my free account, and I still can select this model. But your usage limits may be lower than on paid plans. So I'm coming back to my account that I use for this demo. First, let's see how Gemini handles request with no framing at all. I'll ask about a popular topic personal branding. I want to learn about personal branding. How should I start? Let's hit Enter. If we are interested, we can look at Gemini's thinking process. You see those are the steps that it took to give us this recommendation. Everything is correct, but it's very theoretical. It feels like a long to do list before you have even started. Now let's use a one shot example to shift the logic to a hands on mental frame. I want Gemini to act like a coach who values immediate small wins over big theories. So here is my new prompt, apart from my original instruction. I also included an example of hands on logic. Let's enter and see what Jimmy and I would suggest here. See that? Because I labeled the logic as hands on and showed Gemini the hello world example, it isn't giving me a reading list anymore. It literally tells me the hands on recommendations, things that I can do right now. So now, Gemini is mirroring the way of thinking, not just the tone and style, like in our first example. Alright. And let's take one more quick example. This is especially useful when you are doing research. Let's say you want Gemini not only to answer the question, but also to show where the information comes from, you can include an example that demonstrates the format you want. For instance, you can write a full prompt like this. And what's important, I also provided rules for Gemini. For the cases, it cannot find a reliable source for a claim. Let's run it. This kind of example makes the output much more structured and easier to trust because you are showing the exact format, you want for evidence. All right. Apart from one or few shot prompting, there is another technique using interactive examples. Interactive examples differ from regular examples in that they create a dynamic, back and forth learning experience where each example builds upon previous understanding or feedback, while regular examples are study demonstrations. Interactive examples involve active participation and iteration. Here is how interactive examples work. You provide an initial version example. The AI gives specific feedback and suggestions. You create an improved version based on that feedback. The AI analyzes the improvements and suggests further refinements. You iterate again if needed. The key is that each iteration builds on the feedback from the previous version, creating a collaborative improvement process. Okay, great. And that's it for this video, let's quickly cover what we've learned here. And I'll see you in the next video where we will cover yet another prompting technique. 12. Specifying Output Format in Gemini: Every one. We're almost done covering the key ingredients of a good prompt. There is yet another component you might find worth including in your prompt information on what format you want the AI's response to take. Let's talk about this now. Remember that in our first lecture on prompting, we said it's important to include information regarding the basic outline or list of points. You won't cover it as context for your task. It turns out you can also specify your formatting preferences for the response, which can help organize information more effectively. This information may not be necessary depending on the task, but if you include it, edging it towards the end of your prompt is better than at the beginning. Let's go through some examples of formatting you can request. You can ask for specific formatting styles. For example, if you need a business report, you might say, Please format this as a professional report with headers, subheaders and short clear paragraphs. AI will structure the information accordingly, making it ready for professional use. When working with data or analysis, you can request tables or specific layouts. Instead of a wall of text, you might say, present the comparison of these three products in a clear table format with features in the left column. This makes complex information easier to understand and use. And here are a few more format and patterns that are especially useful for research or decision making. Comparison table. Give me a comparison table of these options with columns for key features, pros, cons and best four. Source mapping, list the sources you used and briefly explain what each source supports in your answer. Facts versus interpretations. Separate your response into two sections, facts, verifiable statements, and interpretations, your reasoning, assumptions or recommendations. You can request a specific markdown formatting. The AI can use bold text, italics, headers, and bullet points as needed. Just ask for key points in bold or important terms in italics, and it will format the response as you requested. You can organize your tips using bullet points for claridm main tip, supporting detail, and another detail. Lastly, remember that you can always ask to reformat the response if the first version isn't quite what you needed. It's perfectly fine to say, Could you reorganize this information as a numbered list? Or please break down this into shorter paragraphs for weather readability. Okay, and that's it for this brief lecture. Let's recap the key points we've just covered. Always specify your desired format upfront to get the most useful response. You can request specific structures like reports, tables, or lists. Comparison tables are great for decision making. You can ask for a structured table with pros, cons and best form. For research tasks, you can request sources and even separate facts from interpretations for clarity. A AI model can adapt its writing style to match your needs from casual to professional. Markdown formatting helps highlight important information. You can ask for reformatting if the first response isn't quite right. Clear formatting instructions lead to more useful and actionable responses. And that's it for this video, and as always ALCa in the next one. 13. Follow-Along: Choosing the Right Model and Brainstorming with Gemini: Everyone. Up until now, we've been exploring Brampton in isolated pieces. It is time to bring those pieces together into a complete end to end workflow. And along the way, I'll show you some productivity i packs available in Gemini, like how to double check responses for accuracy and export them directly to Google Docs. We are going to explore two scenarios that are by far, one of my favorites when it comes to working with Gemini. Those are brainstorming and getting feedback. But before we start with our first scenario, let's talk a bit on how to choose your AI model. You have seen me switching between them throughout this section demos, and you may be wondering, so what model should you choose? And when your choice depends on your subscription plan. If you are a paid user, I suggest you make thinking your default choice. Its reasoning power handles almost everything switch to fast, only for low stakes tasks like quick grammar checks or quick questions and switch to pro when you are dealing with long documents, deep research or anything that requires sustained focus across a large amount of content that's where it earns its place. I've been working with Gemini for quite some time now, and this is the best workflow I've come up with after a lot of experimenting. If you are a free user, keep fast as your default because the more advanced models have limited daily quotas on the free plan, so you need to be strategic and save those credits for when you really need them. Switch to thinking when a task requires deep logic or multi step reasoning and switch to pro when you are working with large content or need that high level of nuance and depth. Now, with that in mind, let's jump into our first follow along scenario of brainstorming process. I want you to imagine you are the marketing manager for a very ambitious, imaginative sleep tech startup called Snooze AI systems. We are about to launch the Snooze One, the world's first autopilot for your dreams. As you can see from our internal briefing, This mattress has everything from climate zoning technology, dream sync analytics, and the vibe sing story engine. Need to build a social media launch campaign that makes smart sleep sound essential. So let's open Gemini to start the demo. I'm selecting the thinking model because we need a creative strategist who can handle nuances. And let's begin our brainstorming. Here is the first prompt that I'm going to use. You see that I first introduce the role that I want Gemini to take. Then I included a bit of context in terms of what we are about to launch. Our target audience. And then I gave a task to Gemini to suggest tent content themes for our 30 day launch window. And let me also include the PDF file that you just saw to provide even more context to Gemini. And let's hit Enter so here are the ten themes that Gemini suggested. I like this theme the best. So let's ask Gemini to dig deeper into this specific theme. So here is my second prompt. And let me actually specify that I want ten cost ideas. Let's hit Enter. Great suggestions. And in case if you don't like some of them, you can always ask Gemini to suggest you ten other ideas. So let's do this. I notice when you do this several times, you can come up with really great suggestions. So please try do this and don't just use the first list of ideas that Gemini provides. Let's do one more iteration. I gave some feedback to Gemini regarding the list of ideas that it provided. Nice. I see that we can work with some of the ideas further. But before we start doing the actual scripts for our post or videos, let me ask Gemini another question. Before we proceed next, I want to know what are the current social media content trends for tech product launches, like in our case. Here are the trends. You see that it correctly picked up the current year. And here is my next prompt. I'm going to ask Gemini to suggest ten short form video script IDs for the vibe check Storytelling series. Let's say that I would like Instagram to be our platform of choice. And notice that I also included this PDF with the viral hook ideas that I want Gemini to use when preparing the response. This is what is called grounding. So I'm anchoring the EIs response in our specific brand style so the scripts don't feel generic. Next, I also provided the structure for the script and that's it. Let's hit Enter. Alright, we see that Gemini included some placeholders, and I really want to have a full script ready for the teleprompter so that we can just record the video. So when brainstorming, I start with asking Gemini to explore a wide range of ideas, then I might iterate on those ideas several times. And then I usually select an idea which I like and ask Gemini to narrow down on that topic and let's say, create a post or a story related to that idea of my choice. Alright, our script is ready. I can continue talking with Gemini and ask to adjust that script or take another idea to expand. But let's say I'm fine with this one, I can actually export this script directly into the Google Doc. You see three dots I can hear. If I click on it, I can choose export two dogs, and let's see what happens. Gemini tells me that the new document is created. Let's click Open. Very nice. We even have a table with time codes and exact text that we need to say very cool. And you also see here Geminis jests to export this table into sheets. Let's try to do this as well. Personally, I like export into Google Docs for this scenario. I think it works better for this type of document, but you got the idea. That's it for this tutorial and Alca in the next one. 14. Follow-Along: Getting Feedback with Google Gemini AI: Everyone. Welcome to the second follow along video. Let's explore getting feedback from Gemini. This use case is one of the first I started with. When using EI assistant. I used to submit my documents like presentations, reports, resume, and ask EI for feedback so that I can get a second opinion on it and make improvements. But Gemini moved this process to the entirely new level as it's natively multimodal, meaning it can process not only texts but other types of content like videos. You can now get personalized feedback on how you actually perform, not just on what you wrote. The reason Gemini is so dominant here is its massive context window. That's the first time we are using this term. So let's introduce it. The context window is essentially the IIS short term memory. It is the amount of data the model can hold in its brain at once to understand the request. While other models might struggle to remember more than a few minutes of footage, Gemini can process up to 1 million tokens. To give you an idea that's about an hour of video or thousands of pages of text in a single go. This massive memory is exactly why we see so many users switching to Gemini for video analysis. But don't just take my word for it. Let's verify it. I'm going to use thinking mode to verify the claim. And this is the prompt that I'm going to use first. Let me hit Enter. The reason why I started with this question is because I want to show you the double check response function. And here is the response with the details on why professionals are making the switch to Gemini and to access the double check response function, click on three dots icon at the bottom of the response. And here you will see double check response. This feature uses Google search to find content that's slightly similar to or different from statements generated by Gemini. And please note that this feature is specifically built to verify factual claims. It won't appear for things like creative writing, code, or similar tasks. Gemini started evaluating the statements And here we see the green highlights confirming the claims that Gemini did. And we can even expand this window to see the detailed article that Gemini used to validate this claim. That's pretty convenient feature. And now let's get technical. I recorded a video of myself during a Zoom interview for a head of product role. This is a 1 hour recording, which is a massive amount of information. And because of this, I'm going to choose the pro model. But first, let's start a new chat. Here I'm going to choose P. Pro Model is designed with a much higher intelligence ceiling and is superior at maintaining a coherent understanding across the entire hour of footage. So let me attach the footage first. I have ten different video fragments here, and I also submit my instructions. I started with giving Gemini a role of executive leadership coach. I provided context in terms of the video, what I'm doing here, and this is my task. With the specific questions that I want Gemini to go through. My expectation from Gemini is to provide me with information in terms of my presence, communication, style, and clarity, my strength, and areas for improvement. And I also asked Gemini to provide the specific timestamps for its observations so that I can quickly find the fragment Gemini is referring to and rewatch it myself. Watch how Gemini processes this information. And here is the feedback. Those are great observations and things I could definitely improve on. And now let's take that feedback and turn it into something useful. I'm going to ask Gemini to rewrite my tell me about yourself script so that it is more punchy and it is more relevant for the head of product role I'm going to apply for. When you work with Pmdel like in our current example, the response generation takes significantly longer time, so be aware of this. And finally, here is the rewritten version of my Tell me About Yourself introduction, it looks quite good. But of course, if I would use it in a real conversation next time, I would prefer to change some things to make sure that it does sound more like me. Great job Gemini. And just like that, you have turned Gemini into your personal coach. I can imagine so many use cases for this kind of video feedback. Imagine you are doing a 28 day yoga challenge, and you need daily feedback on whether you are improving or maybe you have a fear of public speaking, so you can record yourself, submit the video to Gemini, along with your presentation slides and ask what worked and what didn't thing I noticed when I started doing this regularly is a positive side effect I didn't expect. The fact that you are recording yourself makes you more self aware. Even before Gemini says anything, you start paying more attention to what you are doing and how you're doing it. But that's it, and that is important. Take AI feedback with a grain of salt. These models are incredibly powerful, but they do make mistakes. For instance, in the example we just looked at, Gemini told me I was seated the entire time while I was standing. So use the insights as a starting point, but always rely on yourself for the final judgment. Please let me know in the Q&A for this video, what scenarios you're experimenting with Alcia in the next one. 15. Keeping It Real: Practical Strategies to Minimize AI Hallucinations: Everyone, imagine asking AI assistant about a recent news event, and it confidently cites a detailed article that does not actually exist or asking it about public figures and getting responses that mix real facts with completely made up details. These aren't bugs or glitches. They are what we call hallucinations in AI. And they are one of the biggest challenges when working with large language models. Let's explore why these hallucinations happen, how to spot them, and most importantly, practical techniques you can use right away to get more accurate, reliable responses. To understand why these errors happen, we have to look at how these models are built, unlike a human who truly understands a topic, language model works by predicting the most likely next word in a sequence based on statistical patterns because they are designed to be as helpful as possible, they often prioritize providing a complete, fluent answer over admitting they are unsure. When a model reaches a gap in the information it was trained on or when it encounters an ambiguous request, it might fill in the blanks by guessing the most probable sound in response. It isn't a glitch. It's a side effect of the AI prioritizing a smooth conversation over verified truth. Now that we understand why hallucinations occur, let's explore how to spot them in practice. Think of this as developing your AI fact checking skills. Once you know the warning signs, they become much easier to catch. Here are the key warning signs to watch for. Overly specific details. When AI model provides very specific detail, especially about recent events or statistics, this should trigger extra scrutiny. For example, if it provides exact numbers or statistics for very niche or rapidly changing events, without citing a live source, that is a red flag. In these cases, the AI may be generalizing from similar historical patterns rather than reporting on the specific event you asked about. Perfect sounding citations, examples or statistics. If you notice an answer that sounds too perfect, that's a good reason to double check the information. And believe me, the more experience you become working with EI tools, the better you will be exporting these two good to be true moments. You will develop an instinct for recognizing when something feels off or overly polished. And that's your cue to dig deeper, verify facts, or cross check sources. Trust but verify. That's the golden rule when working with EI generated content. Inconsistent answers. If you ask the same question multiple times and get different specific details each time, that's a strong indicator of hallucination. Overly definitive statements. When AI makes very definitive statements about topics that should have some uncertainty, especially regarding future events or complex topics, be cautious. Knowing why hallucinations happen and how to spot them is a great start. But how do we actually prevent them? Let's go over four useful strategies that will help you get more reliable, accurate responses every time. Strategy one. Be explicit about uncertainty. Instead of asking a direct question that forces the AI to guess, give it a clear out by asking it to prioritize accuracy over completeness. For instance, instead of writing, what were the key findings of the Johnson's report? Try this. If you have verified access to Johnson's report, please share its key findings. If you are not 100% certain about any details, please explicitly state which parts you cannot verify. Or instead of list all the companies using this technology, try this based on the data you were trained on. Can you list verified examples of companies using this technology? Please provide the specific sources or context for each example and indicate if any of these cases are speculative rather than confirmed. Instead of what's the market size for AIchatbds right now, try this. Can you provide the most recent market size estimates for AIchatbds from reliable cited sources? Please specify the exact time period for any data you share and let me know if you don't have access to the latest figures. Notice how each revised prompt explicitly gives permission to acknowledge uncertainty and limitations. This simple change can dramatically improve the reliability of responses. Strategy two, demand evidence based citations. When you ask for sources, don't just look for a list of links. AI can sometimes generate perfect looking citations to papers or websites that don't exist. Instead, instruct the model to quote the specific sentence from the source that supports your conclusion. By forcing the EI to match its claim word to word with an existing text, you significantly reduce its ability to invent details mid sentence. Strategy three, use structured output formats. Requesting structured outputs can help minimize hallucinations by forcing AI model to organize information more systematically. For example, please analyze these sales data using the following structure, verified data points, direct numbers from the document, calculated metrics, show your calculations, interpretations, clearly labeled as interpretations, and uncertainties, areas where data is unclear or missing. Strategy four. Implement verification steps. Include verification steps directly into your prompts to enhance the accuracy and reliability of responses. For instance, you can ask list any assumptions it made during its analysis, highlight areas where it has lower confidence or certainty. Recommend additional information that could help validate its conclusions. This approach ensures more thorough and transparent output, making it easier to assess the quality of the response. Right now that you have all the information on AI hallucinations, take a moment to review one of your recent prompts. How could you modify it using the strategies we've just covered? Remember, the goal isn't to eliminate hallucinations completely, but to create a workflow where they are less likely to impact your results. Please share your original and revised prompt under the Q&A section for this video. And as always, let's briefly recap the key points of this lecture. AI hallucinations happen when language models generate false but plausible sounding information. Hallucinations happen because the AI is confident storyteller that prioritizes a smooth conversation over checking its work against a textbook or real facts. Warning signs of hallucinations include overly specific details, perfect sounding citations, inconsistent answers, and overly definitive statements. Be explicit about uncertainty in prompts to encourage AI to acknowledge its limitations. Request citations and reasoning to verify AI outputs and identify hallucinations. Use structured output formats to minimize hallucinations by organizing information systematically. Incorporate verification steps in prompts, such as highlighting uncertainties or listing assumptions. All right. And that's it for this lecture, and I'll see you in the next video. 16. Working with Gemini Canvas and Gems: Section Intro: Welcome to the next section. By now, you should have a good understanding of how to talk to gemini. While we will keep building on those fundamentals, it is time to level up. We are moving beyond basic back and forth prompts to explore Canvas and jams. We'll begin with Canvas a side by side workspace where you can edit text, compare versions, and iterate on your work. Not starting from scratch every time and do much more. Then we'll learn jams. These are like custom made specialists that remember your specific rules, so you don't have to repeat them. We are going to build two of them together, grammar and spelling reviewer. This jam acts as a professional editor to profit your writing while keeping your voice unchanged and an AI fitness coach, this one can watch your workout videos, check your form for safety, and even design custom motivational backgrounds for your phone. By the end of this section, you won't just be sending prompts. You will be creating your own personal team of experts to turn your quick thoughts into finished pieces of work or to automate your routines. Let's get started. 17. Welcome to Gemini Canvas: Everyone. Welcome back to the first lecture of this section. So far, we've seen Gemini's standard chat interface, like the ones we're used to working with in different messengers. It's great for a quick question, getting feedback or brainstorming. But it can feel a bit limited when you are working on a brand new document. Or a piece of content that needs multiple revisions. This is because when you are drafting something complex, you need more than back and forth conversation. You need a workspace with various editing tools. That's where Gemini Canvas comes in. Think of Gemini Canvas as a collaborative workspace. In a standard chat, the EIS gives you an answer, and if you want to change one sentence, you usually have to ask for the whole thing to be rewritten. In Canvas, Gemini opens a side by side window. On the left, you have your chat. On the right, you have a living document. It's no longer just a chatbot it is an editor sitting right next to you. You can click into the text, change words yourself, or highlight a specific paragraph and tell Gemini. Make just this part puncture. If that sounds good, wait until you hear this. Canvas is not just for writing, it's also for building. Right from the interface menu, you can generate web pages, visual infographics for complex data, and even study tools like quizzes and flashcards. For those who prefer listening. There are audio overviews that create podcast style summaries of your findings. Perhaps most impressively, you can generate functional mini apps. Simply describe a tool like a family recipe organizer or a personal calendar and Canvas will build and run the code for you in real time. You don't need to know how to code. You just need to describe what the tool should do a process now known as vibe coding. Now, because Canvas is so powerful, it can be tempting to jump straight into building apps and games. However, we are going to take this one step at a time. For now, in this section of the course, we're going to focus entirely on document drafting. Using an imaginary AI mattress company as our example, we'll see how to use the Canvas workspace to refine a narrative and generate support and visuals in one fluid session. Once we have mastered document creation, we'll move into the more advanced features like interactive app creation and rapid prototyping later on in the course. In the next lesson, I'm going to show you how to open the Canvas interface, and we'll start our very first collaborative draft. I'll catch you in the next one. 18. Follow-Along: Creating and Editing Documents in Gemini Canvas (part 1): As promised in this video, we are going to get hands on. We will explore how to navigate the Canvas workspace, how to do targeted editing using the ask Gemini feature, we will change specific parts of the document without rewriting the whole draft. We will also take a look at the quick actions for changing things like document tone and length. Finally, we'll go multimodal. We'll bring the brand to life with EI generated logos and product visuals. Let's switch to Gemini for the demo. Let's begin by switching to Canvas mode. For this, I'm clicking on Tools and choose Canvas in the pop up window. Let's also change the model to thinking. And I'm going to start with broad conversational prompt. Here is what I'll type. I gave Gemini some context in terms of what I'm about to do. I provided the task. I said that I need a brief description of the company and the new product that this company is about to launch. I also provided details about the style. I want Gemini to pick up. Let's hit Enter and see what Gemini will write. It is opening the Canvas workspace with the chat on the left hand side and with the text on the right hand side. We see here it created the company description including name, motor, and a brief overview of what this company is doing. Next, we have the information about the product, including the key features of the mattress, and it even suggested some brainstorming objectives for my upcoming demo. Perfect. Let's explore this workspace on the right hand side. On top of the workspace, you can first of all see some editing tools. For example, you can change heading style for your text. You can add a bullet list or a number at list, or even some formulas here. If you like, you can print this page. Into a PDF document, and there are other functions here that we are going to explore a little bit later in this and the following tutorials. The real magic in this workspace is the ask Gemini feature. Let's say that you want to make a change in one part of your text. And instead of asking for a whole new draft in the chat, you can just highlight the part you want to edit and then write your request to Gemini. For example, I would like to change the location of the company office. So what I'm going to do, I will highlight this text, and I will just include my instruction for the change that I want Gemini to make. You see, Gemini did the change and included this new text directly in the document. And on the left hand sidebar, we see that it included the information text and even some description of that change. Let me skim through this text and see what kind of edits I would like to make in addition to the office location. M I can continue working on that document and going back and forth, including the changes up until the moment when I will be fully happy with the text. Frankly, I use Canvas for document creation because of this ask Gemini feature. As in most cases, I need to adjust a very specific part of a document. However, here is what I discovered after weeks of experimentation with it. Since Gemini is focusing on that specific part of a document, it sometimes misses the big picture. I have noticed cases where it repeats phrases used in other parts of the document or brings in terms that aren't introduced until later. So definitely give your work a quick review to make sure it all fits together. And that's it for the first part of this tutorial. And I'll see you in the second one. 19. Follow-Along: Creating and Editing Documents in Gemini Canvas (part 2): Welcome to the second part of the tutorial, where we explore Gemini Canvas for document creation. Apart from ask Gemini, there are quick actions that you may find useful for making changes to your text. The first quick action is change length. This is great if you need to quickly expand on a section with more detail or shrink it down into a punch summary. Let's say we want to change the length for our text, I'm clicking on this button, and then I need to choose the length that I would like for my new text. Let's say I want it to be longer than the current one, and let's wait for the changes. And Gemini has expanded this text. You see that it highlighted the new text in blue color here. Let's come back to the week action buttons. And the second one is for changing tone. So in case if you want to sound more professional or on the other hand, a bit more chatty, this is the button that would help you to switch the vibe of your writing with literally just one click. Let's select change tone, and I can go from formal to very formal or casual and very casual. Frankly, I'm good with the current tone for the text, but for instance, let's make it a bit more formal for the purpose of this demo. We see that Gemini has changed almost the entire text fragment here. I would prefer to return to the previous version. But I think you got the idea of what this change tone option can do. So I'm returning to the previous version of the document. And lastly, there is also function to suggest edits. This is like having a writing body. Gemini will give you feedback and show you ways to make your writing better without changing your original text right away. Let's try this function as well. Alright, great. We see that Gemini has included some changes with the information about the reason for that change. If I'm good with all those changes, I can apply them all. If you don't like suggestion from Gemini, and you would like to return to the previous version of the document, you can tell this to Gemini directly here in the chat. Cool. So let's click on apply for the remaining suggestions so that we can keep them in the new version of the document. All right. Let's continue the demo. And as the next step here, I want to create some visuals to show you the multimodal capabilities of Gemini. We will have a dedicated section on visual content creation later in the course. So for now, I'll just type very short straightforward prompt. And let me press Andrew to see the results. And here is the first image. Amazing that Gemini even included the product name here on one side of the mattress. Gemini also tells me that it can only generate one image at a time. It is asking me if I would like to go ahead with the company logo. Gemini is getting very good at including texts inside the images. And let's ask for several visuals for features. Great. And you see why it is important to create images in this same chat where we created the original text. Gemini uses the context from the previous conversations to create the image. You see that it took information about three degree angle, even though this angle looks a bit larger to me. But that's fine. We can adjust this through iterations working on this image. It also included the mattress name here. Let's create the fourth image. That's awesome. You see that in the description, we have the information that this feature creates a clean air dome over the sleepers, and that's exactly what we see here on the picture. Amazing. And let's check the text. Optimal humidity, air quality. Yeah, and the text is correct. I don't see any mistakes here. Alright, let's finish this tutorial before it becomes too long. We will continue working with the text and images in our next video. 20. Follow-Along: Turning a Gemini Draft into a Polished PDF with Gamma: We now have our brand back story, product features and images organized within Gemini. Think of this as our drafting studio. The space for core thinking and writing. However, our working draft is not finished deliverable. If you need to present this to a manager or a client as a professional report, we need to move this content into a dedicated design tool like Canva or Gamma App. You might think, cannot I just ask Gemini to generate the PDF for me? Good question. And yes, that was my intention as well when I first got this task to create the final PDF. Here is how Gemini handles this. If you try to create PDF in Canvas, you won't get that final document. Canvas tool is built for live editing and collaboration, not for publishing. Because it operates in a private workspace, it cannot see your local image files to include them in the document. If you try to export from here, you will see file with empty placeholders where your images should be. Of course, you can try a regular chat as well. It is more functional. It can generate files in the background to give you a downloadable PDF. However, it lacks the layout control and polish required for a professional presentation. Here is the PDF Gemini created for me. It is a good start, but it required significant manual formatting to look right. So to get our presentation ready finish where text flows correctly around images and branding is consistent, we move from the drafting studio to a design studio. In the next tutorial, I'll use Gamma app to demonstrate this. It's been my primary tool for nearly a year, and it's what I use for almost all my design work. However, the same principles apply to other similar platforms like Canva or Adobe. Let's head back into Gemini and prepare our content for the move. Let's transfer our assets text and images to Gamma app. I'll begin by copying the text. For this, I'll click on Share and Export button. And from here, I'll choose Copy contents. And I already downloaded the four images that we generated in the previous tutorl. So everything is ready for us to move to Gamma. Let's open Gamma app. Here is Gamma main page. The central part is the content grid. This area displays our projects also called Gammas. The top bar here is for creating new documents. On the left hand side, we have templates. Here we can access preset layouts to jump start our presentation design. We have such useful things as MAI images. Where we can view and use images that we've generated using Gammas built in EI image tool. We can also create folders so that we can separate our materials by specific themes or topics. So let's get straight to creating an PDF file. I'll choose Create New with AI. And here we have different options. Since we already have a text, which I copied from Gemini, I'm going to choose this paste in text option. And here I will include the text from Gemini. Next, we have several options for what Gamma app can do with our content. And it's important that we choose preserve this exact text. Meaning that Gamma won't be doing any modifications of our draft. This is the most effective method for our example because it allows us to use Gemini for the heavy lifting of thinking and drafting and then use Gamma to handle the formatting and beautification of the final document. I'm going to select continue to prompt editor here. Here we can choose different themes for our presentation. Let's choose this one and click Select theme. Before we hit Generate, notice the two modes at the top, free form and card by card. Let me quickly explain the difference. When you choose card by card, Gamma automatically breaks your content into separate numbered slides. One idea per card, but you can still rearrange the cards or add new ones. It is perfect for presentations. Reform keeps everything as one continuous flowing document, more like a report than a slide deck. Same content, but it reads top to bottom without heartbreaks between sections. This gives you more control over the layout and flow. It is great for documents or reports. For our demo, I will choose freeform because I want our text and images to flow together naturally. And let's hit generate. Gamma starts creating our slides. First of all, what I usually do, I ask Gamma to suggest several other layouts for me so that I can compare the default layout with other suggestions. So for this, I click on Edit with Agent button, and from here, I choose Try New layout. Let's do one more turn to see if there is anything better than our first default option. I think I'm going to choose this one. I like this background image here. Let's move to the next slide. I will include our logo image instead of this one. To change the picture, I'm going to click on that one. Next, I go to Edit Image. And from here, I'm choosing image upload or URL. I have my images on my local Drive. And here we go. This is our first image. Let's attach it. Perfect. Let's move to the third slide. All right, we are ready to go. Let's do the final check and take a quick look at all of our slides. To export this file, we click on the three dots icon. Here we choose Export, and I'm going to export to PDF. Let's open the file straight away, and here we go. Looks cool. So this is my favorite way to work when it comes to creating new documents. I let Gemini to do the creative thinking part, and then I let my design tool of choice like Gamma handle making it look good part. I hope that you enjoyed this tutorial, and as always, I'll see you in the next one. 21. What Are Gemini Gems, and Why Do We Need Them?: Everyone, when you start using Gemini regularly, you quickly notice that there are certain things you use it for again and again, whether it is for brainstorming, getting feedback or generating new content, you may find yourself typing out the same prompts and giving the same context over and over, which can start to feel a bit repetitive, like your own digital version of groundhog day. Well, today we are ending that cycle. We are going to explore a feature that lets you package up those repetitive instructions and turn them into your team of AI experts or personal assistants. They are called Gemini Gems. And, no, we aren't talking about diamonds here. Though once you see how much time the save you, you might think they're just as valuable. So what exactly is a jam Think of them as customized versions of Gemini built to help you tackle repetitive tasks or get deep expertise in specific areas. When you chat with Jam, Gemini remembers your goals and guidelines automatically saving you from repeating yourself in every prompt. So while a standard Gemini is like a librarian, who knows where everything is, a gem is like a dedicated specialist. It does not just know about a topic. It follows your specific rules to perform work for you. There are three types of jams, premade jams. These are out of the box tools built by Google. You cannot see or edit their underlying logic. You can only pin them to your sidebar for quick access. They often have unique interfaces like the ten page storybook layout that regular jams simply cannot mimic. Custom Jams. These are the focus of our next tutorials because you build them yourself. You provide the instructions and can upload up to ten personal files to act as the Jams knowledge base. It is the difference between a general assistant and a dedicated expert tailored specifically to your data and your goals. Jams in Opal. Ople is an experimental project that moves AI beyond simple chat windows. These drums are interactive mini apps that follow a specific workflow. Their standout feature is the ability to remix them. You can take a pre built tool like a fashion stylist and modify its internal steps to create something new. They are highly visual and can generate text, images and video simultaneously. We are going to explore these dams in the later sections of the course. Now, since we have already worked with Canvas, you may now have a logical question. How is a Jam actually different? The key is to think of Canvas as your shared workspace. It is the collaborative desk where you and the EI work side by side on long form documents or code. Gems, on the other hand, are your tactical specialists. You use a drum to produce the initial draft, like generating a specialized first version based on your uploaded data, and then you hand off that work into Canvas to refine and polish it. One is the specialist, you call for the initial output. The other is the desk where the project is completed. Of course, you can also use drums entirely on their own for certain tasks, and that brings us to our next follow along lecture. But before we start working with drums, let's briefly recap what we have learned here. All right. And that's it for this video. I'll meet you in the next one. 22. Follow-Along: Building a Grammar Check Gem: Everyone, and welcome to our first tutorial on Gemini Gems. Today, I'm going to show you how to build a custom expert to proofread your writing, whether you are drafting landing pages, product descriptions, quick emails, or any other texts. It's like having a second pair of eyes that gives you total confidence in every word you share. Let's open Gemini to create that Jam. We are going to start by clicking on Jams. In the sidebar, we go to Jam Manager here, the section where we create custom Gems. And here I'll click New Jam. Let's begin by providing the name for our Jam here is my gem description. Next, I included my instructions. This is by far the most important part of your gem. I included role description, saying that you're an expert at checking grammar, spelling and punctuation in English text and fixing them if you encounter any mistakes, then I provide target audience description if you follow along and building the same type of gem, you can change the target audience to something which is more relevant to your use case and domain. Next we have core rules followed by the information about what output we are looking for and we also have a starter prompt. You see that I'm using hash tags in the instruction text. These act as section dividers that create a clean skeleton for your instructions. They make Gems brain more organized so that the AI knows where one rule ends and the next one begins. Now let's get back to the set of rules and discuss them. How do I actually come up with this list? I highly recommend doing the task you want to automate three to five times manually before you even try to build the jam. If you go straight into the instructions, it can feel intimidating. Every rule in this list exists because it is a specific preference. I discovered over weeks of manually prompting the AI. You also may notice that I'm using words in cups log, like for example, here. There is no technical requirement to use them. Gemini is very sophisticated. It understands lower case, just as well as upper case. But I found that using them is still helpful. Think of those words as power words. We can use them to highlight the non negotiable rules, so the AI knows exactly what is a must versus a maybe. Alright, let's move next. I'm fine with these instructions for now, even though we can always get back to this list after we create this jam and further edit them. We can also choose a default tool. This tool will be selected when you start the new conversation with the Jam. I'll choose Canvas as the default tool. Instead of a messy chat conversation, your directed text will slide out in a clean side panel perfectly formatted and ready to copy them. You can also include files to the knowledge base if you want your jam to reference any external sources. When preparing the response, you see that we can upload files from different sources here. But for this specific example, I'll leave it empty. And we are all set. So let's save the am. I'm clicking on Save button. And we can start our new chat. Here is the text that I want Gemini to check. I made several grammar mistakes here on purpose. So let's see if it will be able to find them and correct this draft. It is opening a Canvas with our new text. Look great to me. And remember that you can use this Canvas interface to make some quick edits in that text in case if you feel like you want to introduce some changes here, for example, let's highlight reconcile and ask to find alternative And if we are fine with these edits, we can click on Share and Export, choose copy contents, or we can choose to export this text directly to our Google Docs. Let me return to our JAM you see we have it in the list of gems here on the left hand side bar. One thing I noticed, there is no conversation starter here. So when I opened this am interface, it's not quite clear to me what should I do here? I did some research, and I found this article with exact same question. It turned out that those conversation starters are not supported by gems at the moment. There is also a workaround we can try. The article says that you can simulate starter prompts like this by including additional description into your Jam. Right, let's try to include an example of a conversational starter to see if this will help. I'm returning to my Jam. If I click on the three dots, CN, I can choose Edit option, and we can make any changes here we want. Let me just include this example below the current version of the instructions. And what we can also do here, apart from including an example of our starter prompt, we can use this magic button so that Gemini will rewrite our instructions and improve them. Let's try this out. Maybe it would help. I see that Gemini has removed our example of the starter prompt. What I decided to do, I included the rule number six, asking Gemini to always start the conversation with the following starter prompt. Let's see if this will work. So I'm going to update my gem instructions, save them, and let's test. When I opened my updated Jam, I still don't have any conversation starter here. Unfortunately, all my other experiments with defined Jams instructions to add the conversation starter turned out unsuccessful. Given this, let's define the jam description to provide information on what a user has to do to begin the conversation. For this, let's return to the JAMS editing interface. I included submit your text to get started. Text at the end of the Jam description, I'm going to update it, and let's test it out again. Our instruction is here, and let's submit something else for a change. I have this fragment. Let's see how Gemini will handle it. Perfect. And if I'm fine with this jam and I want to share it with my friends or colleagues, I can click on Share button and choose Share. Jim and I will create a link. I can copy it and then send it out. I leave the link to that jam in the resources for this video in case you want to test it. And I'll meet you in the next tutorial where we are going to build the personal coach Jam 23. Follow-Along: Building a Fitness Coach Gem (part 1) : Now let's build a jam that works with video. Let's say I'm doing an online 28 day app workout challenge, and I want to know if I'm actually improving day by day. I'm going to record myself doing the daily exercises and ask my AI fitness coach for feedback as a word of caution, as we already discussed, while the AI is a good partner for tracking your movement and form, it is not a medical expert. Always consult with the healthcare professional before starting the new fitness program. This tool is for coaching and progress, not medical advice. Okay, let's open Gemini to begin the demo. Let's create a new Jem. I'm expanding this menu. Go to Jams. Here we see Jams made by labs. I'm scrolling down to Jam Manager. I already have grammar and spelling review Jam visible here in the list of my Gems. And for now let me create a new one. I'm clicking on New Gem. Let's provide the name, the description and instructions for our personalized AI coach Jam I included this description. This jam analyzes your workout videos to provide detailed performance feedback, and it creates custom vertical motivational phone backgrounds to keep you inspired. And here are my instructions. So as always, I started with describing the role. I want this jam to play. In our case, I wanted to be a professional fitness coach. Then I included a task for this jam. We are telling Gemini to analyze our workout videos, looking for engagement and safety cues like Cin or Domin and I also described that I want Gemini to create a vertical image with a motivational quote. I also included starter prompts, even though we've seen that starter prompts are not quite working right now. But still, let's check what will happen this time. And to make this drama truly personal, I'm going to upload an image to the knowledge base that represents the vibe of the motivational image that I want Gemini to create. I'm clicking on Plus button. I have my reference file on my local Drive, so I will choose Upload files. This is my folder, and that's the motivational quote that I selected. Of course, you can also include other files here, for example, in case if you have a research paper that you want this jam to analyze when providing the recommendations and not just use its general knowledge can always upload this file here. And in terms of the default tool, for this jam, I'm not going to choose anything here. This is because our fitness coach is doing two very different things. It gives us text feedback, and it creates a high resolution image. So by letting Gemini choose the best tool for each task, we ensure our phone backgrounds look sharp and our feedback is delivered without any technical issues. Everything is good here. We are ready to click on Safe. And by the way, notice that there is also this preview window which you can use to test your instructions before saving them. But in my case, I already did the first test before I started recording this tutorial, so I'm good to go. I'll just click on Safe and let's start our chat. Have uploaded my first video from the day one of my workout, and let's wait a bit for the Gemini to process it. Our video has been uploaded, and before we press Enter, let's talk about model selection here. So since this jam involves multimodal analysis, watching video, checking for safety queues and providing structured feedback, I'll choose a thinking model which prioritizes reasoning over pure speed. And we are all set here, and I'll just hit Enter. And here are the recommendations from Gemini. First of all, I really like that it tells us that this information is for informational purposes only. And for medical advice or diagnosis, we should consult the professional. That's totally true. Notice that it successfully identified that this is my day one workout session because of the relevant name of that file, there was a day one workout in the name. Here is the scorecard, what I nailed, and one thing to improve. I can agree with this. And next, there is a question. Would you like me to create your custom daily motivation phone background, based on your day one progress? Yes, definitely, yes. So let's just reply. Yes. And here we go. We have this perfect quote, but there is one issue with that image. If we compare it with my original reference image, we would find that they are not the same. Here is an image that I asked Gemini to create. You see that background is completely different. So let's get back to our jam and let's work with Gemini to see if we can change this and make sure that it creates images with similar background as in our reference file. 24. Follow-Along: Building a Fitness Coach Gem (part 2): Welcome back. In the first part of this tutorial, we set up the core logic for our fitness coach Jam. But we came up across a limitation. Even though we uploaded a reference image to the knowledge base that generated daily motivation backgrounds didn't look anything like our original image. Let's fix that by understanding how the system actually processes these different types of data. Have mentioned before that Gemini is multimodal. It can see, read, and hear all at once. That is all true. However, there is a technical difference in how a gem reads a file and how it creates an image. When we applaud a reference to the knowledge base, Gemini uses its vision capability to analyze the file and summarize it into text based data for its long term memory. But when the am generates a new image, it triggers a separate image generation model. According to Gemini's technical documentation, this generation model cannot directly see the raw pixels of your knowledge base files. It only receives a text based prompt. If your instructions simply say match the style in the knowledge base, the AI is working from a summary, not the original source, and the original style gets lost. To solve this, we move from referencing to specifying. Instead of showing the jam a file and hoping it interprets the style correctly, we are going to write a visual specification directly into the instructions. This ensures that every time the jam creates an image, it follows your exact rules without any guesswork. Here is how we do this. Go to your list of Jams, find the one that you'd like to edit and click on the edit icon. And from here, go to your instructions. In the motivation section, let's remove this vague instruction. Next, we will add a description for our image to create it, open a separate chat, applaud your reference image, and use this prompt. I suggest switching to thinking model here for by the results. Once you have the image description, paste it right into your Jams instructions. Here is the description that I have for my reference image. This defines the layout, the phones, and the atmosphere. So the model has a clear set of guard rails. Once we do this, we can click on Update to save the changes. Let me start a new chart to test the changes that we just made. A You see that our new image and the reference one are not the same but very similar in their layout, visual hierarchy, and overall aesthetic, a frosted glass textbook over a soft pastel cityscape at dusk. And that's it for this tutorial. Please write in the comments for this video what jam you are planning to work on. And I'll see in the following video. 25. Gemini for Visual Creation: Section Intro: Welcome to this new section of the course. You have already seen me creating a few images with Gemini earlier in the course, and now it is time to get into the details. We are going to take Gemini's image and video tools for a proper test drive. And I think this is one of the most visual parts of the whole course. We will start with image generation and not just the basics. I'm going to show you how to use techniques like contextual blending. Where you combine reference images to create something completely new and iterative refinement, where you direct gemini like a photographer adjusting one element at a time until you get exactly the shot you want. We'll also look at visual synthesis where you hand Gemini multiple ingredients and let it build a single seamless scene. From there, we will head into what I call the editing suite, where we will use Gemini to work with images you already have think restoring old photos, turning rough sketches into product shots and making precise edits using Geminis building markup tool. Well then look at building complete visual systems, infographics, flow diagrams, and assets adapted for different platforms and screen sizes. We will finish this section with the tutorial on video creation. And of course, I will also share my top prompting tips the practical recommendations I have developed from working with Gemini other AI image and video generation software that will help you get better results. All right. Let's get creative. 26. What Is Nano Banana? Key Features Explained: You might have noticed strange little banana moja appearing in your Gemini app. It's not just cute icon. It is a tiny clue to a funny naming story behind this model. Before this model was officially released, Google submitted it for anonymous testing on a platform called ALM Arena, a public site where people compare two AI models side by side and vote on which result they prefer without knowing which model is which it is how AI labs gather real world feedback before a full launch. The model needed a placeholder name, something that would not hint it was a Google product to submit it into the LM Arena site. At 2:30 in the morning, Google product manager named Nina typed Nano Banana. Thinking it was just a placeholder label that nobody outside the testing platform would ever see. But the model performed so well that people on X became obsessed with this mysterious, powerful Nano Banana, speculating about which lab had build it, whether it was a secret Google project, whether it was something entirely new. Instead of quietly correcting the record, Google leaned into it. They addit the banana image or the Gemini app and even made a limited edition banana themed merchandise. The reason the banana went viral was not just the name of horse. It was one specific capability that EI image tools had been getting wrong before, character consistency in the past, if you uploaded a photo of yourself and asked an AI to reimagine it, you would get something that looked vaguely like you. What people started calling your AI distant cousin, Nano Banana changed that you upload one photo of yourself, and it preserves your actual likeness across completely different scenarios, you can turn yourself into a graffiti mural. Custom to card or a ceramic k, and it's recognizably you in each one. You can transport yourself to different places, different outfits, different decades. The face stays yours. You can even add motion turning aesthetic portrait into a short video where the subject turns their head or shifts expression we will look at that in more detail when we get to view Gemini's video model. But character consistency is just one piece of it. Let me walk you through the other things that make this model worth understanding. Scene blending, lets you upload two separate photos and fuse them into a single coherent image. You can put yourself and historical figure at the same table or create a group photo of people who have never actually been in the same place. Gemini handles the lighting, angles and context. So the result feels like one image rather than something that looks stitched together. Multiturn editing turns your conversation into a living canvas. You don't have to get everything right in your first prompt. You can start with an empty room and talk it into existence, paint the walls, add a leather sofa, place a steaming cup of coffee on the table. Each prompt builds on the last. One important thing to remember, the chat keeps context across your edits. So if you want to start a completely separate project, open a fresh chat rather than continuing in the same thread. Design mixing is about taking the texture or visual language of one thing and mapping it into something else entirely the pattern of a butterfly wing becoming a high fashion gown. The texture of marble tile wrapping around a pair of sneakers, it is less about editing a photo and more about merging two worlds that don't normally belong together. Now, one important thing to understand about how all of this fits together, Gemini itself is a reasoning and language model at its core. The image and video capabilities come from dedicated specialist models that Gemini calls behind the scene for images. That's Nano Banana. Officially named Gemini 2.5 flash Image, though nobody calls it that. For video, it is a model called VO. Think of them as Gemini's creative team available on request. When you ask Gemini to generate or edit an image, it hands the task to Nano Banana. When you ask for a video, it calls VO. The conversation stays in Gemini. The specialist work happens underneath in the next lecture, we are going to open Gemini and try creating our first images. I'll meet you there. 27. Creating Your First Image with Gemini: So now that you saw the preview of Gemini's visual capabilities, let's get our hands dirty and create our first image. Image creation is available on all plans. Let's open Gemini and get to work. To create an image, you have two options. Option one, create an image in your existing chat where you ask questions or work on creating a new piece of content, like in our last lecture when we worked on our product brief for an AI mattress company. Option two is to start from scratch. That is what I'm going to do this time. I'm going to start with the simple prompt. A fluffy orange cat sleeping on a sofa. To tell Gemini that we are going to create an image, let's choose image in the list of tools. This way, Gemini knows that we are expecting an image as the output, so we don't need to type these verbal instructions in the prompt. The next step before generating an image is to choose an image generation model, either fast thinking or pro. I'll choose fast this time. An alternative way to create an image would be to type in create an image of directly in your prompt. And in this case, we don't need to select Create image from the list of tools. This is my preferred way of working with Gemini. But for this demo, let's go ahead with Create Image selected. Our image is ready, pretty good given how short our prompt is and that it's just our first iteration. You can share, copy or download that image, or you can continue adjusting the image just by chatting with Gemini and adding more details to your original prompt. You see that Gemini modifies the image prompt by prompt adding more details while keeping all the previous context in place. But in case you want to start over with one of your previous iterations, click on more and choose branch in New chat. Then you can give the prompt to Gemini, and in that case, Gemini will change that selected image Of course, you can give Gemini the entire prompt straightaway, or instead of describing details yourself, pick a style for your image. For instance, instead of describing what light we want to have in our image, let's choose cinematic from the list here. You saw me selecting between fast mode and thinking mode. In the Gemini app, these modes represent how much processing power and reasoning the AI uses to build your image, while the specific model names under the hood, like nana Banana evolve rapidly the way these two modes function. Remains constant. I always recommend checking the official Gemini support pages for the latest version names. But here is the best way to think about your workflow. Think of fast mode as an interactive layer. It is built for speed and quick iteration. If you are changing shirt color, trying a new hairstyle, swapping a background or generating lots of variations, keep it on fast Thinking mode, the reasoning layer, this takes longer because it's more careful before it generates. Use it when you need precision, like clean, readable text on assign consistent product shots or complex scene where details really matter. You can ask me, but Anna, why I wouldn't just use thinking all the time if it's more powerful. It is a fair question, but there are two practical trade offs. First is time. First mode is speed of thought tool. Thinking mode requires waiting period while the EI thinks through the prompt. Second, is usage limits because thinking mode is more computationally expensive. It usually has tighter daily limits than fast mode. My recommended process use fast mode to explore and generate rough options quickly. And once you have found your hero concept, switch to thinking mode for the final high fidelity polish. Start with thinking mode immediately, only for high complexity tasks like visualizing process flows or creating images with specific localized texts. All right. Now you have an initial idea of how to prompt Gemini to create visuals. In the next video, we are going a bit deeper and learn how to create a good prompt LCR in the next video. 28. 7 Prompting Tips for Creating Better Visuals: Hello, everybody, and welcome back to the lecture. As this section of the course is all about generating visuals, we cannot overlook such an important topic as how to create those instructions. In the upcoming video, I will share my top seven recommendations on how to craft effective prompts. Let's get started. Sometimes you will see solid outputs with simple open ended prompts, especially if you are open to surprises. However, when you have a specific vision in mind, describing various details can help lead you to perfection. But regardless of the direction you want to take I recommend starting with a simple prompt and then adding extra details one by one to see how they affect the image. Begin with the description of your subject matter, person, animal, landscape, fictional character, and so on. Generate your first image and then include extra details or context like its location, information about the environment and lighting, as well as emotions or moods you'd like to introduce. To clarify the idea of what you want to create, it's helpful to ask yourself a series of questions. Here is a checklist you might use. Decide if you want a photo or an illustration. What is your subject matter, person, animal, landscape, fictional character, and so on. Think of specific effects and details you want to include art movements, themes, techniques, effects, materials, concepts, color, and tone, lighting, and composition. Go beyond the basics and include additional descriptions in your prompt that can take the creative process in a completely different direction or add extra flavor and nuance to your images. Here are just some examples of what you can add. Type of photography, environments, emotions and moods, specific art styles, cinematic or painterly effects. Experimenting with these kinds of descriptors is one of the most enjoyable parts of working with Gemini image generation. Small additions can dramatically change the feel of an image. Pay attention to the order of the words in your prompt. The words at the beginning carry more weight than the words at the end. So if your snowy landscape matters more than the cabin in the foreground, lead with the landscape. Try reordering the same set of words, and you will often get noticeably different results. Be mindful of third party rights. Gemini does allow you to reference historical artists and art movements by name. So asking for a man like quality or a style of Vang works perfectly. However, the EI will block prompts that ask for the styles of living or contemporary artist to protect creators. It also restricts copyrighted characters and brand logos. If you want the look of modern artist or a specific brand, describe the visual qualities you are after instead of naming them directly. Look for inspiration and examples when crafting your own prompts. If you are new to AI image generation and don't have design background, it can be challenging to write detailed descriptive prompts at first, and that's completely normal. A great way to get started is to browse I generated image communities online, find images you like, look at the prompts behind them and start experimenting by making small modifications. It is also a good idea to create a mood board of images you like and might want to reference later. Save the image, the prompt used, and any style notes alongside it. This becomes a really useful creative reference over time. Last but not least enjoy the process. At first, it might feel like the EI is doing all the creative work. But without your unique ideas, your instincts about what looks good and your curiosity to experiment, the EI would not produce anything interesting. So be yourself, throw your ideas out there, and have fun with it. To recap. Here are the seven tips. Start simple, then add details one by one. Ask yourself a series of questions to clarify your vision. Go beyond the basics at descriptors for environment, mood, style, and more. Word order matters. What comes first carries more weight. Be mindful of third party rights. Artists styles are fair game, but avoid copyrighted characters and brand imagery. Look for inspiration online and build the mood boards as creative reference. Have fun with it. As always, Alca in the next video. 29. Contextual Blending, Iterative Renement, and Visual Synthesis: Welcome back. So far, we met the banana Banana and learned how to create an image from scratch. But in most cases, you aren't just looking for cool images. You're looking for assets. You need that perfect hero image for a website or a social media ad that actually stops the scroll. In this video, we are going to explore how to create these assets. Of course, you can start from complete scratch and prompt Gemini what image you want. But think about it. Describing a specific lighting angle, a unique texture or complex physical structure with just text is hard. You can spend 30 minutes writing the perfect prompt and still not get what's in your head. But if you show Gemini reference image, you provide an instant map of your expectations. Today, we're going to look at how to use images to talk to the AI. Let's start with the classic marketing challenge. You have a product, in this case, skincare bottle, and you want it to look vibrant, fresh, and premium. For this, we're going to use contextual blending. Watch what happens when I upload a simple photo of the bottle alongside the reference image and then guide Gemini to place it into a completely new creative scene. In our first prompt, we aren't just asking for a random picture. We are telling Gemini exactly what we want by referencing the original image and asking to replace parts of it, swapping the water for juice and the original bottle for our skincare brand. Let's begin with fast mode. I hit Submit, and here is our image. The text is crisp and the bottle is perfectly under the waterline. Now let's make some changes. First of all, I will add this phrase into the prompt. Phrases like Ecommerce product shot, bright studio lighting or pure white background are the pro secrets that make an image look like a real commercial rather than an AI experiment. Let's also switch to thinking mode here. I used the same prompt, but the bottle is suddenly on top of the liquid. Why? Because the model is actually reasoning through physics, it knows that orange juice, unlike water is non transparent. It thinks if I submerge this bottle in juice, the bottom half of the label will disappear. Let's try to force it by adding half submerged instructions to the prompt. Similar results. Thinking mode is prioritizing product photography logic over my specific layout instruction. It assumes a good photo must show the whole brand, so it fixes my composition by lifting the product out of the juice. Now, let's look at iterative refinement. This is where Gemini really shines. You don't have to get the perfect shot in one go. Instead, you direct it like a photographer adjusting one element at a time until you land exactly where you want. For this Gemini brew coffee bag, we are going to build up a rich textual product shot step by step, starting with placement, then refining the composition, adding spill and depth, and finally, dialing in the lighting. Watch how each prompt nudges the image closer to that premium roster aesthetic. And finally, let's look at the technique I think is the most impressive of all visual synthesis. Sometimes you have an entire campaign kit, multiple products, a model, an outfit. In the past, pulling this together required a massive creative brief and a lot of back and forth. With Gemini's thinking mode, we just handed the pieces and let it figure out the rest. Creating from scratch is about direction, not just description. You have seen how to blend context, refine a shot step by step and synthesize multiple elements into a single complete image. But what happens when an image is almost perfect and just needs one specific change. In our next video, we're heading into the editing suite where we'll use Gemini to fix restore and precisely edit images. You already have Alca there. 30. The Editing Suite: Turning Sketches into Prototypes and Photo Restoration: Everyone, and welcome back to the series of lectures on creating images with Gemini. In this video, we are heading into the Gemini editing capabilities. I'm going to show you how to use Gemini thinking layer to fix, restore and literally read and then adjust the images you already have. This is where we move from being creators to being sophisticated editors. Let me open Gemini to begin the demo. It usually starts on a napkin or a whiteboard. You have a vision for a product, but you aren't a designer. Here is what we are going to do. I'm uploading this sketch of a new chair design to Gemini. I don't need to be an artist. I could just tell Gemini, interpret this sketch into a photo realistic product shot because we are in thinking mode. Gemini uses the lines as a structural guide. It understands the perspective I intended and fills in the details, I could not draw myself. This turns your rough drafts into prototypes in seconds. Let's change the chair fabric. But instead of explaining the color and texture I want, I'll use reference images. Surprisingly, I got this book image since I used the word cover in my prompt. Let's start a new chat to make the image right. And, of course, we can give this share 360 degrees spin. Here I have the hair image and my video pmt. And I also selected video from the drop down menu to make sure Gemini understood my task correctly. Now let's look at one of the most powerful repairs you can do for the restoration. We all have those old faded family photos or low quality digital shots from years ago. Instead of just coloring it in, I'll ask Gemini to restore it. Using its thinking layer, Gemini analyzes the textures and historical context. It removes the scratches, sharpens the faces, and applies natural realistic colors as if the photo were taken today. It's not just the filter. It is the EI reconstructing the quality that was lost over time. Let's take a look. Mm. Oh, what feeling dancing on the pedal lost in the rhythm of the sunny 31. The Editing Suite: Targeted Edits with the Markup Tool and External Annotations: Let's move on. What if the image is great, but you want to change one specific thing. Let's explore how to work with Gemini's dedicated image markup tool, and also its alternative. I would like to edit this image. I'll upload it to Gemini and to open the markup tool. I simply click on the image. And here we have our editing workspace. What I will be doing here is called special prompting. I'm showing Gemini exactly where I want the change and describing what the change should be. First, I'll choose a color. Let's go with red. And I circle this fireplace. Next, I need to explain the intent, so I'll switch to the text tool and type Ed fire. Notice I used a verb here. You can be specific with actions like add or replace, or you can just describe the object. For example, let's add two cups of coffee on this side table here. If you made a mistake, you can always hit the undo button to go back. I'm clicking on Done as I just finalized the annotations and let's hit Enter without providing any instructions because we just made them on that image. And here is the new image. We see that Gemini has successfully included the changes. We see the fire in the fireplace and we see here two cups of coffee. Great job. When I'm opening this new image, you will notice that clicking on it doesn't open the markup tool again. So that tool is specifically for your initial uploads. However, you aren't stuck, you can continue to refine the result using conversational edits. So here is my new prompt. Gemini is contextually aware of the image. It just created and will continue making the changes that you requested. And coming back to my original annotations, Notice that I like to match the text color to the circle color while the AI primarily tracks coordinates. This is a great best practice for keeping your instructions organized. You can also bring in annotations from external tools like Canva. For example, here, I have marked up this photo of the Bursch Khalifa building. I want Gemini to make those exact changes. I want this building to be removed, and I want to change colors for some parts of the building. I've opened a new chat, and I submitted this image to the chat. For complex tasks like this, I recommend switching to thinking mode. This triggers more powerful reasoning model, that is much better at following these precise instructions. I will also include these instructions, including this prompt here is important. For example, here is the image that I got when adjusting that same image without providing any instructions to Gemini. We see that Gemini has successfully made the change. However, we still see the annotations, and that was my original image without any instructions provided. Let's return to our chat and hit Enter. Unfortunately, this time, we still have the instructions on the new image, and we also see that Gemini has successfully made other changes. We don't see the building here on the right side, and the new colors has been successfully applied. Let's ask Gemini to remove annotation instructions from the image. And here we go. The second attempt has been successful. As you can see, Gemini recognize the text, remove the building, and change the colors perfectly. And then we provided the second instruction to remove the annotations. All good here. Finally, let's look at how Gemini reasons about the world inside your photos. For example, if you upload a photo of a city skyline, you can ask Gemini to annotate it, watch as it identifies the landmarks and adds labels exactly where they belong. This is not just drawing, it's information design. It's taking a raw pota and turning it into a smart educational asset for a presentation or a manual. And that's really the theme of everything we cover it in this video, whether you are bringing a rough sketch to life, restoring an old fora, annotating an image or smart labeling a complex scene, Gemini handles the precision work, so you don't have to. In our next video, we are going to bring all these skills together to build complex visual systems, including infographics and data visualizations that transform complex data into something instantly clear. I'll see you there. 32. Complex Visuals: Menus, Diagrams, and Infographics: Welcome back. So far, we have covered a lot of things creating from scratch, editing with precision, and synthesizing complex scenes. Now, let's look at what Gemini can do when the task gets even more ambitious, building multi piece visual assets like infographics diagrams and assets that work across different social platforms and screens. Let's get started. I want Gemini to create a one page. Infographics menu using these coffee images. I wanted to identify each drink and place it in a clean section with its name and a short description. Let's also choose create images from the selection of tools. As from the Pam description here, it's not quite clear if I want an image or a text as the final output. Let's start. And here is our picture. Because Gemini has that deep resonin layer, it sees the difference between the images we submitted and can identify a coffee cup with the ice cubes inside versus the one with the warm milk form. Let me ask Gemini to change this layout for a bit and also change colors to fit our brand colors. Oh, this is a great design. I like it better than our first iteration. And let's do one more change. I want to change this coffee menu text to our brand name. And here is our image. I like it very much. The only thing that I want to change, I would like to remove those coffee beans so that the text is fully visible. But instead of doing this as a series of iterative prompting, let's actually try to use another technique here. I'm going to use the markup tool that we covered in the previous demo. Let me download this full size image. I created a new chat, uploaded our image that we just generated. Next, I opened the markup tool and let me highlight the coffee beans. I added the instruction to Gemini to remove the coffee beans. It's going to be a bit tricky because we see the beans together with the text. But let's try to make it work. I'm choosing the thinking model here and also select and create images. So my first attempt was unsuccessful. You see that the OF images are still here inside the image. Let's try to describe the change that I would like to make. And here is our image. It's really incredible that Gemini did so well following my prompt instructions and removing those coffee beans from the top right corner of the menu. And now we can see our text clearly. Awesome. And let's move to the second demo here. Sometimes you need to explain the how like the journey from bean to cup in my Gemini coffee brand example. So here is our brands signature brewing process. I'm going to ask Gemini the following. I want Gemini to finalize this five step Gemini Brew signature process into a clean architecture flow diagram. I wanted to use minimalist layout and match the colors to those that we use in our PDF file. Let me choose thinking mode. And for this example, I'm also going to choose Create images. And here is our diagram. Gemini built the structure, created the icons, and also labeled every step. What I don't like here is those throws that are definitely unnecessary. And this text that we can see on every box. Let's ask Gemini to remove this. And here is the cleaner image. And I also would like to remove this frame. Let's ask Gemini to do this. And this is a much better picture. And I want to do one more iteration to make this image more beautiful. Look at this. This is a completely different aesthetic. Let me know in the Q&A for this video, which one you prefer. And we're moving next with our demo. 33. Complex Visuals: Adapting Assets Across Formats and Platforms: Of course, you can further edit this image if you like, either by continuing asking Gemini for improvements directly here in the chat, or you can copy this image and go ahead with markup tool directions. But let me show you another example while we are here on this image. Let's say that we are planning an international expansion of the Gemini Brew brand. So we need this diagram to be translated into other languages. So I'm going to ask change the image so that the texts are shown in Chinese language. And this is our translated diagram. Notice that in my prompt, I explicitly say that I want Gemini to change the image, not just show the texts in Chinese language so that it is crystal clear to Gemini that I need another variation of that image translated into Chinese language. All right. And let's take one final example. Let's say that we need assets for the Gemini Brew marketing campaign that will work everywhere from Instagram stories and posts to a hero image on our website. We are going to take this shot we built earlier with Gemini, and I'm going to tell Gemini that this is our master asset. And now I need a version for a vertical social media story, a square post, and a white header for the Gemini Brew website. Have also attached the image that I want Gemini to modify. And here is the message that I got from Gemini when I tested this prompt before recording the tutorial. This is because Gemini can create one image in time. While Gemini can process many reference images at once, its goal is always to synthesize them into one final high fidelity composition. If you ask it for several separate image files in one go, like in my example here, it won't be able to proceed with your request. So always frame your request as a single project like an infographics, a menu, or a campaign shot where all your elements live together in one image. So let me change the prompt. I first would like to create a white header image for the Gemini Brew website. As always, I'm selecting thinking mode, and let's also choose Create images to give Gemini clear instructions that I'm expecting to see image in that case. And here is our new white hero image for our website. We see that Gemini doesn't just stretch our original image, it outpaints it, so it adds more details into it like those old coffee machines, as well as these coffee beans on the left and right side of the original image while making sure that our product is always perfectly positioned in the center of the composition, regardless of the screen size. Let's also create one vertical size image and square size image for our Instagram posts. 34. Beyond Chatting - Deep Research and Building with Gemini: Section Intro: What happens when you give Gemini research task that would normally take you half a day. That's what this section is about, and then we take those findings somewhere you might not expect. We are going to do this using a Gemini feature called deep research, and we will work through three very different real life situations with it. One that most of us deal with every single week, one about making a purchasing decision without falling down the rabbit hole of review sites and raided threads, and one about getting up to speed on a completely new subject. In each case, I want you to see not just what Gemini produces, but how to prompt it, so the output is actually useful to you. And then we are going to take it one step further using Canvas to turn one of those research outputs into a working interactive app built from a conversation. No code require it. I hope you're ready. So get yourself a cup of tea or coffee, and let's get into it. 35. Deep Research: Beyond Blueprint Answers: Raise your hand if this has ever happened to you, you ask a chatbot a hard important question, something like, I want to raise the Series A funding. What are the most active investors in my space right now? And it responds with a list of options, which is quite shallow, and you also get a bunch of high level recommendations. Like you should research active investors in your category. You should build a target list. You should reach out to your network for warm introductions and so on. Google's Product Team has a name for this. They call it a blueprint answer, a high level map that tells you what to go find while leaving every bit of the actual work to you. You are still one drowning in 50 open browser tabs, trying to separate the useful signal from the noise. Gemini deep research is what can help you to move past the blueprint and get something very comprehensive you can act on right away. Deep research is not just a smarter chatbot. It is an agentic system, meaning it autonomously plans, searches reasons and synthesizes information across hundreds of sources on your behalf. Think of it as having a PhD level research assistant in your team who does hours of complex investigation in minutes and comes back with a polished report, not a to do list. So what does a PhD level research assistant actually do for you in practice? Let me give you the three most powerful use cases. First, topic understanding, going deep on complex subjects. Imagine you are an HR manager trying to understand how AI will impact the workforce over the next three years. You don't just want a surface level summary. You need to understand the landscape. How does AI automation compares to AI augmentation. Which roles are most at risk and which ones are evolving what are other companies already doing? And what does the research say versus what just hype deep research dives into academic papers, industry reports, expert commentary and real world case studies simultaneously. It comes back with a structured analysis that maps out the landscape, contrasting competing ideas, surfacing the relationships between concepts, and explaining the why behind all of it. Second, professional due diligence. Think about preparing for an enterprise sales meeting. Before you walk into the door, you need to understand the prospects core business challenges, the recent strategic moves, the competitive pressure they are facing, and how your product fits into all of it. Deep research investigates the company's products, finding history, leadership team and competitive environment. And this is very important, merges it all with your own internal notes on the client relationship what would have taken a junior analyst a full day to compile is now ready in minutes. So you walk into that meeting room knowing more about their business than they might expect. Third, high stakes, personal decisions. Not everything is about work, buying a car, choosing a neighborhood, comparing insurance options. These decisions matter just as much, and the research Rabbit Hole is just as deep instead of a weekend lost going through conflicting blog posts and raided threads, you get report structured around your specific situation, the pros, the cons, and the nuance that generic advice never gives you. And here is what makes all three of these use cases possible in practice. Deep research does not just hand you a list of links. It produces a comprehensive multi page report, structured analysis, cited sources, and even things like infographics that bring the data to life. In the next lecture, we are going to get our hands on it. I'll show you how to launch deep research, how to create the research plan before it starts, and we will walk through a real example together so you can see the full process from prompt to final report. I'll meet you there. 36. Deep Research in Action — Topic Understanding: As promised, let's see deep research in action. We are going to start with the topic understanding use case, and I picked an example that I think most of us can relate to personally. We are going to use deep research to cut through one of the most confusing topics in everyday life. Breakfast, nutrition, you know the feeling. You Google RX healthy and get ten completely different answers depending on which article you land on to follow along with this demo, you will need a paid Gemini plan. If you are currently on a free plan and want to upgrade, check out the lecture in the introductory course section where I walk through how to do that. Okay, let's go. To launch deep research, open a new chat, and choose deep research from the list of tools. By default, Gemini uses Google search as its primary source. But you can expand that. For example, you can choose your Gmail or Google Drive as a source or upload your own files. This is what makes deep research so powerful. It's not just searching the web. It can merge public information with your own private documents. For this demo, we will keep it simple and use web search only here is the prompt I'm going to use. Notice how specific this prompt is. We are not just asking, What should I eat for breakfast? We are giving deep research, a clear research agenda with three distinct tasks. The more direction you give it up front, the more useful the output. As for the model selection here, the specialist analogy we introduced earlier in the course stays exactly the same when you activate deep research. The mode you select dictates how that specialist behaves during the research process. Fast remains your sprinter, performing a broad rapid scan of the most relevant sources to give you a quick summary without digging into every detail. Thinking is still your strategist posing to cross reference multiple sources and resolving contradictions to find a more logical angle. Pro remains your expert deep diving into everything from dense reports and technical PDFs to long email threads to give you a truly comprehensive synthesis. I'll choose thinking here. Now let's hit submit and see what happens first. This is the goal decomposition step, and it's one of my favorite parts of the process. Instead of diving straight into research, deep research pauses and builds a personalized multi step research plan based on your prompt. You can see it mapping out exactly what it intends to investigate. If you need to, you can edit this plan before it starts. If you want to direct it toward a specific angle, add a subtopic, or remove something that isn't relevant to you, do it now before a single search is run. For this demo, I'm happy with the plan as it is. So let's approve it and let it run. And now the search begins. Gemini is working through sources in real time, academic papers, nutrition, guidelines, health publications. It is deciding which threats to investigate in parallel and which ones need to happen in sequence. You can even click on any of the websites here if you are curious on what sources Gemini is going through. As Gemini deep research reads each source, it does not just collect information and move on. It thinks about what to search for next. It is running a continuous self critique process, spotting contradictions between sources, flagging vague or unsupported claims and recognizing when a piece of data simply does not add up you can see it adjusting its research directions in real time, as new information comes in, and when it hits a dead end, say a study is behind a paywall or a website is down, it does not stop. It reroutes and finds another path to the same answer. There is one more thing that makes this possible at scale. Deep research works inside a context window, the IIs, working memory. In practical terms, it means Gemini holds in memory every single source it has read for the entire session. Nothing gets lost or forgotten as the research grows. And this is also why follow up questions later are so sharp. I never loses the threat of what it already investigated. And you might already guess you don't need to sit there watching all of this happen. Deep research is asynchronous. You can close the tab and get back to your work, and Gemini will let you know when your report is ready. If you are on the web app, you will see a notification appear next to the chat thread in your sidebar. And if you have the Gemini mobile app installed, you get a push notification straight to your phone. And I just got mine. Our report is ready. So let's get back to Gemini to take a look. This is what deep research delivers and notice what it is not. It is not a list of links. It is not a bullet point summary. It is a structured multipage analysis with cited sources, organized sections, and actual conclusions you can act on the tiered ingredient table we asked for is right here, tier one, tier two and tier three, clear, actionable and based on current research. And if you are curious about any of the sources, every claim has relevant links. You can click through and read the original research yourself. I don't know about you, but it would have taken me hours to read through all of these resources and compile the report manually. And it is important that deep research is not replacing your judgment. It is doing that tedious groundwork so that your judgment is actually more informed. In our next lecture, we will take deep research into a personal context and walk through a few more examples. I'll see you there. 37. Deep Research in Action — Purchasing Decisions: In this lecture, we are going to look at two more use cases for deep research that I think you will find immediately useful in your own life. The first one is about making a confident purchasing decision, and I'm going to use a very real life example. The second is about learning a completely new subject. I will show you something I haven't shown before. How to turn a deep research report into an infographics, a quiz and flash cards all without leaving the Gemini deep research interface. Let's start. My Aura slip tracking ring recently broke. I would like to replace it, but I'm not sure if I should just purchase the latest ring of the same brand or use this as a chance to switch to something better. And there is one specific feature I've been wanting for years. Vibrating sleep cycle aware silent alarm that actually wakes you up at the right moment in your sleep cycle, not just at a fixed time. Let's use deep research as our personal shopping assistant to cut through online reviews and articles. Here is my prompt. Notice a few things about this prompt. It is personal. I've given deep research real context about my situation and what I'm looking for. I included the vibrating alarm, not just because I want it, but to see if Gemini can filter out the obvious choices. Most popular rings actually don't have vibration models. So a basic search might just give me a top ten rings list that ignores this requirement. Deep research should catch that. The prompt has a clear research agenda with three tasks, and it asks for a specific output format at the end, a feature table, which means the report will be immediately usable, not just the wall of text. Let's choose deep research from a list of tools. I'm going to rely on search here as the main source, and I'm choosing thinking mode. And let's start. Gemini has prepared this research plan for me, and I would like to make a change here for this I click Edit plan. Next, I will type in the change that I want Gemini to make in the current plan. I want Gemini to also include a specific brand into its research. We see that the list of brands has been updated. I'm now fine with this plan, so I'm going to approve it and start the research. And in a few minutes, our report is ready. Let's walk through it together. You can see that deep research has identified the top three candidates, analyzed them across exactly the criteria I asked for, including vibrating Smart alarm system and produced the feature comparison table right here. This is the kind of output that would normally require at least an hour of tap switching, ready threads, and conflicting review sites analysis. I have it in minutes structured around my specific situation and requirements. And here is the list of strategic recommendations from Gemini. A notice because I gave it personal context upfront. The recommendations aren't generic. They filter it through my actual priorities. Value for money, no heavy subscription and slip alarm, that actually works. This is a great example of using deep research for making purchasing decisions. Instead of drowning in options, you walk away with a clear, reasoned short list. In the second part of this tutorial, we will continue exploring deep research for another use case, ACA there. 38. Deep Research in Action — Learning a New Topic: Now let's look at something a little different. Using deep research to speed up your learning when you're getting into a new subject. I have recently started studying real estate investment. I attended my first class and took some notes on the topics that we've covered there. Now I want to learn more about those topics using deep research. I can upload this photo directly into the prompt. Gemini I will read my handwritten notes, extract the key topics, and use them as the foundation for a research report. I don't need to re type anything. Let me show you how this works. First of all, let's choose deep research from the list of tools. I'm going to switch to thinking mode here, type in my prompt, and then I will attach my handwritten notes. What I love about this approach is that the research is anchored to what I have already started learning. So the report reinforces and expands on my existing knowledge rather than starting from scratch. For this, I specifically asked Gemini to refer to the key themes in my notes, when researching and drafting the report. And here is our research plan all look great to me, so I'll hit start the research. And our report is ready. You can see it picked up all the key topics from my notes and build a structured analysis around them. Definitions, context, relationships between concepts, practical implications. We can use this information as a study companion, not just a summary. But here is where it gets really interesting. Once the deep research report is ready, we can transform this wall of text into active learning tools. You will notice create a button in the top right corner of the Canvas panel. Click it and you get a drop down menu with several options for transforming the report. First, let's look at the infographics. Gemini takes the complex information like the difference between residential and industrial assets in our real estate example and turns them into a visual summary. This is perfect for a quick, high level review or for sharing one pager with a stakeholder. Let's return to our real estate investment trends report to continue the demo. Next, to ensure the information actually sticks, we can generate a quiz. Gemini creates interactive questions based specifically on the report. As you answer, it provides immediate feedback, helping you identify exactly where your understanding of a new topic might need more work. I And finally, we have flashcards. You have two ways to use this. You can generate a full deck of flashcards to review every keyterm from the report. But if you have just finished the quiz, like in our example here, Gemini can generate cards based specifically on your quiz results. It targets the areas where you struggled. Let's do this. So we see a complete learning loop here, research, understand, test yourself, and reinforce your knowledge all inside one tool, in the next video, we are going to move on from deep research and revisit it to you already know, but we'll explore its advanced use cases, specifically building AI applications. And as a heads up, we are going to use the key takeaways from one of our deep research reports as the input data, our app will be built around. And more on that in the next video. 39. Beyond Documents: What Else Can Canvas Do?: Welcome back. So in our last Canvas lecture, we focused on document drafting. How Canvas gives you a life work space to refine writing with gemini right beside you. But document drafting is really just the beginning of what Canvas can do. And you have already seen some of it without realizing it. Remember that create button that appeared after your deep research report was radium, the infographics, the quiz, the flash cards, that was Canvas. Deep research delivers its report directly inside Canvas, which is why you could transform it into all those formats without ever switching tools. Deep research and Canvas are connected by design Google built them to flow into each other seamlessly. So let's look at the full picture of what Canvas can do. The first thing Canvas can build beyond documents is web pages. And I don't mean plain HTML with some text on it. I mean structured interactive pages with information cards, charts, visual layouts, and clickable elements. Think about the last time you had to share a report or a brief with someone who wasn't going to read a wall of text. With Canvas, you can take that same content and say, turn this into a webpage or simply click on the web page button. And within seconds, you have something that actually looks like a real page. You can share it with the link. No publishing or hosting setup required. Next is infographics. If you have ever tried to explain something complex to a non technical stakeholder, a process, a comparison, decision framework, you know the challenge. Words can only do so much. Canvas can take your raw content and restructure it into a visual format. Clean sections, digestible chunks, icons, comparison side by side. And you can keep refining it in the same chart. Make the second section bigger, change the tone to be less formal, and it updates it in real time. Third, Canvas can also generate interactive quizzes and flash cards from any content you throw at it. This is useful beyond just studying. Think client on boarding, team training, product knowledge checks. You describe what you want and Canvas, build a working interactive quiz. No third party tool, no form builder, no extra steps. There is also an audio mode. Canvas can transform written content into a podcast style audio overview, conversation between two AI hosts that discuss and summarize your material. It is useful if you want to go through a long document while working or share findings with people who would rather listen than read welcome back to the Deep Dive. Today, we're unpacking a vision that feels like it's really shifting under our feet. We are moving past that old idea of a smart assistant that just sets timers or plays music. We're looking at this concept of a universal assistant. A partner that actually anticipates what you need before you even ask. And then there is the big one Canvas can build fully functional apps, working software. You describe what you want, recipe organizer, trip planner, or quiz tool, or budget tracker, and Canvas generates the code and runs it for you. Right there in the window, you don't see the code. You don't need to understand the code. You just see a working interactive app, and it is not static. You can keep chatting with Gemini to adjust it. This is what's been called vibe coding. Building software by describing what you want rather than writing code line by line. We touched on this concept in the Geni Implementation impact lecture of the course. And now we are about to see it life. Here is what I love most about Canvas in this context. It is not a separate developer tool. It's the same workspace you have already been using to write documents and outlines. The move from draft me a document to build me an app is just one conversation. In our next lecture, we are going to do exactly that. We are going to pick up right where we left off. We used deep research to finally get a clear evidence based answer on breakfast nutrition. And we are going to turn that research into a family breakfast recipe app that suggests healthy quick meals for both adults and kids. Let's go build this up. 40. Follow-Along: Building an App with Canvas - From Research to a Running App: Welcome back. Here we are building breakfast chef up, quick meals under 20 minutes, family friendly with photos of the finished meal. All inside Gemini Canvas, no code, no technical background needed. Just a good prompt and a bit of back and forth with Gemini. Let's go. To keep our workflow organized, we are going to follow four simple steps, ID Eight, build, refine, and finally share. And here is step one, IDed. This is our deep research report on breakfast nutrition. Let's brainstorm with Gemini on the idea behind the amp and what it will do. I have some initial thoughts, but I want to expand on them. I started by describing the purpose of the app. I also said that I want the app to use the research findings, and I referenced the comprehensive tireedGrocery framework from the report to emphasize that I don't need a random list of ingredients for the recipes. I want Gemini to come up with three cool features for the app, and I also suggest an Aviall look and feel for the app. I put some descriptive words here like fun, warm, approachable to give the aval direction for what I want to see. I'm looking for detailed description of the app. The concept, we can start building the actual app on. Let's hit Enter. And here we have our app description. Let's ask Gemini to make some changes into this concept. The first feature, the front loader family timer, seems to be quite complex, especially for the first version of the app. So let's ask Gemini to replace it with something more straightforward. Simple question on what kind of meal is preferred today. And I also add additional details to make sure that every time we ask for a recipe, we get a new one and that the app takes strictly the ingredients recommended in our report. So I'll hit Enter again and let's see how Gemini will incorporate those changes. And here we have the updated version of the app description. I'm good to go with this concept, but before we move to step two, build the app, we need to check our settings Look at the model selector here. You might be tempted by P. It says advanced math and code. So it sounds like the most powerful choice. But here's what I found when I tested both while building this breakfast app before. Pro actually made the process harder. It took more back and forth to get the results I wanted, and I burned through my P credits quickly, leaving me waiting a few hours before I could continue. Thinking mode got me there faster. So here is my recommendation. Always start with thinking. It is designed for step by step reasoning, which is exactly what app building requires. Working through logic, structure, and flow, save pro for when your app needs to work with a large volume of content from multiple sources, documents, videos, images, and more. Let's begin the step two, building the app. My previous tests show that if you send this request in this chat directly, Gemini won't start the building process, but send you the app concept description one more time. Yes, that's what happened this time as well. You see that instead of creating the app, Gemini just made some changes into the report itself, and that's not what we need. So to initiate the app software creation process, not just textual description, click on Create and describe your own app section, write this. Build an app based on the description above. You see that Gemini shows this command under our app description here, and it starts building it. And while Gemini is building the app, let me answer a question you are probably thinking now. What if I'm not starting from a deep research report? What if I just want to build an app from scratch? In that case, start by opening a new chat. But before you type anything, switch to Canvas mode first. Here is why. Gemini can only build and run apps inside Canvas. It is a dedicated workspace designed specifically for that. A regular chat can help you think through ideas, but it cannot actually construct a working app. Once you are in Canvas, brainstorm your app idea with Gemini. Describe what you want to build, what it is for, and what it should do. When you are ready to start building, hit the Create button, type in your prompt, and Gemini will get to work. Okay, back to our demo. And our app is ready. We begin by selecting how we are feeling today and what kind of meal we would prefer. And Gemini would suggest a healthy meal. Accordingly, we see here a list of ingredients, followed by instructions on how to prepare the meal. We have the great foa illustrating what we are about to eat. And we can also choose a kid chef mode so that we have a list of tasks for our young helpers. Pretty cute. Now let's move to the third step, refine. As you would imagine, we are not done here. We can continue iterating and enhancing our app. Let's say I want to adjust a few things. I'll type my requests in the chat. You just saw me introducing several changes into our app. When you do so, introduce one change at a time, rather than trying to include everything in one single prompt. Let me make several other changes to our app. Here is the version that I've got so far. I decided to add the possibility to include other ingredients in addition to the predefined list. And in case it happens to be from the tier three category, there will be a relevant message shown, but the recipe still will be created. I also added the possibility to save a recipe into the favorites, which are accessible here. And finally, I added the reset button in case we want to start all over again and choose different ingredients. As you can see, we have been able to make quite a lot of changes just by casually chatting with Gemini with no coding involved. I'm happy with our current progress and the user experience we have created. In the second part of this tutorial, I'll show you another way for how you can make changes in your app using the Canvas toolbar. And we will also take a look on how to share it with others. I'll see you in the second part. 41. Follow-Along: Building an App with Canvas - Refining and Sharing: Everyone. Welcome to the second part of the tutorial, where we explore how to build working software by describing what we want rather than writing code line by line, the process known as vibe coding, as promised, I want to show you another option for making changes to your app as part of our refined step. Notice this Gemini Canvas toolbar. Let's explore what it can do for us. Let's start with this sparkle icon. This is the AI feature injector. It adds EI capabilities to your app. When you click it, Gemini analyzes your current app view and suggests smart components, such as an AI storage bar or text and image generation and then it injects those elements directly into your app's logic. Let's ask Gemini to add AI features and see how it works. In the chat on the left, Gemini provides an overview of what AI features were added to the app. We can respond in the chat and ask Gemini to make additional changes. But first, let's try out these new features. Here is the magic feature number two. We see that Gemini I proposed more health ingredient instead of the one that I just selected, but I don't have it right now, so I'll just click Cancel and go ahead with these three. Here's the EI wisdom card pretty nice. And of course, let's try out how the audio narrator works. Rise and shine. Today's mission is the sunny side spinach and avocado clouds. The iron rich spinach paired with mono and saturated fats from avocado provides a clean energy boost that keep you feel nimble and refreshed. Let's make a change to one of the feature. Gemini confirms that the change has been made, so let's test it. Take a deep breath and let's start the day. Your recipe today is the Emerald Cloud Nest. The combination of iron rich spinach and mono unsaturated fats from avocado ensures a slow release of energy, keeping you feel light and airy. Wasson, we just saw how Gemini has followed our instructions, and I suggest that we return to the Gemini Canvas toolbar and explore it further. The next I can hear is the drag handle. It is used to move the atolbr so it doesn't block your app's navigation during tasting. And there is also a third icon, the refinement tool, which tells Gemini to modify a specific element of your app. You might notice, it is not visible here in our golden hour app. That's actually intentional. Gemini recognizes that this app has gone through enough iterations, so small automated edits could be risky. If it tries to tweak one element but misreads the context, it could break something else that depends on it. So it hides the icon as protective measure to demonstrate how the refinement tool works. Let's switch to a simpler app. I started building before recording this tutorial. I have only made a few iterations there, so the icon is available. Let's say that I want to change the color of this button. So I'm choosing select and ask, highlight this button, and type in my prompt, suggest another color palette. I Notice what happened here. Instead of changing just this button, color, Gemini redesigned the whole app. Why is that? It turns out the word palette is the problem here. A color palette refers to the entire set of colors used across your app. So Gemini takes that literally and updates everything to match. It's not doing anything wrong. It's just following your instructions precisely. To change only the color of this button, you need to clearly describe the scope of the change. Let me show how. I'm selecting the button again and typing in another prompt. You see that my detailed prompt has worked, and this time, Gemini I applied the changes to the element that I indicated through the refinement tool. That is a really useful thing to keep in mind. The more specific your prompt, the more precise the result. Let's come back to our golden hour app. Now that we've covered how to refine and adjust your app. Let's talk about what happens when you're happy with it. Step four, share. Once you are done, you can get a sharable link and send it to anyone. They can open and use the app directly in the browser. No downloads, no signs, no technical setup on their end. They can even remix it. That's one of the features Google has built into Canvas. Someone can take your app, open it, and create their own version from it. All right. And that's it for this tutorial, please share what apps you are working on in the Q&A section for this video. I would love to see what you're building.