Create AI Avatars: From Image to Video. Step-by-Step Guide

Bros Academy, null

Get unlimited access to every class

Taught by industry leaders & working professionals

Topics include illustration, design, photography, and more

Get unlimited access to every class

Taught by industry leaders & working professionals

Topics include illustration, design, photography, and more

Lessons in This Class

- 1.
  
  Introduction to AI Avatars
  
  1:16
- 2.
  
  Module 1: AI Avatars - Types, Use Cases & Choosing Your Direction
  
  3:05
- 3.
  
  Module 2: Visual Foundation - Creating Your Avatar in Practice
  
  8:28
- 4.
  
  Module 2.1: Creating Multiple Angles and Styles for Your AI Avatar
  
  5:00
- 5.
  
  Module 3: Script & Story - Writing Short video scenario
  
  3:13
- 6.
  
  Module 4: Turning Scripts into Speech - Turning Scripts into Speech
  
  7:42
- 7.
  
  Module 5: Comparing AI Tools - Comparing AI Tools
  
  13:22
- 8.
  
  Module 6: Creating Background Sound - Creating Background Sound
  
  6:48
- 9.
  
  Module 7: Final Assembly in Practice - Editing the Video in CapCut
  
  7:32

Beginner level

Intermediate level

Advanced level

All levels

Students

Projects

About This Class

AI avatars are becoming a natural part of modern video content — but creating a good one takes more than simply choosing the right tool.

In this class, you’ll learn a practical, step-by-step workflow for creating AI avatars, starting from a single image and ending with a finished talking video.

We’ll focus on the core building blocks of the process:

creating consistent avatar visuals
writing short scripts that work well with AI voice and lip sync
generating natural-sounding voice
applying lip sync and subtle motion
assembling everything into a complete video

This class is based on real production experience and focuses on understanding the process, not chasing perfect results.

You’ll see how different decisions affect the final outcome and why a clear workflow matters more than any specific tool.

By the end of the class, you’ll have your own AI avatar video — and the confidence to keep experimenting and building new projects using the same approach.

AI Disclosure:

This course includes examples of content created with artificial intelligence.

Artificial intelligence tools are used to demonstrate how AI-generated voiceovers are created for short videos as part of the learning material.

The course itself is narrated by a human instructor. AI-generated voice is used only as an example within the projects shown in the course.

Meet Your Teacher

Bros Academy

null

Teacher

We are Bros Academy, a creative duo combining the worlds of cryptocurrency and AI content creation.

With over 6-8 years of experience in the dynamic world of crypto, we've invested in digital assets, explored Web3 games, and been active members of global crypto communities. Our passion for blockchain technology and decentralized finance (DeFi) has led us to create practical, beginner-friendly courses for those entering the space.

In addition to our crypto background, we are also AI content creators. We craft engaging AI-generated ads, animated shorts, and visual stories using the latest generative tools. From storytelling to marketing, we love experimenting with how artificial intelligence can be used creatively and commercially.

Through our courses, we share hands-... See full profile

Related Skills

ChatGPT AI for Film & Video AI & Innovation AI for Marketing & Business

Level: Beginner

Hands-on Class Project

Project: Create Your First AI Avatar Video

In this class, your project is to create a short AI avatar video using the workflow shown throughout the lessons.

You don’t need to aim for a perfect or polished result.
The goal of this project is to understand the process and complete all key steps from start to finish.

What you’ll create

You’ll create:

a consistent AI avatar (realistic or stylized)
a short script (around 20–30 seconds)
an AI-generated voice-over
a lip-synced avatar clip
a short final video where everything comes together

Project steps

You can follow these steps at your own pace:

Choose your avatar direction
Decide whether you want a realistic or stylized character.
Create your avatar image
Generate a base image and check visual consistency.
Write a short script
Keep it simple and suitable for AI voice and lip sync.
Generate the voice-over
Create clean audio using an AI voice tool.
Apply lip sync
Test one or more tools and choose the result you like best.
Assemble the final video
Combine visuals, voice, and motion into a short video.

What to share in the Project Gallery

You can share:

a screenshot of your avatar image
a short video clip or final export
or even just notes about what worked and what didn’t

Sharing is optional, but highly encouraged — seeing different approaches can be very helpful for other students.

Tips for this project

Don’t worry about perfection
Focus on understanding the workflow
Small experiments are more valuable than perfect results
It’s okay if your first result feels rough — that’s part of the process

Final note

This project is meant to help you feel more confident experimenting with AI avatars.

There’s no single “right” result — only progress and learning through practice.

Class Ratings

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction to AI Avatars: You have been curious about AI avatars, but feel overwhelmed by tools, demos, and mixed results, you are in the right place. This course is created by Bros Academy based on real production experience from Bros AI Studio. In our studio, we don't just experiment with AI. We use it to create AA avatars, animated characters, full AI driven cartoons, music video clips, and commercial advertising videos for real use cases. AA avatars don't fall because of bad tools. They fall because of unclear workflow. This course is not a collection of random features or demos. It's a practical end to end process. We actually use it in our own projects from choosing a character to creating a talking, moving AA avatar video. You will see how we design consistent characters, write shorts, scripts that work for AA, generate natural voices, apply lip sync, no hype, no one click magic, just what works and why. This course will not promise perfect results, viral videos or instant income. What it will give you is a clear structure, realistic starting point, and the confidence to experiment without guessing. By the end, you won't just have a finished AI avatar video. You will understand how and why it was made. If you want to grounded practical introduction to AI avatars used in real projects, this course is for you. Let's get started. 2. Module 1: AI Avatars - Types, Use Cases & Choosing Your Direction: Module one, AA avatars types, use cases and choosing your direction. Before we start working with tools and visuals, let's take a step back and talk about A avatars in general. In this module, we'll look at what A avatars are, where they used and how to choose the right direction without overthinking it. When we talk about AI avatars, we mean digital characters created with the help of AI tools. These avatars can represent real people or they can be completely fictional characters. The main purpose is communication to explain something, tell a story, or deliver a message. AA avatars are already used in many different areas. You will often see them in social media content, educational videos, marketing and advertising, and even presentation or storytelling projects. They are flexible tools, and their role depends on how you want to use them. There are several main types of AA avatars. Some are realistic and based on real photos, others are stylized, cartoon like, or semi realistic and some are fully fictional characters. To be honest, there is no single correct type. Each approach has its own strengths. Realistic avatars are usually created from real photos. They can feel very personal and engaging, especially when representing a real person. At the same time, they come with a higher expectations for realism and require more control and consistency. Fictional or stylized avatars are not based on real people. They offer more creative freedom and are often easier to maintain visually. They are also more forgiving when it comes to motion, lip sync, and small imperfections. Before choosing an avatar, it's important to ask yourself a few simple questions. Why do you need this avatar? Where it will be used? Do you want it to represent you or a character, and how realistic does it really need to be? One important thing to remember is that you are not locked into one choice. You can always change your avatar later or create more than one. What matters most is not making a perfect decision, but gaining experience by actually trying. In this course, we will show you both approaches in practice. We will create a realistic avatar based on real photos, and we will also generate a fictional avatar from scratch. This way, you understand how each approach works and which one fits your goals best. All right, that's the end of this module. Let's quickly summarize what we have learned. In this module, we have learned what A avatars are where they are used, and the main types you will encounter. You have also seen how to think about using an avatar direction without pressure or fear of making the wrong choice. In the next module, we will start building the visual foundation for your avatar and move into hands on practice. In the next module, we will move from concepts to practice. You will see how AI avatars are actually created using different tools and approaches. We will work with real photos, generate fictional characters, and focus on building a consistent visual foundation. You can reuse later for video and animation. 3. Module 2: Visual Foundation - Creating Your Avatar in Practice: Module two, Visual foundation, creating your Avator in practice. Welcome to Module two. In this module, we finally move from theory to practice. I will work you through how we create a avatars step by step using different tools and setups. We are not aiming for perfection here. The goal is to understand the process and learn how to create avatars that are consistent and usable in real projects. Will see a few common approaches from realistic avatar based on real photos to more stylized cartoon like characters. We'll use different tools along the way, but don't get to attach to any specific one. The tools are just examples, the workflows that really matters. As you watch, try to notice some small decision, lighting, angles, prom details. Those little thing often make a bigger difference than you would expect. Will also spend time on angles and consistency, creating multiple views of the same character and preparing images for video or animation. And don't stress about remembering everything. Focus on understanding why things are done a certain way. You can always come back later when you start building your own avatar. All right, let's start with a very powerful tool that I use a lot. Hicks felt AI. Hicksfld is a generative AI platform that brings multiple images and video models together in one interface. Instead of being limited to a single model, it lets you experiment with different models, cinematic tools and generation modes depending on what you are trying to create. You can generate images and videos, experiment with cinematic shots, control camera angles, create scene variation, and explore different visual styles. That's why it's a great tool for creative production, marketing, and visual storytelling. Start creating our avatar. The first thing we need is a good prompt. For that, I usually go straight to ChatGPT. You can use other models as well like Gemini ASCI to help generate the prompt. In this example, I want to create a super realistic avatar of myself using my own photos as a reference. I want the avatar to be recognizable, so I'll keep some clear visual details like wearing a blue hoodie. I also want a studio microphone in front of me and some green plants maybe on a background. I usually use photos like this, and you have to remember that your reference image should be just like you in a frame, preferably a portrait or a close up, and also a good quality, and most importantly, your face should be looking straight at the camera. Once you edit your reference image, let the AI do its magic, and let's wait for the prompt. All right, let's check what ChatGPT gave us. Let's take a quick look at the prompt. If everything looks good, just copy it and let's get back to Hicksvild AA. The first way to create your avatar is by training it using your own photos with Hicksvild sole. This actually is my wife favorite option right now. To get started, click Image in the top left corner and select the model Hicks felt Sol. My wife and I already have our Sol avatars trained, so I won't go through the full process from the scratch here. But what you need is to click Generate New and upload around 15 maybe 20 photos of yourself. Take in mind that it's the best to include a mix of close up shots and full body photos. This helps the model learn your face and body in different positions and angles. For my avatar training, I use the same photos I showed you earlier. Try to use similar images of yourself, good lighting, clear face, and no other people on the frame. The training process take a bit of time. From what I remember, it usually takes about ten or 20 minutes, so just be patient here. Once your avatar is ready, select it and past the prompt, what we got from ChatGPT. Since this course, we are creating YouTube shorts, TikTok style videos. Let's choose the 916 aspect ratio. For resolution, I usually go with the highest available option. Right now, it's two K. Just a quick note about the credits. Generating four images costs two credits, and honestly, that's not much, especially compared to other tools. And remember when I said Hicksfeld is an AI model aggregator. That's one of the biggest advantages. You can switch between different models while paying for one subscription instead of juggling five separate tools, which would be way more expensive. Next up is my personal favorite at the moment, nana Banana Pro. Here we will use the same prompt and the same reference image and generate the avatar again. I will generate two images using nana banana. At the moment, one image in two K resolution costs two credits, so keep that in mind when you are testing things. After that, let's switch to CDRm 4.5. Same setup again, same prompt and same reference image. I think by now you are starting to see what we are doing here. The idea is to show you several different image generation models using exactly same input. This gives you more option to choose from and makes it easier to compare results. Might like the look of one model more than another, and that's totally fine. The goal is to understand how to test, compare and choose the model that works best for you. Next, I want to show you link AI. You can actually use link models inside Hicksville too. But when I'm specifically working with link, I usually go straight to the link website and generate images there. I will explain why I prefer that a bit later. It will make sense once you see the workflow. For now, let's generate our avatar using 01 model or whatever the latest version is available on your site. On the left side of the screen, click 01 buttom now switch to image generation. Pase the prompt and upload your reference photo. If you have a clink subscription, you can generate images for free. I have a subscription, so I'm going to select the free generation option by clicking this button. I will set four outputs, but you can choose up to nine if you want more variations. And for resolution, let's go with two K. See, with subscription, it shows zero credits. If you don't have a subscription, generating one image usually cost about one credit. Alright, I everything looks good, let's hit generate. While the images are generating, I want to explain why I often use link AI directly. Instead of running it through Hicks field, even through Is that having one subscription can be cheaper. The reason is pretty simple. In Bros Academy, we are active creators and we generate a lot of images and videos both for ourselves and for clients. That means we need a lot of credits. Link AI has a really nice system that lets you earn credits for free. Let me show you our profile in link. We post our work here regularly and sometimes participate in different contests. Just posting your creatives doesn't earn you credits by itself, but you do earn credits when someone recreates your images or videos using your on top of that, you get a small commission from each create. People can also like your work and follow you if they enjoy your style. So if you want to, you can actually build the audience inside the cling, as well. For example, yesterday, I earned 160 free credits. Not my best result, but that's totally fine. Some days are better than others. Over the past few months, I have earned more than 32,000 free credits this way. If I were to buy those credits, it would cost roughly 400 bucks. On top of that, I earn over about 50,000 free credits from other activities offered in the past. Link also has a referral program. At the moment I'm recording this. If you use my referral link, we both get 500 free credits for your first generations. It's a win win for both of us. I will link the link in the description. All right, let's take a look at the results that link generated. I actually really like how this one turned out. Before downloading the image, I recommend upscaling it to get the best possible quality. It's a small step, but it makes a noticeable difference. Now let's get back to Hicksfeld AI and check what the different models produced there. Personally, my favorite result here is from C dream 4.5. But in your case, the best result might come from different model, and that's totally fine. The goal here isn't to pick up a correct model. My goal is simply to show you the options so you can choose what works best for you. Go ahead and download the image you like the most. And that's it for now. We just created our first realistic avatar based on a photo reference. We tested several models and pick the one we like the most. In the next module, we will take this avatar further. We will generate more images of the same character, but with different angles. So our video feels more dynamic and natural. And if you'd like more a cartoon style avatar, don't worry. I will also show you how to turn your realistic avatar into a pixel stay character in just a few clicks. See you in the next module. 4. Module 2.1: Creating Multiple Angles and Styles for Your AI Avatar: Module 2.1, creating multiple angles and styles for your AA avatar. In this module, we will take your avatar step further. We will focus on creating multiple angles of the same character, so your video feel more natural and dynamic. You will see how to turn your realistic avatar into a more stylized pixel style version, if that's the direction you want to explore. Alright, let's get back to Hicksfeld and choose the nana Banana Pro model. I have already picked the images I like the most and downloaded it. Now I want to generate different camera angles for this image while keeping the same character and the same environment as in the reference. Let's select our image and update the prompt. Make sure that 916 Spec ratio is selected. Then click Generate. I will generate four variations, but feel free to experiment with fewer or more outputs and see what works best for you. All right, as you can see, we got a few different options here. They're not perfect, but that's right. In some cases, it helps to be more specific in the prompt and clearly describe the camera angle you want. But for now, I'm mainly showing you the workflow and the options that are available. Next, I want to show you another tool inside Hicksville AI that based on Nana Banana Pro model. It's called Shots. Let's go to the top of the screen, click Apps and find Shots app. This app generates nine different camera angles from a single uploaded image. We use it quite often. So let me quickly show you a few examples of the work that were generated with it. It's a great tool for telling your story in a more cinematic way. A lot of people use avatars with just one image or one camera angle. If you want to stand out, it helps to do something a bit more interesting or at least understand how it's done. All right, let's upload our image. Give it a moment, then double check the aspect ratio. In our case, 916 works perfectly. The generation cost four credits. Let's click Generate and wait about one, 2 minutes for the results. All right, we have got nine shots here. I like few of them, so I'm going to pick up four and upscale those. Each image upscale costs two credits. You can also download the images without upscaling if you don't need it or upscale them up to four X, which cost more credits, of course. All right, let's start the upscaling. It usually takes about two or 3 minutes, so let's be patient and wait. Okay, now we can see all the images we got. Let's download them and save everything into a separate folder. And later maybe in the next modules, I will generate a few more angles where the avatar is looking straight at the camera, lacking these shots. Now remember I said that I will show you how to turn your avatar into a cartoon style character just in case you want that look. You can generate a cartoon avatar from scratch using proms from ChatGPT, of course, but today we will keep it simple. We will use nana Banana Pro in Hicksville with a short prompt. I'm going to use the same reference image and ask for a pixel style version while keeping the character features and environment as close as possible to the original. Let's generate it, and then we will jump into clean Gale to compare the results. That way, you will see how different tools handle the exact same task, and you can choose what you like best. All right, in clink, select 01 model and switch to image generation. Upload the reference image, paste the same prompt we used in Hicks field and set four outputs. And now let's hit generate. Now let's get back to Hicks field AA and take a look what we got. So what do you think, guys? Personally, I think this looks really good. At this stage, you can easily experiment by changing things like clothing, hairstyle or small details just by tweaking the prompt and using the image as a reference. It's very flexible setup. Now let's check what we got from Klink AA. Hm. In my opinion, the results here looks better with the banana banana Pro model. The link version feels a bit too simple and cartoony for my taste. That said, you might feel different, and that's totally fine. There is no single best result here. My goal is simply to show you different options so you can test them yourself and decide what works best for your style and your projects. All right, that's it for this module. Let's quickly summarize what we have learned here. You learn how to expand single avatar images into multiple views and styles. We explore how to generate different camera angles while keeping the same character and environment and why this matters for creating more dynamic and natural looking videos. You also saw how different AI models can produce very different results, even when using the same prompt and reference image, and why there is no single right choice. Finally, we'll look at how realistic avatar can be transformed into a more cartoon style version and how to compare results across different tools. The key takeaway here is a workflow, testing options, comparing results, and choosing what fits your project and your taste. In the next module, we'll focus on story and structure. We'll use ChatGPT to come up with a simple script and turn our idea in a short video scenario. You will see how to go from a rough concept to a clear, usable script what works well for a short form video. See in the next module. 5. Module 3: Script & Story - Writing Short video scenario: Module free, script and story writing short video scenario. Hey, everyone, and welcome to Module free. In this module, we are going to start playing with ideas for our video using the avatar we have already created. We will talk about AA, but don't worry, not in a boring or super technical way. The goal is to come up with the ideas that are interesting, even for people who aren't really into AA. We will use ChatGPT to help us brainstorm. We will tell you that we want to create short YouTube short style video and ask for ten fun engaging ideas around AA. Let's see what it comes up with and pick something we like. Alright, let's take a look at what ChatGPT came up with I read through all the ideas, and just keep in mind, you don't have to stick to AI as a topic. You can take any subject you like and use the same approach to create videos. The main thing is the workflow. Once you finish this course, you will have a clear way to go from an idea to a finished short video. After that, everything else depends on your imagination. Uh huh. One idea really stood up to me. AI won't replace you, someone using AI W. It sounds a bit provocative, which is exactly what we need. Let's tweak it slightly and add in 2026 to the idea. Now I will ask ChatGPT to write a 20 32nd script, which is strong hook in the first 3 seconds, so people don't just scroll past the video. Let's see what ChatGPT gave us next. All right, let's read what we got. ChatGPT actually generate a script broken down second by second, which is super helpful, especially for short videos. If this version already feels good to you, that's totally fine. You can stop right here and move forward with it. But here's a small trick when working with ChatGPT. You usually get better results if you give it a roll. In our case, I want ChatGPT to imagine that it's a YouTuber with ten years of experience and a skilled public speaker. Then I will ask to rewrite the script using that perspective. Now let's see what kind of results we get and compare it with the previous version. Alright, let's read the result. Personally, I like this version more. I really like how the script starts with. In 2026, AI won't replace you. At that moment, the viewer can relax a bit because a lot of people are genuinely afraid that AI will replace them in the future. And then just a few seconds later, the avatar says, someone using AI will. That's where the feeling flips. The viewer might think, wait, what? What do you mean? And that curiosity makes them want to keep watching. The video also ends with a provocative question, which is great. It can motivate people to leave a comment or react to the video, and that naturally helps with the engagement and the algorithm. Let's copy the script and paste it into a separate Google Docs file to keep everything organized. We will come back to this document in the next module. And that's it for this model. Take a short break, grab a coffee, or do a few push ups, reset a bit, then get ready for the next module where we'll turn the script into a voiceover. I will show you how to do that in 11 labs in a way where most people won't even realize it's AI generated voice. See in the next module. 6. Module 4: Turning Scripts into Speech - Turning Scripts into Speech: Module four, voice generation, turning scripts into speech. Hey, everyone, and welcome to Module four. In the previous module, we focus on writing a script that works well for AI avatars. Now it's time to give the script a voice. In this module, we will look at how to turn your text into natural sounding speech using AI voice generation. This is a very important step because voice plays a huge role in how believable and comfortable your avatar feels. We will be using 11 labs for this part of workflow. 11 Labs is AI voice platform that allows you to generate realistic speech from text, works with different voice styles and control how the voice sounds and delivers your script. It supports things like text to speech generation, voice libraries with different tones and personalities, voice design and voice tools for longer content like audio books or videos. You don't need to use every feature. We will focus on what's actually useful for AA avatars and video voiceovers. You can sign up for 11 labs for free using your Google account. Every month you get 10,000 free credits for voice generation. For most beginners, this is more than enough to get started. In practice, that amount of credits is usually enough to create around five or six videos, similar to one we are building in this course, so you don't need to worry about paying for anything right away. And follow along, test the workflow and see how everything works using the free plan. In our past projects, we have used 11 labs in many different contexts. We have used it to record voiceovers for online courses to create dialects and narration for AI cartoons, to voice YouTube videos and short form content, and to produce clean, consistent audio for different types of videos. We are not going to cover all of these use case in details here, di mentioned to give you a context and to show how flexible this tool can be in real projects. This module will focus only on what you need right now using 11 laps to turn your script into clear, natural sounding speech that works well with lip sync and animation. You will also find a link to 11 laps in the course resource document with useful links attached to this course. As always, don't worry about memorizing every setting. Focus to understanding the process, how to choose a voice or how small changes in text or delivery affect the final result. By the end of this module, you'll be able to confidently turn your script into spoken audio that's ready to be used with your AI avatar. Before we start generating the voiceover, let's take a quick look at the voice library in 11 Labs. On the left side of the screen, click on Voices. Here you can explore a large variety of voices that's already available. 11 labs also give you the option to clone your own voice. You can upload an audio sample with your voice and generate voiceover without recording every time. You will notice that voices are organized in different ways. You can browse by language, by styles or use case, for example, narration, social media, or advertising. You can also filter voices by gender, age, and other characteristics. For our case, we are looking for something closer to a social media or advertisement style voice. I have already chosen a voice for my avatar, but don't feel like you need to use the same one. Take a moment to explore the filters, listen to a few options, and choose the voice you like the most. At the end of the day, this part is very subjective. It's mostly a matter of taste. Now let's move on and start generating our voiceover. On the left side of the screen, click on text to speech. As I said, I have already chosen a voice for this project. It's Alex a young American male voice. Here you can also choose the voice generation model. All the available models are good and each one is designed for slightly different purposes. For my avatar, I'm going to use the V free model. At the time I'm recording this, it's the latest model available. If you are going through this course later and you'll see newer models added, a good general rule is to try the latest one first. In most cases, newer models offer better quality, more natural delivery or improved lip sync behavior. Now let's get back to Google Docs where we saved our video script. From here, we can simply copy the text and paste it into 11 laps to generate the voiceover. In my case, this script is only less than 900 characters long, and 11 laps allows you to generate up to five K characters at once. Technically, we could generate the entire voiceover in the single audio file. However, there is an important thing to keep in mind. Later we'll be using this audio for lip sync and we don't want our avatar to be talking for the entire video length. We'll be applying lip sync only for specific parts of the video using different camera angles that we prepared earlier. Some sections we'll also cover the avatar with stock footage or additional visuals. Another reason for this approach is cost. Generating long lip sync videos can consume quite a lot of credits. If you are making just one video, that might be totally fine. But if you plan to create many videos, splitting your voiceover into smaller parts can help you save a significant amount of credits. So for now, let's generate the first part of the voiceover and listen to the result. In 2026, AI won't replace you. That line gets repeated a lot. But here's the part people skip. In 2026, AI won't replace you. That line gets repeated a lot. But here's the part people skip. With the free model, 11 laps usually gives you two different variations to choose from. Don't worry about the sound quality right now. I'm recording my screen, so the audio you hear is compressed. Once you download the file, you will hear how good the final quality actually is. Listen to both options, choose the one you like the most, and then download it. Now let's take the next part of our script, paste it into 11 laps and hit generate. The generation process is very fast as you see. Someone using AI will, same job, same title. Very different results. Someone using AI W. Same job, same title, very different results. Once you're choosing the options you like the most, download the audio file and move on to the next part of the script. You will simply repeat the same process until all parts of your script are generated. I'm not going to record every single repetition here. The goal of this module is to show you the workflow and help you to understand how to approach voice generation, not to waste your time watching the same steps over and over again. I will finish generating the remaining parts in the background, and then we will move forward for the next all right now all the parts of our script have been turned into voiceovers. I have downloaded all the files into a separate folder just to keep everything organized and easy to work with it later. One important thing to notice here, I downloaded all the files in wave format. The reason is simple. Wave gives you the best possible audio quality, which is especially important when you use this audio for lipsing and animation. Starting with high quality audio helps avoid problems later and gives better final results. All right, at this point, we have turned your script into a voice. You have seen how to choose a voice, how to work with generation models, and how to prepare clean audio files that are ready for the next step. You also learn why it's often makes sense to split a script into smaller parts and how this approach can save time, credits, and give you more flexibility later. Most importantly, you now have high quality voice files that work well for animation and lip sync. In the next module, we will take these voice files and move on to lip sync. You will see how to apply lip sync in practice using different AI tools and how the same audio can produce different results depending on the workflow. We'll compare to tools side by side and talk honestly about what works well and what doesn't and what to pay attention to when choosing a lip sync solution. When you are ready, let's move on to the next module. 7. Module 5: Comparing AI Tools - Comparing AI Tools: Module five, lip sync in practice. Hey, everyone, and welcome to Module five. In this module, we will take the voiceovers from the previous module and make lip sync videos in practice. We will apply the same audio to the same character using two AI tools, cling Avatar and Hagen, so we can clearly compare how each one handles lip sync and movement. The goal here isn't to find a perfect tool, but to understand the difference and choose the lip sync result that works best for our video. By the end of this module, we will select the final version and use it in the last stage of the editing. Let's get back to Google Docs, where we saved our script. In the previous module, we generated the voiceover in 11 laps by working with the script in smaller parts. For each audio piece, I noted which avatar shot or stock footage will go with it. So we basically ended up with a simple written storyboard. I planned the order of the avatar shots to keep things moving and avoid staying on one angle for too long. I also generate a few extra angles and pick the ones I like the most. For this video, I will use three main angles, a front view, a slight side angle, and a slightly top down shot, and we will switch between them to keep the video feeling dynamic and engaging. Now let's move over to link AI website. In the top left corner, click AI Tools. As I mentioned earlier, clink offer a wide range of tools and models. But for this lesson, we're interested in only Avatar two point oh, or simply the latest Avatar model available if you are watching this course later. The team at link Luis updates quite often, so using the newest version is usually a safe choice. You also see that link offers a set of pre made Avatars. They can be useful for quick test or short term tasks. But for this project, we are taking a more professional approach and using our own custom Avatar. Click Aloadimage on the left and upload your avatar image. Once the image is loaded, upload your first voiceover by clicking Upload Audio. Before generating the video, you can choose the output resolution HD or full HD. I usually go with the highest option available. It costs more credits, of course, but it gives you the best possible quality, which is especially important for close up talking avatars. One full HD lip sync generation costs 48 credits, which is not cheap, but lip sync in general slightly more expensive than regular video generation. Below, you will see an option to add a prompt if you want to describe the avatars behavior in more details. In many cases, clink automatically suggests a prompt, and from my experience, it's actually work quite well. To keep things simple, we will use the suggested prompt and see how the results all right, let's move on and check which Avator angle comes next in our storyboard. Now we go back to clink, upload the next part of our audio, click Upload Image, select for Avatar angle, and upload it. Then we repeat the same process for the remaining audio parts. Alright, everything is set. All the pieces of our puzzle are now in generation. All that's left to do is weigh the result and take a look at how everything turned out. Alright, our videos are generated. So let's take a look what we got. In 2026, AI won't replace you. That line gets repeated a lot, but here's the part people skip. So what do you think? Compared to previous version, link has clearly improved the realism of the Avatar's emotion. It's already feels much better, and I think it will only keep improving from here. Personally, I like the result. The dialects feels alive, no te or plastic. Let's move on and check the next generation. Someone using AI W. Same job. Same title. Very different results. Someone using AI W. Same job, same title. Very different results. Someone using This one also looks really good. I don't see any noticeable defects or artifacts here, so I think we can safely keep it. Let's continue and see what we got next. A developer without AI writes everything from scratch. A developer with AI ships faster, fixes bugs earlier, and focuses on real problems. Hm. This one I like a bit less. During the head turn, it feels like the head becomes slightly smaller. It's not a critical issue. So for now, I will keep it, but it's something to be aware of. Let's move on. So, in 2026, the question isn't will AI take my job. So in 2026, the question isn't will AI take my job. This generation looks great. I will definitely keep this one and move forward. The question is, the question is, All right, and here we have a quick shot that works well as a transition. That fits our video perfectly. Will you be the one using it or the one competing against it? Which side are you on? All right. Now let's say all the clips into one folder. As you can see, applying lip sync to our avatar is pretty straightforward process. Next, we will do the same thing in another application, Hagen, which is currently one of the leading in lip sync. We will compare the results and then choose the best shots for our final video. Hagen is an AI platform focused on creating talking avatars and lip sync videos. It's widely used for educational content, marketing, videos, and social media, and it's known for stable lip sync results and easy to use workflow. The last time we used Hagen was about four months ago. Back then, we use it to create short videos for YouTube and TikTok, as well as short form film. Since then, Hagen has released a new model, and that's exactly what we are going to test today. Hagen has limits on how many avatars you can create depending on your subscription, and additional Avatar require extra payment. I have already reached the limit on my account, so my wife registered a separate account. That way, we can properly show you the full Avatar creation process inside Hagen and walk through it step by step. Hagen offers several subscription plans, including a free. With the free plan, you can make a few generation and get a feel how the service works before committing to anything. For this lesson, we are using the 25 euro plan, mainly to properly test the new model and show you the process. Also because three free videos wouldn't be enough for today's example. We also want to get the best possible quality. That set your setup might be different. In some cases, a clink subscription alone might already be enough. It's really depends on your needs and workflow. All right, now let's move on creating our avatar. On the left side of the screen, click Avatars. Here, you will see that Hagen gives you two main options to choose from. You can either clone a real person, for example, yourself, or create a virtual character from image. In our case, we'll go with the second option. Create a virtual character from image, since that fits our workflow best. Next, we upload our avatar image. As you can see, agents show examples of which images work best, but our avatars are perfectly suitable for lipsing, so there is nothing to worry about here. Click Upload and move on to the next step. Here we enter the basic information for our Avatar. There is nothing special here, so you don't need to spend much time on this part. Our Avatar is now created. To add voice, click on the Avatar, and then click ZEN Video. You will see many different options here. You can use voices from Hagen's Library, as far as I remember, they recently partnered with 11 Labs, which we used for our voiceover. But since our audio is already ready, we'll upload our own file. Click Upload audio in the top left corner. Upload the first audio file and check how it sounds. In 2026, AI won't replace you. That line gets repeated a lot, but here's the part people skip. If everything sounds good, click Out Audio. Once the audio is added, go to the top right corner, click Generate Video, make sure all the settings are set to maximum quality and that there is no watermark, and then click Submit. All right, while video is generating, we can move on and create the next lip sync. Here I select the six audio file, which I mark in a Google Docs as the one that should be used with this avatar. Just like before, we click Generate, make sure all the settings are set to highest available quality, and then click Submit. While we are waiting for the next generation, let's take a look at the result from our first one. In 2026, AI won't replace you. That line gets repeated a lot, but here's the part people skip. To me, it looks very realistic. I'm pretty sure that most people, if they see this avatar in their feet, wouldn't even realize that it's AI. Now, let's compare it with the same video created in Clink. In 2026, AI won't replace you. That line a lot. A replace. That line gets repeated a lot. But here's the part people skip. In the Klink version, the face feels a bit more plastic, and the emotions look slightly more expressive compared to the hygien result. What do you think? Now, let's take a look at second generation. So, in 2026, the question isn't will AI take my job. Nice. I really like this one again. And let's also compare it with the version created in link. So, in 2026, the question isn't will AI take my job. So in 2026, the question isn't AI take my job? In the clink version, the Avatar thiefs are not visible and the overall quality feels a bit less detail compared to Hagen. Hagen doesn't hide the thief, and because of that, the result feels more natural to me. Overall, I think I prefer the Hagen version here. Alright, we have created two lip sync videos. Now we need to create next one using a different avatar image. For that, we go back to Avatar section to create a new Avatar, click on New Look, upload the next avatar image, and then click Create Look. Using the same process, let's also create our third and final Avatar by uploading the next image and clicking Create Look. Now we take our second Avatar and move on to creating lip sync. Let's check the Google Docs to see which Audio files should be used for this Avatar. Okay, here we have two audio tracks that need to be applied. Before applying them, let's quickly double check. AI doesn't replace A developer without AI writes everything from scratch. A developer with AI ships faster, fixes bugs earlier, and focuses on real problems. Yes, that's exactly what we need. We upload the audio and add it. Next, we follow the same familiar process. Click Generate, add the description if needed, and make sure all the settings are set to the maximum quality. This time, the generation cost free credits, since the audio file is a bit longer than the previous one. That's fine. In my case, I have enough credits. Then we click Generate. Alright, almost everything is generated. Let's take a look at what we got. A developer without AI writes everything from scratch. A developer with AI ships faster, fixes bugs earlier, and focuses on real problems. Me, this looks really good. There is no head deformation, like we saw in Klink. Let's compare the two versions. A developer without AI writes everything from scratch. A developer with AI ships faster, fixes bugs earlier, and focuses on real problems. Honestly, both options could be used with a bit of post production work, but once again, Hagen is my favorite here. Let's move on. The question is, the question is, here, everything looks fine. This clip is too short to really compare, so let's move on. Someone using AI W, same job, same title. Very different results. This one also turned out great. No weird glitches or awkward gestures from the avatar. Let's compare this version with the one generated in clink. Someone using AI Will, same job, same title, very different results. Someone using AI Will, same job, same title. Very different results. The clink version is a bit more expressive. In some cases, that can work well, but because of this expressiveness, it becomes slightly more noticeable that it's AI. Now let's take a look at final generation. Will you be the one using it or the one competing against it? Which side are you on? Will you be the one using it or the one competing against it? Which side are you on? This one turned out to be a great closing shot for this video with a question that works as a call to action, encouraging viewers to leave a comment. In my opinion, Hagen handles this really well. And even fru, I'm a big fan of Klink, which I personally use for about 80% of my tasks. When it comes to realistic avatars, Hagen currently feels stronger to me. That said, if you are creating a cartoon style avatar or if you only have a link subscription, or you need to apply lip sync to a shot in animated project, link does a great job. I have used it many times for those cases, and I can definitely recommend for that kind of work. Now let's download all the files into a separate folder. You have seen the full process and the results, and from here, you can decide what works best for your own situation. My goal was to show you the available options. Alright, let's wrap up this module. We already have finished talking avatar. There are just a few pieces left, creating background music for our video using AA, and then bringing everything together in post production. We are almost at the finish line. If you have made it this far, there is no reason to step now, S in the next module. 8. Module 6: Creating Background Sound - Creating Background Sound: Module six, AI Music in practice. Hi and welcome to the next module. In this module, we will focus on generating background music for our video using AA. I will show you how quickly create music that fits the mood and pacing of our avatar video without spending hours searching through stock music libraries. We will use two AI tools that I personally work with, and we will also use ChatGPT to help us write a clear prompt for the kind of music we need. The goal here isn't to create a perfect soundtrack, but to generate clean, usable background music that supports the video and works well in final edit. All right, to understand what kind of music works best for this type of video, let's ask JAGPT. When you're creating this kind of content for the first time, it's totally normal not to know what music is actually popular or works well for this format. So instead of guessing, we will use JAGPT to help us to figure out and give us a few good directions to start from. Okay, let's take a moment and go through the options JAGPT came up with. Out of all the options, I like second one subtle cinematic underscore the most. Ja GPT mentioned that this style works really well for public speaking style delivery, and I agree it supports the voice without distracting from it. So let's do the next step. I will ask JAGPT to write a prompt for generating music in this exact style. Okay, here's our prompt. We will copy it and use it in two different AI tools to generate background music. The first app we are going to use is Sona. Sona is AI tool that mainly used for generating music from text Prompt. You can create background music, full song, instrumentals, or simple atmospheric tracks, all just by describing the mood and style. It's especially popular for background music for videos, social media content, demos, and experiments, weak music ideas without needing music production skills. One thing I really like about Sona is that it's really easy to use. You don't need to understand music theory or mess with complex settings. You just describe what you want and it gives you a result. For our case, we will use Sona to generate subtle cinematic background music that supports the voice and doesn't distract from the message. We will use same prompt we prepared earlier, generate the music, and then later compare it with another AI tool to see which result fits our video better. Let's quickly talk about SNA subscriptions, including a free one and paid option with more credits and features. Our goal in this course, generating short background music for video, the free plan is totally enough. You can already create music, test prompts, and get a feel for how everything works. If later you decide to generate a lot of music and need commercial rights, you can always upgrade. But to follow along with this course, you don't need to pay anything. I will link the link to Sun and producer AA, in the course resources and Pin file so you can easily find it later. Second app, as you already got it, it's producer AA. It's not a popular as SNA, but I have been using it for quite a long time. It has all the features I personally need for my workflow. When it comes to subscription, producer AA is slightly cheaper than Suna the difference isn't huge. There is also a free plan, which is more than enough if you're generating music just for personal use or learning. I'm currently on Startup plan for eight bucks because I use the music for commercial projects. But that's a separate topic. For this course, the free version is totally fine. The main point here is to compare the result and see which tool fits your style and workflow better. You can see, I have generated quite a long list of tracks here. There are actually a lot of them. The last time I generate music in Producer AI was a few weeks ago, but everything is still safe and easy to access. Over time, this becomes really handy. You build your own small library of tracks that you can reuse, compare, or take inspiration from later. All right, let's start generating. First, in producer AI, click New Session. In the chat window that appears, past the prompt we got earlier from ChatGPT and click Submit message. While music is generating there, let's switch to Sona. In Sona, go to the left corner and click Create. We only need background music without vocals, so make sure to select instrumental. Now paste the same prompt and click Create. This way, you're generating videos in two different tools at the same time using the exact same prompt, which made the comparison much clearer. Okay, let's listen to what producer Rey gave us. Hmm. It's not bad, but it feels a bit too go, maybe even a little boring. Let's try to make it more interesting and add more bits. While we waiting for the new generation, let's listen to what Sona created for us. A I actually really like the last part of the track. I think it fits our video pretty well. So let's go ahead and download it. As you can see, with the free plan, you can download the track only in MP free format. But for our video that's more than enough. We don't need anything more complex here. All right, let's wrap up this module up. In this module, we create a script for our video, figure out what kind of background music work best for our format and generate music using different AI tools. We compare the results, picked what we like, and now we have all the main pieces ready. There is just one puzzle left, putting everything together. In the next module, we will move to post production and assemble the final video in Capcat. That's where everything comes together. See in the next module. 9. Module 7: Final Assembly in Practice - Editing the Video in CapCut: Module seven, editing the video in CapCut. Welcome to final module. In this module, I will walk through my own video project and show you how everything comes together in CapCut. We will go step by step through the key features I used from placing the visuals and voiceover to adding music and small finishing touches. I'm not aiming to show a perfect edit here. The goal is to share a clear practical workflow so you can understand the logic and then experiment on your own. All right, let's move into CapCut. First, I will show you the final result I ended up with. After that, we will go through everything step by step and break it all down together. In 2026, AI won't replace you. Someone using AI will same job, same title, very different results. A designer without AI spends all day on one concept. A designer with AI explores ten directions before lunch and refines the best one. A developer without AI writes everything from scratch. A developer with AI ships faster, fixes bugs earlier, and focuses on real problems. AI doesn't replace professions. It replaces hesitation. It replaces resistance. It replaces people who wait. So in 2026, the question isn't will AI take my job? The question is, will you be the one using it or the one competing against it? Which side are you on? All right, this is how my final version turned out. I don't know if you noticed, but while editing, I felt that the original script was a bit too long and slightly boring at the start. So I trimmed the opening and cut a small part of the script, and honestly, it feels much better now. Not everyone will watch 1 minute video till then, and our goal is to deliver the main idea clearly, not to drag it out. So here's what I did next. I imported a folder with all the files I needed for this project. I also download a few stock video clips. We could have generated those with AI as well, but in this case, downloading stock footage was simply faster. I'm using those clips to fill the part of the script where the token avatar is not visible. I split the screen into two parts. On top, I place a stock clip and below it, the avatar. The idea here is to avoid that first reaction of This looks boring. Skip. Instead, the viewer has something to look at right away, which helps keep their attention. I also added a bit of motion to the first shot with the token avatar, a subtle zooming effect. To do this in CapCut, go to the very first frame of the clip and add a keyframe. Then move to the end of the clip, slightly increase the scale to the level you want and add another keyframe there. This creates a smooth zoom that adds a bit of life and energy to the shot. Next, I switch to another shot of the avatar, but from a different angle. Right after that, I add a shortcut to a full screen stock clip and then bring the layout back to the split screen again. All of these small transitions and changes help keep the viewers attention, and that's really the most important part. If you don't catch someone's attention in the first or 2 seconds, they probably won't stick around to see what the video is about or what comes next. So the goal here isn't to be fancy. It's to keep the video visually alive and give the viewers a reason to keep watching. After that, I switch to a stock lip that visually supports the part about designers. This helps reinforce the message and makes the idea clearer without over explaining. This is also where I had the first subtle hint to subscribe, a small sticker placed in a visible spot. The key here is to keep it light and more intrusive. You don't want to push too hard because that can easily turn people off. Think of it more as a gentle reminder, not a call to action shouted at the viewer. After that, I didn't overcomplicate it. I keep the layout simple, a split screen with a stock lip on top, captions and the talking avatar shown from another angle. Then I decided to give the viewer a short break from seeing my avatar's face. For this part, I felt the words themselves were strong enough, so I played with a very minimal visual approach, large text appearing on a black background. I selected the text, went to animation, and chose a Zoom in animation with the longest possible duration. I repeated the same setup for all three words. This creates a clean pause in the visuals, lets the message land and helps reset the viewer attention before moving on. Next, we move into the final part of the video. Here, the avatar appeals full screen with different camera angles switching through the scene. I also alternate those shots with subtle zoom effect on more static frames using a closer camera view. This helps to keep the ending visually interesting and prevents it from feeling flat or repetitive. Another important element here is a background music. The track we have chosen has a fairly dramatic tone, so I lowered the volume quite a bit. The goal for the music to stay in the background, supporting the mood without distracting from the voice. Honestly, this part isn't even mandatory. You could always add music later, directly in TikTok or YouTube. I mainly want to show you the music generation process as part of the workflow. Whether you use it in your own project or not, it's completely up to you. The next very important detail is captions. To create them in CapCut, go to text at the top of the screen. Choose auto captions, select English as the language and click Generate. CapCut has a really large library of caption styles, different fonts, animations and layouts for all kinds of looks. If you're editing in CapCut, I highly recommend spending some time exploring them and choosing what feels right for your style. Another important element is transitions between clips. Just like with captions, CapCut offer a wide variety of transitions. You can find them by clicking transitions at the top of the screen. Don't overuse them. A few simple transitions usually work best. And sometimes it's also helpful to gently remind the viewer to like or subscribe. For that, we can use stickers. You can find stickers right next to the transition section. There are tons of them, and new one are added all the time, arrows, highlights, outlines, call outs, and more. They can be really useful when you want to point at some specific on the screen or guide the viewer attention. Take some time to explore the section. It's more powerful than it might seem at first. And the last final step is audio mixing. AI voiceovers are already pretty good, but they aren't perfect yet. Sometimes you will hear longer pauses between phases, so it's a good idea to trim the audio bit to make it sounds more natural. At the point where two audio clips meet, I usually add a short fade outut at the end of the first clip and smooth fading at the beginning of the next one. In some cases, I even slightly overlap audio eclipse. This helps avoid awkward pauses and keeps the flow going. Since we are working with short form videos, pace and rhyme really matter, and that's basically all the techniques I used to create this video. So yeah, congratulations. You made it to the end of the course. And that's a wrap. In this course, you learn how to go from an idea to a finished short video using AA, creating an avatar, generating visuals, writing a script, adding voice and music, and finally putting everything together in post production. Thanks for choosing our course. We truly try to share our real experience, not theory, but a practical workflow you can actually use. This course gave you a new useful skills, we would really appreciate if you let the positive review. It helps us grow and continue creating practical honest content like this. And if you like to keep learning, feel free to check out our other guides and continue developing your skills in this direction. Thanks again for being here, your Bros Academy.

Create AI Avatars: From Image to Video. Step-by-Step Guide

Bros Academy, null

Watch this class and thousands more

Watch this class and thousands more

Lessons in This Class

1.

Introduction to AI Avatars

1:16

2.

Module 1: AI Avatars - Types, Use Cases & Choosing Your Direction

3:05

3.

Module 2: Visual Foundation - Creating Your Avatar in Practice

8:28

4.

Module 2.1: Creating Multiple Angles and Styles for Your AI Avatar

5:00

5.

Module 3: Script & Story - Writing Short video scenario

3:13

6.

Module 4: Turning Scripts into Speech - Turning Scripts into Speech

7:42

7.

Module 5: Comparing AI Tools - Comparing AI Tools

13:22

8.

Module 6: Creating Background Sound - Creating Background Sound

6:48

9.

Module 7: Final Assembly in Practice - Editing the Video in CapCut

7:32