AI Art Generation: The Complete Workshop from Installation to Mastery (Free using Stable Diffusion)

Marko Smiljanic, Boudoir Photographer and Retoucher

Get unlimited access to every class

Taught by industry leaders & working professionals

Topics include illustration, design, photography, and more

Get unlimited access to every class

Taught by industry leaders & working professionals

Topics include illustration, design, photography, and more

Lessons in This Class

- 1.
  
  Things You Will Learn!
  
  2:38
- 2.
  
  Why Stable Diffusion?
  
  4:13
- 3.
  
  Setting Up Your Free Software
  
  8:09
- 4.
  
  The Art of Prompting
  
  14:44
- 5.
  
  Stable Diffusion Models
  
  16:44
- 6.
  
  Expanding Your Models
  
  17:28
- 7.
  
  Settings and Sliders
  
  29:20
- 8.
  
  Image-To-Image Generation
  
  6:44
- 9.
  
  Adding Elements Using Inpainting
  
  6:00
- 10.
  
  Amazing Extensions!
  
  12:59

Beginner level

Intermediate level

Advanced level

All levels

1,010

Students

Projects

About This Class

Learn AI Art Generation from a portrait and travel photographer with 10 years of experience in the art sphere!

If you're curious about AI Art, but unsure where to begin, this course offers that and more!
It's ideal for Artists, Content Creators, YouTubers, Photographers, Tech and Art Enthusiasts seeking to upskill and take total control of the AI Art Generation process, using both Windows and Mac OS!

I will take you on a creative journey, that begins with setting up Stable Diffusion on your PC - that is completely free, running locally and privately, with no censorship, and gives you total freedom in regards to content you can create! ( Things you can't do with Midjourney! )

I'll provide numerous examples to help you grasp each setting and parameter, demonstrating how to

Create Art using textual prompt, that I will explain in depth
Generate new images using other images as a reference.
Create images with faces, clothing, settings and scenery you enjoy
Take absolute control over poses and compositional elements of the image
Turn sketches into Art
Merge photography with AI art
Find, merge and make the best out of both photorealistic or animated models
How to teach a favorite model of your choice some new tricks!
Make use of various upscaler and upscaling methods
Use extensions and make the best out of them
Use inpainting to generate or fix selected parts of an image
Create AI videos using Deforum
Batch Process, leaving your PC to create Art while you're away

I'll do my best to explain various terms thrown around in the AI Art community, such as LORAs, Hypernetworks, Deforum, Textual Inversions etc, providing examples for each and every thing I tell you about!

This Course requires no prior AI experience, and by the end of it, you will gain absolute understanding of Stable Diffusion and AI Art as a whole, and you will be ready to create anything your heart desires!
Join me, and let's unlock your artistic potential together. I will be happy to be your teacher!

- Mark Smilyanic

(an AI Image created using the Photorealistic model I've provided you with)

(an AI ''painting'' created using the same model I've provided you with)

Meet Your Teacher

Marko Smiljanic

Boudoir Photographer and Retoucher

Teacher

Mark Smilyanic is an accomplished international boudoir and portrait photographer, as well as a professional photo retoucher. With extensive experience in color stylization and skin retouching, he has spent many years educating photographers at his Retouching School in Belgrade. After venturing into the world of online education his courses have been attended by thousands of photographers worldwide.

What continues to fuel Mark's passion for photography is his love for color work, color stylization, glam retouching, and set design--traits that are evident in his boudoir work and environmental portraits. He is also the creative force behind several popular portrait presets, including Amaranth, Ambertone and the film-inspired Kinochrome, which have become go-to tools for photographe... See full profile

Related Skills

Midjourney Adobe Photoshop Computer AI for Creativity & Inspiration AI for Photography AI & Innovation Generative Art

Level: Beginner

Hands-on Class Project

Mac OS Installation Guide (click here)

My own Photorealistic Model, Course Materials, Useful Links and Resources etc..

You can download all the resources used in the class here:

Smilyanic Photorealistic Model - Download from Dropbox
* Your download is ~5GB
Upscaler Comparisons - Download from Dropbox
* Your download is ~100 MB
Course Materials Text Files - Download from Dropbox
* Your download is a few kilobytes

Share your final results and progress shots with the class by uploading them to "Discussions" section or join my Facebook Portrait Photo Community. I can't wait to see your work and progress! If you have any questions or need more tips, please let me know! I'm always happy to help!

Parvana Aliyeva 7 likes

1 comment

Marko Smiljanic 6 likes

Digital Team 4 likes

2 comments

Character replacement using Img2img inpaint

Arturo Rivas 1 like

1 comment

Class Ratings

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Things You Will Learn! : Hello everyone. I'm Mark mine. I am a portrait and boudoir photographer and a professional photo editor. Today I will be teaching you everything you need to know about AIR generation using stable diffusion. I am confident in saying that this is the most in depth course on all things AI art available on the Internet today. Just as I do in my other courses. I'll start with the assumption that you have no prior experience with AIR generation. Breaking down the basics first and slowly increasing the complexity level as the course progresses. Throughout the course, I'll treat you to a no stone unturned comprehensive guidance, ensuring that by the end you'll have grown from an absolute beginner to an experienced user. By the end of this course, you will know how to set up your completely free AI art software from installation to various extensions available on the Internet. You will become familiar with all the tools and techniques required for both basic and advanced use cases. You will learn how to communicate effectively with your computer, a process known as prompting. And how to generate images using both text to image and image to image methods. I will also show you various in painting techniques used to fix and recreate parts of previously generated images. And finally, how to upscale your results using a variety of. What sets this course apart from most AI related courses on the Internet is the in depth breakdown and overview of each and every setting and slider of Automatic 11, 11. The most advanced AI generator available to date, including various examples, comparisons between different settings and more. All with the goal of helping you find your own style and preferred method. This will provide you with a comprehensive understanding of the parameters that you can use to guide the AI generation according to your vision. Additionally, we will cover both the photo realistic and animated stable diffusion models. Where to find and install them alongside textual inversions, luras and files we use to teach our favorite models new concepts. Made a great effort to simplify and help you navigate the sometimes confusing, stable diffusion interface. Including various terms and techniques used by the community. With the goal of presenting them in a way that is understandable even if you have had no prior contact with AI. This course is your one stop destination for mastering AI art generation with stable diffusion. I'd be delighted to have you on board as my student embarking on this creative journey into the world of AI art generation. Let's unlock your creative potential together. I am Mark and I will be happy to be your teacher. 2. Why Stable Diffusion?: Welcome to the first chapter of this tutorial. Before we move on, I want to answer a question that is probably in the minds of many of you. Why not Mid journey, an already popular AI art generator. Why don't we use the Adobes firefly? I can provide you with two kinds of answers. A short one, I would assume you're like me and you don't like being restricted when it comes to trying out different ideas and having to pay for it on top, all while being severely limited when it comes to taking control over your image. A longer one mid journey is a subscription based service, giving you a number of generations to use it. You need an internet connection and all your results are public. Costs of using mid journey are vastly increasing in case you want to keep your results private or get additional generations outside of the basic plan. It's a similar story with Adobe's Firefly and their generative fill requiring both the Internet connection and a paid subscription. Adobe is likely going to charge extra for the generative fill feature within Photoshop two that is in development at the moment of creation of this course. Contrary to Mid Journeys and Adobe's solutions, stable diffusion is both free and running locally on your PC. With a simple tweak, you can run it completely offline for when you want to go off the grid or when on a trip with no Internet connection. The second important reason why stable diffusion is a better choice is the fact that results generated by Mid Journey or Adobe are based on their own large trained models. While these large models are flexible and capable of generating a wide range of outputs, they are limited in terms of the quality of those outputs. Now let's delve into the most crucial reason. Both Adobe and Mid Journey operate as businesses. Which means they need to adhere to strict standards and tend to be very restrictive. In terms of the prompts you can use, mid journey in particular, continuously adds to its list of band words. As a boudoir photographer, you can imagine how much of my typical prompts are already blacklisted or will likely be added in the future. Another significant advantage of stable diffusion is that you have the freedom to train your models on the content you desire, or download pre trained models shared by various Internet users. The possibilities are limitless in terms of what you can create, and there are no restrictions on your creativity. Speaking about the advantages of stable diffusion interfaces over other generative AI solutions, here are some of them you can experience using software such as Automatic 11 11 that we will base this lesson on. A generous number of words allowed in the prompt window. The ability to use negative prompts. Reading prompts from existing content or having the AI search for prompts, generating original content guided by other images. Extensive control over the AI creation process with various parameters, precise control over AI generation seeds. We will cover these in detail. Batch processing and creating AI work efficiently. A wide selection of samplers, a variety of upscaling methods and upscalers to choose from. The ability to install models from the Internet, filling in models using files. We will become familiar with Laura's textual inversions and others. The potential for modding your software to gain even more control, training, stable diffusion. To reconstruct your own face or any other desired content. Merging various models to achieve your desired results. Exploring different types of in painting, which we will also cover. Uploading precise masks created in Photoshop and much more. The only drawback of stable diffusion is that its generation speed depends on the graphics card in your PC, with newer graphics cards offering higher speeds for a more enjoyable experience. If you lack the hardware requirements for stable diffusion, you can also use it inexpensively by renting graphic processing time from Google using Google Colab. In conclusion, while stable diffusion may initially appear more complex to get into, it is ultimately worth the effort, as it does not inhibit your creativity and allows you to create according to your preferences and vision. 3. Setting Up Your Free Software: In this chapter, we will cover the following topics. Pc specifications required to run AI Art programs or as they are often referred to as user interfaces. User interfaces for creating AI art including NMKD and others. Automatic 11, 11, my preferred user interface, and how to set it up for local and offline use on your PC. The installation process, PC specifications needed 16 gigabytes of Ram, Nvidia, GPU, GTX, 700 series or newer. With at least 2 gigabytes of V, Ram, Linux or Windows 7810, or 11 at least 10 gigabytes of disc space. As mentioned earlier, I will be showing you how to run stable diffusion for free. However, if your PC doesn't meet the required specifications, you can still run stable diffusion models using a Google Colab notebook for $10 a month. As of July 2023. I will add a link explaining setting up stable diffusion through Google Colab in the text file in the course materials. User interfaces, the programs used to run stable diffusion models and generate AI art can be either standalone applications or user interfaces accessed through your computer's internet browser. Here are some of the most popular options. Kd graphical user interface and offline standalone application. Somewhat slow with updates, but beginner friendly, playground, user friendly, and offering 1,000 free image generations per day. It is intuitive and fun for dipping your toes in AI Art Dream Studio, similar to playground but with some missing features. Also, it is not free invoke AI stable, though not as feature rich. It provides a powerful user interface, made space offers a simple interface and allows limited free usage with full functionality available through a paid plan Comf Ui newer arrival featuring a node based user interface. Quite powerful but also quite complex to diffusers packed with advanced features. It has a clean user interface and is known for its speed and stability. For a list of more free websites, you can check the course materials text file. If you haven't experimented with AI generated art before, you can start with simpler options like playground AI to get a feel for it. In that case, you can also skip to the second chapter of this tutorial where I will teach you how to communicate with your PC. However, since this lesson covers everything from a beginner to a pro level uses, I strongly recommend diving into Automatic 11, 11 with me A few words on Automatic 11, 11. Automatic 11 11 is open source and it's the most powerful and feature rich user interface out there, offering frequent updates, a continuous stream of new features, and numerous extensions for advanced users. You can generate using text prompts or use other images to guide the creation process. You can also generate a part of the image only instead of a whole one. And upload masks created in Photoshop and so much more. All of the new stuff in the AI art world you can get to try first using Automatic 11 11. This is the reason we will be covering and creating our AI art within this user interface. In the course materials text file, you will find a link that goes to this web page. This is Automatic 11 11 page on Github. You can also Google Automatic 11 11 and find the first link on Github. Don't let the somewhat technical installation process deter you. It's a straightforward step by step procedure. Even if it involves entering some command prompts. You're just going to scroll down until you find the installation and running section. Here you can see automatic installation on windows. There are literally three steps only to install. Stable diffusion, or Automatic 11 11, which is a user interface for stable diffusion. First you need to download Python three pint 10.6 You need to press this link and scroll down here where you're going to find the Windows installer 64 bit download that one, then we'll head back to the installation instructions. On step two, we will download Git. Now, we haven't installed Python yet, but we'll do that in a second. We're going to download Git, which will use the standalone installer 64 bit Git for Windows. Set up, Download that file. Once you've downloaded those files, you will see them in your downloads folder. Start with double clicking the Python, Make sure that you check this box. Add Python to path, this is very important. Then press install. Now, Python is now installing. This shouldn't take more than a couple of minutes. You could now close this. Now we need to install Git. Double click here, we're going to press, you can leave everything here, Default and just press next again. Press Next, as we will not be using Git for anything more than one or maybe two simple commands. You don't need to care about the editor, just leave this again. Default and press next. Let git decide. Press next again, leave everything default and just press next. After you press next with all the default settings, Git is now installing on your computer. Git is the application that we will use to download files from Github, and that is where Automatic 11 11 is stored, developed and updated. Click view, release notes and just press Finish. We have now installed the prerequisites for Automatic 11 11 and stable diffusion. We can now go back to the install instructions and we will copy paste this line here. Now, open Explorer window, create a folder where you want your stable diffusion to be. I'm creating a folder here named a 11 11. I'm entering that folder now I'm going to press up here and I'm going to type in CMD to open a command prompt. You can also open a command prompt from your start menu. But then you won't be in this directory here. You're going to copy paste the git Clone text. Git is the program that we installed. Clone will copy the files to your computer. Press Enter. Now your files are being copied to your computer. This should be fairly fast depending on your Internet connection. For me, it took about 5 seconds. Automatic 11. 11 is now installed on your computer. Now you can start your stable diffusion by using the web UI user file. However, I recommend that we do some changes. First, we're going to open Notepad and we're going to drag this file into Notepad. This will greatly improve your AI generation experience. We're going to add a space two dashes, and write x formers. This will speed up your stable diffusion. Generations will also type auto launch. This will automatically launch a browser window when you start Automatic 11, 11. Now, if you are on a GPU of let's say maybe four to six, maybe 8 gigabytes of Am, you could add med Am. This will lower your Am usage and will make stable diffusion easier to use on your computer. I will use the X former and auto launch commands. All we need to do now is save the file. With the software installed and ready, it's time to move on to the first creative part of this course. 4. The Art of Prompting: Welcome to the first creative chapter of this course. Now that you have successfully installed the user interface we'll be using for creating AI art, it's time to dive into the fundamentals of AI art creation. Whether you intend to use Automatic 11 11 to enhance parts of your images, create assets, or craft entirely original AI art. It all begins with a prompt. You may have come across the concept of prompts or have heard about the art of prompting. In this chapter, I will provide a comprehensive understanding of what prompts are and guide you on how to craft effective prompts the right way. What exactly are prompts if this is your first encounter with the term? Let me explain. Prompts are the words you give to the AI to tell it what to generate. This is how we communicate our creative intentions in a way that the computer can comprehend. As the process relies on words instead of complex programming languages. It's also intuitive for us humans and in practice, much simpler than it may sound. This window here is where we type in our prompts, that are our textual commands. And this area here is a negative Prompts window. This is where you tell stable diffusion what we want to see in our generated image. Here is where we write the elements we want to exclude from the result. Think of prompts as the recipe for the image we want to create. This is the most crucial aspect of AI image generation. When you're preparing to craft a prompt, begin by asking yourself questions about the image you wish to create. What's the subject of your image? What are the characteristics and details of your subject? What additional details do you want to add to the subject of your art? What medium should your result? Try to recreate an oil painting illustration or a photo. Should it be a close up portrait, full body portrait, or a big landscape photo? What art style should your image be inspired by? Which artist and aesthetic? Describe the surrounding environment. How should the light and ambient of your image look like? Describe the color scheme of your shot, such as teal and orange. A lot of models respond well to quality tags. Those are the words and phrases in your prompt, such as masterpiece, best quality, intricate details, high resolution, et cetera. Make sure that the model window is showing V 1.5 pruned EMA only if not, consult with the word file I've provided with the lesson. Before we give our first AI generation a go, it's important not to get disappointed on your first run. We are using a base model that's coming alongside Automatic 11 11. Simply so you can get a feeling or how prompting works. I promise you'll see your results getting way better as we progress through the course. One thing to mention is that you will probably get different results than me even if using the same exact prompts. And that depends on a lot of factors such as the graphic card, you have version of your software, and so on. With that out of the way, let's try out prompting together. Like mentioned before, let's answer to those questions laid out earlier. Subject, subject description and details. Let's say golden retriever dog with big black eyes and big ears, medium of our generated image. I will go with the illustration in the style of cartoon shot type or angle. Let it be the close up shot style children cartoon. Maybe surrounding elements in a park color. Vivid colors, colorful lighting on a sunny day, morning, light shining through the trees surrounding environment, birds flying in the background. Let's hit the Generate button. This is my result. Of course, you can try your own prompts instead of the ones I've chosen. Keep generating a few times until you get to something resembling a result you like. You might get it on the first run or you might not. It's a bit like lottery. The first time you do this, when you get to something you like, lock the seed by typing one in the seed window in order to loosely lock the compositional elements in the image. Don't worry, I will tell you everything about seeds later on. This will serve us well in order to compare the upcoming results with our first one. Returning to the result of my first generation, it was a good start, but neither great nor terrible. Let's see how we can further improve it. A good idea would be to add a few of the quality prompts, such as masterpiece, best quality, intricate details, high resolution that I have mentioned earlier. Let's click Generate again. That is definitely better. Now, I'm not so sure about those pink or red trees in the background. How are we going to take care of those and make so that they don't appear in our future generations. It's time to learn about the negative prompts. We use negative prompts to describe what we don't want to appear in the image. We can also use them to alter the style. For example, minimizing animated results in case we're going for realism in our work. Or to exclude certain features, such as facial hair on people, et cetera. Using the positive prompts from our earlier generation, let's test out a few negative prompts. Purple trees, red trees color. Let's hit the Generate button again and much better. As you can see, negative prompts can heavily impact the result. Remember the universal quality prompts. There are also some negative prompts that can affect the quality of your results. Such as you can use them with most of your generations too. You can find all the prompts in the word file provided in the course materials where I've typed all the prompts out for your convenience. Now let's return to the image of our dog and try to further improve our results. We will do this by adding a bunch of these quality negative prompts to those few negative prompts we typed in earlier. Let's press Generate button again, assuming we've just started not bad at all. Some additional prompting tips. We, there's a clever trick that can help us emphasize a particular word within our prompt. Placing a word in round brackets increases the emphasis on that specific keyword. The community calls this putting weight on a keyword. You can see an example using the image with our dog. I would emphasize the birds flying in the background part of our prompt by writing the sentence in brackets like this. Let's generate. Again, it's not the best looking bird, but there's one more in our image. Let's keep increasing weights by adding a second set of brackets like this. There are significantly more bird like animals in our current photo. Each bracket represents 1.1 times increase in weight. In other words, science. For now, don't obsess over the imperfections in our result. As the painting chapter deals with this, this is simply a demonstration of how prompting impacts the result. You should be cautious when adding weight to your keywords, as adding too many could lead to various artifacts. This usually happens when the generation process becomes confused about what to emphasize. In such cases, it's better to restructure the prompt. We can also restructure our prompt and use a flock of birds instead of birds flying in the background. You can see it works as well. Remember, you can do the weight optimization in the negative prompt window. Or you can use a different type of bracket to suppress the strength of objects in your prompt by using these square brackets. Here's a time saving tip. Typing brackets by hand can be tedious. There's a neat trick you can employ here. If you want to increase or decrease the weight of a keyword or a couple of keywords at once, select the word or words with your mouse and press the control plus arrow up key combination to increase the Or, Control plus arrow down key combination to decrease it Re ordering keywords. Even if we decide not to change the keywords in our prompt, their order in a prompt plays a major role too. I will demonstrate this by only moving the keyword, close up shot to the beginning of the prompt. The close up shot that I've used as part of the prompt also carries strong associations with photography. Moving the keyword to the beginning of my prompt, It seems to communicate to the AI that my desire is to place a stronger emphasis on it, even if my intention wasn't to achieve photo realism that I've ended up achieving. This demonstrates how sensitive and susceptible to changes the final result can be. This case, if I wanted a really tight close up frame, I could have achieved it by placing more emphasis on the phrase close up or omitting the keyword shot. Or restructuring my prompt to say, macro perspective of a dog's nose while leaving other parts of the prompt unchanged. A lot about AI art generation is about getting a feeling by simply playing with it and experimenting. In our examples covered earlier, we have used a default stable diffusion model that isn't used for much besides demonstration purposes yet. It has helped us get a better understanding of the process. You'll be amazed with how much more you can achieve with a custom model in the upcoming chapters. You can now unlock the seed by typing minus one into the seed prompt window. This will randomize each generation again, in case you want to generate different looking images using the same prompt instead of sticking with the composition we had before. Remember, we'll cover the seeds in details further on in the course. Also, there's another important aspect you should know about when building prompts two different types of prompting You can experiment with the main one that is used by the majority of users and another one that's bit less rigid and more reminiscent of natural language and how we speak. Taking our earlier prompt example, you can try writing a grammatically correct sentence in the prompt window, such as an illustration of a golden retriever dog with big black eyes and big ears in a park on a sunny day, with morning light shining through the trees and birds flying in the background, drawn in a masterpiece colorful style of a vivid children's cartoon, in best quality, with intricate details and high resolution. As you can see, the method works quite well too. So what is the correct way to do it? Unfortunately, the answer is, it depends on the model you're using. I would advise following the fragmented style explained earlier because this is the prompting style that more models are trained to understand. Blending two keywords, Are you interested in combining two keywords or combining faces in your AI generation? To do this, use this syntax in your prompt. The number allows you to control how much of the blending is supposed to be done. 0.1 reduces the strength of the first word. 0.5 mixes the two words in equal measures. 0.75 puts more emphasis on the first word in the syntax. For example, you can use Emma Watson and Harry Potter followed by the number keyword. Swapping is a technique tailored for this purpose. Essentially, it serves as a valuable method to create fresh and unique looks by merging two existing ones. Mentioning a celebrity's name in your prompt can have a significant impact on your result as the training data used for the model likely includes many images of that celebrity. However, if you wish to have a consistent face across a variety of generations, yet not easily recognizable, incorporating the names of well known actors and actresses, and blending them together enables you to merge two distinct recognizable faces to create a brand new one strain. This tip is not directly related to prompting, but it can help prevent eye strain. Especially when using a large monitor where the prompt text becomes tiny. You can hold the control key while using the scroll wheel of your mouse to zoom in on the interface. This allows you to see the text and type more comfortably. Saving prompts as styles by utilizing the pencil icon located here. You can save a collection of positive and negative prompts to use alongside the ones you've entered in the prompt window. For instance, if you've crafted a set of photoralistic prompts that you'd like to apply to various subjects, you can simply type the subject, importing the remaining prompt from your saved template. One thing to keep in mind is that we are all early adopters of this technology and you should take pride in that fact. The technology is still in its infancy and can be somewhat complex for beginners with a lot of ground to cover. As fun as it is, the whole process is prone to artifacts, mistakes, and imperfections. However, this shouldn't discourage you from delving deeper into it, as the community is working hard to find a variety of ways to reduce those mistakes and train the models better, finding inspiration. When it comes to prompting, there are several places on the Internet where you can find inspiration and see how other people craft their prompts. You can visit Civet I.com Explore Page or Mid Journeys Showcase Page, where you can view images created by the community members using different models and prompts used to generate those images. Clicking on an image will often display the prompt and the model used. Speaking of inspiration, I personally enjoy having fun with AI art generation because it allows me to be creative in various fields in which I have no expertise in, such as drawing or painting. It also lets me envision my photoshoots in advance as photography is what I do as my main profession or use AI generated elements that I can incorporate in my own photography. Photo manipulation used to be something I had never been as good at and it has never been as enjoyable as it is. Now I have a deep passion for technology and find it intriguing to witness how a computer thinks and creates art. It's rewarding to utilize a machine primarily composed of processors, wires, and calculations to produce something as beautiful as art. As a boudoir photographer, I not only teach skin retouching and color grading, but also offer courses on integrating AI imagery with photography. With AI art generation, I have the capability to create concepts that have never been seen before. Explore fictional historical scenarios and art styles and craft imagery inspired by the paintings of my favorite artists from the past. Among other things, It's a pleasure to be alongside you, my students, at the forefront of something new where creativity knows no bounds, allowing us to expand our creative potential. I'm fully committed to this journey and hope you enjoy the upcoming chapters. 5. Stable Diffusion Models: In this chapter, we're going to cover one of the most important elements of AI image creation. We've mentioned stable diffusion models a few times before. What are they and what do we use them for? User interfaces like automatic 11, 11 are nothing more than powerful tools that allow us to run different stable diffusion models. To put it simply, or to find a real life analogy, our graphic interface, Automatic 11 11 provides us only with the blank canvas. The model we use is our palette and prompts represent what we're going to paint. Models, The most crucial part of image generation contain all the information needed to generate images. And the subjects style and quality of the images we generate depend completely on the model we use. Due to the data used to train that model, We won't be able to generate an image of a cat if there have never been images of cats in the models training data. Likewise, if we only train or use a model with images of cats, we won't be able to generate images of cars. Soon after the release of the first public model, the community started to build on top of it, creating specialized models that perform way better than the base one. These models are usually focused on a specific style subject, mood, et cetera, such as children's animation, poster art, not safe for work imagery, photorealism, cars, anime and more. Many of these models retain a lot of flexibility on top. There is now a huge number of various models available on the Internet all for free, so you can never exhaust all the possibilities when it comes to your creative ideas. So far we were using the model called Stable Diffusion version 1.5 It is a default base model that can be used to determine if our software works well with our hardware. It's flexible. Not as good when it comes to specific styles. You know that saying a jack of all trades is a master of none. Now it's time to cover the exciting part, custom stable diffusion models created by the community that are far superior to what the base model can do. Where do we find all these models? I hear you ask. As mentioned before, a website called Civet is a large repository of all things AI art related, where you can find models, photo examples, alongside prompts for each model. Lots of new models are appearing daily with image examples, parameter descriptions, prompts, and more. We will be focusing on this platform for all our AI art generation needs. Before using Civet AI, you should create your account and if you wish, enable not safe for work results. Because even if you're not planning on using those capabilities, many good models might be filtered out from your search. Also, you can activate dark mode right here. As browsing through a white page, looking at prompts and imagery can become tiring for your eyes when generating images. You can easily mitigate any not safe for work results in your prompts by staying away from such keywords in your positive prompt and adding keywords, nude, nudity, nipples, naked, et cetera, in your negative prompt. As an additional safety measure, there are a few other places you can find models at hugging face. It is another large repository of various AI models used for everything from science applications to generative art which we are concerned with. The interface is rather dry, often with no photos for chan, risky place to find models that can have viruses and ransomware packed within. I would advise against looking for models here. The biggest benefit of using stable diffusion is that unlike Mid Journey and Adobes Firefly, which are both very restrictive in terms of what ideas you can toy around with, there is no limit to what anyone in the community can train a stable diffusion model to do. Stable diffusion models come in two different formats, KPT and Safe tensors. Download the safe tensor version of the model whenever it is available. If not, make sure you download the CKPT files from a trustworthy source. As safe tensor files can't be packed with malicious code, you should be worry free using models found on AI. As you will see, the majority of models were trained on animated art with varying levels of photorealism. However, some were trained or merged to be as photorealistic as currently possible. Speaking of photorealism, a new model type currently in active development is called SDxl. Aiming to achieve even higher generation resolution, legible text and photorealistic results. And these are the models trained on larger images than 512 by 512 pixels and 768 by 768 pixels, which most other models are trained with. Stable diffusion, Cel models take significantly longer to produce an image, but the results aren't necessarily twice as good as the resolution makes it seem. Generating images this way requires a secondary refiner model that also takes additional time to get loaded during the generation process. For now, for practicality and generation speed purposes, let's stick with the regular checkpoint models. I will show you the SDXl models later on when dealing with image size and the settings and parameters chapter of this course. Sometimes a model made by the creator whose work you like can have multiple variants, be on the lookout for those. Usually, different variants of the models will be shown here. The same creator can sometimes publish the same model in two stylistic versions. Or it could be a model used primarily for image generation or a model with additional data, non pruned, suitable for further training. As our plan here is to create art rather than train our models. All you should be looking for are the pruned models. They contain only the data needed for image generation, saving you a lot of disc space. And trust me, with models being 5 gigabytes on average, they can swallow a lot of your disc space. Fast speaking of disc space, the same goes for FP 16 versus FP 32 models. When given the choice, choose the FP 16 as the FP 32 models contain a lot of data you won't be needing for image generation. A creator can update their models with a newer, additionally trained version. In the meantime, if you like a specific model check from time to time the models page on Civet. Often it will be in the description section you will find what makes a newer version unique and different than the previous one. Of course, not all models come in a variety of versions. But some popular models creators are updating and retraining their models to perform better. And are often publishing the results within the same page. Now let's take a much needed break from all the tech talk and test how a different custom model performs in comparison to the default model we were using before. I've developed a photorealistic model that I've extensively tested during the creation of this and my other AI photography compositing course. I have found it to be very capable of providing a wide variety of photorealistic results. Still being perfectly capable of delivering illustrations and other non realistic results too. You will find this model in the course materials where I will provide you with a download link. All the downloaded models are installed the same way It is done by placing them in the Stable diffusion folder. Found within Stable Diffusion web is models folder. After placing the model, be sure to refresh the model drop down menu found here. By clicking on the refresh icon for this generation, I will use my own model provided with the lesson. Now let's return once again to the prompts used earlier and our good old friend, the golden retriever. If you have been using different prompts than me, that's perfectly fine. You should re use those again with this example. I just want to show you how much a model, even when used with same settings and prompts changes. The final look, pay attention to this as it can save you a lot of time. Instead of typing the whole prompt again or copy pasting your prompt from a text file, you can re use a prompt from an already created image. This is how you can quickly get to your generated images by pressing the folder icon. Navigate to the PNG Info tab shown here, Browse Four or Drop in Image into the window. And then simply transfer the prompt and generation parameters by clicking Send to TXT two EMG. This tool provides you with data including prompts, negative prompts, seeds, models, used, extensions, used and more. It is here for our convenience, allowing us to see the creative recipe that has led to the image we are examining. Sometimes the creator will be using their own mixed model, or a model expanding file, such as a Loro file that you might not have yourself or he could be using another AI image generating software. In these cases, you won't be able to replicate the same exact result, but at times you can get quite close to it. The PNG info can also help gain deeper insight into the process of image generation. Or how a model is responding to various prompts and parameters. If an image created by someone else possesses data you're still not familiar with, don't fret as we're going to go over various extensions and additional files in the upcoming chapter. With that out of the way, let's load our custom model. Loading a model takes some time. Now that it is done, let's re use the prompt as discussed earlier and hit Generate button again. Pretty damn nice. Now let's compare the result with the images we created using the default model. We're going somewhere with all this. Let's try to bring our dog to life by trying to generate a photo realistic result instead of the ones inspired by cartoons. I've changed my prompt to say close up raw photograph of golden retriever dog with big black eyes and big ears. Camera photography in a style of Annie Lebowitz Getty images. Cannon 60 is 135 F 3.5 in a park. Vivid colors, colorful on a sunny day, morning, light shining through the trees, birds flying in the background. Masterpiece. Best quality, intricate details, high resolution force. Let's press Generate button again. Let's unlock the seat, as it probably got locked. When we transferred the image data from the PNG info window and try a few more generations, four of my non cherry picked results are compared to what I was getting with a default model. It is far superior with way fewer artifacts and still capable of delivering both animated and realistic results. After covering some additional tips on finding and experimenting with models, I'll show you more ways to further improve and enlarge your results. Searching for models, the models you can find on Civet AI will either be trained by a creator of the model or they will be a so called merged model containing multiple other models. Using the method I will teach you about at the end of this chapter. Sometimes you will find them under the name checkpoint merged. You will find models on CivitAI under the classification checkpoint. The size of the model files on average is going to be 2-7 gigabytes. To look for models only without other AI content we'll be covering later. Activate the search filter that is located here by clicking on checkpoint option. Keep in mind that the filter location and look might change in the upcoming months as the website keeps evolving monthly. When it comes to the overall style or feel of the model, as you will see while browsing through CivtAI, all the models can be roughly divided into two main categories, photorealistic and illustration oriented, also known as anime models. Most models, regardless of their stylistic leanings, are still trained on a wide variety of styles. And to some extent, capable of delivering both photorealistic and animated styles. As you've seen with the model I've provided with the lesson However, you will be able to easily spot the model's main style by browsing through the images. A model can gravitate towards a specific ethnicity too. However, you can use both positive and negative prompts, such as Caucasian, Asian, white skin, black skin, et cetera, to better navigate the AI towards the desired result. Some models could have their own special keywords that the model has been trained to understand. Keywords are there to trigger the style a model is specialized in. Most of them will be listed in the description. Once you click on the model, it would be good to pay attention to the words a models creator is using in the prompt. In the example provided alongside the model, sometimes the trigger words are going to be shown on the side. The choice of model depends on nothing more than your aesthetical preferences, alongside prompts given in the examples. A lot of models are going to have notes on how the creator uses their model, including parameters, trigger words, and other tips that seem to make the model work best. My best advice is to check both the preview images and their prompts, alongside the author's notes, if available, as they are going to give you the best chances of obtaining great results with a model you've chosen, or at least a similar look to the preview images the author has provided. Sometimes you will notice a sign, these are the Laura additions which are there to teach the model a new concept. They provide additional flexibility to the model and we'll be covering them in the next chapter. Remember that no matter the model we go for, we can use the negative prompt window to suppress certain aspects using prompts such as illustration, anime cartoon, photorealistic, et cetera, photorealistic models. The model I've mixed and provided you is capable of creating great illustrated results. But where it excels is at photorealism. However, it's by far not the only one. In order to further enhance the photorealism in our results, we should be using photography oriented trigger words. In our prompt, I will provide you with all these prompts in the course materials so you can copy them or save them as styles using the pen icon that I've shown in the prompting chapter. Remember that some models could have their unique special keywords that the model has been trained to understand too, anime models, as there is a huge amount of artists drawing or painting different styles and all of them significantly differing from one another. It would be hard coming up with some universal prompts. Animated models, what usually works would be the subject in the style of the artist's name. I will give you an example using a very popular anime style, that is the style of Hayao Miyazaki, who runs a famous anime studio called Studio Gibe. I will run an anime style prompt using the model I've provided you with. Once again an image of our dog. I will build off the prompt we used at the beginning of our lesson, but adding some new and specific anime oriented prompts, This will be my first time running this prompt using a model I mixed specifically for photorealistic results. I am not sure how good it will perform. Let's hit Generate, not bad at all. This also goes to show how flexible some models are. Instead of overloading your hard drive with gigabytes of various models, you should definitely try out what your favorite model is capable of. If you end up liking a certain model that seems incapable of delivering a result you're looking for. Wait until you hear about Laura's textual inversions and more that will allow you to quickly teach your model new things. Some anime related prompts that you can draw inspiration from. Let's try something completely different. A futuristic version of a dog in a style of a currently popular game, Cyberpunk 2077. You may want to increase the resolution a bit so you can see the more complicated elements of our prompt shine through. I will show you the implication of resolution and other parameters in one of the upcoming chapters of this tutorial. For now, let's set the resolution to 840 by 840 pixels. I've used the original dog prompt and changed some of the keywords to better reflect the futuristic neon style of Cyberpunk 2077. Let's hit Generate button again. These are some good results. If you're tired of our good old friend, you can experiment more using your own prompts, trying out everything that you have learned so far. If in your experiments you have created an image of a human, sometimes you might notice the further the face is within the frame, the more it might get warped. In the next chapter, I will be teaching you how to teach your model new concepts. How to create people's faces, add elements the model is struggling with, and so much more. The models are fun, no doubt about it. But what comes next is what makes stable diffusion amazing. 6. Expanding Your Models: Welcome to another exciting chapter of this course. Hope you're having fun so far learning about stable diffusion. This one is going to be an exciting one as I will be showing you many ways you can teach your preferred model some new tricks or helping it generate better the idea you've had in mind. Before. We move on to new kinds of files we haven't been dealing with. It's time to show you another cool trick you can do with checkpoint models model merging. Another fantastic thing about automatic 11, 11 user interface is that using it, you can merge two or even three models yourself into a new model. By merging multiple models, you're giving your merged model the abilities of all the models you've included in the process. Each stable diffusion model has its own strengths and weaknesses. And merging them can help mitigate their limitations and enhance their strengths. Let's say you like a model that can create cats in a very interesting art style, but it has been trained to create nothing but cats, and you would really like to see a dog generated in a similar art style. This is where model merging is useful. By merging these two models, you'd create a new one capable of generating both. Additionally, it's good to see what prompts are triggering the art style you enjoy so you can put more emphasis on it. The new model isn't going to be delivering only the art style of the first model, but the art style of the second one as well, that you may want to suppress using negative prompts to merge the models. Navigate to the Checkpoint Merger tab, where you're fine. Drop down menus that will allow you to choose up to three models. And the multiplier slider. The more to the left the slider is, the more the final model is weighted towards model A to the right model. If you set the weighting to zero, then the final result will be identical to model A. If to one, then identical to model B. Once you've decided to mix the models, my advice is to pick the weighted sum and set the multiplier value according to your wishes. Hitting Merge will take some time and a new model will be added to the directory. To use it, you should refresh the models in the upper left corner first. Now that we have covered everything that is to be known about checkpoint models, it's time to tell you a bit more about the other kinds of files used for AI generation. You can find on Civet, AI, and other platforms besides stable diffusion models or the checkpoint models that require no additional files. In order to generate AI art, you can find a number of files that can expand and teach your model new concepts. They all must be used alongside a model. Some of the new concepts a model could be expanded would include subjects and characters, art styles, clothing items, facial expressions, props, poses objects, photography styles, various interiors and exteriors and many more. These additions to your checkpoint files can also be trained to affect not only the generated subject or style, but also sharpness, level of detail, contrast, how dark the black tones are or any other such balance of color and light. Overall quality of your image generations. Skin detail or level of skin imperfections help you keep a generation detail the same across multiple image generations. It's hard explaining these model additions in detail without getting too technical. But to keep things simple, you can understand them as a sub model or a model infusion. There are a couple of file types of this kind and they are, on average, way smaller than the model files, ranging from 14 kilobytes to 250 megabytes on average, and flexible enough to be used with any model. They can be helpful when trying to achieve a result. The model itself isn't trained to understand and generate. And they are a quicker and often better solution than let's say model merging we've covered earlier. Placed inside their corresponding folder, inside Automatic 11 11 installation directory. The file gets automatically installed. All you need for your Automatic 11 11 to recognize them and include them in Generations is to hit Refresh. Then you need to refer to them by typing a trigger keyword related to the file itself in the prompt window that will activate the effects of the model edition we just installed textual inversions, also called embeddings are the smallest of the bunch, typically ten to 100 kilobytes, and are very practical due to their size. People often use them to introduce a new character to the model, although they also can be used to teach a model different concepts. A great thing about a textual inversion is that you can create them yourself by using a training process in Automatic 11 11. This process allows you to create a textual inversion trained on images of yourself, your friend, a family member, et cetera. Most creators on Civit AI are uploading textual inversions trained on faces of various public figures, actresses, Instagram models, et cetera. This is the installation method. Remember, you must use them with a checkpoint model. All textual inversions and any of the future model infusions we're going to be learning about are either trained on a base stable diffusion model or using a specific model. The will, of course provide somewhat different results based on the models. They are used alongside with all your installed embeddings. The other name for textual inversions are going to be shown here. All you need to do is click on the one you wish from the list, and it will be automatically added to your prompt. Then you can use it like any other keyword in your prompt and move its position within the prompt. On CivitAI, I have found a great textual inversion that can introduce the concept of hazy light to my image generation. Here is a generation result without the textual inversion used. Here is a generation result with a textual inversion haze light used at the beginning of my prompt while the rest of the prompt remained unchanged. An interesting development are the negative embeddings. And these files are trained on bad quality images by placing their corresponding activation keyword in your negative prompt. With some models, you'll get better image generations. Certain negative embeddings can help reduce low quality image artifacts or reduce the chance of poorly rendered limbs or hands, which are generally common issues with AI image generation. At this point in time, let's try generating an image of a person, which is the primary use of textual inversions. We will retire our golden retriever and try something new. I want to create an image of a person in a photo realistic style. I will bump my resolution to 512 by 768 pixels, which allows a bit more of the photorealistic elements to come through. Keep in mind that we will deal with resolution and all the other automatic 11, 11 parameters. In the upcoming chapters, I will start with a prompt focused on photorealism, but without a textual inversion first. Now I will include a textual inversion trained on a specific face. Note that some of the minor elements have changed too, but the most significant difference is apparent in the face of the lady we generated. I will now utilize an automatic 11 11 extension called after detailer, which enables me to modify the face only. I will explain this in the extensions chapter of this tutorial. I will use a new textual inversion, This time trained on a different face. Even though it's so small, 14 KB. Only the impact of a textual inversion on our image generation can be significant. That you've tried and experimented a bit with textual inversions, it's time to show you an even more powerful model infusion called Laura. Laura's abbreviated from low rank adaptation. These are my favorite model infusion files. Everything that I've told you a model infusion can do. Laura files are capable of, they are larger and more powerful than textual inversions are typically between ten, 200 megabytes in size. They can introduce virtually anything to your model, Some quality improvement Lauras are already popular in the AI community, such as detail tweaker, noise, offset film, grain, age, slider, et cetera. As they work with almost all the models, don't forget. Same as with textual inversions. You must use them with a checkpoint model. To install them, they need to be put in their corresponding folder in the Web Ui Laura folder. Once placed there, all you got to do is hit Refresh Laura's. Use a similar method of activation as textual inversion. All you need to do is navigate to the Laura tab and click on the one from the list you wish to use, and it will be automatically added to your prompt. Some Las can be stand alone in the prompt, requiring nothing more than selecting them from the Laura list, while others perform better. If you include a necessary activation keyword, you can inspect your luras for the activation keywords and what specific words are used to trigger effects within a Laura. Let's take an example. A Laura inspired by the art style of a Polish painter, Zuzizlobixinski, once selected from the list and added to the prompt, it will look like this. These brackets are used to differentiate from other words in your prompt and activate a Laura. The word within is Laura's name given by the creator. While the numerical value represents strength, normally it goes from 0.1 to one, and exceeding these values isn't recommended. Let's bring back our good old friend, the dog. I will use the model I've provided you along with the lesson, the one I've been using for all our previous image generations. And I will re use the prompt from the beginning of our lesson without the Laura. First, let's read the data from the PNG info, transfer it to the text image tab. Hitting the Generate button, we are greeted with a familiar result. Now I will use these prompts again by adding a few of the Bksinski related prompts you can see on the screen. I will do it without a Bsinski Laura. I can check if my model has been trained on any of Besinski's paintings at all. As you can see, this model hasn't been trained using any imagery by this artist. This is where Laurs could be of great help. Now let's increase the image size by a bit, 840 by 840 pixels. So we can allow the details characteristic of Besinski's work to shine through and include a Laura file in our prompt. While leaving the rest of the prompt the same. I am 99% sure that the results we're going to get this time are going to be a drastic shift from the cute animated style we started with. Even if there are no changes to the prompt, this is way closer to those apocalyptic scenes presented in Besinski's work. Let's try to clean up our prompt and remove the children illustration related keywords, replacing them with new keywords better suited to imagery, color, palette, and motives found in Besinski's art. I will not change the strength of the laura and only focus on the keywords in the prompt. Much closer to the scenes in Beksinski's work. Now what if we want to use multiple Lauras in our prompt? A general rule of thumb when it comes to using two or more Las in your prompt is that the combined strength amount should not exceed a value of one. You may still go over that value and a model will generate just fine. But in most cases, it would get confused, producing results with various artifacts in case it gets lost over what Laura should give the priority on Civit AI. You can usually see the recommended settings by the Laura's author. Some Lauras will produce a desired effect at a lower value than others, as there are so many methods of training and so many Las out there considering a variety of models and prompts to be used alongside. The best way is to test yourself using an SD model you enjoy. Let's use our usual prompt and try increasing a value of Laura way beyond one and see what happens. I will start with no Laura. A Laura value of one and a Laura value of three. As you can see, more isn't always better. With the value of three starting to increase artifacts in our generation result and making the result stay further away from the original prompt, Let's try adding two Lauras and exceeding the recommended values. This is how our usual illustration of a dog in a park prompt with broken mirror Laura added looks like this is the same prompt, a broken mirror Laura set to a strength of one alongside a detailed tweaker. Laura from an earlier example set to the same strength. You can already see some strange things here. Loss of composition, flying fairy dogs, duplication, artifacts and more. Now that you've gained some insight into how Laura's work, it's time to cover chorus. These files belong to the same family as Laura's. They are a newer development but not necessarily better. Let's say Lechorus is somewhat more expressive than Laura, but this doesn't matter too much to an end user as that too depends on a lot of factors. They are used in a very similar way to Laura's and sometimes require a trigger word for the generation process to extract from a licorus everything that it's capable of. I've tried testing them without trigger words, and it's a hit or miss to look for them on to activate the Licorice filter. Once you've found a Licorus, you'd want to try download it as usual and put it in the Laura folder to even if they are called Licorus and not Laura. For simplicity, you can install them in the same folder as they belong in the same family. To use them, just select them from the Laurea list and once added to the prompt, they will to look like a Laura. For any reason you want to separate your Licorus files from Laura's, you can install an extension using the method I will show you in the automatic 11 11 extensions chapter of this tutorial and place all your Licorice files there. In that case, you'll be selecting them from a Liqorus tab with no difference in their actual use. As always, after installing one, hit refresh so that it will show in the list. Before using a Licorus, you can inspect the trigger words here by clicking on the info icon. You can also look for the trigger words here. Just as with Laura's, you can pick from the list and adjust the strength. Placing the trigger word closer or further from the beginning of the prompt can also affect the result to a degree, which is a general rule about prompting. Just for fun, I will use our dog to show you both the use of licorus and the importance of a keyword order at once. We'll use a Liicorus trained to produce images of trucks. I've only added the Lichorus set to the strength of one to the usual prompt we've been using before. Let's do the same prompt again with the only difference being the word order for a change. Here is a proper use of Alchorus trained on fashion inspired by the golden winged birds from Buddhist texts. Besides textual inversions, Lauras and lycurus, you can find a couple of additional files on Civet Doi used for similar purposes. Hyper networks. Hyper networks represent additional network modules added to checkpoint models. They are on average around 80 megabytes to explain them on a deeper technical level. After an image has been partially rendered through the model, the hyper network will skew all results from the model towards the hyper network training data, effectively changing the model, in simpler words, to an end user. The results are going to be similar to what we could get using Laura's hyper networks. Do not need trigger words. Just adding the hyper network in your prompt is enough. With previously mentioned files, you must use hyper networks with a checkpoint model to browse through hyper networks on Civet AI. Let's activate the filter first. The installation method is similar to installing all the previously mentioned files with hyper networks being installed in their own folder. I will use a Hyper Network Louisa vintage train to produce colorful vintage style headshots with an image that we've used before. In order to use them in your prompt, pick from a list and set the strength just as you do with a Laura. Interesting results but definitely not something alike the examples provided on Civet, This hyper network is trained to produce headshots. Let's try something that's probably closer to the way it was imagined to work. Quite nice. One more file type you can find on Civet are the aesthetic gradients since they are more of an extension than a file such as Laura. We're going to cover them in the extensions chapter of this tutorial. Tell me how are you doing if you're in the mood for a break or experimenting with different prompts and models. Go ahead. In the next chapter, we'll delve into optimizing our generations, upscaling them to larger sizes, maintaining the essence of our generations while introducing variations and much more. Following chapters are going to take your generations from nice to amazing. Now that you know the basics, I will show you how to merge your AI creations with your photos. How to generate using images. How to blend images. And how to fix various generation issues. How to bump resolution in detail. And how to properly upscale your images. Next chapter is going to give you the ultimate understanding of image generation processes and give you the keys to creation. We still have a lot of fun ahead of us, so get ready for the next chapter of our adventure. 7. Settings and Sliders: Now that we have covered prompts and various files needed to create AI art, let's tackle the parameters that guide the process of AI art creation. The things I'm going to teach you in this chapter are just as important and capable of heavily affecting our final results. Don't get intimidated by the variety of sliders in Automatic 11, 11. With most of these you won't need to play around too often, as either you won't be changing them much or you'd be loading them automatically from another image using the PNG Info method I've shown you before. The lower portion of my Automatic 11 11 interface might differ slightly from the one you have. As I have added plenty of extensions to mine, I will tell you all about them in the chapter dealing with extensions which also come in the shape of various tabs and sliders. Two, let's begin with the most important options and parameters that are going to be common for any automatic 11 11 user. We will start with the most intuitive one that has the biggest effect on our result, image size. The image size parameter determines the size of the generated image. The standard image size that stable diffusion version 1.5 is trained on is 512 by 512 pixels, which is the models native resolution. Some newer models are trained on images with a 768 by 768 pixel resolution. And the newest SDX L models are trained on 1024 by 1024 pixels. However, these larger models take significantly longer to generate and require a refiner model in addition to the general one. When using the higher solution fixed method or various up scalers, the image size will represent only the initial step in the generation process, not the final pixel dimension of the generated result. In this case, one part of the process generates an image at, for example, 512 by 512 pixels, while the rest of the process increases that resolution further. However, let's not delve too deeply into that. For now, we stick with the basic use of image size. Even a slight change can significantly alter the result. If you lock the seed to retain the compositional elements of the image, changing the image size might completely disrupt the intended composition. Generating results closer to the native model. Resolution increases the likelihood of successful image generation and avoids issues such as two bodies or multiple heads in the results. While 512 by 512 pixels is a small resolution, it is often used as the starting point before upscaling the results to the desired larger resolution. Keep in mind that some models are trained on higher resolutions or different aspect ratios than a square image, and you can usually find that information in the notes left by the models author regarding the aspect ratio. The little up and down pointing arrows allow you to quickly swap height and width dimensions, facilitating a quick change between portrait and landscape orientations. Naturally, if you're seeking human portrait oriented results an aspect ratio closer to the usual aspect ratio of a portrait. Larger vertical dimensions than horizontal ones might provide you with a better result. The same principle applies to landscape imagery where a longer horizontal dimension might generate a much better scenery or landscape image without the use of higher solution. Fix explained further control net and various upscale methods. Your image size shouldn't deviate too far from the native resolution of the model. You can determine the resolutions at which the model performs best. Different GPU's will generate at different speeds. So instead of generating everything at a larger result and risking plenty of poorly looking generations, it's advisable to generate them at a lower resolution. Will also be faster and then upscale or repeat the generation using his fix and up scalers. The model I've provided you with generates the best looking results at a satisfactory speed at sizes of around 85850 pixels. This is the image generated at normal aspect ratio and recommended native resolution of the model. Now let me show you what happens when we deviate too far from the native resolution. This is an example with a vertical side far exceeding the dimensions the model was trained on. This generation artifact is known as duplication or twinning is happening due to our model suddenly having to fill in a much larger space than the one it's been trained to fill. Duplication and twinning refer to unwanted duplication or multiplication of features in your creations. For instance, this might result in characters with two faces or two heads, extra limbs, et cetera. This is what happens when both sides are largely exceeding the dimension the model was trained on. In summary, stick close to the native resolution. Now that you have a grasp of models and image dimensions, I will tell you a bit more about the SDXl models. As mentioned earlier, SDXl is a newer development aiming to achieve a better level of detail, much improved photo realism, and higher native resolution SDXl models are trained on 1024 by 1024 pixels. And can be used with or without a refiner model. The refiner model is another, often smaller model added to the original SDXl model that refines the details When downloading an SDXl model, make sure to download a refiner if it's added or hinted at alongside it. Refiner models are installed in the same folder as general checkpoint models. You can pick them from this drop down menu. The recommended value for the switch at slider is between 0.7 to 0.8 and serves as the point at which the generation process using a general DXL model stops and switches to the refiner model. At the moment of writing automatic 11 11 isn't very efficient at running SDX L models quickly and switching between the model and refiner model can be slow. Plus SDXl models use a lot of computer memory to create images right now using SDXL models and automatic 11 11 might not be the best use of your time as the results may not always be worth the much longer generation time. Probably the currently best and time efficient way is to generate images using the base model first without the refiner. After that is done, you can collect a batch of images that you like to use the refiner on, then do the refiner step through the image to image panel that we're going to cover. Minimizing the time spent on model switching on the bright side, most of the SDXcel models currently being uploaded to Civet, I are trained to produce a great level of detail without the use of a refiner that makes them somewhat faster to use SD Xcel models at another layer of complexity and an additional loss of time on image generations. Therefore, let's stick with the regular SD models. Here are some of the custom models compared to the base SD, Cel. Sdxcel models are expected to become fantastic in the upcoming future with further retraining, just as it was done with regular, stable diffusion models that were optimized into thousands of models by the community. It's important to note that in any case, the general use, prompting and other settings are all the same between regular and SDXl models sampling methods. Before intimidating you with an explanation, it's important to know that any of your sampling method choices is going to work well. There are no bad or good sampling methods, only different ones. The easiest way of understanding sampling methods and samplers is to think of them as different artists creating your commissioned art. They can all do it, they just have a different way of going about it. Some methods guide the AI towards meticulously crafting every detail, while others prompt it to quickly sketch out a concept. What's cool about this is there's no one size fits all best setting. Now for a more technical description, sampling methods represent the algorithmic strategy AI uses to translate a text prompt into a unique image. If you really wish to go in depth and scientific on samples, I will provide you with a link inside the course materials file. Here is where you can choose between different sampling methods. They are all different methods of solving diffusion equations. There's no right choice here. At most times, what matters is if the image looks good, Euler, which is a default option, is a fast sampler, but you're given other options too. You can download additional samplers off the web. At the moment, there are probably way too many samplers available within Automatic 11, 11 that you'll never have time to check and understand exactly how they work. Some people prefer one sampler over the other. For their models, you should try them out for yourself and change them from time to time to see the effect they have on your images. Here's a comparison using a prompt for an orange tabby cat outdoors. Now if you look for this variation in your images, intentionally look no further than seeds and variation. Seeds explained further on samplers can also affect the speed of your generation. Here's a chart showing the generation speed using different samplers when generating eight images. You will see in the next part of the course that when it comes to samplers and sampling steps, more time invested in generating an image doesn't directly translate to quality. In fact, you can already see this in the comparison image that uses a cat to show how different sampling choices affect the final result. My general advice is to test out a few samplers. And if a few of them you like produce the same result, then simply pick the one that produces the result faster. Now let's see what are sampling steps. Sampling steps are a slider on the interface that controls how many iterations or steps stable diffusion model takes to craft your artwork. It's like the number of brush strokes artist decides to put into their painting, contrary to what one might think. Bigger isn't always better. With sampling steps. Cranking up the sampling steps number doesn't necessarily result in a better image. It's all about finding that balance between a high quality computation time. As the higher the number of sampling steps, the longer time it takes to generate a result. Typically 20 steps with the oiler sampler are enough to reach a high quality sharp image. Although the image will still change subtly when stepping through to higher values, the result will be somewhat different, but not necessarily of higher quality. The fewer the number of sampling steps, the faster the image will be generated. Finding some middle ground between speed and quality is advised. I usually stay 20-40 sampling steps and adjust to higher. If you suspect quality is low, it takes three times the time reaching 25-75 steps with no benefit in terms of quality. Cfg scale, or the Classifier Free Guidance Scale, CFG scale is a parameter to control how much the generation process should stick with your prompt. You can imagine CFG as a sliding scale that controls your guide's attentiveness to your instructions or as creativity versus prompt literalness. Here is how the CFG values are usually seen as one to three. Mostly ignore your prompt. Giving free rein to SD. Three to six, still relatively free, but sticking a bit more to the prompt. Six, playful and creative setting, best suited for shorter prompts. Seven to ten. A good balance between following the prompt and freedom 15. Adhere more to prompt optimal. When you're confident your prompt is detailed, 20 values of 20 and more are rarely useful and tend to result in less satisfactory outcomes. The typical and default value is seven. Here is an example comparing CFG scale values ranging 5-30 and sampling steps 10-50 seed value. All AI generations begin with noise built from a noise pattern. The value of the seed determines the noise pattern. The generation process starts with greatly affecting the final result. You can also think of the seed as a unique identifier for that particular image. This is how all AI generation looks like, starting from noise and resulting in your image. You don't need to come up with the seed number yourself, because it is generated randomly. However, controlling the seed can help you generate reproducible images or images similar to the one you like. Don't get too spooked out. With this vague description, the seed controls the elements of your image determining where and how they are positioned in relation to each other. The default value is minus one and stands for the randomized value, meaning Automatic 11. 11 will generate a different image every time Generate button is pressed using the specified prompt. You normally want this value to be minus one unless you're trying to lock the composition and vary the prompt a little to see what else you can get. Same prompt, random seed typing, one is going to lock the seed. So you can experiment a bit with varying your prompts. Same prompt with one keyword difference. However, pay attention to this same prompt and the same seed. This can often happen by mistake, results in the same exact image every time clicking on the dice icon randomizes the seed, unlocking them so you can get entirely different images. Again, each generated result will have information on the seed number saved in its data that you can inspect. Using the PNG Info tab, you can reuse the seed number of the image you like in case you want to change something little within your prompt, but still keep the general image similar. Note that if your prompt changes a lot, the re used seed number isn't going to be as effective. To sum it up, if you wish to explore and get a variety of different images, use a value of minus one. If you want to fine tune your generation, fix the seed to a specific number and vary the prompt a little until you're satisfied. Another option that allows you to fine tune your generations and vary your result a little while keeping the general seed locked is this little extra checkbox. This reveals the extra seed menu with even more options. Variation Seed. This is an additional seed you can play with. Think of it as a seed within your seed. You'll use this when you're fairly happy with your image, but still want to change it slightly. Variation strength, you can control how much of your original seed and variation seed you want in the mix. A setting of zero uses your original seed only, while a setting of one uses the variation seed. If you wish to vary your results slightly, which is the idea behind this option, lock the main seed, randomize the variation seed, and set the variation strength to 0.1 This produces similar results to your main seed with minor variations between different generations. Seed resize. We have covered earlier how changing resolution, even when using the same exact seed, produces entirely different results. Seed resize function is here to let us generate images at different resolutions while preserving the general look of the image we're trying to recreate. This function allows you to generate images from known or fixed seeds at different resolutions. Even on a fixed seed, the image changes entirely once we change the resolution. As the resolution is a part of the generation algorithm. If you really like the image obtained using a certain seed, but wanted a larger resolution, this is where seed size becomes useful. You can see the general feel of a starting image remain the same. The resolution is increased from left to right, you will put the new image size in width and height sliders. And the width and height of the original image you are trying to recreate here, batch count. Now this is the long awaited moment where we can stop pressing the Generate button repeatedly. If you set the batch count to three and press Generate, the system will generate three images, all using a prompt you set and a different random seed. Unless you've locked this option, advice is to always generate at least four to five images with the prompt before changing it so you can get an idea how close the prompt is to what you wished in the first place. Or should you change either your prompt, a certain parameter or simply generate more batch size. Refers to the number of images to generate in one go within a single batch while increasing the batch size can significantly boost the generation performance. Be mindful that it comes at the cost of higher video Ram usage. I am keeping batch size at one while using the batch count to tell to Automatic 11, 11 how many images I want. You can increase this number if you have a powerful GPU, the total number of images generated equals the batch count times the batch size face restoration. It's a fact that stable diffusion is not fantastic at generating faces as the models are getting trained better and with various automatic 11, 11 extensions coming out frequently aiming to solve this problem. This isn't so much of a problem as it used to be, however, there are still situations where we can see those issues. One such example is when generating images where the subject is far in the frame. Restore faces aims to solve this and many similar problems by applying an additional post processing model near the end of generation, trained for restoring defects on faces. Turning on restore will try to render a natural looking face. Not every model will benefit from this process and frequently the face restoration style isn't coherent with the general style of a model you are using. Moreover, with the emergence of some extensions such as after detailer, the use of face restoration has declined. And therefore, it has moved from the main panel into the settings panel, one slider that deals with face restoration and is still on a main page, though that might change in the future versions of automatic 11. 11 is the GFP Gan visibility slider. When set to zero GFP gan face restoration is off, But in my tests, when higher than zero or one, it will activate the GFP gan effects even if restore faces is switched off. In the menu, there are two face restoration models you can use in Automatic 11 11 found in the settings menu. By clicking on face restoration on the side panel code former produces a more realistic result at all strength levels. This can be either good or bad depending on the context and frequently results in a totally different phase. Gfp Gan retains much more of the original structure of the face. It's soft in general and sometimes almost painterly, which could be either desired or undesired. If photo realism is our only goal however, it retains the facial features better. I've seen many commenters recommend using code former specifically to obtain the eyes, then blending the result with the original in Photoshop. This is a workable solution, but it's time intensive too. As my second AI course deals exactly with merging AI generated art with photography. This is where the Photoshop technique might come useful. You might like the way the face restoration results look, and you should definitely try both of the models out. You can even blend them by selecting code former in the menu and adjusting its weight. And doing the same with the GFP Gan slider on the main page. I usually keep both of these settings off as it additionally slows down generation time to get better faces, I frequently use the after detailer extensions I will soon tell you more about. Here are some examples, original code, former GFP gan up scalers. As we've previously established, the default size used to train most models is 512 by 512 pixels. Some stable diffusion version two models have bumped the resolution higher, while SDXl models are going as high as 1024 by 1024 pixels as their native pixel resolution. Of course, not everyone has the latest and greatest graphics cards, and many people are stuck with models trained on lower resolutions. This is, however, no problem at all, because upscalers are here to help with the current state of things. Upscalers are the go to tool for achieving high resolution generations. Now let me tell you a thing or two about upscaling. Ai upscaling works differently than the upscaling methods used in the past. Traditional upscaling methods use just the pixels of the original image, mixing these existing pixels using mathematical operations to enlarge the image. Here you can see the two traditional upscaling methods at work. Traditional upscaling always results in blurry outcomes and is not much of an improvement over the image we started with. In the case of an image that is distorted or corrupted, in some ways, these algorithms can't fill in the missing information accurately. In contrast, AI upscaling works in an entirely different way. Here's a little comparison between a couple of AI upscaling methods and their traditional counterparts. Ai upscalers are trained on a massive amount of data to be able to recreate information. These upscalers try to recognize patterns in images and videos, and upscale by guessing new details that would contextually fit into the new pixels. The way the model is trained is by degrading good quality images, and training a neural network to recover the original image. Using automatic 11 11, you can upscale your results in two ways as a part of the generation process and later on by sending images you like to the extra tab where upscaling is done. You can also run upscaling processes on a large batch of images, which I will show you too in Automatic 11 11. You are also given a choice of working with two up scalers at once. Up scales 1.2 Using the up scalar two visibility slider allows you to blend two upscale results. The default upscale factor is four, but you can set it to a lower value if you don't need the image to be four times as big as the original resolution. You can set the upscale factor by dragging the scale by slider. A good general purpose AI up scalar is RS organ Four X Plus. When it comes to the produced results that are most alike photography, my favorite one is Maker, A custom up scaler not installed by default with automatic 11 11 installation, anime images require up scalers specifically trained on such art. I will provide you with a link. As always in the course materials text file. Don't let this confuse you. Even when the scale by slider is set to a value, it won't have any effect unless one or two upscalers are chosen in the drop down menus below the number four, where the slider is going to be at most times when you run Automatic 11, 11, just the default value. Some of the popular up scalers used for different purposes are the first two up scalars in the list are the traditional up scalars that are not able to generate new details like the rest. Here you can choose from a variety of automatic 11, 11 upscalers or download them from the Internet and install them if you want to delve deeper into it. I will provide you with my test examples, alongside course materials in case the results aren't clear from the video. Together with links to the websites offering custom up scalars that you can try out, Remember to refresh the user interface after installing a custom up scalar into your automatic 11, 11 time saving tip. The extra tab shown here is your general upscaling hub. As mentioned before, upscaling can be done as part of the generation process or separately on images you like. If you prefer to save time on image generations and only upscale images, you're happy with. The extras tab is the place for you here you can input an image set the upscale factor, choose the up scalar, and even add a second one on top. You can adjust the visibility of the second up scalar using the visibility slider alongside face restoration module settings shown below. Once you've generated an image from other tabs, you can also swiftly send it to the extras tab for a touch of upscaling magic. If you want to batch process a large number of files during your computer break or lunch, you can use the batch from directory tab. Set the input directory where the original files are and the output directory where your results should be generated. To copy the directory destinations, open a folder, navigate to the directory, right click and copy, then passed it into the input directory. Do the same for the output directory. The rest of the up scalar settings within the extra tab should already be familiar to you. Here's a little bonus. In addition to using the up scalers found within Automatic 11 11. For my photography and photography oriented results, I often use another piece of software called Topaz Giga Pixel I. It has a very user friendly and intuitive interface, allowing you to choose the upscale factor. Just like Automatic 11, 11 alongside an image type. The standard option is quite good, but you can try various other image types depending on the images you're upscaling. You can also leave it on auto, letting the app analyze your image and suggest settings. I've tested it extensively for both my photographs and AI generations, and it works really well. It is particularly useful when trying to restore old family or childhood photographs. You can batch process a variety of images that you can collect in one folder and let it run while you're having your morning coffee or lunch. Now that you grasp the workings of upscalers, let's delve into the high resolution fix option designed to use upscalers in conjunction with additional post processing step to generate detailed images at resolutions higher than the default 512 by 512. This process incorporates an additional layer of detailing to enhance the final result. The high resolution fix procedure involves initially generating an image at the smaller, closer to native resolution. Upscaling this initial image to the image resolution you specified. And subsequently applying extra post processing steps to increase details and achieve the desired outcome. This supplementary step significantly increases the level of detail compared to a straightforward upscale. And it also has the potential to alter the visual appearance of the generated image effectively. Utilizing the high resolution fix function proves beneficial in mitigating issues such as twinning or duplication previously mentioned in relation to image size. And helps maintain the composition integrity of your upscaled images. Let's consider setting a resolution higher than the models training resolution, both without and with the high resolution fix as illustrated in this example. Working with NonSDXcel models can yield impressive results, producing images that surpass the 512 by 512 pixels default size. This is achieved without an additional up skelling step that can be applied in the extra tab. It's essential to note that working with the high resolution fixed does come with a trade off in terms of generation time. His fixed steps represents a number of his fixed steps. Hagen, in addition to the sampling steps used during the first pass of the generation process, if set to zero, it employs the same number of sampling steps as used for the original image. If set to a specific number, that designated number will be utilized. I recommend 15 high resolution steps as it strikes a good balance between speed and quality. Similar to other aspects in the AI realm, it involves a delicate dance between achieving optimal quality and minimizing processing time. Noising strength. You can think of this slider as the strength of the up scaler during the upscaling steps, or how much freedom you're giving to stable diffusion during this process on the lower values. This slider lets us preserve the essence of our image during the enhancement process, while with higher values, the process will likely introduce additional changes into your image. I will show examples using the same prompt and the same settings, with the only difference being the denoising strength. I will start with the original image generated at 568 by 832 pixels with no his fix. Now I will be regenerating this image with two up scales at denoising strength set at 0.250 point 5.0 0.8 For the first set of examples, I will use the latent up scalar. With an upscale by slider set to 1.5 x. That will produce an image size of 880 by 12 88 pixels. The latent up scalers work slightly different than others, upscaling at a different point in the generating process. They usually need more steps at a higher denoise strength such as 0.5 and higher. Observe the difference between latent versus RSR. Gan up scalar at 0.25 strength, especially at level zero, your image will not change at the value of one. The results are hardly like how the image looked before the upscaling process has started. The optimal denoising strength will depend on the upscaler you're using. You'll need values of around 0.5 for the latent upscalers, while other upscalers will do just fine from 0.3 to 0.5 If you wish to give high resolution fix more freedom to reinterpret your idea, you can aim for higher values. Hope you're doing well there. We have now covered the basics of AI image generations. Our following chapters are going to bring it all together and are going to be way less intense than this one. 8. Image-To-Image Generation: Welcome to another exciting chapter of this tutorial. I am pretty sure this is going to be the one you'll enjoy. The image to image panel is the second most important panel of Automatic 11 11. Now that you understand how Hires Fix works, you'll have a better understanding of image to image. As we have established throughout the course, text to image is a default way of AI image generation. However, besides creating images from a text prompt, only another popular and interesting way is generating, using another image as a reference. We call this method image to image. This allows us to transform an existing image, your earlier AI generation, your photo, or a sketch, or anything from the Internet into a new image. The process of using another image as a reference is simple. All we need to do is type in our prompt. As usual, place an image into this window here, determine the dimensions of the generated image. And finally, how much freedom we want to give automatic 11, 11 in reinterpreting the source image. To do that, we use the denoising strength slider, just as we did with his fix. The denoising strength slider allows us to fine tune the extent of the transformation applied to our images. Lower values retain more of the original images characteristics, whereas higher values allow for more dramatic and creative transformations. Keep in mind that the lower denoising strength values will also make a generated image stay closer to the reference image and often result in somewhat blurry generations. While higher denoising slider values allow the model to express itself freely. Here are a few examples with the same prompts and settings, with different denoising values. Think of the input image as nothing more than a guide. The image also does not need to be pretty or be high res or have any details. The important part is the color and the composition. So you can use a child's drawing, for example, and see how stable diffusion alongside your prompt and a model interprets the input. One thing I've noticed too is that the stronger the contrast and lines in your original reference, the stronger these will imprint themselves on your result. There is no good value when it comes to denoising strength. If all you want is a result loosely based on a reference, you can increase the values beyond 0.6 If you want to give some painterly quality to a photographic image, you can get satisfactory results even with values as low as 0.15 How much the image will change in comparison to the reference depends on the model used, various lores, textual inversions, your prompt sampling steps, et cetera. The image to image panel provides us with many familiar options we had in the text to image panel that we were covering earlier. However, there are a couple of additions. The first one being resize mode, that allows us to determine various image size related parameters. Just resize, This will resize your image to meet the width and the height set. If your height and width are different than those of the original image, your image will be stretched, crop, and resize. This will crop the original image to the resolution values here first and then run the image generation. This is similar to you cropping the original image yourself before putting it into automatic 11, 11. Resize and fill. Resizes the image to your specified resolution and fills the empty space with colors present in the image, just latent upscale. This option is very similar to the first one, the only difference being that it uses a different latent upscaling method. The scale by and scale to options you can use to either resize by a factor or resize to specific dimensions by typing them in in case you have chosen the up scalar. The image to image Prompt panel also understands instructions, so you can say things like make the person wear a hat. And if your denoising strength is high enough, the person in your generated image will be rendered wearing a hat alongside previously mentioned settings and parameters. The image to image panel provides us with a couple of new tabs such as sketch in paint in Paint, sketch in paint, upload and batch. I will show you the sketch and batch tabs now and leave the in paint related tabs for the next chapter that deals with in painting, specifically sketch. Now that you're familiar with image to image, it's time to cover the sketch option. That introduces an interesting addition to image to image generation. You can think of sketch as a creative and quite useful coloring tool, merged with an image to image module. At first glance, sketch and image to image look completely the same. But if you look closer, once you drop an image into this area here, you will notice a couple of options you haven't seen on the basic image to image panel. These tools are the rudimentary paint tools. Brush and brush size undo clear and color palette. On the left side, hovering the mouse over the little info icon shows you some things that can help you when drawing. The way sketch works is that it will render the new image in a similar way to how image to image will do, but also paying close attention to colors that you've painted over the image. Your final result will be a new image that might be very close to what you had initially. How close the result will be to the reference image. Again, depends mostly on a denoising slider. Let's try an example using the image of a girl we used earlier. This is how my sketch masks looked like. Here you can see the result. I've changed my prompt to contain less of red related keywords and reduced some of the weights on the word red that my initial prompt had. Let's hit Generate and see what result I got. Now general rule is that when you use Sketch, you want to use the same prompt as you had initially. You can help the image generation a bit by using words related to your new color. Two, like I did here. If your prompt says red studio background and you're trying to paint the background yellow using sketch, there will be a bit of a conflict between your intentions. One more thing I wanted to show you is the batch tab. If you remember the batch tab I've shown you when we were discussing up scalers, this is pretty much the same thing this time. The only difference is that instead of batch upscaling the batch tab within image to image allows you to process a large number of photos automatically using, of course, the image to image panel. Copying the directory destination from your Explorer into the input and output directories tells Automatic 11 11 where to take photos from and what folder to generate the results in. Now that you've understood the process of image generation, various upscalers, parameters, additional functions and image to image generation, it's time to show you in painting a great way to fix your image generations and introduce new elements to them. Let's move on to in painting. 9. Adding Elements Using Inpainting: Welcome to yet another fun chapter of this course, How are you doing so far? I hope you're taking breaks and letting all the new stuff settle in. We have covered quite a lot together, but I still have some cool tricks to show you. Actually many more new tricks. There are further techniques and total game changers awaiting us in the extensions chapter of this course. But before we dive in, let's get familiar with in painting. Instead of generating the whole image, which is what we were doing until this point in painting, is a technique used when we want to generate just a part of an image, fix a part of previously generated image, or generate everything around a certain area. You can use in painting to regenerate part of an AI generated image or a part of a real image. This is similar to Photoshop's new generative fill function, but unrestricted when it comes to content. The content that will be generated within the masked area depends on the model and additional files that can expand our model, such as Laura, textual inversions and more. Remember the way we use sketching. Now imagine this, but instead of colors, we're going to be adding actual content into our image, regenerating parts of images, or removing undesired elements. The method works like this. We supply an image, then draw an area of the image we would like to generate using stable diffusion type in the prompt for the redraw and click Generate. After we click Generate, the area will be generated based on our prompt in painting is a part of the image to image panel and the area that we draw is called a mask. Just as with the sketch tab we were using before, you will find all the familiar drawing tools and the info panel on the left side. Some differences between the sketch and in painting panels are the absence of the color palette and some new options. I will explain mask blur. This slider affects the softness of the painting brush. If set too low, the painted content might look pasted into the picture. While increasing this slider will result in better blending between the original and generated content. Padding affects how much of the area surrounding the mask should be used as a reference when it comes to generating the content inside the mask. This slider depends on what you're trying to do. I usually go with higher values for this one as I'd want the generated result to blend as best it could. Mask mode presents you with two options in paint mask that generates content inside the mask and in paint mask does exactly the opposite, changes everything about the image except the drawn area. Masked content presents us with various modes for how the content within the mask is going to be created. Again, your choice should depend on what you're trying to achieve. And some modes are better or worse for specific tasks. Phil uses the neighboring colors as a base for painting original. Used when you don't want huge changes and mostly when fixing stuff other than adding new elements. Latent noise or latent nothing are good when you're trying to add something into an image. Unlike what the image contains already, latent noise fills the area with noise from which all AI image generation starts basically generating from your prompt without too much of the image used as a reference latent, Nothing is comparable with erasing the mask area with an eraser. Think of it as the choice between filling with static or black. I would advise picking latent noise in paint area. In paints only the masked area, whole picture might be good. Only when working on already small results. It will still in paint the mask area, but it might take into account the rest of the picture. Better drawback of this method is that it resizes an image based on size parameters. So I'd stay away from it when I want to retain the size of the image I put into in painting. Just as with general image generation, it could be challenging to get the result we want on the first try. Therefore, we should set the batch size to around five. According to the results, we could switch up a few parameters, such as denoising, strength resolution, et cetera, until we start getting closer to what we want. Here are a few of my results when fixing minor hand mutations or similar elements using the original prompt for in painting works 90% of the time. However, if you're trying to add something new, you can retain the stylistic keywords of your prompt while describing what it is that you want to add with in painting. Now it's time to cover the two other in painting modes in paint upload. The painting tool is powerful but lacks many of the fine tuning options that some users might be accustomed to from programs like Photoshop. Drawing masks over subjects can be tedious, especially when dealing with intricate details like hair. If you aren't satisfied with the level of control over masking and have more ambitious goals. Automatic 11 11 allows you to create your mask in another software and import it using the paint upload feature. The upper portion is where you need to put your image, while the lower one is intended for the mask. You can go with a black and white mask. I will show you a couple generations and the masks I've created in Photoshop to aid in my AI generations. My second course deals specifically with the topic of AI and photography. So if this is something that interests you, I will happily have you as my student again in paint. Sketch. In paint sketch combines the functionality of in painting and color control of the sketch panel. Unlike the original sketch, it will render only the masked zone, not touching the rest of the image. Contrary to the normal sketch. You can write a unrelated prompt and the paint will try to render your prompt in the masked area by using the color of the mask as an additional element in the generative process. Now that we have covered image to image generation and in painting as one of its integral parts, what awaits us is an exciting chapter that will bring everything we have learned so far together and unlock some new options and ideas that you were maybe unaware you can do with Automatic 11, 11. 10. Amazing Extensions! : I have some amazing things to show you in this chapter. Not much more left before I leave you to use everything you've learned so far. Extensions are my favorite part of stable diffusion, as they allow us to take further control over our image generations and enhance everything we've learned earlier with some additional abilities. Some of these extensions can be used to add an extra element of control to your image generations, such as the super popular Control Net extension. While others, like deforum, enable you to create videos from your image. Generations developed continuously by the global Internet community and users worldwide. Automatic 11 11 is enriched daily by community developed extensions setting it apart from other AI generators and enhancing its functionality and ease of use. Some of the popular ones are Control net, xyz plot after detailer, Civet, AI helper canvas, zoom, aesthetic gradient, interrogate clip, ultimate SD, upscale, open pose editor and deforum. The installation method for all these extensions is quite simple. All you need to do is copy a link. Navigate to the extensions tab found here, then click Install from URL. Paste the link right here, and click Install. All you need to do next is click on the installed extensions right here, and press the Apply and restart UI button. Let's talk about the first extension, the fantastic control net. This extension has changed stable diffusion forever. You will see very soon why it is my favorite stable diffusion extension. Among other things, it lets you copy or specify human poses from a reference image, copy composition from another image by analyzing either edges or depth, and so on. It can replicate the color palette from a reference image or turn a scribble into a great looking result. And more. It could be used in any of the image generation panels alongside them. But when used in tandem with image to image feature becomes incredibly powerful as it gives you a granular level of control over your creations, paving the way for boundless creativity. You remember the way image to image generation works using a reference image to guide our generation. Now imagine that tool becoming ten times as powerful and feature rich. This is what control net is. When activated by checking this checkbox right here. Control net becomes an additional step of control that your image generation will adhere to. What you see here are a plethora of various elements that could be extracted from the reference image and used to guide your image generation Control net can analyze the hard contrast lines of the image and use those to guide the generation. Analyze the depth of the reference image and use that to guide the generation process. Extract the pose from the reference image, having it be the only thing locked in while freely interpreting everything else. Based on your prompt, convert a reference image into a drawing by analyzing lines. Extract, for example, only hard lines ignoring other elements present in the reference image. Analyze the orientation of surfaces and use that as a method of control. Use the shuffle option to transfer the color scheme of the reference image. Allow even better control of in painting and more. Make sure that you have the various models installed needed for control net to work. Here you can see some of the ways I've used control net. Control net can also work with an open pose extension allowing a direct transfer of the pose you've created using a stick man figure to be transferred as a method of control and control net. A couple of things that would be good to know love Ram option is experimental and is for GPUs with less than 8 gigabytes of V Ram. Allow preview check this to enable a preview window next to the reference image. I recommend you select this option. Use the explosion icon next to the preprocessor drop down menu to preview the effect of the pre processor. The explode icon allows you to see the preview of the analyzed image. The upwards pointing arrow transfers the dimensions of the image you placed into control net to the image size dimensions for the image that is about to be generated. Here you can see how I have used image to image alongside control net to completely lock in content and compositional elements of the image. I extensively use Control Net for all my photo manipulations that I am teaching more about. In my second course on AI plus Photoshop editing, make sure to click on the enable checkbox before starting the AI generation process. You incorporate control net into it. It's something that I am often forgetting about as the installation method might change a bit over time. Pay attention to the installation instructions on control nets web page. I will provide you with the link alongside some additional instructions in course materials. Text file after detail after detailer is another community favorite. It serves to help generate better faces, body parts and hands. It's among my favorite extensions, not only because it acts as an automatic paint feature that detects and fixes potentially problematic areas, but also because it provides high quality results. Often better than what the AI generation does by default When installed and activated. Once you press the generated button, the image will generate as usual. Then after detailer takes over, looking for faces and hands in the image and attempting to automatically paint those areas using its custom model, specially trained to fix such possible errors. It can also further enhance the quality of generated areas. After detailer contains both positive and negative prompts, allowing you an additional step of control over the in painting it is doing when using textual inversions trained on people's faces. After detailer can be used to increase the likelihood of a generated face looking like the person. As both your general prompt and after detailer prompt can contain the textual inversion working on replicating someone's likeness. Another amazing thing about after detailer is that it allows custom prompts for both hands faces, et cetera, all while letting you use both in painting models in unison. I will show you a couple of generations with and without after detailer. The results speak for themselves. Below the after detailers model and prompt selections, you can find three drop down menus. Detection, mask, pre processing, and in painting, allowing you so much more control over how after detailer should be applied. You can leave the first two on default settings, however you should pay attention to the last one that allows you to run after detailer using different denoising and mask blur settings. Or using another model than what the image has been created with. You can also specify the sampler number of steps and CFG scale. How crazy is that? Civet AI helper. This is a very useful one. It's an extension helping you handle your models much more easily. Here are some of the things it can do. It can scan all models and download model information and preview images from Civet I. It can check all your local models new version and automatically update them with an info and a preview. It adds some new icons to the round globe icon opens this models URL. In a new tab, you can use the bulb icon to add this model's trigger words to prompt. While this one here the tag icon uses this models preview images prompt. One thing to note is that every time you install or update this extension, you need to shut down web Ui and re launch it. Just reload UI option from the. Settings won't work for this extension, Canvas zoom. This extension allows you to zoom into the sketch in paint and in paint, sketch panels. It doesn't change anything about image generation itself. But it makes it more comfortable to do all the drawing related things within the UI. Aesthetic gradient. Aesthetic gradient is an extension somewhat similar in functionality to Laura's. Basically, instead of using the prompt weight only, it allows you some further control over the implementation of downloaded aesthetic gradient file from Civet AI. Some say they are about to get phased out. Some say they are good. I haven't used them much personally, with Laura's being powerful as they are, I don't see aesthetic gradient as a part of my workflow. But it might be a great for, you should definitely check them out. You can find them using the filter options right here. Interrogate clip and interrogate deep buru. It's a built in extension to automatic 11, 11. Both clip and deep buru are used to extract prompts from images placed into the image to image tab using a couple of gigabytes large model that will get automatically downloaded once you run these options. Interrogate clip is used for general imagery and deep boru should be used for anime. They use a lot of video Ram. They cannot be used with a low spec GPU. Using these tools is quite a hit and miss scenario and often funny. So my advice is that you better explore the capabilities of your models by exercising your own creativity. Instead, ultimate SD upscale, a great upscale module that allows you to upscale your images without introducing classic AI upscale artifacts such as over sharpening polishing of skin tones, et cetera. The way ultimate upscale works is by breaking an image into smaller tiles, then working and upscaling tile by tile, and finally merging all these tiles into one upscaled image with superior results to the usual upscaling methods done in automatic 11 11 open pose editor. This is a small extension that allows you to add a person or more people into the image and craft their poses using a simplified representation of a human body. You can then send your creations to control net extension to use as a guide. During image generation, I will show you a simple example, really an interesting little extension, X, Y, Z plot. Not so much an extension, as much it is a script, but I have decided to cover it here as it is almost invisible when using the UI, X, Y Z. Plot is a script that creates grids of images with varying parameters, which can be found in the script drop down menu as shown below. I've used them earlier to show how different CFG scale and sampling steps affect the result, but I will show you a few more examples. Deforum. You must have seen those tripy videos where one frame blends into another with a camera panning inside the video, and fractal like animations, changing shapes and merging into one another. All this is done in deforum. Deforum is probably the extension with the largest number of options allowing you to control numerous generation parameters, camera movements, and more. It has so many features that it would take a whole lesson to cover them all. For the purpose of this chapter, I will try to simplify it a bit. The Run tab offers the classic choices of sampler sampling steps, dimensions, and seeds, things you should be familiar with. Below you can see an option to restore faces, which will increase generating time, but might result in nicer looking faces. The key frames tab provides a multitude of parameters that deal with how the image changes over time, including camera movements and generation parameters like seeds. Key frames allow you to select the duration of the animation using the max frames value. In the prompt tab enter the prompts you wish to use the difference compared to the usual prompting is that here you can set at which point a set of prompts changes into another set of prompts. The control net tab allows you to incorporate control net, which we covered earlier, a guide during frame generation. Hybrid video, among other things, allows you to use another video as a guide to the camera movements of your deforum generation. In the output tab, you can select the export parameters and whether to combine generated images into a video or just leave them as images for your further manipulation. This way you can import them into Premier Pro, add a soundtrack, various effects, and more. With all that we have explored today, what remains is for you to set your creativity free. Congratulations on finishing the course. It's been a privilege. Being a part of your learning experience. Feel free to reach out any time, whether you have questions or want to showcase your unique creations here to support you in all your AI and Photoshop endeavors. And speaking of Photoshop, if you sense that it's time to elevate your skills, consider joining my next course, delving into the fusion of AI art and photography. Or if you're passionate about photography and interested in skin retouching, I will be happy to teach you my tips and secrets throughout a 3 hours long in depth portrait and boudoir retouching course. I'm looking forward to seeing you again. I am wishing you endless inspiration and boundless success. My name is Mark and see you again.

AI Art Generation: The Complete Workshop from Installation to Mastery (Free using Stable Diffusion)

Marko Smiljanic, Boudoir Photographer and Retoucher

Watch this class and thousands more

Watch this class and thousands more

Lessons in This Class

1.

Things You Will Learn!

2:38

2.

Why Stable Diffusion?

4:13

3.

Setting Up Your Free Software

8:09

4.

The Art of Prompting

14:44

5.

Stable Diffusion Models

16:44

6.

Expanding Your Models

17:28

7.

Settings and Sliders

29:20

8.

Image-To-Image Generation

6:44

9.

Adding Elements Using Inpainting

6:00

10.

Amazing Extensions!

12:59