Crafting Your Own Music With Artificial Intelligence | David Armendariz | Skillshare

Playback Speed


1.0x


  • 0.5x
  • 0.75x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 1.75x
  • 2x

Crafting Your Own Music With Artificial Intelligence

teacher avatar David Armendariz

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

    • 1.

      Introduction

      0:59

    • 2.

      What is MusicLM

      15:21

    • 3.

      Trying out MusicLM

      7:12

    • 4.

      Trying out TextFX

      7:05

    • 5.

      What is Stable Audio

      4:31

    • 6.

      Trying out Stable Audio

      3:37

    • 7.

      Conclusion

      1:26

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

860

Students

2

Projects

About This Class

Welcome to "Crafting Your Own Music With Artificial Intelligence" a cutting-edge course designed for both musicians and tech enthusiasts. Students will explore the revolutionary capabilities of Google's MusicLM, a state-of-the-art artificial intelligence model specialized in music generation. The course provides a comprehensive introduction to the fundamentals of AI in music, emphasizing hands-on experience with MusicLM.

Key Takeaways:

  • Understanding of AI's role in music creation.
  • Skills to use Google's MusicLM for creating music.

Meet Your Teacher

Hi! My name is David Armendariz. I am from Ecuador.

I studied mathematics at USFQ (Universidad San Francisco de Quito). However, I love coding and that's why I transitioned to the software industry. I love to share my knowledge here in Skillshare.

I hope you enjoy my courses as much as I enjoy doing them and remember: never stop learning!

See full profile

Level: Beginner

Class Ratings

Expectations Met?
    Exceeded!
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction: Hi and welcome to this course, Music Generation with Music LM. My name is David Armendariz. What is this class about? There is a rapid growth in AI development, especially notable in generative AI. Music generation is part of generative AI. There's this new Google's model called Music LM. Its launch date was January 2023, and we're going to focus on exploring music LM's capabilities via AI Test Kitchen. What you will learn. Learn what is music LM. Learn what is music LM capable for and test out music LM. Well, my I'm a software engineer and mathematician. I'm a data science student, an AI enthusiast and a music lover. I hope you enjoy this course. 2. What is MusicLM: This lecture, we're going to learn what is Google's music. Lm. Music LM is revolutionizing text to music generation. It was presented in ago steny at all in a paper from 2023. It's very recent. Capable for is to generate high fidelity music from text descriptions, the technical details. It's based on another model called Audio LM. It's capable of producing several minutes of music at 24 kilohertz. Right now there are other AI tools like Chat GPT, but they are not able to generate music as of now December 2023. They also release this public dataset called music caps. The purpose of releasing this dataset is to aid in model development and research extension. So other people can help Google to enhance this model. It's manually created by professional musicians. You can also use this model to train your own model. We're not going to learn how to do that because we need a lot of AI knowledge to do that. They also focused a lot on responsible development. They focused on preventing misuse of creative content. What does this mean? They adopted methods from a paper from this guy called Carlini to ensure uniqueness in generated music compared to training data. That means that the generated music is not going to be similar to the training data they used for music LM. Now. Music LM has a website that we're going to see right now to see some examples of what it's capable for. If we go to that website, we're going to see here the paper which you can see in archive. You can see the dataset that I talk about, which is the music in the website. You can see all of the examples that Music L M is capable to generate. Let's see we have audio generation from rich captions. The caption here is the main soundtrack of an arcade game. It is fast paced and up with a catchy electric guitar of the music is repetitive and easy to remember, but with unexpected sounds like symbol crashes or drum rolls. Let's see, that's the example for this main soundtrack of an arcade game. You can actually think about it and feel like if you are playing a game from the '90s. There's this other example. A fusion of regaton and electronic dance music with a space. Other worldly sound induces the experience of being lost in space. And the music will be designed to evoke a sense of wonder and awe while being danceable. That's pretty interesting. Let's see what this sounds like. Yeah, that's very specific and I think it did a good job by trying to transmit that experience to the user. Let's see some other examples. Long generation. Well, you could see here that these sounds were only 30 seconds, but they can generate up to 5 seconds. Let's see, for example, relaxing as okay, so these are 5 minutes of relaxing jazz. As you can see, I was like testing at different times if it sounded like the same thing, but just repeat it all over the time and it's not that case. It's actually different at different times it's able to generate long sounds like. Then. This is my favorite feature out of all the examples that we have here. The story mode, the audit is generated by providing a sequence of text prompts. This influence how the model continues the semantic tokens derived from the previous caption. I don't know why I like this a lot, but you can actually have a song generated by story. For example, time to meditate, time to wake up, time to run time to give 100% electronic song played in a video game. Meditation song played next to River Fire and fireworks. I actually, so let's say as you can see, the song was like a video game until second. It says here 15. But I actually looked and it was like 19, but that's okay. And then from there, it changed that tonality to something more relaxed. And it was actually like meditation next river. After that, then it was not like fire. I didn't feel like it was fire, but more like some voices that were try to be put into the song. That happens a lot. I've been experimenting with this. Sometimes it tries to put voices. They are voices that actually don't say anything, don't expect this to generate lyrics. But they are like voices that try to be put in there. I think that was the case in this fire prompt here. I don't know if you felt it as well, then I also like this combination here because this reminds me of Bohemian Rhapsody, the song from Queen. Let's hear this one as well. Let's listen to this, This Go to Top Extking. Well again, this is a clear example of AI trying to put voices into the song. That will happen. I don't know if it's a lot of times that will happen, but I've seen it very frequently. These voices are not intelligible. They are just like Berish because they don't say anything but you can hear them. Then there's this text and melody conditioning that you can add a melody that will be fixed throughout the song. And then we can start changing the song itself, but by maintaining this melody. For example, let's see the Leo jingle bells whistling with a guitar solo as a constant or piano solo. As you can see, the piano solo and the guitar solo word constants. The text P said, hey, first put bello ingo bells and then some whistling. Okay, it's basically the constant. Then we have this one which is, I think this one is also very interesting painting caption conditioning. We have the painting title, Author, The Persistence of Memory, Salvador. Right? This is the image just as a reference from Wikipedia. And we have the painting description. Basically, this is something that models like chant GPT are able to do. You can upload now an image and it will throw you a description of the painting and then you can generate the audio. Let's see how the scream sounds like. Okay. I'm going to be honest, I didn't expect this painting to sound like that. It sounds like, I don't know, like a Pink Floyd song. Then we have like audio generation from tags, 10 seconds of instruments. For example, the cello. Let's see, the flute. That sounded a little bit like the Titanic song. We have genres, for example, let's see British blues, that's more common, I guess, else that grain. Yeah, that sounds like blues musician experience level. I don't know why you would like to put like a beginning piano player into a song, but let's see how that sounds. Definitely sounds like me And a crazy fast professional piano player. Yeah, that looks like a fast, professional piano player and places. This is also one I like a lot. I'm going to put the example of the gym because it generates a really good example. Back to touch the ten. Yeah, definitely that is better music than what they put in my gym. I guess that you'll use this to put some music there. Epochs. You can also use epochs like for example, club in the '80s. Let's see how that sounds like fun. Yeah, that definitely sounds like a club in the '80s. Well, I was not born in that epoch, but I've heard songs from the '80s, Of course, that sounds like something that we'll put in the club in the '80s. Let's see also this feature of musical M, which is generation diversity. This means that it can generate for the same prompt. Multiple examples as we are going to see also in AI test kitchen. For the same text prompt, let's see, we have this prompt saying motivational music for sports. That's one example, and another example would be this. Okay, yeah, they are different examples for the same text front. These are all of the examples that music LM is capable for. I'm going to say that not all of these features are available in AI Test kitchen. In fact, we can only, as of now, test audio generation from text. Let's test that in the next lecture. I hope you like it. 3. Trying out MusicLM: Now we're going to actually test music LM The only way, as of now, December 2023, is to test it via this website. I test Kitchen.google.com You can only sign in with Google. This website is also only available in certain countries, US, Kenya, New Zealand, and Australia. But you can easily use a VPN like I am in order to test this website. If you click on this dropdown and go to music, then you will have a text box order to put the prompt here, you will have the generated sum. You also have the Settings button. Okay, this Settings button have three settings. The first one is a Seed. This is a random number that you can put in here. After you put your prompt, you can put your random number, it's automatically generated for you. You can click on this button here to lock that seed. That means that given a prompt, given this seed, you will be able to generate basically the same output. Because remember, generative AI can be very random. If you want to avoid that randomness, then you can put the set the same prompt. There's also some parameters called temperature, but we don't have that parameter here that will make your prompt more consistent, the output will be more consistent. Also, we have this track length. Remember that we could generate up to 5 minutes, but this only allows us to generate up to 70 seconds. I guess that's because a lot of people might be using this tool. And generating a five minute song takes more computing resources. They are offering this website for free. We don't want to use all of their computing resources for free. We also have the looping, which is a feature that stitches the beginning and end of your track to make your music endless. Remember in that example where we had that arcade game, that needed to be endless. Well, this also allows us that when the endless song ends, then it's going to be similar to the beginning of the track. That's very useful for things like that. Things like background sounds for video games. Those are the settings that we have here. We have the I Am Feeling Lucky button. Let's see what happens if I click here. Ambient soft sounding music. I can study too. This is going to generate some um, music when this is another example. So as you can see, it generated two examples here. We also saw this in the example output that it could generate multiple examples for the same prompt. In this text box we have the chips. We can like rad over these sounds and generate different things. I'm going to start over and generate my own track. I like a lot Bachata. I'm going to say a modern Bachata, it has to be slow first, then fast, and then slow again. It has to be danible, little romantic. Okay, Let's see what this generates for me. Again, it's identifying what things I can change or vary. So it can vary. So yeah, I like this, but I think the beat from the Pachata is being overlap with maybe the romantic. Let's get rid of this. Maybe we're putting too much constraints on this prompt and let's generate this again, It's generate. I like this a lot more. Let's see the other example it gave. Yeah, I like this one better. I think I can dance to this. Well, you now have a tool to generate your own songs. Given a prompt, I hope you like this video. See you in the next lecture. 4. Trying out TextFX: We are again an AI test kitchen. There's another tool here called Text X, which supercharges your writing process with AI, power language tools made in collaboration with Lupe Fiasco. If I launch this tool, we have all of these ten tools. This is something that can be also done with GPT. It's not something very innovative like music LM, but it still can be useful for people who want ideas out of this I too. For example, acronym creates a phrase using the letters of a given word. For example, if I type the word hamburger. Let's see what this runs here. We have a parameter called temperature. I think I told you this last lesson. But if you set temperature to zero, then the output is going to be less random. It's going to be almost consistent 100% of the time. If you put temperature equal to one, it's going to be something random every time you run this. 0.7 is a decent default. Many models, many AI models use 0.7 as a default. Let's run this hamburger. It's happy animals made by great humans, eat really good burgers or have a meal body really good, or having a meal, being energized, getting rid of bad moods and joint relaxation. I think this can be more used for some restaurant that sells hamburgers. This can be their logo or something like that, It's very creative alliteration. Find words in a category that start with a chosen letter. For example, fast food that starts with age. I guess they will find the hamburger. Hamburger. Hard shell tacos. Yeah, it was pretty obvious it was going to give me hamburger. We have chain built a sequence of words where each word relates to the last one. Again, let's put the hamburger, let's see what happens with the hamburger. Hamburger, bread, sandwich, meat, steak, grill, fire, heat. It went from this word to heat by making a sequence of words. Each word was related to the last one. Hamburger, bread, basket, grocery store, cashier, customer bill. It went from hamburger to bill, hamburger, bread, dove floor, bakery, shop, store. These are all related words. This is very useful for rap lyrics, I guess in the example they give you. Lupe fiasco. In this video that you can watch, he is a rap, a lyric writer. He uses this tool a lot to generate lyrics by using also human intelligence, explode, create phrases that sounds similar to given words. Again, hamburger. Let's see what this does. Hamburger, A fat big who eats hamburger. A sandwich with ham and a burger. Hamburger, a type of sandwich with ham and cheese. Okay, use find similarities between unrelated things. Let's see hamburger and the moon. Let's see what are the similarities between these two concepts. Both a hamburger and the moon are round and can be eaten with a forking knife. The moon can be eaten, both a hamburger and the moon are round and often associated with food. Both a hamburger and the moon can be associated with rounds and fullness. A hamburger with its round shape and the moon with its full face. Yeah, I guess you can be very poetic with this tool here, POV, let's see. Let's talk about fast food. This evaluates topic through different points of view. Fast food is cheap and convenient way to feed a family. Fast food is a convenient way to get a quick meal. Fast food is a delicious and convenient way to eat. Now a scene generate sensory details about the scene. Again, eating a hamburger in a hotel, I don't know what is going to generate a dry, overcooked burger patty. A hamburger that is so dry cracks when you bite into it sticky plastic bun. That's what the AI imagines when you are eating a hamburger in a hotel. Smiling thing or a concept. And it's going to create a simul about or a concept. Let's see hamburger. A hamburger is like a pi that lost its way in life. Why hamburger is like a pi with a hat on a pix? Well, you can think about it like that. Yeah, it's like a Pixa with a hat on. A hamburger is like a pixi has a bun, meat and cheese, and it's delicious. I guess the AI is, right? Make a scene more creative. Imagine a person eating a hamburger in a. Let's see what the AI imagines a person eating a hamburger in a hotel that's floating in the middle of a lake. A person eating a hamburger in a hole that is located on the moon. A person eating a hamburger in a hole that is made out of gingerbread. These are fictional things. Unexpected unfold, identify words and phrases that contain a given word. Hamburger again, hamburger says, back of the hand. Bowl of confusion, bowl of jarneauess. This is a little bit more unexpected. Anyways, this tool, text effects, can help you, if you are a professional writer, to give you ideas of lyrics for the song you just made. But it's something that you can also do with Chat GPT, but this gives you a nice UI to make all of these things. 5. What is Stable Audio: We have some alternatives to music. Lm, and I'm going to talk about a stable audio. First of all, generating music is not an easy task From a technical point of view. A stable audio was developed by the same people who created a stable diffusion, so they have experience doing this kind things. It uses the stable audio audio spark Audioparxv 10 model. They are working on using a new model, version 11. I think it's going to be more powerful. In the free version, you can generate up to 45 seconds of a song. Let's take a look at this website, which is stable audio.com You can create a free account and then you can go to the generate section here. As you can see, we have up to 20 songs per month. If you go to the pricing, you're going to see the free version. Monthly track generations 20 you can use, you can generate up to 45 seconds and the license is non commercial use. If you're a professional, you pay $12 a month and you can generate up to 500 of these tracks. The trucks can be up to 90 seconds and they can be commercially used. If you're an enterprise, then you have to get in touch with these people so that they can set your price. That's the pricing section. The user guide tells you, first of all, some examples of what this can do. As we saw in the Google website, you can explore all of these examples by yourself. Use a stable audio to generate full musical audio. Encompassing a range of instruments. Include as much detail as you can as you can tell. The more details you put into the prompt, the better the result. You can put individual stems, sound effects, et cetera, et cetera. I like that they are more explicit under interface guide. This is the interface they're telling you. For example, steps. It tells you the amount of generation steps used to create your audio track. A higher step count means greater processing and this can increase the quality of your audience likely. And they have found 50 is the sweet spot. Number of results you can generate, maximum five at a time, okay? But if you put four, this will cost you four tracks when generating. So be careful of that, because if you put five for one prompt, then you will only be able in the free version to generate four tracks. The seed, I already told you what the seed is, the default. This input is set to random, but you can put any number here. By using the same prompt and the same seed, you're going to have consistent outputs. The prompt strength controls how closely the model attempts to guide the audio to your text prompt. They have a block post for the model they are using, the one that I told you, the audio park X10, if you are interested in the technical details here, we also saw the licensing scheme here. As a free user, you can use the audio stable audio sample in your own music, but as a bad user you can use it for commercial use. You can't train AI models on the generate audio because that goes against their terms of service. They have, I guess, a better user guide on how to use this. In the next lecture, we're going to test out a stable audio to see if they generate better results. 6. Trying out Stable Audio: Okay, so let's take a look and test the stable audio. I'm going to put my same prompt, Modern chata. It has to be slow first, then fast and slow again. It has to be ansible. I didn't copy and pasted it, so I have to write it once again. Let's generate soundtrack with this description. Also, you have the guide here if you want to use the user guide. Let's see, Mother and pa chata. It takes a little bit more I guess, but we have to wait. Okay, it got generated. That's same. No, this doesn't sound like a Bachata at all. Let's see what happens if I change the smothering to sensual. But this is not a Bachata that makes me think Google LM is better. Maybe because they have more training data. I don't know, but let's give it a chance. Maybe stable audio. Wasn't trained with these genres. Maybe they were trained with, I don't know, Rock pop or some other kind of things. No, this doesn't sound like a chat at all. Let's see I by modifying the prompt, the typical, typical chat bungle, I'm going to put the strength to be 100% Let's see if by modifying the prompt like this, it's generating a better result. No, no, no, no. We have seen that stable audio is failing at generating Api chat. But again, you can try it with different genres. Maybe it generates better rock. I know. 7. Conclusion: What is the conclusion here? You can now write your own music with music. Elem, which was developed by Google Research, is designed to create music based on textual input. This metal is capable of producing extended periods of high quality music that adhere to the provided text instructions to experiment with music L M one can register for the AI test kitchen as of December 2023, however, for those interested solely in sample outputs, visiting the Google research website is an alternative option. We tried also stable audio, but we saw that music LM was better at generating Pachata. I'm selling Pata here because that's the only genre we generated. You need to try other kinds of music because maybe it's better at generating rock, I don't know, but I am a ba chata lover. I love to hear bachata. I was disappointed by a stable audio outputs. Musical was way superior that stable audio. Don't forget to follow me on social media. You can join my Discord channel, you can follow me in Scra and you can subscribe to my Jet Channel. I hope you enjoy discourse. See you in the next course.