How to Install and Use AI Text to Music Engine, AudioCraft Plus (Free and Fast)

lordcaocao2025

Ready to unlock the power of local AI music generation? This comprehensive tutorial dives into the world of AudioCraft Plus and its user-friendly interface, Pinokio.  Even if you're new to AI music creation, this video is your perfect starting point to confidently generate unique and captivating music based on your creative ideas.  What You'll Learn: Seamless Installation: Step-by-step guide to installing and configuring AudioCraft Plus locally using Pinokio.  Music Generation: How to craft descriptive prompts, experiment with settings, and witness your musical vision come to life.  Fine-Tuning: Explore different parameters and techniques to personalize your AI-generated music to achieve the exact style and mood you want.  Video Details: Original Publish Date: 7 June 2024  Focus: Local installation and generation workflow for the AudioCraft Plus text-to-music engine using the Pinokio launcher.  Test Setup: Local Windows Environment with an RTX 4060 Ti  Project Resources & Links: Prerequisite Tutorial (How to Install Pinokio): https://youtu.be/3ywhr3eOQX0  AudioCraft Plus GitHub Source: https://github.com/GrandaddyShmax/audiocraft_plus/tree/plus  Official AudioCraft Models (Meta): https://huggingface.co/facebook (Includes MusicGen and AudioGen models used by the engine)  Follow lordcaocao2025 on Dailymotion for more storage optimization tips, advanced local UI configurations, and AI tutorial workflows!  Connect with me: 📺 YouTube: https://www.youtube.com/@CaoCao2025 📱 TikTok: https://www.tiktok.com/@caocao20250 💎 Patreon: https://www.patreon.com/cw/Caocao2025  #AIMusic #AudioCraftPlus #Pinokio #AIWorkflows #RTX4060Ti #TechTutorial #MusicGen #LocalAI #ArtificialIntelligence

Transcript

00:00Hello guys, welcome back with me. Ciao ciao, 2025. Today we are going to learn about how to install and

00:07use Audiocraft Plus, a text-to-music AI engine.

00:12Before we start, here are some examples of music that created by Audiocraft Plus.

00:18Old school punk with badass guitar riffs.

00:48Pause music that touched your hearth and make you cry.

01:17Ska music with catchy beat, groovy bass, and exciting trumpet.

01:49Retro 8-bit Nintendo music for epic role-playing game game.

02:19Rock with saturated guitars, a heavy bass line, and crazy drum break and fills.

02:52AudioCraft is a PyTorch library.

02:54For deep learning research on audio generation.

02:59Audiocraft contains inference and training code for two state-of-the-art AI generative models producing high-quality audio, Audiogen,

03:09and MusicIn.

03:11Audiocraft Plus is an all-in-one web UI for the original Audiocraft, adding many quality features on top like

03:19Audiogen model, multi-band diffusion, custom model support, generation metadata, and audio info tab, mono to stereo, multi-prompt, prompt

03:30segmentation, structure prompts, video output, customization, and music continuation.

03:35Open Pinocchio, then visit Discover page by clicking Visit Discover page, or by clicking Discover icon on the top screen.

03:45Getting thing from Discover Verified Publisher lower the risk of you running dangerous and malicious script.

03:52From here, simply search Audiocraft Plus.

03:56Click the Audiocraft Plus installation script by Cocktail Peanut.

04:02Then from the Download screen, click Download to start the installation process.

04:07A window of requirement will show up on the screen.

04:11Here you need to download every single thing on the list.

04:16And since Pinocchio act like virtual computer, you will need to download every single thing here, even though you already

04:23have this thing on your computer.

04:25To continue the process, click Install.

04:29Pinocchio, then, will install all those requirement one step at a time, so take your time.

04:35You could see the progress of the installation by looking on the prompt and list of the step that has

04:41been done on the right bottom screen of Pinocchio.

04:44Click Install.

05:02Later, you will be asked to install Visual Studio Tools 2019.

05:08Just let it progress.

05:09The screen will go back to the original requirement, since the installation for this will be done on your current

05:16computer outside Pinocchio.

05:18Just let the installation finished.

05:24After that, simply click Install again.

05:28This time, Pinocchio will install the Visual Studio Tool 2019 into itself.

05:42After that, you will now able to install AudioCraft Plus.

05:57Click again AudioCraft Plus from UA Engine List.

06:01This time, click Install on the left side of Pinocchio.

06:06Installation will start with Pinocchio will download all the Python requirement.

06:22After installation is finished, you can now use AudioCraft Plus V2.01.

06:30To stop the AI Engine, simply click Stop in Pinocchio screen or simply close Pinocchio.

06:36Now open Pinocchio again, and click AudioCraft Plus again.

06:41This time, the Install menu will change into Start.

06:46Click Start, and Pinocchio will show prompt that generate Gradio Link.

06:51Click it to open AudioCraft Plus.

06:56How to use AudioCraft Plus to generate music.

07:01AudioCraft Plus has 4 model that is small, medium, large, and melody.

07:07Go to the setting, and select the model that you want to use.

07:12What you need to know about this 4 model, that the model is not able to generate realistic vocals.

07:18It's been trained with English descriptions and will not perform as well in other languages.

07:24It does not perform equally well for all music styles and cultures.

07:29The model sometimes generates end of songs, collapsing to silence.

07:34It is sometimes difficult to assess what types of text descriptions provide the best generations.

07:41Prompt engineering may be required to obtain satisfying results.

07:45With seed 14-18-41, 21-60-99, 87-9, and prompt happy pop music with piano, drum, melody, and

07:56guitar.

07:57Here are the results of those 4 model.

08:01Small models.

08:02Small models.

08:23Medium model.

08:45Large model.

08:47Large model.

08:48high model.

09:08Melody Model

09:29Remember Melody Model is basically a medium model with melody-to-music capability,

09:35like an in-painting or image-to-image model in stable diffusion.

09:40If you have small VRAM, try using the small model.

09:44The large model take about 10.1 gigs of VRAM.

09:49So if you have 8 gigs VRAM, try using small or medium model.

09:54Remember every time you change model in the setting,

09:57if this is the first time you use the model, it will first download the model.

10:02It will only do this once.

10:05And every time you change the model that you use, it will need to load it first,

10:09so the first generation will feel a little bit slower.

10:13You can also use a custom model, but we will not talk about it in this video.

10:18In the output audio channel, you could select Mono, Stereo, or Stereo effect.

10:25Mono is a straightforward single channel audio.

10:29Stereo is a dual channel audio, but it will sound more or less like Mono.

10:34Stereo effect, this one is also dual channel, but uses tricks to simulate a stereo audio.

10:41In decoder, you could use the default or multiband diffusion.

10:46Multiband diffusion is a decoder that uses diffusion to generate the audio.

10:50Top K determines the number of most likely NEXT tokens to consider at each step of the generation process.

10:58A smaller value of K results in a more focused and deterministic output,

11:04while a larger value of K allows for more diversity in the generated music.

11:09Top P, also known as Nucleus Sampling or Probabilistic Sampling,

11:15is another method used for token selection during text generation.

11:19It selects the smallest possible set of tokens whose cumulative probability exceeds a certain threshold,

11:26usually denoted as P.

11:28This approach ensures that the generated output maintains a balance between diversity and coherence,

11:35as it allows for a varying number of tokens to be considered based on their probabilities.

11:41Temperature is a parameter that controls the randomness of the generated output,

11:45where A, a higher temperature, can introduce more variability and creativity into the generated music,

11:51but it may also lead to less coherent or structured compositions.

11:56On the other hand, a lower temperature can produce more repetitive and predictable music.

12:03Classifier-free guidance, or CFG, is the same in stable diffusion is like how faithful the AI to follow the

12:09prompt.

12:10For now, let's generate some music.

12:13Let's simply input a prompt.

12:15This time we try pop music that touch your hearth, piano, guitar, and melody.

12:22Change the duration to 20-second.

12:25Output audio channel to stereo effect.

12:51The result is quite good.

12:53Remember that the processing time is depend on the model that you use and the complexity of the prompt.

12:59Also, it depends on any other setting.

13:03For 20-second music, it takes about 40, 100-second processing time with my Arts 4060 Ti 16 gigs.

13:11You can also download the result by clicking the generated file, the file size, for either the EM4, WAV, or

13:21JSON file.

13:22Let's try generate another music for 30-second duration.

13:26A grand orchestral arrangement with thunderous percussion, epic brass fanfares, and soaring strings creating a cinematic atmosphere fit for a

13:37heroic journey.

13:38To me.

13:39To me.

13:51To me.

13:55To me.

13:59To me.

14:06To me.

14:09Now, let's make a longer song with a bunch of segment.

14:14Click the structure prompt and write the global prompt.

14:18Global prompt is where you write the prompt that you wish to be used for all prompt segments.

14:24In this case, we will write it as old school punk music.

14:31We will add five segment and the duration will be 100 second.

14:35With multi-prompt will allows us to control the music, adding variation to different time segments.

14:42You have up to 10 prompt segments.

14:46The first prompt will always be 30s long and the other prompts will be up to 30 second

14:50minus overlap, for example.

14:52If the overlap is 4 second, each prompt segment will up to 26 second.

14:58Now let's add each of the segment.

15:00First prompt will be opening with catchy guitar rift, followed by bass and drums.

15:06Second prompt will be a punchy double bass and a distorted guitar riff.

15:11Third prompt will be saturated guitars, a heavy bass line and crazy drum break and fills.

15:19Fourth prompt will be electronic guitar solo.

15:23Fifth promote will be close with another catchy guitar rift.

15:30That's the result.

17:36Now, let's add the BPM to 215, key using G and scale still major.

19:50Well, that is the result if you set up your BPM and key.

19:54Anyway, guys, I think that's it for today video.

19:59Don't forget to like, share, and comment.

20:02See you again on the next episode, and have a nice day.

Category

Transcript

Comments

Recommended