How to Train Custom AI Voice Models: Step-by-Step RVC Training Guide

lordcaocao2025

Ever wanted to clone a specific voice for your AI projects? In this tutorial, I break down the process of training your own RVC (Retrieval-based Voice Conversion) voice models.   We walk through the training tab in Mangio-RVC, covering the entire workflow: data preparation, preprocessing, feature extraction, and the actual training cycles (epochs). I also share tips on managing VRAM usage on mid-range hardware like the GTX 1660 Super and how to avoid common pathing errors.  Original YouTube Tutorial: https://youtu.be/svU58S2JA94  How to install Mangio-RVC: https://www.dailymotion.com/video/xa71r62  Video Details: * Original Publish Date: December 27, 2023 * Focus: AI Voice Training / RVC Model Creation * Video test using GTX 1660 Super  Follow lordcaocao2025 on Dailymotion for more technical AI research and generative workflow guides!  --- Connect with me: 📺 YouTube: https://www.youtube.com/@CaoCao2025 📱 TikTok: https://www.tiktok.com/@caocao20250 💎 Patreon: https://www.patreon.com/cw/Caocao2025

Transcript

00:00hi guys welcome back with me ciao ciao 2025 in how to train RFC models that is a

00:07voice models first you could use the monkey RFC fork here and click this train

00:14tab then you need to search for original voice the voice that to create

00:20the models to learn we want to make Yusuke Kitagawa voices you could

00:26actually get his voice sample from the games itself or you could search it on

00:32the internet especially in YouTube I see this is an example of Yusuke voice

00:37could you come with me let a lot of his voice here on only his voices so I don't

00:43really need to do anything more my persona has gained new power what a skill

00:50I'd like to try this skill soon our range of tactics is expanded okay it's like

00:54eight minutes video we already have to search so next we're going to create

01:02experiment name this is important you should name it Yusuke Kitagawa or

01:09whatever you want but remember the name cause this is gonna be a name that you

01:14could you know later if you stop training and you want to continue you could use

01:21this name again separate I don't use any says where the model has pitch guidance

01:28let's use it and sell it fit to because it's gonna be compatible

01:34this use whatever you have the CPU for the more you use the more process you take

01:41okay and now enter the path of the training folder just copy paste the link

01:49here and this is the first step so it's gonna be pretty fast you're just gonna

01:55process the data and okay I kind of forgot gonna create an error here I'm sure of it

02:04and we it felt literal for end this is because the link you put here must not

02:12contain any space so I'm going to move the voice model or the voice data to

02:19somewhere else that has no space so yeah there's now no space here who could just

02:25process that again and this time the errors and we happen and you just wait until it

02:40finish

02:41it's gonna take quite a while so this preparation okay and preprocess this is

02:46and it's it's a little pretty face pretty fast no pretty face okay next you need to

02:53do feature extraction just do it you know just click it after you make sure it

02:59that's the end preprocess here and RMVV because it's better so use this instead

03:05don't use the arrow thing here

03:09feature extraction it's gonna take more time than the process data and it's gonna take

03:19your GPU power here but you know I use GTX 1660 super and it's not really that hard you know

03:31okay don't worry if it create an error here this is pretty a problem

03:37everything fine as long as you see the console still working

03:44everything this is the error problem that I also have in stable diffusion

03:50feature none I think that's that's finished right

03:55yeah I think it's all feature done saying it's finished

04:00really sure but yeah let's try it next is self frequency it's fmv epoch this is mean epoch

04:09it's this thing gonna train the models and after the train the models you're gonna train

04:15again and then train again and then train again you know every time it create epochs then

04:20mean it create models that you could use and then that model's gonna be trained again and

04:25train again and train again so it's gonna save every like five epochs

04:31you need this if you somehow you got to exit that or you somehow you need to do something else

04:38and close the program you have not need to start from the start again

04:44if you see any models you know you will see that most model is created like 600 epochs to 1000

04:52epochs

04:53so the more epochs usually the better but I don't know 200

05:01150 also pretty good but in this case cause you know I'm making a video so I'm going to

05:09wait it to 20 epochs for now and let's see how it goes okay

05:18this is you could use it if you have less than 10 minutes it can speed up training but it's

05:25gonna take GPU memories

05:26so using at your own discretion if you want to use the PC to do something else not gonna use

05:36this and I don't know I never used this and it goes

05:41the x66 super has only six gigs I don't want to waste it cause I usually doing anything else while

05:51I'm training

05:54so yeah that's that just three in the models

06:02and now we just need to wait it's gonna take some while 30 epochs with 60 system super

06:11it's gonna take some time so we're going to skip this

06:14okay guys it's almost like 40 minutes more

06:20and it's finished training 20 epochs for 7 minutes video without using the catch all training this

06:28so let's try it out

06:30back to model inference

06:32let's refresh the voiceless

06:35and we're gonna have

06:37the Yuzuki Kitagawa here

06:39this is the 5 epochs

06:41this is the 10 epochs

06:43this is the 15

06:43this is the 20 epochs

06:45this is

06:46the PTH here

06:48it's

06:48actually

06:50the same with the 20 epochs

06:51it's just the largest checkpoint

06:55so we use this

06:57and let's

06:58let's try our voice now

07:00we have a sample voice here

07:01that is

07:04hello guys welcome back at Milo

07:06China voice

07:07in last

07:08I don't even test this out so

07:12you and you will just know it right now

07:18about the result

07:21yeah it's just this we're not going to set up any things here

07:24let's just convert

07:25okay how the result

07:28hello guys welcome back at Milo

07:30ChaiChau2025

07:32in last learning Excel

07:33today we're going to learn about sparkline

07:36sparkline is some kind of graph

07:39but instead of

07:40okay the result's still bad

07:43doesn't sound like Yuzuki at all

07:45but it's not sound like my voice also

07:49it's a different voice altogether

07:51but

07:52how did you get for 20 epochs

07:56okay I tried to reduce the octave into minus eight

08:00going to learn about spark

08:02it's

08:03it's

08:04it's

08:06it's a different voice altogether

08:07let's learn the Excel today we're going to learn about

08:10it's a little like Yuzuki now

08:12is some kind of graph

08:14but still far away for 20 epochs you know

08:18the best result should be like hundreds of epochs

08:23but now that's it guys thank you for watching

08:25don't forget to like share and comment and see you again on the next episode

08:28have a nice day

08:30if you want to see what we're going to do

08:31here

08:31you

08:31can

08:31get

08:31a

08:31you

08:32you

Category

Transcript

Comments

Recommended