Skip to playerSkip to main content
  • 2 days ago
Ever wanted to clone a specific voice for your AI projects? In this tutorial, I break down the process of training your own RVC (Retrieval-based Voice Conversion) voice models.

We walk through the training tab in Mangio-RVC, covering the entire workflow: data preparation, preprocessing, feature extraction, and the actual training cycles (epochs). I also share tips on managing VRAM usage on mid-range hardware like the GTX 1660 Super and how to avoid common pathing errors.

Original YouTube Tutorial: https://youtu.be/svU58S2JA94

How to install Mangio-RVC:
https://www.dailymotion.com/video/xa71r62

Video Details:
* Original Publish Date: December 27, 2023
* Focus: AI Voice Training / RVC Model Creation
* Video test using GTX 1660 Super

Follow lordcaocao2025 on Dailymotion for more technical AI research and generative workflow guides!

---
Connect with me:
๐Ÿ“บ YouTube: https://www.youtube.com/@CaoCao2025
๐Ÿ“ฑ TikTok: https://www.tiktok.com/@caocao20250
๐Ÿ’Ž Patreon: https://www.patreon.com/cw/Caocao2025
Transcript
00:00hi guys welcome back with me ciao ciao 2025 in how to train RFC models that is a
00:07voice models first you could use the monkey RFC fork here and click this train
00:14tab then you need to search for original voice the voice that to create
00:20the models to learn we want to make Yusuke Kitagawa voices you could
00:26actually get his voice sample from the games itself or you could search it on
00:32the internet especially in YouTube I see this is an example of Yusuke voice
00:37could you come with me let a lot of his voice here on only his voices so I don't
00:43really need to do anything more my persona has gained new power what a skill
00:50I'd like to try this skill soon our range of tactics is expanded okay it's like
00:54eight minutes video we already have to search so next we're going to create
01:02experiment name this is important you should name it Yusuke Kitagawa or
01:09whatever you want but remember the name cause this is gonna be a name that you
01:14could you know later if you stop training and you want to continue you could use
01:21this name again separate I don't use any says where the model has pitch guidance
01:28let's use it and sell it fit to because it's gonna be compatible
01:34this use whatever you have the CPU for the more you use the more process you take
01:41okay and now enter the path of the training folder just copy paste the link
01:49here and this is the first step so it's gonna be pretty fast you're just gonna
01:55process the data and okay I kind of forgot gonna create an error here I'm sure of it
02:04and we it felt literal for end this is because the link you put here must not
02:12contain any space so I'm going to move the voice model or the voice data to
02:19somewhere else that has no space so yeah there's now no space here who could just
02:25process that again and this time the errors and we happen and you just wait until it
02:40finish
02:41it's gonna take quite a while so this preparation okay and preprocess this is
02:46and it's it's a little pretty face pretty fast no pretty face okay next you need to
02:53do feature extraction just do it you know just click it after you make sure it
02:59that's the end preprocess here and RMVV because it's better so use this instead
03:05don't use the arrow thing here
03:09feature extraction it's gonna take more time than the process data and it's gonna take
03:19your GPU power here but you know I use GTX 1660 super and it's not really that hard you know
03:31okay don't worry if it create an error here this is pretty a problem
03:37everything fine as long as you see the console still working
03:44everything this is the error problem that I also have in stable diffusion
03:50feature none I think that's that's finished right
03:55yeah I think it's all feature done saying it's finished
04:00really sure but yeah let's try it next is self frequency it's fmv epoch this is mean epoch
04:09it's this thing gonna train the models and after the train the models you're gonna train
04:15again and then train again and then train again you know every time it create epochs then
04:20mean it create models that you could use and then that model's gonna be trained again and
04:25train again and train again so it's gonna save every like five epochs
04:31you need this if you somehow you got to exit that or you somehow you need to do something else
04:38and close the program you have not need to start from the start again
04:44if you see any models you know you will see that most model is created like 600 epochs to 1000
04:52epochs
04:53so the more epochs usually the better but I don't know 200
05:01150 also pretty good but in this case cause you know I'm making a video so I'm going to
05:09wait it to 20 epochs for now and let's see how it goes okay
05:18this is you could use it if you have less than 10 minutes it can speed up training but it's
05:25gonna take GPU memories
05:26so using at your own discretion if you want to use the PC to do something else not gonna use
05:36this and I don't know I never used this and it goes
05:41the x66 super has only six gigs I don't want to waste it cause I usually doing anything else while
05:51I'm training
05:54so yeah that's that just three in the models
06:02and now we just need to wait it's gonna take some while 30 epochs with 60 system super
06:11it's gonna take some time so we're going to skip this
06:14okay guys it's almost like 40 minutes more
06:20and it's finished training 20 epochs for 7 minutes video without using the catch all training this
06:28so let's try it out
06:30back to model inference
06:32let's refresh the voiceless
06:35and we're gonna have
06:37the Yuzuki Kitagawa here
06:39this is the 5 epochs
06:41this is the 10 epochs
06:43this is the 15
06:43this is the 20 epochs
06:45this is
06:46the PTH here
06:48it's
06:48actually
06:50the same with the 20 epochs
06:51it's just the largest checkpoint
06:55so we use this
06:57and let's
06:58let's try our voice now
07:00we have a sample voice here
07:01that is
07:04hello guys welcome back at Milo
07:06China voice
07:07in last
07:08I don't even test this out so
07:12you and you will just know it right now
07:18about the result
07:21yeah it's just this we're not going to set up any things here
07:24let's just convert
07:25okay how the result
07:28hello guys welcome back at Milo
07:30ChaiChau2025
07:32in last learning Excel
07:33today we're going to learn about sparkline
07:36sparkline is some kind of graph
07:39but instead of
07:40okay the result's still bad
07:43doesn't sound like Yuzuki at all
07:45but it's not sound like my voice also
07:49it's a different voice altogether
07:51but
07:52how did you get for 20 epochs
07:56okay I tried to reduce the octave into minus eight
08:00going to learn about spark
08:02it's
08:03it's
08:04it's
08:06it's a different voice altogether
08:07let's learn the Excel today we're going to learn about
08:10it's a little like Yuzuki now
08:12is some kind of graph
08:14but still far away for 20 epochs you know
08:18the best result should be like hundreds of epochs
08:23but now that's it guys thank you for watching
08:25don't forget to like share and comment and see you again on the next episode
08:28have a nice day
08:30if you want to see what we're going to do
08:31here
08:31you
08:31you
08:31you
08:31can
08:31get
08:31a
08:31you
08:31you
08:31you
08:31you
08:32you
08:32you
Comments

Recommended