00:01Hello, guys. Welcome back with me, Chow Chow 2025. Today, we're going to learn how to have
00:06a local private chat GBP slash bar on your own local computer. Not only for answering
00:12questions, but we could also role play and pretend to have a relationship with the AI.
00:16We could use this Rubabuga text generation web UI. This powerful tool is about to unlock
00:24a whole new level of creativity. So get ready to harness the power of AI for your writing
00:32projects. Rubabuga. Text generation web UI lets you generate unique and engaging text content
00:38with just a few clicks. Imagine writing stories, poems, scripts, marketing copy, or even emails
00:45with the help of AI. Sounds pretty awesome, right? Before we jump into the installation,
00:51let's make sure you have everything you need. Here's a quick checklist of prerequisites. First,
00:56you need to have Python and Git. Link on description on how to install Python and Git, or simply
01:01check the playlist. You also need GPU with at least six gigs of VRAM. Eight gigs is recommended.
01:07Higher is better. Try using my GTX 1660 Super. It was working, but the reaction is to slow,
01:15so it is not a very fun experience. After you get all that, access the library at github.com
01:21slash upbabuga slash text generation web UI. Simply clone this library by accessing folder where you want
01:34to download and call CMD there. Later. You simply copy paste this code and write git clone the link.
01:44After you clone it, you can now set it up for the first time by double-clicking start windows.bat
01:50and text generation web UI folder.
01:59When asked your GPU, select what you have since I use RTX 4060 Ti here. I select NVIDIA,
02:07and I also use the newest CUDA since I use RTX 3060 Ti. It will not start downloading a few
02:14large file,
02:15so it will skip the video. Fast forward.
02:24After everything is finished, you will be warned that you have not downloaded any model,
02:28and it will recommend that you download one. Also, the startup, only this long for the first time only.
02:34After this, everything is going to be so much faster. You will then inform the link. You can now
02:41access text generation web UI using the link provider. As you can see in model tab, we don't
02:49have any model yet. We need to download one first. So let's access hugging face first. In hugging face,
02:58let's search GPT. Click show all result in the model section. Filter it to text generation only.
03:10Since the look has many model, I access his profile and try to download one of his most download model,
03:15that is Vicuna Uncensored.
03:22To download the model, simply copy the hugging face username and model, and then go back to text generation web
03:28UI.
03:30From model tab, put the username SLA model name to text box download model in LoRa, and then click download.
03:38The model is going to be downloaded in the background. Just check the CMD for more detail.
03:45It's 4.5 gigs, so it's going to take a while. While we wait, let's go back to huggingface.co.
03:52You can see that each model has a number of how many data it was trained, like 7 billion, 13
03:57billion, or more.
03:59The more big the training data is, the more VRAM you're going to use.
04:03So make sure to not download something that your system couldn't handle.
04:07The size of the model also going to follow the size of the data it was trained to.
04:12The more data you use, the larger the size of the model will be.
04:16You could also check which branch that you want to use by checking GPTQ parameter.
04:22Bits is the bit size of the Qantas model.
04:27GS is GPTQ group size.
04:29Higher numbers use less VRAM, but have lower quantization accuracy.
04:34None is the lowest possible value.
04:37Act order true or false.
04:39Also known as disconnect.
04:40True results in better quantization accuracy.
04:43Some GPTQ clients have had issues with models that use act order plus group size, but this is
04:49generally resolved now.
04:51Dam percentage is a GPTQ parameter that affects how samples are processed for quantization.
04:560.0.1 is default, but 0.1 results in slightly better accuracy.
05:05GPTQ data set is the data set used for quantization.
05:09Using a data set more appropriate to the model's training can improve quantization accuracy.
05:14Note that the GPTQ data set is not the same as the data set used to train the model.
05:20Sequence length is the length of the data set sequences used for quantization.
05:24Ideally, this is the same as the model sequence length.
05:27For some very long sequence models like 16 plus K, a lower sequence length may have to be used.
05:33Note that a lower sequence length does not limit the sequence length.
05:37XL Llama compatibility decide whether this file can be loaded with XL Llama,
05:42which currently only supports L Llama models in 4 bytes.
05:46Here, the list of the branch.
05:49If you want to download a branch, just add the branch after the model name separate with colon symbol.
06:07Okay, the model is downloaded.
06:11Time to use it.
06:12Refresh it and then select the model from the drop-down list.
06:16It will auto-load the setting.
06:18Refer to the document.
06:20How to use the model since we use the main branch.
06:22The info said that it used 4 bits and use 2048 sequence length for a loader.
06:27Since it decided to use the XLL2HS, we're going to use the XLL loader.
06:32It seems everything already set up when we click the model.
06:35So just click load.
06:40Check the CMD to make sure the model already loaded.
06:44And then click chat.
06:45Remember, this is the basic chat without me giving any personality to the model.
06:49And here's some demonstration.
06:50I will fast forward this.
06:52Feel free to pause and read it yourself.
09:47Don't forget to like, share, and comment.
09:50See you again on the next episode.
09:51Have a nice day.
09:54Please subscribe.
Comments