Skip to playerSkip to main content
Tired of subscriptions or concerned about privacy? In this beginner-friendly tutorial, I show you how to set up your own local, private version of ChatGPT using the powerful Oobabooga Text-Generation-WebUI.

I walk through the entire installation process, from hardware requirements (6-8GB VRAM recommended) to cloning the library and selecting the right drivers for your GPU. You'll also learn how to navigate Hugging Face to find and download models like Vicuna Uncensored, understanding technical parameters like quantization and bit size so you can choose a model that fits your system's performance.

Original YouTube Tutorial: https://youtu.be/Vj3bRth6R4Y

Connected Videos in this Series:
* Python Installation Guide: https://www.dailymotion.com/video/xa71ury
* Git Installation Guide: https://www.dailymotion.com/video/xa71vgq
* Visual Studio C++ Libraries Tutorial: https://www.dailymotion.com/video/xa8ony2

Source/Model Resources:
* Oobabooga GitHub Repository: https://github.com/oobabooga/text-generation-webui
* Vicuna-7B-v1.5-GPTQ Model: https://huggingface.co/TheBloke/Vicuna-7B-v1.5-GPTQ

Video Details:
* Original Publish Date: April 17, 2024
* Focus: Local LLM / Text-Generation-WebUI / Private AI Chatbot
* Hardware Used: RTX 4060 Ti (16GB VRAM)

Follow lordcaocao2025 on Dailymotion for more technical AI research, local LLM guides, and generative workflow tutorials!

---
Connect with me:
📺 YouTube: https://www.youtube.com/@CaoCao2025
📱 TikTok: https://www.tiktok.com/@caocao20250
💎 Patreon: https://www.patreon.com/cw/Caocao2025

#ChatGPT #LocalAI #Oobabooga #LLM #AITutorial #Privacy #HuggingFace #lordcaocao2025
Transcript
00:01Hello, guys. Welcome back with me, Chow Chow 2025. Today, we're going to learn how to have
00:06a local private chat GBP slash bar on your own local computer. Not only for answering
00:12questions, but we could also role play and pretend to have a relationship with the AI.
00:16We could use this Rubabuga text generation web UI. This powerful tool is about to unlock
00:24a whole new level of creativity. So get ready to harness the power of AI for your writing
00:32projects. Rubabuga. Text generation web UI lets you generate unique and engaging text content
00:38with just a few clicks. Imagine writing stories, poems, scripts, marketing copy, or even emails
00:45with the help of AI. Sounds pretty awesome, right? Before we jump into the installation,
00:51let's make sure you have everything you need. Here's a quick checklist of prerequisites. First,
00:56you need to have Python and Git. Link on description on how to install Python and Git, or simply
01:01check the playlist. You also need GPU with at least six gigs of VRAM. Eight gigs is recommended.
01:07Higher is better. Try using my GTX 1660 Super. It was working, but the reaction is to slow,
01:15so it is not a very fun experience. After you get all that, access the library at github.com
01:21slash upbabuga slash text generation web UI. Simply clone this library by accessing folder where you want
01:34to download and call CMD there. Later. You simply copy paste this code and write git clone the link.
01:44After you clone it, you can now set it up for the first time by double-clicking start windows.bat
01:50and text generation web UI folder.
01:59When asked your GPU, select what you have since I use RTX 4060 Ti here. I select NVIDIA,
02:07and I also use the newest CUDA since I use RTX 3060 Ti. It will not start downloading a few
02:14large file,
02:15so it will skip the video. Fast forward.
02:24After everything is finished, you will be warned that you have not downloaded any model,
02:28and it will recommend that you download one. Also, the startup, only this long for the first time only.
02:34After this, everything is going to be so much faster. You will then inform the link. You can now
02:41access text generation web UI using the link provider. As you can see in model tab, we don't
02:49have any model yet. We need to download one first. So let's access hugging face first. In hugging face,
02:58let's search GPT. Click show all result in the model section. Filter it to text generation only.
03:10Since the look has many model, I access his profile and try to download one of his most download model,
03:15that is Vicuna Uncensored.
03:22To download the model, simply copy the hugging face username and model, and then go back to text generation web
03:28UI.
03:30From model tab, put the username SLA model name to text box download model in LoRa, and then click download.
03:38The model is going to be downloaded in the background. Just check the CMD for more detail.
03:45It's 4.5 gigs, so it's going to take a while. While we wait, let's go back to huggingface.co.
03:52You can see that each model has a number of how many data it was trained, like 7 billion, 13
03:57billion, or more.
03:59The more big the training data is, the more VRAM you're going to use.
04:03So make sure to not download something that your system couldn't handle.
04:07The size of the model also going to follow the size of the data it was trained to.
04:12The more data you use, the larger the size of the model will be.
04:16You could also check which branch that you want to use by checking GPTQ parameter.
04:22Bits is the bit size of the Qantas model.
04:27GS is GPTQ group size.
04:29Higher numbers use less VRAM, but have lower quantization accuracy.
04:34None is the lowest possible value.
04:37Act order true or false.
04:39Also known as disconnect.
04:40True results in better quantization accuracy.
04:43Some GPTQ clients have had issues with models that use act order plus group size, but this is
04:49generally resolved now.
04:51Dam percentage is a GPTQ parameter that affects how samples are processed for quantization.
04:560.0.1 is default, but 0.1 results in slightly better accuracy.
05:05GPTQ data set is the data set used for quantization.
05:09Using a data set more appropriate to the model's training can improve quantization accuracy.
05:14Note that the GPTQ data set is not the same as the data set used to train the model.
05:20Sequence length is the length of the data set sequences used for quantization.
05:24Ideally, this is the same as the model sequence length.
05:27For some very long sequence models like 16 plus K, a lower sequence length may have to be used.
05:33Note that a lower sequence length does not limit the sequence length.
05:37XL Llama compatibility decide whether this file can be loaded with XL Llama,
05:42which currently only supports L Llama models in 4 bytes.
05:46Here, the list of the branch.
05:49If you want to download a branch, just add the branch after the model name separate with colon symbol.
06:07Okay, the model is downloaded.
06:11Time to use it.
06:12Refresh it and then select the model from the drop-down list.
06:16It will auto-load the setting.
06:18Refer to the document.
06:20How to use the model since we use the main branch.
06:22The info said that it used 4 bits and use 2048 sequence length for a loader.
06:27Since it decided to use the XLL2HS, we're going to use the XLL loader.
06:32It seems everything already set up when we click the model.
06:35So just click load.
06:40Check the CMD to make sure the model already loaded.
06:44And then click chat.
06:45Remember, this is the basic chat without me giving any personality to the model.
06:49And here's some demonstration.
06:50I will fast forward this.
06:52Feel free to pause and read it yourself.
09:47Don't forget to like, share, and comment.
09:50See you again on the next episode.
09:51Have a nice day.
09:54Please subscribe.
Comments

Recommended