00:00OpenAI, which is only really open about consuming all the world's energy,
00:04just got rattled to its core. DeepSeek, a new AI startup run by a Chinese hedge fund,
00:09created a new open weights model called R1 that allegedly beats OpenAI's best models
00:15in most metrics. And they did it for $6 million, with GPUs that run at half the
00:21memory bandwidth of OpenAI's. Tony Stark was able to build this in a cave with a bunch of scraps!
00:28Besides the embarrassment of a Chinese startup beating OpenAI using 1% of the resources,
00:33their model can distill other models to make them run better on slower hardware.
00:38Meaning, this Raspberry Pi can run one of the best local Quen AI models even better now.
00:43OpenAI's entire moat is predicated on people not having access to the insane energy and GPU
00:49resources to train and run massive AI models. But that moat disappears if anyone can buy a GPU
00:56and run a model that's good enough, for free, anytime they want.
01:00But sensationalist headlines aren't telling you the full story. This Raspberry Pi can technically
01:05run DeepSeek R1. But it's not the same thing as DeepSeek R1 671B, which is a 400GB model.
01:12That model, the one that actually beats ChatGPT, still requires a massive amount of GPU compute.
01:18But the big difference is, assuming you have a few 3090s, you could run it at home. You don't have to pay
01:24OpenAI for the privilege of running one of their fancy models. You can just install a llama,
01:29download DeepSeek, and play with it to your heart's content. And even if you don't have
01:34a bunch of GPUs, you could technically still run DeepSeek on any computer with enough RAM.
01:39Like here it's running on my 192 core Ampere 1 server. It's running DeepSeek 671B at about 4 tokens
01:46per second. Which isn't crazy fast, but this server won't set you back like 100,000 bucks either.
01:52Even though it's only using a few hundred watts, which is honestly pretty amazing,
01:56a noisy server like this isn't going to be in everyone's living room. A Raspberry Pi could be,
02:01though. So let's look at how the smaller 14B model runs on it. It's definitely not going to win any
02:07speed records. Testing a few different prompts, I got about 1.2 tokens per second. I mean, it runs,
02:13but if you want a chatbot for, like, rubber duck debugging, or to give you a few ideas for your next
02:18YouTube title, this isn't fun. But we can speed things up. A lot. All we need is an external
02:24graphics card. Because GPUs and the VRAM on them are way faster than CPUs and system memory.
02:30I have this setup I've been testing with an AMD W7700 graphics card. It has 16 gigs of speedy VRAM,
02:37and as long as it can fit the whole AI model in that, it should be way faster than any CPU. And it is,
02:43like, 10 times faster. I can get between 20 to 50 tokens per second, depending on the type of work
02:49I'm doing. Here's the raw output from an interactive session, and if I look at NVtop,
02:53I can see all this processing is being done on the GPU. And if I run LlamaBench, it's reporting 24 to 54
02:59tokens per second. And this GPU isn't even targeted at LLMs. You can go a lot faster. If you're interested
03:06in running GPUs on Raspberry Pis, or maybe even other ARM boards, well, you're in for a treat this year.
03:12Not only do we have AMD GPUs working great, the new Intel open-source drivers are also working.
03:18Somewhat. And NVIDIA might be in the cards, too. On top of that, I have an Orion 06,
03:24a CM5 ITX board, and even the Hi5 Premier P550, all of which have full-size x16 PCIe slots. So,
03:32even if the year of the Linux desktop will never come, at least we'll get custom ARM and RISC-V PCs.
03:37AI is still in a massive bubble. NVIDIA just lost more than half a trillion dollars in value in one
03:43day after DeepSeq was launched. But their stock price is still eight times higher today than it
03:49was in 2023, and it's not like anyone's hyping up AI any less now. The one good takeaway, I think,
03:55is people might realize we don't need to devote more than half the world's energy resources or set up a
04:00Dyson Sphere around the sun just to help computers solve trillions of multiplication problems to spit
04:06out another thousand mediocre web apps. The other takeaway is that there's new confusion in AI models
04:12over who precisely is Winnie the Pooh. Until next time, I'm Jeff Geerling.
Comments