Skip to playerSkip to main content
🇨🇳 BREAKING: China’s Qwen 3 AI model is here — and it’s sending shockwaves through the entire AI industry! 💥🤖

In this AI Revolution episode, discover:
🔥 How Qwen 3 outperforms major models like GPT-4
🧠 Why its open-weight hybrid design is a big deal
🌍 What this means for global AI competition
💡 The future of open AI frameworks

Qwen 3 isn’t just powerful — it’s a game-changer. Don’t miss this deep dive into the AI that’s rewriting the rules!

📢 Subscribe now for weekly updates on the AI arms race & emerging tech!

#Qwen3
#AIRevolution
#ChinaAI
#OpenWeightAI
#ArtificialIntelligence
#AIUpdate
#TechNews
#GPT4vsQwen
#NextGenAI
#MachineLearning
#AIComparison
#EmergingTech
#AIModel2025
#HybridAI
#OpenSourceAI
#AIShock
#GlobalAI
#ChineseTech
#AIvsAI
#FutureOfAI

Category

🤖
Tech
Transcript
00:00Alibaba just released Quen 3, a full family of AI models ranging from
00:07lightweight versions to a massive 235 billion parameter giant. These models can
00:12actually switch between deep thinking and fast answering, depending on the task,
00:16and they're open for anyone to download and use. With performance that matches or
00:21beats some of the best models from OpenAI and Google, this launch is already
00:24shaking up the AI world. Alright, so here's what actually dropped. It's not
00:28just one model, it's a whole lineup. At the smallest end, you've got a lightweight
00:33version with about 600 million parameters, basically a tiny model you could run on a
00:38decent laptop. And at the top, there's this absolute giant with 235 billion
00:44parameters called Quen 3235 BA22B. Now, even though that sounds massive, the cool
00:53part is it's actually smart about how it works. It doesn't fire up the whole thing
00:57every time you ask it something. Instead, it picks just a handful of experts out of
01:01128 possible ones. Only eight jump in to handle each question, so you get the power
01:06without wasting a ton of computing. There's also a slightly smaller powerhouse
01:10called Quen 330 BA3B, where only three billion parameters are active, making it
01:17even easier to run. And if you're not into the fancy expert systems and just want
01:21something simple, there are six regular versions too, from 32 billion parameters
01:26down to a tiny 0.6 billion one. All available completely free under an open
01:32license. You can already grab them on places like Hugging Face, GitHub, Modelscope,
01:37or Kaggle. And if you want to start playing with them right away, there are simple
01:41tools for that too. You can even run one with a single command in your terminal.
01:44Some cloud providers like Fireworks AI and Hyperbolic also made them available almost
01:49instantly after the launch, so no matter how you want to use it, it's ready.
01:53Now numbers are cool, but why should you actually care? Two big reasons, hybrid
01:57reasoning and sheer breadth. Hybrid reasoning means Quen 3 can literally
02:01switch brains mid-conversation. Out of the box, the model boots into what the team
02:06calls thinking mode, where it marches step-by-step, chain of thought visible
02:10under special tags, perfect for nasty math or code puzzles. Yank the handbrake by
02:15passing slash no underscore think in your prompt or toggling enable underscore
02:21thinking equals false in the chat template and QN races along in a non
02:26thinking fast path that drops the internal monologue and returns answers at
02:30near GPT 3.5 latency. You can bounce back to slash think whenever a query
02:35suddenly looks hairy. The most recent instruction always wins, so multi-turn flows
02:40stay sane. Internally, Alibaba says the fusion happened in a four-stage post-training
02:45pipeline. A cold start with long chain of thought data, reinforcement learning
02:49with rule-based rewards to push deeper reasoning, another RL round to graft
02:54quick answer behavior on top, and then a general RL sweep across more than 20
02:59everyday tasks to iron out weirdness. And as AI technology explodes with
03:05releases like Quen 3, there's never been a better time to level up your skills and
03:09turn these tools into real-world advantages, which is why I strongly
03:12recommend Outskill's two-day AI mastermind. It's the world's first AI-focused
03:17education platform backed by top artificial intelligence investors and founders.
03:22This weekend, they're running a 16-hour live training from 11 a.m. to 7 p.m. on both
03:28Saturday and Sunday. Normally priced at $895, it's free for my audience. In this
03:35program, you'll gain in-depth knowledge on a wide range of topics, including
03:39over 20 powerful artificial intelligence tools, prompt engineering for better
03:44results, data analysis without coding, using AI and Excel in creating pro-level
03:49presentations, building tools without writing code, creating stunning images and
03:54videos with AI, developing AI agents, automating tasks to save time and boost
04:00productivity. More than 1 million people from 40 countries have joined this training, and it's
04:05perfect for anyone from tech professionals to business owners and freelancers.
04:09Slots are filling up fast, so click the link in the description to book your spot.
04:14Don't forget to join their WhatsApp groups for updates, and there's an intro call
04:18this Friday at 10 a.m. Eastern Standard Time. Make sure you don't miss it.
04:22Now back to Quen 3. That brings us to the training diet. Remember Quen 2.5's 18 trillion tokens? Double it.
04:31Quen 3 chewed through roughly 36 trillion tokens, covering 119 languages and dialects. The engineers
04:40didn't just scrape another slice of the open web. They harvested PDF-style documents, sucked out text
04:46with Quen 2.5 VL, cleaned it with the plain Quen 2.5 model, and then generated synthetic math and code
04:54with Quen 2.5 math and Quen 2.5 coder. Stage 1 of pre-training ran on over 30 trillion tokens
05:02with a modest 4K context. Stage 2 pumped in a further 5 trillion, heavy on STEM and reasoning.
05:09And Stage 3 stretched the context window to 32K. And crucially, added data whose sequence lengths
05:17actually hit that ceiling. Out the other end came dense base models that match and in STEM often beat
05:24Quen 2.5 variants two to three times their size, while the Moe e-bases achieve the same accuracy
05:30with a tenth of the active parameters. And if 32K isn't enough for your novel length prompts,
05:36Yarn can warp that to 128K on the fly.
05:40Now benchmarks. Alibaba dragged the internal scoreboards into the blog post, and yeah, they're proud.
05:47The not-yet-public 235 BMOE nudges past OpenAI's O3 Mini and Google's Gemini 2.5 Pro on code forces,
05:55edges them on the latest IME math test and the BFCL logical reasoning suite, and basically lives in the
06:02same zip code as Grok 3. The biggest model most of us can actually download right now.
06:08Quen 332B still squeaks ahead of OpenAI's O1 on Live Code Bench, slots just behind DeepSeq R1 on
06:16aggregate math, and smashes Quen 2.572B Instruct even though it's less than half the size. The tiny 4B
06:24dense checkpoint straight up rivals the 72 billion parameter heavyweight from the previous generation,
06:30which, if you've ever tried to fit Llama 70B on a local RTX card, is very good news.
06:36Tool use and agentic behavior were a priority too. Out of the box, Quen 3 knows how to follow the MCP
06:44tool calling schema. Alibaba even ships a Python wrapper called Quen Agent that hides the call
06:51signatures, pipes JSON in and out, and comes with built-in utilities like a code interpreter,
06:57a fetch step, and a time zone service. Fire up an assistant object, point its model parameter at
07:03Quen 330BA3B, point API underscore key to your local VLLM endpoint. Yeah, they demo with API underscore
07:11key equals SMP. Very hacker chic. Then stream messages. Thoughts arrive wrapped between tags,
07:18which means you can store or discard them as you like. The blog post even includes a bread and butter
07:23conversational script. Ask how many R's in strawberries, with thinking on, then follow up,
07:30then how many R's in blueberries, slash no underscore think, and finally switch back with really, slash
07:37think. The model dutifully toggles modes each turn. Because everything's open sourced, the community is
07:43moving immediately. The Basetten CEO, Tuhin Srivastava, told TechCrunch that Quen 3 keeps the open model curve
07:51level with closed systems, and that American labs will have to sprint just to hold pace,
07:57especially now that Washington keeps tightening export rules on H-100 and Blackwell shipments to
08:03Chinese buyers. Those rules were obviously written to slow exactly this sort of progress, yet here we
08:09are staring at a Chinese Moe giant that outruns O3 Mini on math while using fewer active parameters,
08:17and anybody in the world can pip install it. So, yes, policy folks are going to have a busy week.
08:23Competition inside China is ferocious, too. Baidu's Ernie 4.5 Turbo went live last Friday with a new
08:30reasoning-heavy X1 Turbo variant promising leaner latency. DeepSeq is still riding hype from January
08:36when it bragged about training a frontier model cheaper than any Western lab. Alibaba's own response,
08:42then, hmm, Quen 2.5 Max was a stopgap. Quen 3 is the proper piece de resistance, the one the
08:49company claims merges conventional AI functions with advanced dynamic reasoning into an adaptable
08:55platform for developers. And look, the messaging is pretty explicit. They say the model matches,
09:02and in some cases, outperforms the best from Google and OpenAI. They mention direct head-to-head
09:08beats over O3 Mini, and they make sure the license is permissive enough that downstream startups can
09:14wrap the weights into commercial products without legal nightmares. Let's talk hardware for a sec.
09:19Moe routing helps, but moving 22 billion active parameters still wants at least eight decent GPUs
09:25if you care about throughput. Alibaba's docs recommend the new SGLang server. They added a Quen 3
09:32reasoning parser flag or VLLM with enable reasoning and the DeepSeq underscore R1 parser just for giggles.
09:40They demonstrate 128k context inference on HF spaces with sliding window kernels. It works as long as
09:47you mount fast swap. For local development, LM Studio, MLX on Apple Silicon, and believe it or not,
09:56plain Llama.CPP via K-Transformers all load the mid-tier weights. If you've got fewer GPUs,
10:03the 14B-dense variant comfortably fits in 24GB of VRAM at 8-bit. The 4B model fits on most gaming
10:12laptops and still answers stem questions like its swallowed Wolfram Alpha. One cool nugget buried in
10:19the release? Quen 3 keeps the embeddings untied on models bigger than 8B, but it ties them on the baby
10:25checkpoints and it halves the key-slash-value heads so compute scales better. Dense models cap
10:31context at 128k only for the 8B and larger SKUs. The smaller ones max out at 32k, unless you compile
10:38custom kernels. Meanwhile, the Mo Crew always speaks 128k. And because thinking mode can easily balloon the
10:46generated sequence length, every hidden step is serialized. Alibaba lets you parse out the thoughts
10:51after generation. In their sample code, they search for the special token ID 151668, slice the array,
11:00decode the pre-thought chunk for logging, and then hand the post-thought chunk to the user.
11:05The language coverage in Quen 3 is just insane. It speaks 119 languages and dialects,
11:11from English and Spanish all the way to Tokpisin and Fariz. No matter where your users are,
11:17this model probably understands them. Another thing, you're not just getting raw power,
11:22you're getting smart control too. Now thanks to how it's built, you can decide when it thinks
11:27deeply and when it answers fast, keeping things efficient and affordable, especially if you're
11:32running lots of queries. Investors definitely noticed. Alibaba's stock jumped a bit after launch,
11:38and now everyone's watching to see if other Chinese giants like Tencent and ByteDance will open their
11:43models too. They're also watching Washington. Another chip restriction tranche could land literally
11:49any quarter now. Looking ahead, Alibaba's made it clear they're not stopping here. They write that
11:56Quen 3 is a significant milestone on the road to AGI, and they pledge to keep scaling everything,
12:02parameters, context, modalities, reinforcement, learning with environment feedback,
12:06to transform today's models into tomorrow's agents. Anyway, that's the scoop. If you spin it up,
12:12let me know how the tool calling works in your stack, or how many Rs you find in strawberries
12:18before the model corrects you. Smash the like button if this breakdown saved you a few hours
12:23of reading specs, subscribe for more deep dives, and I'll catch you in the next one.
Be the first to comment
Add your comment

Recommended