00:02Google just quietly dropped a mysterious AI app, then pulled it within hours.
00:07Gemini may be getting a new omnivideo model.
00:10DeepMind is testing an AI co-clinician for doctors and telemedicine.
00:14OpenAI added animated pets and smarter workflow tools to codecs.
00:18Anthropic is red-teaming a new clawed build called Jupyter.
00:22And Mistral's new open-source model is getting criticized for its price and performance.
00:27So, let's talk about it.
00:30Alright, first, Google quietly pushed a new Android app to the Play Store called Cosmo.
00:35The listing described it as an experimental AI assistant application for Android devices.
00:41And the package name was com.google.research.air.cosmo,
00:47which already made it feel more like a research testbed than a finished consumer product.
00:52The pitch sounded familiar.
00:54Cosmo was supposed to bring artificial intelligence directly onto your device,
00:58help organize your day, answer complex questions,
01:01and work behind the scenes to simplify your life.
01:03So, initially, it sounded almost like another Gemini app.
01:07The interesting part is what people found inside.
01:10Cosmo can run with a local Gemini Nano model, a remote PI server,
01:15or a hybrid mode that switches between local and server AI depending on what is available.
01:19That means Google may be testing a more flexible assistant setup,
01:23where some tasks happen directly on the phone while heavier tasks go through the cloud.
01:28It also taps into Android's Accessibility Service API,
01:32which means it is designed to read or interact with what is happening on your screen.
01:36In theory, that is exactly the kind of access a serious phone assistant needs.
01:41It can see context, understand what app you are using, and potentially help across the whole device.
01:47In testing, though, this feature did not seem fully ready.
01:51And that basically sums up Cosmo.
01:53It had specific AI skills, some enabled and some disabled.
01:57The assistant worked, yet it felt rougher than the main Gemini app.
02:01Even the Play Store screenshots were stretched into the wrong aspect ratio,
02:05which made the whole thing look like it went public too early.
02:09Then Google pulled it.
02:10On May 1st, after the app had been spotted, the listing disappeared for most users.
02:16People who had already installed it could still view the page when logged into the same account,
02:20while everyone else got a not found message.
02:23So either Google accidentally published an internal experiment,
02:27or it released something early and immediately realized it was not ready for public attention.
02:32And while that was happening, another Google leak started pointing towards something much bigger.
02:37A possible new Gemini video generation tool called Omni.
02:42A screenshot from Gemini's video generation tab showed the line,
02:45Start with an idea or try a template, powered by Omni.
02:49That matters because Omni appears in the actual visible interface,
02:54not only buried in hidden code.
02:56Right now, Gemini's video generation flow is powered by VO 3.1,
03:01while image generation is tied to Nano Banana 2 and Nano Banana Pro.
03:05Google describes Nano Banana Pro as built on Gemini 3,
03:09while Nano Banana 2 is tied to Gemini 3.1 flash image.
03:14So the big question is whether Omni is just a new wrapper around VO,
03:17a new video model, or an early version of a larger Gemini Omni model
03:22that can handle images and video inside one system.
03:25That would be a major shift.
03:27Google currently has a split media strategy.
03:29VO handles video.
03:31Gemini-based Nano Banana models handle images.
03:34Omni could bring those tracks closer together
03:36and make Gemini feel more like one unified creative system
03:39instead of a collection of separate models.
03:41The timing also makes sense.
03:43Google I.O. 2026 runs May 19 and May 20,
03:47and Google already said Gemini and broader AI updates will be part of the event.
03:51So if Omni is real, I.O. is the obvious stage.
03:56And this is coming during a very competitive moment in AI video.
04:00ByteDance's Sedance 2.0 has been sitting at the top of video generation benchmarks,
04:05so Google has a clear reason to push harder here.
04:08Then we have Google DeepMind's most serious announcement from this batch,
04:12AI co-clinician.
04:13This is basically Google's idea for how AI could help doctors without replacing them.
04:18The problem is simple.
04:19Hospitals need better care, lower costs, and faster support,
04:24while the world is heading toward a shortage of more than 10 million health workers by 2030.
04:28Google's idea is a three-part care system.
04:31The patient, the doctor, and an AI assistant working under the doctor's control.
04:36So the AI can help with research, medical notes, patient support, and live conversations,
04:42while the physician still makes the real decisions.
04:45And the early results are interesting.
04:47Google tested the system on 98 realistic primary care questions.
04:51In 97 of them, the AI made zero critical errors.
04:55Doctors also preferred its answers over leading medical evidence tools.
04:59Then Google pushed it further with medication questions from OpenFDA RXQA.
05:04These are tricky because medicine is full of details, side effects, interactions, and edge cases.
05:11On open-ended questions, the kind doctors actually ask in real life.
05:15Google says AI co-clinician beat available frontier models.
05:19The most futuristic part was telemedicine.
05:22Google built on Gemini and Project Astra to give this AI eyes, ears, and a voice.
05:28In simulated video calls, it could listen, watch, and guide patients through basic physical checks.
05:33It corrected someone's inhaler technique and helped guide shoulder movements to check for a rotator cuff injury.
05:39Still, Google is being careful here.
05:42Human doctors performed better overall, especially when spotting serious warning signs and guiding critical exams.
05:48The AI matched or beat primary care doctors in 68 out of 140 tested areas, which is impressive, but it
05:55also proves the point.
05:57This is a helper, not a replacement.
05:59To make it safer, Google uses a dual-agent setup.
06:03One AI talks to the patient, while another AI watches the conversation and checks that it stays within safe medical
06:09limits.
06:09For now, this is research only.
06:12It is not meant to diagnose, treat, or give medical advice.
06:16Google is testing it with partners across the US, India, Australia, New Zealand, Singapore, and the UAE.
06:23Now, over at OpenAI, Codex got one of the weirdest updates we've seen from a serious coding product, animated pets.
06:30The Codex desktop app now has pixel art pets that sit as overlays on top of the screen, even when
06:37Codex is minimized.
06:38There are eight predefined pets, and they show little message bubbles about what Codex is doing in the background.
06:44If a pet speaks while a task is running, you can click it and reply back to the agent.
06:49So what looks like a cute status indicator also becomes a small interaction channel.
06:54Users can summon or hide them with the slash pet command.
06:57Then there is Hatch, a bundled skill that lets users upload any image and turn it into an animated pet.
07:03The pet gets saved inside the local Codex home folder so people can package and share them.
07:09Community directories like Petshare and Petdex already started appearing, and X filled up with custom creations almost immediately.
07:16The playful part is easy to mock, yet the same update also adds more practical features.
07:22Codex can now auto-detect configuration files left behind by other coding agents,
07:26including CloudCode's cloud.md, and import them.
07:30That means project rules, plugins, and conventions can move across tools with less manual setup.
07:37For developers, switching between agents because of weekly limits or different strengths, that lowers friction.
07:43There is also a dictation dictionary in settings, where users can preload abbreviations and phrases that voice input usually gets
07:49wrong.
07:50Put together, this makes Codex feel less like a simple coding assistant and more like a desktop layer for AI
07:57work.
07:57OpenAI is adding personality, portability, and voice polish around the agent experience.
08:03Raw coding performance still matters, obviously, yet stickiness and daily workflow are becoming part of the product race too.
08:10Anthropic may be preparing its own move.
08:13A new internal build called Claude, Jupiter version 1, has reportedly entered red teaming ahead of the code with Claude
08:19Developer Conference in San Francisco on May 6.
08:22The name Jupiter is probably just an internal codename.
08:25Anthropic has used planet names before for pre-release safety testing.
08:28Last year, a similar process used the codename Neptune before the Claude 4 family launched.
08:34That pattern is why people are paying attention now.
08:37The current Claude lineup has Opus 4.7 as the flagship.
08:40Sonnet 4.7 and Haiku 4.7 are still missing, which leaves room for a mid-tier and small-tier
08:47refresh.
08:48There is also speculation that this could connect to the Mythos Foundation that surfaced in earlier reporting.
08:53The red team process itself fits Anthropic's responsible scaling policy, which includes jailbreak probes and constitutional classifier stress tests before
09:01frontier deployments.
09:02So the May 6 event could bring a full new generation, a Claude 4.7 expansion, or something else.
09:08Either way, Jupiter suggests Anthropic has something close enough to test seriously.
09:13And finally, Mistral AI dropped Mistral Medium 3.5 and the internet reaction was rough.
09:19Mistral announced the model on April 29.
09:22It is a dense 128 billion parameter model with agentic features.
09:26The release also included Mistral Vibe CLI, which runs remote coding agents in the cloud,
09:31pushes pull requests to GitHub, and can run tasks in parallel.
09:35The chat also got Work Mode, designed for multi-step autonomous tasks like email triage, research synthesis, and cross-tool
09:43workflows.
09:44On Benchmarks, Medium 3.5 scores 77.6% on SWE Bench Verified, which tests whether a model can fix
09:52real GitHub issues with working patches.
09:54It also reaches 91.4% on Tau Cubed Telecom, a benchmark for agentic tool use in specialized environments.
10:02Mistral also merged Medium 3.1, Magistral, and Devstral 2 into one unified set of weights with configurable reasoning effort.
10:11That is genuinely useful engineering.
10:13The problem is price and competition.
10:15Mistral charges $1.50 per million input tokens and $7.50 per million output tokens.
10:21Meanwhile, Alibaba's Quen 3.6 has 27 billion parameters, less than a quarter of Mistral's size,
10:29scores 72.4% on SWE Bench Verified, and ships under Apache 2.0, so developers can download it and
10:37run it freely.
10:38Chinese open-source models like Quen, GLM from Xipu AI, and Mimo V2 from Xiaomi now dominate much of the
10:47open-source conversation.
10:48Mistral Medium 3.5 has not even ranked on major independent leaderboards yet since third-party evaluations are still pending.
10:56That is why the reaction was so mixed.
10:58Critics argued that Mistral is expensive, weaker than top rivals, and falling behind the Chinese open-source wave.
11:04Others pointed out the one thing that still makes Mistral matter.
11:07It is one of the only serious Western open-weight options.
11:10For European enterprises, that matters a lot.
11:13Banks, governments, and companies dealing with GDPR may avoid Chinese infrastructure and may also want alternatives to American AI labs.
11:22Mistral is EU-headquartered, auditable, self-hostable, and legally easier for many European buyers.
11:28HSBC already signed a multi-year deal with Mistral to self-host models on its own infrastructure.
11:34Alright, so the next few days could be very interesting, especially with Anthropics event on May 6th and Google I
11:41.O. coming later this month.
11:42Thanks for watching, and catch you in the next one.
Bình luận