Đi đến trình phátĐi đến nội dung chính
  • 18 phút trước
Google quietly dropped a mysterious new AI app called COSMO, then pulled it within hours. The app looked like an experimental Android assistant that could run with Gemini Nano, connect to remote AI, and potentially interact with what is happening on your screen. At the same time, Gemini may be getting a new Omni video model, DeepMind is testing an AI co-clinician, OpenAI updated Codex with animated pets and workflow tools, Anthropic is testing Claude Jupiter, and Mistral’s new model is getting criticized.
Phụ đề
00:02Google just quietly dropped a mysterious AI app, then pulled it within hours.
00:07Gemini may be getting a new omnivideo model.
00:10DeepMind is testing an AI co-clinician for doctors and telemedicine.
00:14OpenAI added animated pets and smarter workflow tools to codecs.
00:18Anthropic is red-teaming a new clawed build called Jupyter.
00:22And Mistral's new open-source model is getting criticized for its price and performance.
00:27So, let's talk about it.
00:30Alright, first, Google quietly pushed a new Android app to the Play Store called Cosmo.
00:35The listing described it as an experimental AI assistant application for Android devices.
00:41And the package name was com.google.research.air.cosmo,
00:47which already made it feel more like a research testbed than a finished consumer product.
00:52The pitch sounded familiar.
00:54Cosmo was supposed to bring artificial intelligence directly onto your device,
00:58help organize your day, answer complex questions,
01:01and work behind the scenes to simplify your life.
01:03So, initially, it sounded almost like another Gemini app.
01:07The interesting part is what people found inside.
01:10Cosmo can run with a local Gemini Nano model, a remote PI server,
01:15or a hybrid mode that switches between local and server AI depending on what is available.
01:19That means Google may be testing a more flexible assistant setup,
01:23where some tasks happen directly on the phone while heavier tasks go through the cloud.
01:28It also taps into Android's Accessibility Service API,
01:32which means it is designed to read or interact with what is happening on your screen.
01:36In theory, that is exactly the kind of access a serious phone assistant needs.
01:41It can see context, understand what app you are using, and potentially help across the whole device.
01:47In testing, though, this feature did not seem fully ready.
01:51And that basically sums up Cosmo.
01:53It had specific AI skills, some enabled and some disabled.
01:57The assistant worked, yet it felt rougher than the main Gemini app.
02:01Even the Play Store screenshots were stretched into the wrong aspect ratio,
02:05which made the whole thing look like it went public too early.
02:09Then Google pulled it.
02:10On May 1st, after the app had been spotted, the listing disappeared for most users.
02:16People who had already installed it could still view the page when logged into the same account,
02:20while everyone else got a not found message.
02:23So either Google accidentally published an internal experiment,
02:27or it released something early and immediately realized it was not ready for public attention.
02:32And while that was happening, another Google leak started pointing towards something much bigger.
02:37A possible new Gemini video generation tool called Omni.
02:42A screenshot from Gemini's video generation tab showed the line,
02:45Start with an idea or try a template, powered by Omni.
02:49That matters because Omni appears in the actual visible interface,
02:54not only buried in hidden code.
02:56Right now, Gemini's video generation flow is powered by VO 3.1,
03:01while image generation is tied to Nano Banana 2 and Nano Banana Pro.
03:05Google describes Nano Banana Pro as built on Gemini 3,
03:09while Nano Banana 2 is tied to Gemini 3.1 flash image.
03:14So the big question is whether Omni is just a new wrapper around VO,
03:17a new video model, or an early version of a larger Gemini Omni model
03:22that can handle images and video inside one system.
03:25That would be a major shift.
03:27Google currently has a split media strategy.
03:29VO handles video.
03:31Gemini-based Nano Banana models handle images.
03:34Omni could bring those tracks closer together
03:36and make Gemini feel more like one unified creative system
03:39instead of a collection of separate models.
03:41The timing also makes sense.
03:43Google I.O. 2026 runs May 19 and May 20,
03:47and Google already said Gemini and broader AI updates will be part of the event.
03:51So if Omni is real, I.O. is the obvious stage.
03:56And this is coming during a very competitive moment in AI video.
04:00ByteDance's Sedance 2.0 has been sitting at the top of video generation benchmarks,
04:05so Google has a clear reason to push harder here.
04:08Then we have Google DeepMind's most serious announcement from this batch,
04:12AI co-clinician.
04:13This is basically Google's idea for how AI could help doctors without replacing them.
04:18The problem is simple.
04:19Hospitals need better care, lower costs, and faster support,
04:24while the world is heading toward a shortage of more than 10 million health workers by 2030.
04:28Google's idea is a three-part care system.
04:31The patient, the doctor, and an AI assistant working under the doctor's control.
04:36So the AI can help with research, medical notes, patient support, and live conversations,
04:42while the physician still makes the real decisions.
04:45And the early results are interesting.
04:47Google tested the system on 98 realistic primary care questions.
04:51In 97 of them, the AI made zero critical errors.
04:55Doctors also preferred its answers over leading medical evidence tools.
04:59Then Google pushed it further with medication questions from OpenFDA RXQA.
05:04These are tricky because medicine is full of details, side effects, interactions, and edge cases.
05:11On open-ended questions, the kind doctors actually ask in real life.
05:15Google says AI co-clinician beat available frontier models.
05:19The most futuristic part was telemedicine.
05:22Google built on Gemini and Project Astra to give this AI eyes, ears, and a voice.
05:28In simulated video calls, it could listen, watch, and guide patients through basic physical checks.
05:33It corrected someone's inhaler technique and helped guide shoulder movements to check for a rotator cuff injury.
05:39Still, Google is being careful here.
05:42Human doctors performed better overall, especially when spotting serious warning signs and guiding critical exams.
05:48The AI matched or beat primary care doctors in 68 out of 140 tested areas, which is impressive, but it
05:55also proves the point.
05:57This is a helper, not a replacement.
05:59To make it safer, Google uses a dual-agent setup.
06:03One AI talks to the patient, while another AI watches the conversation and checks that it stays within safe medical
06:09limits.
06:09For now, this is research only.
06:12It is not meant to diagnose, treat, or give medical advice.
06:16Google is testing it with partners across the US, India, Australia, New Zealand, Singapore, and the UAE.
06:23Now, over at OpenAI, Codex got one of the weirdest updates we've seen from a serious coding product, animated pets.
06:30The Codex desktop app now has pixel art pets that sit as overlays on top of the screen, even when
06:37Codex is minimized.
06:38There are eight predefined pets, and they show little message bubbles about what Codex is doing in the background.
06:44If a pet speaks while a task is running, you can click it and reply back to the agent.
06:49So what looks like a cute status indicator also becomes a small interaction channel.
06:54Users can summon or hide them with the slash pet command.
06:57Then there is Hatch, a bundled skill that lets users upload any image and turn it into an animated pet.
07:03The pet gets saved inside the local Codex home folder so people can package and share them.
07:09Community directories like Petshare and Petdex already started appearing, and X filled up with custom creations almost immediately.
07:16The playful part is easy to mock, yet the same update also adds more practical features.
07:22Codex can now auto-detect configuration files left behind by other coding agents,
07:26including CloudCode's cloud.md, and import them.
07:30That means project rules, plugins, and conventions can move across tools with less manual setup.
07:37For developers, switching between agents because of weekly limits or different strengths, that lowers friction.
07:43There is also a dictation dictionary in settings, where users can preload abbreviations and phrases that voice input usually gets
07:49wrong.
07:50Put together, this makes Codex feel less like a simple coding assistant and more like a desktop layer for AI
07:57work.
07:57OpenAI is adding personality, portability, and voice polish around the agent experience.
08:03Raw coding performance still matters, obviously, yet stickiness and daily workflow are becoming part of the product race too.
08:10Anthropic may be preparing its own move.
08:13A new internal build called Claude, Jupiter version 1, has reportedly entered red teaming ahead of the code with Claude
08:19Developer Conference in San Francisco on May 6.
08:22The name Jupiter is probably just an internal codename.
08:25Anthropic has used planet names before for pre-release safety testing.
08:28Last year, a similar process used the codename Neptune before the Claude 4 family launched.
08:34That pattern is why people are paying attention now.
08:37The current Claude lineup has Opus 4.7 as the flagship.
08:40Sonnet 4.7 and Haiku 4.7 are still missing, which leaves room for a mid-tier and small-tier
08:47refresh.
08:48There is also speculation that this could connect to the Mythos Foundation that surfaced in earlier reporting.
08:53The red team process itself fits Anthropic's responsible scaling policy, which includes jailbreak probes and constitutional classifier stress tests before
09:01frontier deployments.
09:02So the May 6 event could bring a full new generation, a Claude 4.7 expansion, or something else.
09:08Either way, Jupiter suggests Anthropic has something close enough to test seriously.
09:13And finally, Mistral AI dropped Mistral Medium 3.5 and the internet reaction was rough.
09:19Mistral announced the model on April 29.
09:22It is a dense 128 billion parameter model with agentic features.
09:26The release also included Mistral Vibe CLI, which runs remote coding agents in the cloud,
09:31pushes pull requests to GitHub, and can run tasks in parallel.
09:35The chat also got Work Mode, designed for multi-step autonomous tasks like email triage, research synthesis, and cross-tool
09:43workflows.
09:44On Benchmarks, Medium 3.5 scores 77.6% on SWE Bench Verified, which tests whether a model can fix
09:52real GitHub issues with working patches.
09:54It also reaches 91.4% on Tau Cubed Telecom, a benchmark for agentic tool use in specialized environments.
10:02Mistral also merged Medium 3.1, Magistral, and Devstral 2 into one unified set of weights with configurable reasoning effort.
10:11That is genuinely useful engineering.
10:13The problem is price and competition.
10:15Mistral charges $1.50 per million input tokens and $7.50 per million output tokens.
10:21Meanwhile, Alibaba's Quen 3.6 has 27 billion parameters, less than a quarter of Mistral's size,
10:29scores 72.4% on SWE Bench Verified, and ships under Apache 2.0, so developers can download it and
10:37run it freely.
10:38Chinese open-source models like Quen, GLM from Xipu AI, and Mimo V2 from Xiaomi now dominate much of the
10:47open-source conversation.
10:48Mistral Medium 3.5 has not even ranked on major independent leaderboards yet since third-party evaluations are still pending.
10:56That is why the reaction was so mixed.
10:58Critics argued that Mistral is expensive, weaker than top rivals, and falling behind the Chinese open-source wave.
11:04Others pointed out the one thing that still makes Mistral matter.
11:07It is one of the only serious Western open-weight options.
11:10For European enterprises, that matters a lot.
11:13Banks, governments, and companies dealing with GDPR may avoid Chinese infrastructure and may also want alternatives to American AI labs.
11:22Mistral is EU-headquartered, auditable, self-hostable, and legally easier for many European buyers.
11:28HSBC already signed a multi-year deal with Mistral to self-host models on its own infrastructure.
11:34Alright, so the next few days could be very interesting, especially with Anthropics event on May 6th and Google I
11:41.O. coming later this month.
11:42Thanks for watching, and catch you in the next one.
Bình luận

Được khuyến cáo