AI Weekly Insights #82

Multi-Agent Models, Open-Source Delays, and Comet’s Quiet Revolution

Happy Sunday,

It’s edition 82 of ‘AI Weekly Insights’, and this week felt like someone yanked the wheel mid-race. OpenAI’s open-weight model hit the brakes again, Grok dropped a multi-agent beast of a model, and Windsurf bailed on OpenAI only to pop up on Google’s doorstep. Also, Perplexity launched a browser that lowkey wants to replace Google.

Ready? Let’s dive in.

The Insights

For the Week of 07/06/25 - 07/12/25 (P.S. Click the story’s title for more information 😊):

  • What’s New: xAI has released Grok 4 and Grok 4 Heavy, a new family of multimodal AI that combine long memory, real-time search, and advanced reasoning into one powerful system.

  • Muscle with a Big Brain: Grok 4 can handle text and image input, run web searches as it responds, and use up to a million tokens of context in the Heavy version. It also introduces “multi-agent” processing, which means it can think through complex problems by bouncing ideas between copies of itself. Early benchmarks put it ahead of models like Gemini Pro and even OpenAI’s o3 in reasoning tests. Users can subscribe for $30/month, or drop $300/month for Grok 4 Heavy, which unlocks all the high-end features. Voice and video capabilities are already in the pipeline. A study also found that Grok sometimes pulls from Elon Musk’s public posts on X when answering political or polarizing questions, which is weird but not entirely shocking.

  • Why It Matters: Grok 4 is genuinely impressive, at least based on the benchmarks we’ve seen so far. The biggest standout is its reported performance on Humanity’s Last Exam, which, if accurate, nearly doubles the previous best score from Gemini 2.5 Pro. That puts it in rare territory and suggests Grok isn’t just fast, but actually thinking in a more deliberate way. The introduction of multi-agent processing is another big shift. Instead of just using tools like code execution or browsing, Grok spins up internal agents that collaborate on answers. I can definitely see this becoming a standard for models like ChatGPT or Claude in the near future. The Elon-posts thing is weird, and if the model is pulling from Musk’s personal opinions on sensitive topics, that raises real questions about objectivity, especially for something that markets itself as “truth-seeking”. If Musk is wrong, so is Grok, and that’s not great. But bias aside, this feels like the first release from xAI that earns its spot in the big leagues. Not just hype, but real capability that could shift how these tools are built going forward.

Image Credits: xAi

  • What's New: Perplexity has unveiled Comet, a Chromium-based browser that bakes its Perplexity AI assistant into every tab and lets users search, summarize, and take actions without touching Google Search.

  • AI Web Browsing: Comet drops a floating assistant pane onto any page, so you can ask follow-up questions, auto-summarize articles, or fire off “shop,” “schedule,” and “send” commands straight from the sidebar. It keeps a running, searchable memory of your browsing history to cut down on tab overload and lost links. The browser can re-rank results in real time, pulling from multiple engines to give a single answer instead of ten blue links. Privacy gets a headline slot too: Comet claims all on-page analysis runs locally, with only anonymized queries hitting Perplexity’s cloud. Right now the browser is invite-only for Perplexity Max subscribers at two-hundred bucks a month, though the company says a free tier is coming once the early feedback loop tightens up. Comet ships on macOS and Windows today, with Linux and mobile builds in testing.

  • Why it Matters: Browsers have felt stale for a while, and Comet is a serious swing at turning the address bar into a conversational workspace. If it works, you could leave Google entirely while still getting fast answers, shopping help, and calendar automation from the same place you read news or check email. That chips away at Google’s ad moat, and you can bet the company is watching retention charts. I’m most excited about the persistent memory: being able to ask “where was that study I skimmed last week about micro-plastics” and have the browser just surface it is the kind of quality-of-life boost that keeps knowledge workers loyal. The invite wall and Max price tag feel steep, but Perplexity is clearly stress-testing before scaling to a wider crowd. The real question is whether AI answers inside the browser can stay factual and up to date without leaning on the same Google index they are trying to replace. For now, Comet plants a bold flag that browsing and searching are ripe for a rebuild.

  • What's New: OpenAI’s planned acquisition of Windsurf has fallen through, its CEO, co-founder, and other key engineers are now heading to Google DeepMind.

  • Big Talent, Bigger Shift: Windsurf built its name on “Cascade,” an agent-based AI coding platform that quickly attracted over a million developers. The company was set to become a foundational part of OpenAI’s strategy, but the deal reportedly stalled over intellectual property terms. Once the exclusivity window expired, Google made its move, locking in not just a license to the tech but also the founding team. Varun Mohan and co-founder Douglas Chen will now join DeepMind alongside other top engineers. Windsurf itself remains independent and has named Jeff Wang as interim CEO. Google will likely fold this tech into its Gemini development stack.

  • Why it Matters: This is a major shakeup in the race for serious AI-powered developer tools. Windsurf wasn’t just another coding assistant, it was one of the more interesting examples of agents that could reason across complex projects. OpenAI does already have agentic systems in place, like the Codex CLI and their new cloud-based coding agent, but nothing quite positioned for developers at this scale. I’d even say Deep Research counts as an agent too, though it leans more into reading and analysis than engineering. This deal felt like a play to close that gap and maybe chip away at Cursor, which has carved out a business by blending AI and IDE workflows. Anthropic probably saw the risk too, since they reportedly held back Sonnet 4 access from Windsurf for a bit while the deal was on the table. Now that Google owns the people and the license, they’ve got a big leg up in folding this into Gemini’s dev stack. For OpenAI, the question isn’t just what they lost, it’s whether they’re going to double down on their in-house tools or go shopping again.

  • What's New: OpenAI CEO Sam Altman announced the company is postponing the release of its open-weight model as they run extra safety tests and review high-risk areas before it hits public release.

  • Safety First… Again: Altman first pushed the release from June to “later this summer,” but this weekend the company confirmed a fresh delay, now indefinitely. The open-weight model is expected to have reasoning capabilities similar to the o3 series and will allow developers to run it locally or via cloud services like Azure or Hugging Face. This cautious rollout comes just as Moonshot AI launched Kimi K2, a trillion-parameter model that outperforms OpenAI in some benchmarks, raising pressure to move quickly. Internally, OpenAI has also tightened security and access controls around research, signaling a broader shift toward risk management. For now, the open-weight model remains locked while other players accelerate.

  • Why it Matters: This delay says a lot about how OpenAI views its responsibilities and risks. Releasing open weights isn't just a flex, it means anyone can fine-tune or modify the model without needing OpenAI’s infrastructure. That’s powerful, but it also invites concerns around misuse. I understand that hesitation. Once it’s out in the wild, there’s no patching it later. At the same time, OpenAI doesn’t have a true open-weight model in the field right now, while competitors like Meta and now Moonshot are building public trust and market share fast. This project was supposed to signal that OpenAI could support both proprietary tools and the open community, and the longer they wait, the harder that message is to sell. Personally, I’d rather they release it when it’s truly ready, but the timing risk is real. If GPT-5 drops before this model does, the window for impact might already be closed.

That’s a wrap on another week of AI twists and trade-offs. Whether you're team Grok, still waiting on OpenAI’s open model, or just wondering if your next browser tab should talk back, we’ve got you covered. The landscape’s shifting fast, but staying curious keeps you grounded.

Stay sharp, stay skeptical, and maybe try asking your browser to explain this newsletter back to you.

Catch you next Sunday!

Warm regards,
Kharee