- Tech Insights 2025 Week 36by Johan Sanneblad
Aaron Levie, CEO at Box wrote this on X two days ago:
“We’re going to look back on the world that was pre-AI and be absolutely astonished by how slow everything was. Every week at Box as we got AI-first, we highlight a workflow where someone internally built an AI agent for automating some process. It can be things like an HR process, sales outreach, responding to RFPs, handling a compliance workflow, writing documentation, and so on. Usually my immediate reaction is “I can’t believe we used to have to do all this work manually”.
The amount of time that we have to spend on researching, writing, or moving data between steps just to get to the thing we actually are trying to do (like hire someone, close a deal, etc.) is insane. And interestingly, as a result of this exercise, I also actually see plenty of areas where I’d be willing to invest more in people as a result of the process getting more efficient. The ROI calculus all of a sudden changes on lots of types of work when agents can actually accelerate the process or grow the output.
Wild times ahead in the era of the AI-first work.”
Often when I hold my seminars, a typical question I receive is “show us actual examples, we hear a lot but don’t see a lot”. The main reason people still see so few examples are that most companies started building their agentic solutions in the past 3-5 months. And while preliminary results (at least for our clients) has so far been outstandingly good, most companies are not yet at a state where they want to show to others what they have done. Companies like Box that are successfully implementing Agentic AI have a real competitive edge, and asking them to show you exactly what they are doing makes little sense. What I can tell you however is that there are LOTS of companies building agentic AI solutions right now, and so far the ROI looks outstandingly good. If you wait for 6-9 months I guarantee you will see hundreds of companies showing their agentic AI solutions and how much money and time they save with them. Or, you can just start developing your own agentic AI solutions, and instead be one of those companies that show actual agentic solutions in production early next year and how much time and money you have saved from creating them.
Thank you for being a Tech Insights subscriber!
Listen to Tech Insights on Spotify: Tech Insights 2025 Week 36 on Spotify
THIS WEEK’S NEWS:
- Anthropic Makes AI Training With Claude Chats Opt-Out by Default
- Anthropic Launches Claude for Chrome Extension
- Google DeepMind Launches AI Image Editor “Nano Banana”
- Stanford Study Finds AI Reduces Entry-Level Employment by 13%
- Tencent Hunyuan Releases HunyuanVideo-Foley Open-Source Video-to-Audio AI
- Microsoft Releases VibeVoice, Long-Form Multi-Speaker Text-to-Speech Model
- Microsoft Introduces Two In-House AI Models for Speech and Text Generation
- xAI Launches Grok Code Fast 1, Speed-Focused Model for Autonomous Coding Tasks
- OpenAI Makes Realtime Voice API Generally Available with New gpt-realtime Model
- OpenAI Codex Gets GPT-5 Update with IDE Integration
- Apple Adds Claude AI to Xcode Development Environment
- Alibaba Open-Sources Wan2.2-S2V Audio-to-Video AI Model
Anthropic Makes AI Training With Claude Chats Opt-Out by Default
https://www.anthropic.com/news/updates-to-our-consumer-terms
The News:
- Anthropic now requires all Claude private consumer users to decide by September 28 whether their conversations can be used to train AI models.
- Previously the company automatically deleted all consumer chat data within 30 days and did not use any input for model training. This policy change now extends data retention to five years for users who do not opt out of data sharing.
- Affected users include Claude Free, Pro, and Max plans plus Claude Code sessions under those plans.
- Business customers using Claude Gov, Claude for Work, Claude for Education, or API access remain unaffected.
My take: IMPORTANT! If you are using Claude Code today with a Max subscription you must immediately go to to https://claude.ai/settings/data-privacy-controls and disable “Help improve Claude. Allow the use of your chats and coding sessions to train and improve Anthropic AI models”. I strongly oppose having to “opt out” for AI training (I always believe it should be opt in), and seeing Anthropic adding this to their most costly AI subscription at $200 per month was almost shocking.
I can see why they did it though, since this is the first step to force all companies into paying for the “Premium seat” at $150 which gives you around 3 hours of Claude Opus 4 every week, compared to over 30 hours with the $200 Max 20 subscription. The premium seat license is where Anthropic will be making the big money, and having the Max20 subscriptions default at opt in will make organizations think twice before allowing it. With the recent improvements to OpenAI Codex this week I do however think this decision was the wrong one by Anthropic. The $150 premium license is not very interesting for companies due to the limited Opus usage included, and now the $200 Max 20 license will default to opt-in. I really hope Anthropic will revert this change for their Max subscriptions.
Read more:
Anthropic Launches Claude for Chrome Extension
https://www.anthropic.com/news/claude-for-chrome
The News:
- Anthropic released Claude for Chrome in limited preview, a browser extension that enables Claude AI to interact directly with websites by clicking buttons, filling forms, and navigating web pages on users’ behalf.
- The initial launch is restricted to 1,000 Max plan subscribers (costing $100-200 monthly) with a waitlist available at claude.ai/chrome for broader access.
- The extension operates through a side panel that maintains context of browser activity, allowing users to authorize Claude to perform actions and complete tasks automatically.
- Internal testing revealed prompt injection vulnerabilities where hidden commands on websites could trick Claude into harmful actions, achieving a 23.6% success rate without safeguards.
- Anthropic implemented defensive measures including site-level permissions, action confirmations, and advanced classifiers that reduced attack success to 11.2%, though the company acknowledges risks remain.
- The company recommends avoiding use on sites containing financial, legal, or medical information during this research preview phase.
My take: I think it’s important to note here that even if it sounds really bad with 11% successful prompt injection, the main reason for this is that current models are not specifically trained to control browsers or computer programs. We have just gotten the first models designed for agentic use, which perform really well with documents and texts, but it will take us until the next generation until we get models that are really good at controlling web browsers and office applications. I have no doubt it will happen, this is the next evolution of automation, but just like we need models that are specifically trained for long-running agentic tasks, the models need to be specifically trained on how to use software applications for them to be somewhat resilient to attacks. My guess is that we will start seeing such models next year.
Google DeepMind Launches AI Image Editor “Nano Banana”
https://blog.google/products/gemini/updated-image-editing-model
The News:
- Google DeepMind released Gemini 2.5 Flash Image, an AI photo editing model that maintains character likeness across edits. The model addresses the key challenge where AI edits often distort faces or alter subjects when making changes.
- Users can change backgrounds, combine multiple photos, and apply iterative edits while preserving the original subject’s appearance, whether editing photos of people or pets. The model allows blending elements from separate images, such as merging a solo photo with a pet to create composite scenes.
- Advanced features include style transfer between images and multi-turn editing for progressive scene building. Users can continuously refine aspects of the same image, such as adding different furniture pieces or background elements.
- The model is available free in the Gemini app, with generated images including visible watermarks and invisible SynthID digital watermarks. Google prices developer access at $30 per million output tokens through the Gemini API.
- The model achieved top ratings on LMArena under the anonymous codename “nano-banana” before its official launch. Google claims users prefer it over OpenAI and other competitors based on Elo scoring methodology.
My take: If you have ever tried using ChatGPT to do minor modifications to an image then you know how bad the results are that comes out of it. This is not the case with “Nano Banana”, the working name for the updated version of “Google Gemini 2.5 Flash Image”. I have used Gemini 2.5 Flash Image for dozens of images already, and it has amazingly good context awareness. Ask it to move things around, rotate things, change camera, it can do everything and the consistency of the output image compared to the source is often remarkably close. If you haven’t tried it yet you should, it’s something you haven’t seen before. If you need inspiration on how to use it for real production, check this video: Google Nano Banana is WILD – 50+ Use Cases – YouTube
Read more:
Stanford Study Finds AI Reduces Entry-Level Employment by 13%
https://digitaleconomy.stanford.edu/wp-content/uploads/2025/08/Canaries_BrynjolfssonChandarChen.pdf
The News:
- Stanford researchers analyzed ADP payroll data from millions of U.S. workers and found that AI automation has caused a 13% employment decline for workers aged 22-25 in the most AI-exposed occupations since late 2022.
- Employment dropped specifically in roles where AI automates rather than augments work, including software development, customer service, accounting, and marketing positions.
- Experienced workers in the same AI-exposed occupations maintained stable employment or continued growing, while younger workers in less-exposed fields like nursing also saw normal growth patterns.
- The study covered data from January 2021 through July 2025, using ADP’s database as the largest payroll processor in the United States, tracking employment changes across tens of thousands of firms.
- Labor market adjustments appeared primarily through employment reductions rather than wage changes, suggesting possible wage stickiness during the early AI adoption period.
My take: If AI reduced entry-level employment by 13% in the past three years, I have no doubt it will reduce it by at least 30% in the coming years due to the more advanced agentic AI models launched earlier this year. So what can you do if you are a young promising engineer that are targeting an area soon to be dominated by AI automation like programming, testing, customer service, accounting or marketing? First – university degrees will probably be required for everyone wanting to work in these areas, especially for young talent. If you only have a 2-year degree then your skills at the time you are finished will not be up to the level where you will be able to control an advanced AI agent creating production-quality source code at a large company. You need deep skills in architecture, computer science, algorithms and statistics, and 2 years is not enough to get that. It can actually be a risk for companies to employ young talent without university degrees when things like software development is moving 5-10 times faster than it did previously. Secondly, you need to build your AI and prompting skills. I believe most job interviews within 12 months will require you to demonstrate how well you can work with AI to be productive and competitive. It will not be enough to exit school with good grades, you must know how to use AI systems. But if you are in this position – B.Sc or M.Sc degree with strong AI skills, I believe you will be one of the most sought-after resources on the market in the coming years.
Read more:
Tencent Hunyuan Releases HunyuanVideo-Foley Open-Source Video-to-Audio AI
http://szczesnys.github.io/hunyuanvideo-foley/
The News:
- Tencent Hunyuan released HunyuanVideo-Foley, an open-source end-to-end Text-Video-to-Audio (TV2A) framework that generates synchronized audio for video content.
- The model generates contextually-aware soundscapes trained on a 100,000-hour multimodal dataset covering natural landscapes, animated shorts, and complex video scenes.
- HunyuanVideo-Foley processes both visual and textual information through a Multimodal Diffusion Transformer (MMDiT) architecture that balances video and text inputs to create layered, detail-rich sound effects.
- The system outputs 48kHz professional-grade audio using a custom Audio VAE that reconstructs sound effects, music, and vocals without artifacts.
- Benchmarks show the model achieves state-of-the-art performance across audio fidelity, visual-semantic alignment, and temporal synchronization, outperforming all open-source models.
My take: Wow, if you have 1 minute to spare go to their web page and click some of the videos. Just send it a video, describe the sound you want, and HunyuanVideo-Foley creates it in 48Khz high resolution audio! It’s not perfect, but wow such an incredible tool to have when making videos!
Read more:
Microsoft Releases VibeVoice, Long-Form Multi-Speaker Text-to-Speech Model
https://microsoft.github.io/VibeVoice
The News:
- Microsoft released VibeVoice-1.5B, an open-source text-to-speech framework that generates conversational audio up to 90 minutes long with up to four distinct speakers.
- The framework employs a dual architecture with a Large Language Model understanding textual context and dialogue flow, paired with a diffusion head generating acoustic details.
- VibeVoice supports cross-lingual synthesis between English and Chinese, handles sequential turn-taking between speakers, and can generate singing in addition to speech.
- Microsoft released the model under MIT license for research purposes only, embedding audible disclaimers and watermarks in generated audio to prevent misuse.
- Hardware requirements include approximately 7 GB of GPU VRAM, making it accessible on consumer graphics cards like RTX 3060.
My take: Well it’s still not up to par with Google NotebookLM, but at least it’s a step in the right direction for Microsoft. If you want to check it out there are lots of examples at Microsoft’s VibeVoice page. I believe owning models like this is critical for large companies like Microsoft and Google, so expect VibeVoice to keep improving at a rapid pace in the coming year.
Microsoft Introduces Two In-House AI Models for Speech and Text Generation
https://microsoft.ai/news/two-new-in-house-models
The News:
- Microsoft AI introduced MAI-Voice-1 and MAI-1-preview, marking the company’s first in-house AI models after years of relying on OpenAI partnerships.
- MAI-Voice-1 generates one minute of natural speech in under one second using a single GPU, supporting both single and multi-speaker scenarios. The voice model already powers Copilot Daily news summaries and Podcast features, with additional testing available through Copilot Labs.
- MAI-1-preview represents Microsoft’s first end-to-end trained foundation model, built using approximately 15,000 NVIDIA H100 GPUs and GB200 cluster infrastructure. The text model targets consumer applications with instruction-following capabilities and everyday conversational tasks rather than enterprise use cases.
- MAI-1-preview begins public testing on LMArena platform for community evaluation, with gradual rollout to select Copilot text features planned over coming weeks
My take: Microsoft MAI-1-preview currently ranks 13th on LMArena for text-based tasks, placing it behind models from OpenAI, Google, Anthropic, and xAI. Building and launching custom models reflects a broader industry trend where partnerships have started to evolve into competitive relationships. After investing over $13 billion in OpenAI since 2019, Microsoft now lists OpenAI as a competitor in its annual reports. As this competition grows, I fully expect Copilot and ChatGPT to evolve in completely different ways, which means that if you are really good at prompting ChatGPT you might have difficulties getting good results from Copilot, and vice versa. Even if they use the same underlying model there are still so many ways each company can make it work differently from their competitors.
xAI Launches Grok Code Fast 1, Speed-Focused Model for Autonomous Coding Tasks
https://x.ai/news/grok-code-fast-1
The News:
- xAI released Grok Code Fast 1 (codename “Sonic”), a specialized coding model built from scratch that performs autonomous programming tasks like project development and codebase analysis.
- The model processes at 92 tokens per second with a 256,000-token context window.
- Pricing starts at $0.20 per million input tokens and $1.50 per million output tokens, approximately 80-95% cheaper than competing models.
- The model achieved 70.8% accuracy on SWE-Bench Verified benchmark, placing it among top-tier coding AI systems.
- Free access runs for seven days through GitHub Copilot, Cursor, Cline, Opencode, Windsurf, Roo Code, and Kilo Code platforms.
- Supports multiple programming languages including TypeScript, Python, Java, C++, Rust, and Go with visible reasoning traces during code generation.
My take: As usual, scoring massively good on benchmarks is not at all an indicator of how well a model works for actual everyday tasks. Preliminary user feedback on Grok Fast 1 has so far been quite mixed. While it’s fast and cheap, it also seems to have a tendency to mess up tasks once they start to become complex. So, should you use it? Most probably no. When it comes to privacy, I actually couldn’t actually find any single mention if the code sent to “Grok Code Fast 1” is saved by xAI and used to train future models, which is quite remarkable. Until this clears up I strongly recommend you do not use Grok Code Fast 1 for anything remotely sensitive.
OpenAI Makes Realtime Voice API Generally Available with New gpt-realtime Model
https://openai.com/index/introducing-gpt-realtime
The News:
- OpenAI launched gpt-realtime, a single-model speech-to-speech system that eliminates the traditional chain of separate speech-to-text, reasoning, and text-to-speech models, reducing latency and preserving audio nuances like tone and pauses.
- The new model achieved 82.8% accuracy on Big Bench Audio benchmarks compared to 65.6% for the December 2024 version.
- Instruction following accuracy increased from 20.6% to 30.5% on MultiChallenge audio tests, with better handling of complex multi-step requests and language switching mid-sentence.
- The Realtime API now also supports remote MCP server connections, image inputs, and phone calling through SIP protocol, plus two new voices called Cedar and Marin.
- Pricing dropped 20% to $32 per million audio input tokens and $64 per million audio output tokens.
My take: Moving from multiple models working in sequence to a single model handling everything is the main news here. The OpenAI Realtime API has been in beta since October last year, and now it’s finally released for the general public. Early feedback from forums have been overly positive, especially on the latency and voice quality. If you are developing speech-to-speech solutions today and have not yet been testing out gpt-realtime then you probably should.
OpenAI Codex Gets GPT-5 Update with IDE Integration
The News:
- OpenAI Codex, the AI coding assistant, just got support for GPT-5, available through existing ChatGPT Plus, Pro, Team, and Enterprise plans without.
- It got a new IDE extension supporting Visual Studio Code, Cursor, Windsurf and other VS Code forks.
- It also got an enhanced CLI interface with improved terminal design, image inputs, message queuing, approval modes, and web search functionality.
- Codex also got seamless task handoff between local development and cloud environments lets developers start work locally, delegate to Codex cloud for background processing, then pull results back without losing context.
- GitHub integration enables automatic pull request reviews that analyze code against intended functionality, examine codebase dependencies, and can execute code to validate changes.
My take: While these improvements are welcome, there are still so many ways where Codex falls behind Claude Code. Claude Code always proposes 2-3 options, keeps todo lists, initiates parallel agent tasks, can write summaries as it progresses, and is super good at advanced tool usage in the terminal. Claude Code also has these small usability features like tab complete for file names, being able to trigger quick shell actions with “!”, and just basic display formatting making it easy to read large amounts of text is just so much easier compared to Codex. OpenAI Codex CLI still feels like a very premature tool and I am actually not even sure if OpenAI uses it themselves in the terminal.’
Apple Adds Claude AI to Xcode Development Environment
https://www.macrumors.com/2025/08/28/xcode-gpt-5-claude-integration
The News:
- Apple released Xcode 26 Beta 7 with native Claude integration, allowing developers to use Anthropic’s Claude Sonnet 4 directly for coding tasks within Apple’s development environment.
- Developers can access Claude through the Intelligence settings panel in Xcode by connecting their existing paid Claude account.
- Apple also upgraded Xcode’s ChatGPT integration to GPT-5, making it the default option with two variants: standard GPT-5 for quick coding tasks and GPT-5 Reasoning for complex problems that require more processing time.
My take: Early feedback from Reddit indicates that this is a special version of Claude Sonnet 4 with more knowledge of SwiftUI and Apple frameworks than the regular version of Claude Sonnet 4 accessible through Claude Code. If this is true then I fully expect we could expect many different versions of Claude 4 Sonnet in the coming months – how about a Claude 4 Sonnet specifically built for Android/Kotlin? If you are an Xcode developer then try it out and let me know in the comments how it worked for you.
Alibaba Open-Sources Wan2.2-S2V Audio-to-Video AI Model
https://www.alibabacloud.com/en/press-room/alibaba-introduces-open-source-model-for-digital?_p_lc=1
The News:
- Alibaba released Wan2.2-S2V, a 14-billion parameter open-source model that transforms single portrait photos and audio clips into “film-quality” animated videos.
- The model generates realistic facial expressions, body movements, and natural lip synchronization for dialogue, singing, and performance content across multiple framing options including portrait, bust, and full-body perspectives.
- It supports both cartoon characters and human avatars, processes multiple characters within scenes, and outputs videos at 480P and 720P resolutions.
- The model is open-source and available for download on Hugging Face, GitHub, and ModelScope platforms.
My take: The AI movements generated by the model are clearly well-synced to the audio but they all look artificial and jumpy. Maybe it works for short clips on social media channels, but this is definitely not capable of “realistic visual effects” as announced. Still it’s open source, so if you want to play around with it go ahead. But don’t expect something capable of producing cinematic quality video.