Tech Insights 2024 Week 47

Imagine a future where you can have a single autonomous robot perform almost any type of surgery known to man. You could fly it out to developing countries or disaster zones, or use it to drastically shorten hospital queues. Researchers at John Hopkins University just announced a breakthrough where a robot could execute the same surgical procedures as skillfully as human doctors, trained only by watching videos of seasoned surgeons in action.

Among other news this week: ChatGPT just announced native app-integration on macOS, meaning that ChatGPT can now read data from select apps. In January OpenAI is set to launch their new AI Agent “Operator”, and having direct API access to select apps means we will soon be able to automate things we can only dream about today. Google also launched a new version of Gemini that places it #1 on Lmarena together with a native iPhone app, and it will be interesting to see if these two improvements are enough to make people switch from their trusty Claude or ChatGPT over to Google.

THIS WEEK’S NEWS:

  1. New AI Robot Learns Surgery by Watching Videos
  2. LangChain Releases State of AI Agents Report
  3. Stripe Introduces Payment Integration for AI Agents
  4. Humans Prefer AI Poetry Over Human Poetry
  5. ChatGPT Launches Third-Party App Integrations for macOS
  6. Google Updates Gemini – Now Ranks Number 1 on Lmarena
  7. Google Launches Gemini iPhone App
  8. TikTok Launches Symphony Creative Studio
  9. Alibaba Cloud Launches Qwen2.5-Coder-32B
  10. Google Just Released AlphaFold 3 on GitHub
  11. Anthropic Launches Prompt Improver

New AI Robot Learns Surgery by Watching Videos

https://hub.jhu.edu/2024/11/11/surgery-robots-trained-with-videos

The News:

  • Researchers at John Hopkins University just announced a breakthrough where a robot executed surgical procedures as skillfully as human doctors, trained only by watching videos of seasoned surgeons.
  • Using a new imitation learning approach, the system trained with hundreds of surgical videos captured by da Vinci robot wrist cameras.
  • The AI model combines ChatGPT-style architecture with kinematics, teaching the robot to “speak surgery” through mathematical movements.
  • The AI model also showed unexpected adaptability, like automatically retrieving dropped needles — a skill it wasn’t explicitly programmed to perform.

My take: Robotic and autonomous surgery is definitely the way forward, and maybe in a not too distant future most of our daily surgeries will be done autonomously by robots. The potential benefits are be enormous – not only from being able to reduce waiting queues but also for developing countries and disaster zones. You could ship surgery robots to wherever they are needed. This work done by John Hopkins University really looks like it’s paving the way for such a future.

LangChain Releases State of AI Agents Report

https://www.langchain.com/stateofaiagents

The News:

  • LangChain just published a comprehensive report called “State of AI Agents”, where they surveyed 1,300 professionals from engineers and product managers to business leaders and executives to report on the state of AI agents.
  • 51% of all respondents use AI agents in production today, where among mid-sized companies (100-200 employees) 63% of those companies already use AI agents in production.
  • 78% of all respondents have active plans to implement AI agents into production soon.
  • The top use cases for agents include performing research and summarization (58%), followed by streamlining tasks for personal productivity or assistance (53.5%).

My take: My take for top trend in 2025 is AI Agents. OpenAI is about to launch their new AI Agent “Operator” in January, which can control your web browser and complete real, multi-step tasks with minimal human oversight. ChatGPT is already integrating natively with developer tools on macOS, and you can expect that integration to grow to more applications and that ChatGPT and Operator will be able to do some amazing things next year you did not think was possible.

Read more:

Humans Prefer AI Poetry Over Human Poetry

https://www.nature.com/articles/s41598-024-76900-1

The News:

  • In a new study by University of Pittsburgh with over 1,600 participants, researchers revealed that AI can now generate poetry that readers not only struggle to distinguish from human-written texts, but actually prefer over works by legendary poets like Shakespeare and Dickinson.
  • AI-generated poems were consistently rated higher across 13 different qualitative measures, including rhythm, beauty, and emotional impact.
  • Five poems rated as ‘least likely’ to be human were written by famous poets, while four rated most “human-like” were AI-generated.
  • When participants were explicitly told poems were AI-generated, they rated them lower regardless of authorship.

My take: Even in creative domains such as poetry it’s getting increasingly difficult to distinguish between AI and human writing. I still meet people that truly believe AI will never be a match for human creativity, however if you look at the results from this massive study I think most of us clearly sees where all this is going. For me the main question is: Will it take 1 year, 3 years or 5 years before we have an AI that writes better texts than any living human on earth? My bet is on 3 years.

ChatGPT Launches Third-Party App Integrations for macOS

https://help.openai.com/en/articles/10119604-work-with-apps-on-macos

The News:

  • OpenAI just released a major update to ChatGPT for macOS that enables integration with third-party applications, initially focusing on developer tools.
  • The update allows ChatGPT to read content from compatible apps like VS Code, Terminal, iTerm2, TextEdit and Apple’s Xcode without manual copying and pasting to the ChatGPT window.
  • The integration uses macOS Accessibility API for most apps, while VS Code requires a specific extension installation.

What you might have missed: ChatGPT for macOS is a true native Swift app, compared to ChatGPT for Windows, and Claude for macOS and Windows (that use a Chrome-wrapper in Electron). Being native means ChatGPT can integrate with the system in ways that is almost impossible to achieve if you are using a web-page wrapped inside Electron, like with Claude.

My take: While it looks great, in practice the only thing it does is automatic copying from Visual Studio Code, Terminal or XCode to the ChatGPT application. You still have to copy and merge generated code back manually. After having used Cursor for over 4 months this is still far behind the experience you have working in Cursor with Claude in a split-window setup. Technically however it is definitely possible for OpenAI in the future to also add support for writing code directly into Visual Studio Code and XCode, and if they implement that functionality things are definitely getting more interesting.

Google Updates Gemini – Now Ranks Number 1 on Lmarena

https://twitter.com/lmarena_ai/status/1857110672565494098

The News:

  • Google just updated Gemini to version EXP1114 and it now ranks #1 overall with an impressive 40+ score leap — matching GPT-4o and surpassing o1-preview. It also claims #1 on Vision leaderboard.
  • Gemini-Exp-1114 excels across technical and creative domains:
    • Overall #3 -> #1
    • Math: #3 -> #1
    • Hard Prompts: #4 -> #1
    • Creative Writing #2 -> #1
    • Vision: #2 -> #1
    • Coding: #5 -> #3
  • You can access gemini-exp-1114 in Google AI Studio.

My take: It’s about time Google caught up with the rest, anything else would have been a mistake. Google invented the transformer architecture in 2017, and they also invented test-time-compute this year that is the foundation for OpenAI o1-preview. So it was about time their model started to perform at least as good as the rest. However I’m not sure performance alone is enough to get traction. First of all is the domain name, https://gemini.google.com/. Compared to https://chat.com used by OpenAI it would have made more sense to use https://gemini.com, but that site is owned by a platform that sells crypto currency. Also, if you use a Google Business account, the only way to get access to Gemini is either if your organization purchases Gemini Business access, or that you set up different profiles in your web browser to be able to toggle to a personal profile before visiting the Gemini web page. All in all I don’t think the performance is the main issue with Gemini, it’s getting access to it and it’s tight connection with Google that’s the top issues.

Read more:

Google Launches Gemini iPhone App

https://twitter.com/sundarpichai/status/1857100676884574399

The News:

  • Google just launched their Gemini iPhone app in the app store.
  • The app has support for 13 languages, but is missing support for Finnish, Norwegian and Swedish.
  • The app supports Gemini Live, image analysis and image generation.

My take: Google Gemini has always struggled behind both Claude and ChatGPT in performance and quality of results, but with the latest release of Gemini (EXP 1114, see above), Gemini suddenly got much more interesting. Will you be switching to Gemini or what does it take for you to leave Claude or ChatGPT?

TikTok Launches Symphony Creative Studio

https://ads.tiktok.com/business/en-US/blog/symphony-creative-studio

The News:

  • After being announced five months ago, Tiktok finally launched their new AI-powered video-generation tool called Symphony Creative Studio.
  • Symphony Creative Studio converts product information or URLs directly into TikTok-ready videos, drawing from their top-performing content styles.
  • Symphony Creative Studio supports AI avatars, where users can choose from pre-built or customized options with the ability to edit voice, position, style, and more.
  • Symphony Creative Studio also supports translation and dubbing, enabling automatic content conversion into multiple languages in over 30 languages with lip-sync capabilities.
  • Symphony Creative Studio includes Your Daily Video Generations that automatically creates new video options based on brand history and platform trends.

My take: If you are a small company with limited budget and want to make ads for TikTok, I can definitely see the temptation to use a tool like Symphony Creative Studio. It presents you with beautiful young people that will say anything you want in any language you want, you can publish new video ads on a daily basis, and it is much cheaper than going to an ad agency and working with real people. But I personally believe using these tools is damaging for your brand. People really dislike content they know came from an AI, and since these videos will be clearly labeled as such, my recommendation for any serious brand is to stay as far from these kinds of tools as possible at least until it is proven that at least some people find these ads acceptable.

Alibaba Cloud Launches Qwen2.5-Coder-32B

https://qwenlm.github.io/blog/qwen2.5-coder-family

The News:

  • Alibaba Cloud just launched Qwen2.5-Coder-32B, an open-source model that on benchmarks “matches the coding capabilities of GPT-4o”.
  • Qwen2.5-Coder-32B supports over 40 different programming languages with 128K context length.
  • Qwen2.5-Coder-32B also has “multi-language code repair capabilities”, that can help users fix errors in their code. In Aider, a popular benchmark for code repair, Qwen2.5-Coder-32B-Instruct scored 73.7, very similar to GPT-4o.
  • Qwen2.5-Coder-32B is released as open-source under the Apache 2.0 license.

My take: Being an open-source model, the performance of Qwen2.5-Coder is amazing. And looking at the image above, it might look like Anthropic Claude is just “slightly better” than GPT-4o and Qwen 2.5 (78% vs 69%). In practice however those percentages is what makes working with Claude so fun and enjoyable, and working with GPT-4o for coding not very fun and enjoyable at all. For me, Claude 3.5 Sonnet (version 20241022) just passed the breakpoint of being “good enough for all coding”, and I have no doubts that Qwen and ChatGPT might reach that point too within 6-12 months. And when they do, if they still keep their Apache 2.0 license, this model will be extremely attractive to use since you can use it offline completely free of charge! Until then however I strongly recommend that you stick with Claude 3.5 and Cursor for all your programming needs.

Google Just Released AlphaFold 3 on GitHub

https://github.com/google-deepmind/alphafold3

The News:

  • Google just opened up its nobel-prize winning AlphaFold 3 protein prediction model, enabling academic researchers to access both code and training weights since its limited release in May.
  • While it now is more “open”, commercial use of AlphaFold 3 is still forbidden, and you cannot use it directly or indirectly for activities that contribute to commercial objectives.
  • You also cannot use AlphaFold 3 in any patent or trademark work, that requires a separate permission.

My take: If you are an academic researcher within MedTech you can start exploring all the possibilities with AlphaFold 3 right now. For commercial companies however you need to initiate a dialog with Google DeepMind and pay for licensing, or wait until the open-source alternative OpenFold3 is released later this year.

Read more:

Stripe Introduces Payment Integration for AI Agents

https://stripe.dev/blog/adding-payments-to-your-agentic-workflows

The News:

  • Stripe has announced a new integration system that allows AI agents to handle payment processing within agentic workflows.
  • The system enables AI agents to autonomously create charges, handle subscriptions, and manage refunds while maintaining security and compliance standards.
  • The platform supports both one-time payments and recurring billing scenarios, with customizable workflow triggers based on payment events.

My take: Still not convinced AI Agents are just about to explode globally? Everything that can be automated will become automated, and I believe this will happen much sooner than anyone could have ever guessed. The integration of payment processing into agentic workflows could revolutionize how businesses handle transactions, particularly in scenarios involving complex, multi-step processes. An AI agent could now handle the entire customer journey from initial contact through to payment processing and follow-up.

Anthropic Launches Prompt Improver

https://www.anthropic.com/news/prompt-improver

The News:

  • Anthropic just launched a new prompt improver that allows developers to take existing prompts and use Claude to refine them using advanced prompt engineering techniques.
  • The prompt improver automatically refines prompts using techniques like chain-of-thought reasoning, example standardization, and prefill addition.
  • Based on tests, the prompt improver increased accuracy by 30% for a multilabel classification test and brought word count adherence up to 100% for a summarization task.

My take: It’s so fascinating to see all these new AI tools getting invented and developed based on new needs. We are breaking new grounds with Machine Learning, and much like when the Internet was getting traction 25 years ago, much of the work today is creating the right tools to increase efficiency with all our new technologies. This prompt improver by Anthropic looks great, and I think this or a similar tool will soon be a requirement for all companies working with generative AI going forward.