NVIDIA Vera Rubin Architecture Unveiled: Next-Generation AI Supercomputer Reduces Inference Costs by 10x

NVIDIA has unveiled its next-generation AI supercomputer platform, ‘Vera Rubin.’ The goal is to achieve 5x the inference performance compared to Blackwell, and reduce the cost per token by a factor of 10. Scheduled for release in the second half of 2026, it sets a new standard for AI computing.

Announced at CES 2026, the Vera Rubin platform consists of a total of six new chips. According to NVIDIA’s official announcement, the core is the NVL72 rack-scale configuration that combines the Rubin GPU and Vera CPU. This configuration bundles 72 GPUs into a single system to handle inference tasks for large-scale AI models. Of particular note is the cost-effectiveness. According to a Tom’s Hardware report, the cost per token can be reduced by 10x compared to the Blackwell architecture. For AI service operators, inference costs are one of the biggest burdens, and if this figure is realized, it is expected to have a significant ripple effect throughout the industry. The Vera CPU adopts an ARM-based architecture to improve power efficiency. The communication bandwidth between GPUs has also been greatly expanded through the 6th generation NVLink interconnect. The NVIDIA blog explains that the Rubin platform is part of a blueprint that encompasses autonomous driving and the open model ecosystem. It’s not just about improving hardware performance, but a strategy to redesign the entire AI infrastructure.

The emergence of Vera Rubin has the potential to fundamentally change the cost structure of the AI industry. If inference costs are actually reduced by 10x, it will open an era where small and medium-sized enterprises can also operate large-scale AI services. Of course, actual performance needs to be verified after release, but it is clear that NVIDIA’s roadmap will once again reshape industry standards. The adoption rate of cloud service providers after its release in the second half of the year will be key.

FAQ

Q: When will NVIDIA Vera Rubin be released?

A: NVIDIA has announced its release in the second half of 2026. The exact month has not yet been disclosed.

Q: What improvements have been made compared to Blackwell?

A: Inference performance is improved by up to 5x, and the cost per token is reduced to 1/10th. The 6th generation NVLink and ARM-based Vera CPU have also been newly introduced.

Q: What is the Vera Rubin NVL72 configuration?

A: It is a configuration that integrates 72 Rubin GPUs into a single rack-scale system. It is designed to handle the training and inference of large-scale AI models in a single system.

LangSmith Now Available on Google Cloud Marketplace — 3 Key Takeaways

LangSmith Now Available on Google Cloud Marketplace — 3 Key Takeaways

  • LangChain’s LangSmith is now listed on the Google Cloud Marketplace.
  • Pay with GCP committed spend credits, streamlining the procurement process.
  • Integrates with GCP services like Vertex AI, Gemini, and BigQuery.

LangSmith’s Marketplace Listing

LangChain has listed its agent engineering platform, LangSmith, on the Google Cloud Marketplace. GCP customers can purchase directly through their existing cloud accounts.[LangChain Blog]

The key is applying committed spend credits. You can use your GCP investment towards a LangSmith subscription, eliminating the need for a separate budget. Billing is also integrated into your GCP invoice.[Google Cloud Marketplace]

Key Features and GCP Integration

LangSmith handles the building, testing, deployment, and monitoring of AI agents on a single platform. Observability allows you to track individual interactions, and evaluation features support pre-deployment testing and production monitoring.[LangSmith Docs]

Integration with GCP services is also extensive, including Vertex AI, Gemini, AlloyDB, and BigQuery. Deployment options include SaaS, hybrid, and self-hosted on GKE.[LangChain Blog]

Impact on the Enterprise Market

The marketplace listing signals a full-scale push into the enterprise market. Large corporations prefer procuring through cloud marketplaces because approvals are faster and it’s handled within existing contracts.

Amidst intensifying LLM observability competition, integration with the GCP ecosystem becomes a differentiating factor. Hope this is helpful!

Frequently Asked Questions (FAQ)

Q: What are the benefits of purchasing through the marketplace?

A: You can use GCP committed spend credits for subscription costs. Billing is integrated into your cloud invoice without separate procurement processes. You can purchase directly with your existing GCP account, accelerating adoption.

Q: What kind of tool is LangSmith?

A: It’s an integrated platform for building, testing, deploying, and monitoring AI agents and LLM apps. It combines observability and evaluation features to help operate production AI systems.

Q: What are the deployment options?

A: There are three options: fully managed SaaS, hybrid (data stored within your VPC), and self-hosted on GKE. It supports Helm and Terraform, allowing you to choose data locations that align with your security policies.


If you found this helpful, please subscribe to AI Digester.

References

MiniCPM-o 4.5 — Surpassing GPT-4o with a 9B On-Device Multimodal Model [GitHub]

MiniCPM-o 4.5: Multimodal AI That Runs on Your Smartphone

  • GitHub Stars: 23.6k
  • Language: Python
  • License: Apache 2.0

Why This Project is Trending

MiniCPM-o 4.5, with its 9B parameters, surpasses GPT-4o and approaches Gemini 2.5 Flash. It’s an open-source multimodal LLM released by OpenBMB in February 2026.[GitHub]

There are very few open-source models that support full-duplex live streaming. It can simultaneously process what you see, hear, and say on your smartphone.[HuggingFace]

What Can It Do?

  • Vision Understanding: Processes images up to 1.8 million pixels and OCR. Scores 77.6 on OpenCompass.
  • Real-time Voice Conversation: Bilingual conversation in English and Chinese. Voice cloning is also available.
  • Full-Duplex Streaming: Simultaneously processes video and audio input, and text and voice output.
  • Proactive Interaction: Sends notifications proactively based on scene recognition.

Quick Start

# Run with Ollama
ollama run minicpm-o-4_5

# Full-duplex mode with Docker
docker pull openbmb/minicpm-o:latest

Where Can You Use It?

A real-time video translation assistant is the first thing that comes to mind. Just show a document to the camera and it translates it instantly. It’s also great as an accessibility aid. You can create an app that describes the surrounding environment in real-time. It can also be used as a local AI assistant that runs without cloud API costs.[GitHub]

Things to Keep in Mind

  • The full model requires 20GB or more of VRAM. You can lower the requirements with the int4 quantization version.
  • Voice functionality is only available in English and Chinese. Korean voice is not supported.
  • Full-duplex mode is in the experimental stage.

Frequently Asked Questions (FAQ)

Q: What hardware can MiniCPM-o 4.5 run on?

A: The full model requires a GPU with 20GB or more of VRAM. The int4 quantized version can be inferred with 8GB. You can also run it locally on a Mac with Ollama or llama.cpp, and an official Docker image is provided.

Q: How does it compare to GPT-4o?

A: It scored 77.6 on the OpenCompass benchmark, surpassing GPT-4o. It recorded 87.6 on MMBench, 80.1 on MathVista, and 876 on OCRBench. This is based on vision performance, and there may be differences in text-only tasks.

Q: Can it be used commercially?

A: Commercial use is possible under the Apache 2.0 license. You are free to modify and redistribute the source code. Please check the license before production, as the copyright of the content within the training data needs to be verified separately.


If you found this helpful, please subscribe to AI Digester.

References

AI Video Generation Tool Comparison: Sora 2 vs Veo 3.1 vs Kling 3.0 — Who Will Be the Strongest in 2026?

The AI video generation market in 2026 has officially entered a heated three-way competition. OpenAI’s Sora 2, Google’s Veo 3.1, and China’s Kuaishou’s Kling 3.0 are all vying for dominance. Let’s compare the strengths and weaknesses of each tool based on actual performance.

First, in terms of video quality, Veo 3.1 is currently considered to generate the most realistic videos. According to Powtoon’s comparative analysis, Veo 3.1 excels at expressing facial expressions and hand gestures, surpassing other tools. Subtle details like skin texture and light reflection are close to real-life. On the other hand, Sora 2 shines in creative direction and cinematic composition. Its ability to interpret prompts is outstanding, effectively translating abstract concepts into video. Kling 3.0 is noteworthy for its cost-effectiveness. In WaveSpeedAI’s comparison test, Kling 3.0 had the fastest generation speed and excellent quality for the price.

Voice and audio integration are also important differentiators. Veo 3.1 is equipped with a native audio generation function that automatically creates sound effects and background music synchronized with the video. Sora 2 has also recently added audio functionality, but it still uses a separate generation and synthesis method, which makes it less natural. InVideo’s review analyzes that Kling 3.0 has the highest lip-sync accuracy among the three tools. Looking at pricing policies, Sora 2 is accessible at $20 per month as part of the ChatGPT Plus subscription. Veo 3.1 requires a Google AI Pro subscription, which is more expensive. Kling 3.0 uses credit-based billing, which is advantageous for small-volume users.

In conclusion, there is no absolute winner. PXZ AI’s real-world usage tests also showed that the recommended tool varied depending on the use case. Veo 3.1 is suitable for realistic videos, Sora 2 for creative content, and Kling 3.0 for fast and affordable work. All three tools are expected to undergo major updates in the second half of 2026, which will intensify the competition. Ultimately, the real winner in this market will be the user.

FAQ

Q: Which AI video generation tool do you recommend most for beginners?

A: Sora 2 is the most accessible. You can use it immediately with a ChatGPT Plus subscription, and its excellent prompt interpretation makes it easy for beginners to get the results they want.

Q: Which of the three tools can generate the longest videos?

A: Kling 3.0 supports videos up to 2 minutes, which is the longest. Veo 3.1 supports up to 1 minute, and Sora 2 supports up to 20 seconds. However, it becomes more difficult to maintain consistency with longer videos.

Q: Are there any copyright issues when using it for commercial purposes?

A: All three tools grant commercial usage rights in their paid plans. However, if the generated video contains real people or brands, a separate legal review is required. You must check the terms of service of each service.

3 Google Online Safety Features for Kids Revealed [2026]

Google’s Safer Internet Day Announcements — Top 3 Highlights

  • Enhanced Default Protection for SafeSearch and Family Link
  • Introduction of Quality Principles for YouTube Supervised Accounts
  • New AI Learning Safety Guidelines

What Google is Offering for Child and Teen Protection

On February 10, 2026, Google announced online safety features for children and teens in celebration of Safer Internet Day.[Google Blog] This year’s theme is “Smart tech, safe choices.”

SafeSearch is enabled by default for child accounts. Family Link allows you to manage screen time, app approvals, and content filters in one place.[Google Blog]

YouTube Supervised Accounts and Quality Principles

YouTube Supervised Accounts allow parents to monitor their child’s uploads, subscriptions, and comment activity. It’s structured to allow teens autonomy while keeping parents informed.

The newly introduced Quality Principles this year better expose teens to age-appropriate, high-quality content.[Google Blog] The focus is on promoting good content rather than just blocking the bad.

In the Age of AI, Kids Want Guidance

According to Google, the purpose of AI use is shifting from entertainment to learning. Teens want to learn with AI but want guidance, not to go it alone.[The Hans India]

The “School time” feature restricts devices during class hours. Be Internet Awesome provides digital citizenship education resources.

Frequently Asked Questions (FAQ)

Q: What features does Family Link offer?

A: It manages screen time, app installation approvals, content filtering, and privacy settings in one app. SafeSearch is enabled by default to filter out inappropriate search results. It helps you understand your child’s device usage patterns.

Q: How is a YouTube Supervised Account different from a regular account?

A: It’s an account where parents can see their child’s uploads, subscriptions, and number of comments. It allows teens autonomy while keeping parents informed. Starting this year, Quality Principles have been added to better expose age-appropriate content.

Q: What is the theme of Safer Internet Day?

A: It’s “Smart tech, safe choices.” As AI becomes deeply integrated into children’s lives, the focus is on how to safely use chatbots, algorithms, and learning apps. It emphasizes critical thinking and adult guidance.


If you found this helpful, please subscribe to AI Digester.

References

Claude Code v2.1.38 – 5 Bug Fixes and Security Patch

Claude Code v2.1.38: 5 Bug Fixes and Enhanced Security

  • Fixed scroll, tab key, and session duplication bugs in the VS Code extension
  • Improved heredoc parsing to prevent command injection
  • Enhanced sandbox security by blocking writes to the .claude/skills directory

What’s Fixed in This Patch

Anthropic has released Claude Code v2.1.38. This version is a patch that addresses several issues that arose in v2.1.37.[GitHub Release]

The most noticeable fix is the VS Code terminal scrolling issue. In v2.1.37, there was a bug where the terminal would jump to the top when scrolling, which has now been resolved. The issue where the Tab key was queuing slash commands instead of autocompleting has also been fixed.

Security-Related Improvements

There are two security-related changes. Heredoc delimiter parsing has been improved to prevent command smuggling.[GitHub Release] Additionally, writes to the .claude/skills directory have been blocked in sandbox mode.

This measure prevents malicious prompts from manipulating skill files. Security is becoming increasingly important for AI coding tools, so these proactive measures are meaningful.[Claude Code GitHub]

Things VS Code Extension Users Should Know

This release is particularly important for VS Code extension users. The issue of duplicate sessions being created when resuming a session has also been fixed. A bug where text disappeared between tool calls when not using streaming has also been resolved.

The bash permission matching issue for commands using environment variable wrappers has also been fixed. Overall, this is a release focused on stability and security. While there are no major feature additions, it’s worth updating if you’re a developer who uses Claude Code on a daily basis.

Frequently Asked Questions (FAQ)

Q: How do I update to Claude Code v2.1.38?

A: It will automatically update from the VS Code extension marketplace, or you can manually update by searching for Claude Code in the Extensions tab. CLI users can install the latest version via npm. You can easily handle it with the npm update command in the terminal.

Q: Can I upgrade directly from v2.1.37 to v2.1.38?

A: Yes. v2.1.38 is a patch release that fixes regression bugs in v2.1.37. If you’re a v2.1.37 user, it’s actually a good idea to update quickly. The scrolling bug and tab key issues are resolved, improving the user experience.

Q: What is heredoc command smuggling?

A: A heredoc is a way to pass multi-line text in a shell script. If delimiter parsing is incomplete, an attacker can inject unintended commands. This patch strengthens delimiter parsing to block this attack path.


If you found this helpful, please subscribe to AI Digester.

References

GitHub Outages 3 Times a Day, Developer Workflow Paralyzed [2026]

GitHub Suffered Three Outages in One Day: A Summary

  • On February 9th, GitHub experienced three outages on the same day.
  • Almost all services were affected, including Actions and Copilot.
  • Complaints are growing due to frequent outages in the past two weeks.

What Happened to GitHub on February 9th?

On February 9th (UTC), GitHub experienced at least three outages in one day. The largest outage began at 19:01 UTC. Performance degradation was detected in Git Operations, Issues, and Actions, and within minutes it spread to Copilot, Pull Requests, Webhooks, Pages, and Codespaces.[GitHub Status]

Mitigation measures were applied at 19:29 UTC, and full recovery was achieved at 20:09 UTC. The outage lasted approximately one hour.[GitHub Status]

Three Outages in One Day, Repeated Over the Past Two Weeks

According to EagleStatus, separate outages were also recorded at 11:26 AM and 12:12 PM, in addition to the major incident.[EagleStatus] On February 2nd, Actions runners were down for 5 hours, and on February 3rd, 4% of Copilot requests failed.[GitHub Status]

Impact on Development Workflow

GitHub outages lead to CI/CD pipeline disruptions, delayed PR reviews, and paralysis of Webhook-integrated services. If Copilot is also affected, the flow dependent on AI coding tools is also disrupted.

GitHub has stated that it will share a root cause analysis. With three outages in one day, it is highly likely to be an infrastructure issue.

Frequently Asked Questions (FAQ)

Q: Which services were affected by the GitHub outage on February 9th?

A: Most core services were affected, including Git Operations, Issues, Actions, Pull Requests, Packages, Pages, Codespaces, Webhooks, and Copilot. The largest outage started at 19:01 UTC and was recovered at 20:09 UTC.

Q: What is the impact of a GitHub outage on CI/CD?

A: If Actions is down, builds, tests, and deployments are all delayed. Webhooks are also affected, which can also interrupt Slack notifications or external integration services.

Q: Where can I check the status of GitHub outages?

A: You can check the official status page (githubstatus.com) in real-time. Email subscriptions are also available. Third-party monitoring services like EagleStatus are also helpful.


If you found this helpful, please subscribe to AI Digester.

References

ChatGPT Free and Go Plans to Show Ads — Only Paid Plans Are Ad-Free [2026]

ChatGPT Ads: 3 Key Takeaways

  • Ads are now actually appearing on the free and Go plans.
  • Plus, Pro, and Enterprise subscribers are ad-free.
  • OpenAI states that AI responses are not influenced by ads.

ChatGPT Now Has Ads

OpenAI has started inserting ads into the free version of ChatGPT and the $8/month Go plan. This is being tested with adult users in the US.[The Verge] The ads appear at the bottom of the response and feature products related to the conversation topic.[OpenAI]

They are labeled as “Sponsored” for distinction. However, it might be annoying for free users.

No Ads for Paid Subscribers

The $20/month Plus, $200/month Pro, and Enterprise subscriptions are ad-free.[CNBC] If you hate ads, a paid subscription is the answer. Launching the Go plan in 171 countries for $8/month is also a low-cost subscription + ad revenue combination strategy.

OpenAI has stated that they will not sell user data to advertisers. Ads will not be shown to users under 18, nor will they appear next to political or health-related topics.

Is the Age of AI Chatbot Ads Dawning?

OpenAI’s annual revenue target is $25 billion.[Bloomberg] They’ve decided that subscription fees alone are not enough. It’s like they’re trying an advertising model in conversational AI, similar to Google search ads.

I’m a little worried about ads being mixed into AI responses. They said “no impact on responses,” but we’ll have to see how it goes in the long run. Hope this helps!

Frequently Asked Questions (FAQ)

Q: Which ChatGPT plans show ads?

A: Ads are shown to free and $8/month Go plan users who are adult users in the US. They do not appear for Plus, Pro, and Enterprise paid subscribers. To avoid ads, you need at least a Plus subscription.

Q: Do ads affect AI responses?

A: OpenAI has stated that ads do not affect the content of responses. They are displayed separately at the bottom of the response with a “Sponsored” label. They have also stated that they will not sell user data to advertisers.

Q: What is the ChatGPT Go plan?

A: It’s a low-cost subscription launched in August 2025. It costs $8/month in the US and is available in 171 countries. It’s cheaper than Plus, but the difference is that it includes ads.


If you found this helpful, please subscribe to AI Digester.

References

The End of the SaaS Era? 3 Predictions from Databricks CEO

Databricks CEO’s 3 Key SaaS Outlooks

  • SaaS isn’t dead, but AI will soon make it irrelevant
  • AI creates new competitors, putting pressure on existing SaaS
  • Subscription software model itself may be reorganized

Remarks by Databricks CEO Ali Ghodsi

Databricks CEO Ali Ghodsi has made a provocative prediction: SaaS is “not dead, but AI will soon make it irrelevant.”[TechCrunch]

The key is “transformation,” not “extinction.” The logic is that the value of existing models decreases as AI-powered competitors emerge.

How AI is Shaking the SaaS Market

AI is lowering the barriers to entry for development. Small teams can quickly implement features that only large SaaS companies used to provide.[TechCrunch]

Instead of subscription-based SaaS, we’re entering a world where AI instantly generates the features you need. Since Databricks itself is a data and AI platform, this outlook aligns with its own positioning.[Databricks]

Which SaaS is at Risk?

Simple, feature-based SaaS is the most at risk. Areas like project management and basic CRM can be quickly replaced by AI.

On the other hand, infrastructure-based SaaS like data pipelines and security are safe. Ultimately, fate depends on “what kind of SaaS” it is.[TechCrunch]

Frequently Asked Questions (FAQ)

Q: Is SaaS really disappearing?

A: It won’t disappear completely. However, as AI lowers development costs, the value of subscription-based SaaS may weaken. The simpler the feature SaaS, the higher the risk of replacement, and the more complex the infrastructure type, the less the impact.

Q: What kind of company is Databricks?

A: It is a data lakehouse platform company. Founded by the creators of Apache Spark, it integrates data and AI workloads. It was valued at $62 billion at the end of 2024.

Q: What happens if AI replaces SaaS?

A: Instead of monthly subscriptions, a method of AI instantly generating software as needed will spread. This is advantageous for small businesses and individuals, but a threat to large SaaS companies.


If you found this article useful, please subscribe to AI Digester.

Reference Materials

Collection of 73 Claude Code Plugins [GitHub]

Claude Code Automation Plugin Roundup: 73 Gems

  • GitHub Stars: 28,200+
  • Languages: Markdown, JSON
  • License: MIT

Why This Project Is Taking Off

As Claude Code usage skyrockets, so does the demand for automation. wshobson/agents is an open-source marketplace boasting 73 plugins, 112 agents, and 146 skills[GitHub]. Its modular design lets you install only what you need, keeping things lean.

What Can You Do With It?

  • Multi-Agent Orchestration: Multiple agents can perform code reviews, debugging, and security scans in parallel.
  • Progressive Disclosure: Skills are only loaded when activated, eliminating token waste.
  • Agent Teams: 7 presets enable team-based workflows[GitHub].
  • 4-Tier Model Strategy: Automatically allocates models (from Opus to Haiku) based on task importance.

Quick Start

# Add the marketplace
/plugin marketplace add wshobson/agents

# Install your desired plugin
/plugin install python-development

Ideal Use Cases

It’s perfect for full-stack projects where you’re running frontend, backend, and testing simultaneously. If your team needs security audits, the security scanning plugin can automatically catch vulnerabilities during code reviews.

It’s also a boon for developers who frequently create Python microservices. The agent can assist with everything from scaffolding to CI/CD setup[Plugin Reference].

Things to Keep in Mind

  • Requires a paid Claude Code subscription.
  • Heavy use of Opus agents can quickly drain your token allowance.
  • Installing everything can be overwhelming. Stick to what you need.

Frequently Asked Questions (FAQ)

Q: Is wshobson/agents free?

A: The project itself is free under the MIT license. However, you’ll need a Claude Code subscription, which is a paid service from Anthropic and incurs separate costs. There are no additional fees for installing plugins, but the tokens used by the agents count towards your subscription limit.

Q: Do I need to install all 73 plugins?

A: No, you don’t need to install everything. Just pick and choose what you want. If you’re only doing Python development, python-development might be all you need. Each plugin consists of an average of 3.4 components, so it’s easy to get started.

Q: Will it conflict with my existing Claude Code settings?

A: It’s designed to layer on top of your existing setup. It shouldn’t conflict with your CLAUDE.md or personal settings. Removing a plugin will revert to the original state. Installing multiple plugins in the same area might cause priority issues.


If you found this helpful, please subscribe to AI Digester.

References