MiniCPM-o 4.5: Multimodal AI That Runs on Your Smartphone
- GitHub Stars: 23.6k
- Language: Python
- License: Apache 2.0
Why This Project is Trending
MiniCPM-o 4.5, with its 9B parameters, surpasses GPT-4o and approaches Gemini 2.5 Flash. It’s an open-source multimodal LLM released by OpenBMB in February 2026.[GitHub]
There are very few open-source models that support full-duplex live streaming. It can simultaneously process what you see, hear, and say on your smartphone.[HuggingFace]
What Can It Do?
- Vision Understanding: Processes images up to 1.8 million pixels and OCR. Scores 77.6 on OpenCompass.
- Real-time Voice Conversation: Bilingual conversation in English and Chinese. Voice cloning is also available.
- Full-Duplex Streaming: Simultaneously processes video and audio input, and text and voice output.
- Proactive Interaction: Sends notifications proactively based on scene recognition.
Quick Start
# Run with Ollama
ollama run minicpm-o-4_5
# Full-duplex mode with Docker
docker pull openbmb/minicpm-o:latest
Where Can You Use It?
A real-time video translation assistant is the first thing that comes to mind. Just show a document to the camera and it translates it instantly. It’s also great as an accessibility aid. You can create an app that describes the surrounding environment in real-time. It can also be used as a local AI assistant that runs without cloud API costs.[GitHub]
Things to Keep in Mind
- The full model requires 20GB or more of VRAM. You can lower the requirements with the int4 quantization version.
- Voice functionality is only available in English and Chinese. Korean voice is not supported.
- Full-duplex mode is in the experimental stage.
Frequently Asked Questions (FAQ)
Q: What hardware can MiniCPM-o 4.5 run on?
A: The full model requires a GPU with 20GB or more of VRAM. The int4 quantized version can be inferred with 8GB. You can also run it locally on a Mac with Ollama or llama.cpp, and an official Docker image is provided.
Q: How does it compare to GPT-4o?
A: It scored 77.6 on the OpenCompass benchmark, surpassing GPT-4o. It recorded 87.6 on MMBench, 80.1 on MathVista, and 876 on OCRBench. This is based on vision performance, and there may be differences in text-only tasks.
Q: Can it be used commercially?
A: Commercial use is possible under the Apache 2.0 license. You are free to modify and redistribute the source code. Please check the license before production, as the copyright of the content within the training data needs to be verified separately.
If you found this helpful, please subscribe to AI Digester.
References
- MiniCPM-o GitHub Repository – OpenBMB (2026-02-06)
- MiniCPM-o 4.5 Model Card – Hugging Face (2026-02-06)
- MiniCPM-o 4.5 Release Announcement – OpenBMB X (2026-02-02)