With PaliGemma 2, Gemma 2 models can see, understand, and interact with visual input, creating various new possibilities.

Google recently released PaliGemma 2, its next tunable vision-language model. The model is built upon the performant Gemma 2 models, which adds the power of vision and makes it easier than ever to fine-tune for exceptional performance. PaliGemma 2 enables these models to see, understand, and interact with visual input, opening up a world of new possibilities.
In PaliGemma 2, Google has introduced new upgrades.
Scalable performance: Optimize performance for any task with PaliGemma 2’s multiple model sizes (3B, 10B, 28B parameters) and resolutions (224px, 448px, 896px).
Long captioning: With PaliGemma 2, generate detailed, contextually relevant captions for images, going beyond simple object identification to describe actions, emotions, and the overall narrative of the scene.
Expanding to new horizons: Through its research, Google demonstrates performance on chemical formula recognition, music score recognition, spatial reasoning, and chest X-ray report generation. Check the detailed technical report.
PaliGemma 2 emphasizes accessibility, with models designed to operate on low-precision formats for on-device inference. According to researchers, “Quantization of models for CPU-only environments retains nearly equivalent quality, making it suitable for broader deployments.”
How To Get Started with PaliGemma 2
To explore the potential of PaliGemma 2, you need to
- Download models and code. You can find the pre-trained models and code on Hugging Face and Kaggle.
- Learn comprehensive documentation and example notebooks to quickly integrate these powerful tools into your projects.
- Use your preferred tools and frameworks, including Hugging Face Transformers, PyTorch, Keras, JAX, and Gemma.cpp.
In May 2024, Google introduced PaliGemma, a powerful open vision-language model (VLM) inspired by PaLI-3. PaliGemma is designed for class-leading fine-tune performance on a wide range of vision-language tasks.
Also Read:
Stay Tuned to The Future Talk for more such interesting insights!