Pixtral 12B (2409)
flagshipPixtral 12B (September 2024) is Mistral AI's first multimodal model, combining a 12-billion parameter vision encoder with their language model architecture to understand and reason about images alongside text. It handles image description, visual question answering, document understanding, and chart analysis.
The model processes high-resolution images natively without requiring resizing or cropping, preserving fine details in documents, diagrams, and photographs. It was trained on a diverse multimodal dataset and supports arbitrary aspect ratios and resolutions up to its context limit.
Pixtral 12B is open-weight and designed to compete with proprietary multimodal models, offering strong visual understanding capabilities for developers and researchers.
Providers for Pixtral 12B (2409)
2 routes · sorted by uptimeClosedRouter routes requests to the providers best able to handle your prompt size and parameters, with automatic fallbacks to maximize uptime.