Pixtral 12B

flagship

Mistral · released 2024-09-01 · text

currently routing · 4.2k rpm

128M tokens

Context

— / 1M

Input

— / 1M

Output

— t/s

Speed

open

License

/ ABOUT

Pixtral 12B (September 2024) is Mistral AI's first multimodal model, combining a 12-billion parameter vision encoder with their language model architecture to understand and reason about images alongside text. It handles image description, visual question answering, document understanding, and chart analysis.

The model processes high-resolution images natively without requiring resizing or cropping, preserving fine details in documents, diagrams, and photographs. It was trained on a diverse multimodal dataset and supports arbitrary aspect ratios and resolutions up to its context limit.

Pixtral 12B is open-weight and designed to compete with proprietary multimodal models, offering strong visual understanding capabilities for developers and researchers.