Qwen3.5 397B VLM (a17b)
flagshipQwen3.5 397B VLM A17B Instruct is Alibaba's most capable multimodal model, using a Mixture-of-Experts architecture with 397 billion total parameters and 17 billion active per token. The VLM (Vision-Language Model) designation indicates it processes images alongside text for visual understanding and reasoning.
The model handles complex multimodal tasks including detailed image analysis, document understanding with charts and tables, visual question answering, and image-grounded reasoning. It supports high-resolution images and can process multiple images in a single context, enabling comparison and cross-referencing.
Qwen3.5 397B VLM represents the frontier of open multimodal AI, designed for applications requiring both visual understanding and text reasoning at the highest quality level.
Providers for Qwen3.5 397B VLM (a17b)
1 routes · sorted by uptimeClosedRouter routes requests to the providers best able to handle your prompt size and parameters, with automatic fallbacks to maximize uptime.