Skip to content
liuhaotian

LLaVA 1.5 7B (Image-to-Text)

flagship
liuhaotian · released 2023-10-01 · text
currently routing · 4.2k rpm
4.1M tokens
Context
— / 1M
Input
— / 1M
Output
— t/s
Speed
open
License
/ ABOUT

LLaVA 1.5 7B is an open-weight vision-language model that combines a CLIP vision encoder with a Vicuna-7B language model to enable visual understanding and reasoning about images. It can describe images, answer visual questions, read text in images, and perform complex visual reasoning tasks.

The model was trained using a novel visual instruction tuning approach, where language models learn to process visual tokens alongside text. LLaVA 1.5 significantly improved over the original version with better resolution handling, improved training data, and stronger performance on visual reasoning benchmarks.

LLaVA 1.5 7B is one of the most popular open multimodal models, widely used in research and applications requiring image understanding, document analysis, and visual chat capabilities.

Providers for LLaVA 1.5 7B (Image-to-Text)

1 routes · sorted by uptime

ClosedRouter routes requests to the providers best able to handle your prompt size and parameters, with automatic fallbacks to maximize uptime.

Provider
Context
Quant
Uptime · 30d
bf16
0.00%