Skip to content
NVIDIA

Llama Nemotron Embed 1B v2

flagship
NVIDIA · released 2025-02-01 · text
currently routing · 4.2k rpm
8.2M tokens
Context
— / 1M
Input
— / 1M
Output
— t/s
Speed
open
License
/ ABOUT

NVIDIA Llama Nemotron Embed 1B v2 is a text embedding model built on the Llama architecture, fine-tuned by NVIDIA for generating high-quality vector representations. With 1 billion parameters, it delivers strong embedding quality while remaining efficient enough for large-scale deployment.

The model produces embeddings optimized for retrieval, semantic search, and text similarity tasks. The v2 update improves upon the original with better handling of long documents, improved multilingual support, and enhanced performance on retrieval benchmarks.

Llama Nemotron Embed 1B v2 is part of NVIDIA's Nemotron model family, offering enterprise-grade embeddings for search, RAG, and recommendation systems.

Providers for Llama Nemotron Embed 1B v2

1 routes · sorted by uptime

ClosedRouter routes requests to the providers best able to handle your prompt size and parameters, with automatic fallbacks to maximize uptime.

Provider
Context
Quant
Uptime · 30d
bf16
0.00%