Llama Nemotron Embed 1B v2

flagship

NVIDIA · released 2025-02-01 · text

currently routing · 4.2k rpm

8.2M tokens

Context

— / 1M

Input

— / 1M

Output

— t/s

Speed

open

License

/ ABOUT

NVIDIA Llama Nemotron Embed 1B v2 is a text embedding model built on the Llama architecture, fine-tuned by NVIDIA for generating high-quality vector representations. With 1 billion parameters, it delivers strong embedding quality while remaining efficient enough for large-scale deployment.

The model produces embeddings optimized for retrieval, semantic search, and text similarity tasks. The v2 update improves upon the original with better handling of long documents, improved multilingual support, and enhanced performance on retrieval benchmarks.

Llama Nemotron Embed 1B v2 is part of NVIDIA's Nemotron model family, offering enterprise-grade embeddings for search, RAG, and recommendation systems.

Providers for Llama Nemotron Embed 1B v2

1 routes · sorted by uptime

ClosedRouter routes requests to the providers best able to handle your prompt size and parameters, with automatic fallbacks to maximize uptime.

Provider

Context

Quant

Uptime · 30d

NVIDIA NIM

—

bf16

0.00%