Llama Nemotron Rerank 1B v2

flagship

NVIDIA · released 2025-02-01 · text

currently routing · 4.2k rpm

4.1M tokens

Context

— / 1M

Input

— / 1M

Output

— t/s

Speed

open

License

/ ABOUT

NVIDIA Llama Nemotron Rerank 1B v2 is a cross-encoder reranking model built on the Llama architecture, fine-tuned by NVIDIA for search relevance optimization. It takes query-document pairs and produces relevance scores, enabling more accurate search results when applied as a second-stage reranker.

The model improves retrieval precision by performing deep semantic matching between queries and candidate documents. The v2 update delivers better ranking quality across diverse domains including technical, legal, medical, and general web content.

Llama Nemotron Rerank 1B v2 is designed for enterprise search applications, RAG pipelines, and information retrieval systems where ranking accuracy is critical.

Providers for Llama Nemotron Rerank 1B v2

1 routes · sorted by uptime

ClosedRouter routes requests to the providers best able to handle your prompt size and parameters, with automatic fallbacks to maximize uptime.

Provider

Context

Quant

Uptime · 30d

NVIDIA NIM

—

bf16

0.00%