Skip to content
berkeley-nest

Starling LM 7B Beta

flagship
berkeley-nest · released 2023-11-01 · text
currently routing · 4.2k rpm
8.2M tokens
Context
— / 1M
Input
— / 1M
Output
— t/s
Speed
open
License
/ ABOUT

Starling LM 7B Beta is an open-weight language model developed by UC Berkeley's NLP group, fine-tuned from Vicuna-7B using Reinforcement Learning from AI Feedback (RLAIF). It demonstrates that AI-generated feedback can effectively improve language model alignment, achieving strong results on chat and reasoning benchmarks.

The model uses a novel reward model trained on pairwise preferences and applies Proximal Policy Optimization (PPO) to align the base model toward more helpful, harmless, and honest outputs. Starling-7B showed that RLAIF can rival or exceed RLHF with human annotations in certain scenarios.

Starling LM contributed to the research community's understanding of AI self-improvement and alignment techniques, offering an open alternative for developers seeking capable chat models.

Providers for Starling LM 7B Beta

1 routes · sorted by uptime

ClosedRouter routes requests to the providers best able to handle your prompt size and parameters, with automatic fallbacks to maximize uptime.

Provider
Context
Quant
Uptime · 30d
bf16
0.00%