Starling LM 7B Beta

flagship

berkeley-nest · released 2023-11-01 · text

currently routing · 4.2k rpm

8.2M tokens

Context

— / 1M

Input

— / 1M

Output

— t/s

Speed

open

License

/ ABOUT

Starling LM 7B Beta is an open-weight language model developed by UC Berkeley's NLP group, fine-tuned from Vicuna-7B using Reinforcement Learning from AI Feedback (RLAIF). It demonstrates that AI-generated feedback can effectively improve language model alignment, achieving strong results on chat and reasoning benchmarks.

The model uses a novel reward model trained on pairwise preferences and applies Proximal Policy Optimization (PPO) to align the base model toward more helpful, harmless, and honest outputs. Starling-7B showed that RLAIF can rival or exceed RLHF with human annotations in certain scenarios.

Starling LM contributed to the research community's understanding of AI self-improvement and alignment techniques, offering an open alternative for developers seeking capable chat models.

Providers for Starling LM 7B Beta

1 routes · sorted by uptime

ClosedRouter routes requests to the providers best able to handle your prompt size and parameters, with automatic fallbacks to maximize uptime.

Provider

Context

Quant

Uptime · 30d

Cloudflare Workers AI

—

bf16

0.00%