Reinforcement learning from human feedback

Reinforcement learning from human feedback             Machine Learning

noun phrase

Definition: A variant of reinforcement learning that learns from human feedback rather than relying only on an engineered reward function; in LLM research, it is widely used to align model behavior with human preferences [Kaufmann et al. 2023].

Example in context:Reinforcement learning from human feedback (RLHF) has emerged as an effective approach to aligning large language models (LLMs) to human preferences.” [Lang et al. 2024]

Synonym: RLHF

Related terms: preference modeling, reward model, alignment, supervised fine-tuning, constitutional AI

Добавить комментарий 0

Ваш электронный адрес не будет опубликован. Обязательные поля помечены *