Superalignment – Artificial Intelligence terminology

Superalignment AI Safety

noun

Definition: The problem of aligning or controlling AI systems whose capabilities exceed direct human evaluative ability, so that they remain consistent with human values and safety requirements at superhuman levels of performance. This formulation is used in recent survey literature on artificial superintelligence and scalable oversight [Kim et al. 2024].

Example in context: “Superalignment, the alignment of AI systems with human values and safety requirements at superhuman levels of capability, aims to addresses two primary goals — scalability in supervision to provide high-quality guidance signals and robust governance to ensure alignment with human values.” [Kim et al. 2024]

Related terms: alignment, scalable oversight, weak-to-strong generalization, superintelligence, AI safety