Bilingual Evaluation Understudy – Artificial Intelligence terminology

Bilingual Evaluation Understudy (BLEU) NLP

noun phrase

Definition: An automatic evaluation metric for machine translation that measures the quality of a candidate translation by comparing it with one or more reference human translations using modified n-gram precision, together with a brevity penalty that discourages excessively short outputs [Papineni et al. 2002].

Example in context: “Specifically, we utilize two well-known automatic metrics: BLEU (Bilingual Evaluation Understudy) and perplexity (PPL), for assessing the quality of text generation and machine translation.” [Liu and Yin 2023]

Synonyms: BLEU; BLEU score

Related terms: machine translation evaluation; ROUGE; METEOR; BERTScore