GLEUEvaluator(n: int = 4, deduplicate: bool = False, name: Union[str, NoneType] = None) → None¶
Sentence-level evaluation metric correlating with BLEU on corpus-level.
From “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation” by Wu et al. (https://arxiv.org/pdf/1609.08144v2.pdf)
GLEU is the minimum of recall and precision of all n-grams up to n in references and hypotheses.
Ngram counts are based on the bleu methods.
__init__(n: int = 4, deduplicate: bool = False, name: Union[str, NoneType] = None) → None¶
Initialize self. See help(type(self)) for accurate signature.
gleu(references: List[List[List[str]]], ngrams: int = 4, case_sensitive: bool = True) → float¶
Compute GLEU on a corpus with multiple references (no smoothing).
- hypotheses – List of hypotheses
- references – LIst of references. There can be more than one reference.
- ngrams – Maximum order of n-grams. Default 4.
- case_sensitive – Perform case-sensitive computation. Default True.
total_precision_recall(references_list: List[List[List[str]]], ngrams: int, case_sensitive: bool) → Tuple[float, float]¶
Compute a modified n-gram precision and recall on a sentence list.
- hypotheses – List of output sentences as lists of words
- references_list – List of lists of reference sentences (as lists of words)
- ngrams – n-gram order
- case_sensitive – Whether to perform case-sensitive computation