neuralmonkey.evaluators.gleu module¶
-
class
neuralmonkey.evaluators.gleu.
GLEUEvaluator
(n: int = 4, deduplicate: bool = False, name: str = None) → None¶ Bases:
neuralmonkey.evaluators.evaluator.Evaluator
Sentence-level evaluation metric correlating with BLEU on corpus-level.
From “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation” by Wu et al. (https://arxiv.org/pdf/1609.08144v2.pdf)
GLEU is the minimum of recall and precision of all n-grams up to n in references and hypotheses.
Ngram counts are based on the bleu methods.
-
__init__
(n: int = 4, deduplicate: bool = False, name: str = None) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
static
gleu
(references: List[List[List[str]]], ngrams: int = 4, case_sensitive: bool = True) → float¶ Compute GLEU on a corpus with multiple references (no smoothing).
Parameters: - hypotheses – List of hypotheses
- references – LIst of references. There can be more than one reference.
- ngrams – Maximum order of n-grams. Default 4.
- case_sensitive – Perform case-sensitive computation. Default True.
-
score_batch
(hypotheses: List[List[str]], references: List[List[str]]) → float¶ Score a batch of hyp/ref pairs.
The default implementation of this method calls score_instance for each instance in the batch and returns the average score.
Parameters: - hypotheses – List of model predictions.
- references – List of golden outputs.
Returns: A float.
-
static
total_precision_recall
(references_list: List[List[List[str]]], ngrams: int, case_sensitive: bool) → Tuple[float, float]¶ Compute a modified n-gram precision and recall on a sentence list.
Parameters: - hypotheses – List of output sentences as lists of words
- references_list – List of lists of reference sentences (as lists of words)
- ngrams – n-gram order
- case_sensitive – Whether to perform case-sensitive computation
-