neuralmonkey.evaluators.bleu module

class neuralmonkey.evaluators.bleu.BLEUEvaluator(n: int = 4, deduplicate: bool = False, name: str = None, multiple_references_separator: str = None) → None

Bases: neuralmonkey.evaluators.evaluator.Evaluator

__init__(n: int = 4, deduplicate: bool = False, name: str = None, multiple_references_separator: str = None) → None

Instantiate BLEU evaluator.

Parameters:
  • n – Longest n-grams considered.
  • deduplicate – Flag whether repated tokes should be treated as one.
  • name – Name displayed in the logs and TensorBoard.
  • multiple_references_separator – Token that separates multiple reference sentences. If None, it assumes the reference is one sentence only.
static bleu(references: List[List[List[str]]], ngrams: int = 4, case_sensitive: bool = True)

Compute BLEU on a corpus with multiple references.

The n-grams are uniformly weighted.

Default is to use smoothing as in reference implementation on: https://github.com/ufal/qtleap/blob/master/cuni_train/bin/mteval-v13a.pl#L831-L873

Parameters:
  • hypotheses – List of hypotheses
  • references – LIst of references. There can be more than one reference.
  • ngrams – Maximum order of n-grams. Default 4.
  • case_sensitive – Perform case-sensitive computation. Default True.
static deduplicate_sentences() → List[List[str]]
static effective_reference_length(references_list: List[List[List[str]]]) → int

Compute the effective reference corpus length.

The effective reference corpus length is based on best match length.

Parameters:
  • hypotheses – List of output sentences as lists of words
  • references_list – List of lists of references (as lists of words)
static merge_max_counters() → collections.Counter

Merge counters using maximum values.

static minimum_reference_length(references_list: List[List[str]]) → int

Compute the minimum reference corpus length.

The minimum reference corpus length is based on the shortest reference sentence length.

Parameters:
  • hypotheses – List of output sentences as lists of words
  • references_list – List of lists of references (as lists of words)
static modified_ngram_precision(references_list: List[List[List[str]]], n: int, case_sensitive: bool) → Tuple[float, int]

Compute the modified n-gram precision on a list of sentences.

Parameters:
  • hypotheses – List of output sentences as lists of words
  • references_list – List of lists of reference sentences (as lists of words)
  • n – n-gram order
  • case_sensitive – Whether to perform case-sensitive computation
static ngram_counts(n: int, lowercase: bool, delimiter: str = ' ') → collections.Counter

Get n-grams from a sentence.

Parameters:
  • sentence – Sentence as a list of words
  • n – n-gram order
  • lowercase – Convert ngrams to lowercase
  • delimiter – delimiter to use to create counter entries
score_batch(hypotheses: List[List[str]], references: List[List[str]]) → float

Score a batch of hyp/ref pairs.

The default implementation of this method calls score_instance for each instance in the batch and returns the average score.

Parameters:
  • hypotheses – List of model predictions.
  • references – List of golden outputs.
Returns:

A float.