neuralmonkey.evaluators.bleu module¶

class neuralmonkey.evaluators.bleu.BLEUEvaluator(n: int = 4, deduplicate: bool = False, name: str = None, multiple_references_separator: str = None) → None¶

Bases: neuralmonkey.evaluators.evaluator.Evaluator

__init__(n: int = 4, deduplicate: bool = False, name: str = None, multiple_references_separator: str = None) → None¶

Instantiate BLEU evaluator.

Parameters:	n – Longest n-grams considered. deduplicate – Flag whether repated tokes should be treated as one. name – Name displayed in the logs and TensorBoard. multiple_references_separator – Token that separates multiple reference sentences. If `None`, it assumes the reference is one sentence only.

static bleu(references: List[List[List[str]]], ngrams: int = 4, case_sensitive: bool = True)¶

Compute BLEU on a corpus with multiple references.

The n-grams are uniformly weighted.

Default is to use smoothing as in reference implementation on: https://github.com/ufal/qtleap/blob/master/cuni_train/bin/mteval-v13a.pl#L831-L873

Parameters:	hypotheses – List of hypotheses references – LIst of references. There can be more than one reference. ngrams – Maximum order of n-grams. Default 4. case_sensitive – Perform case-sensitive computation. Default True.

static deduplicate_sentences() → List[List[str]]¶

static effective_reference_length(references_list: List[List[List[str]]]) → int¶

Compute the effective reference corpus length.

The effective reference corpus length is based on best match length.

Parameters:	hypotheses – List of output sentences as lists of words references_list – List of lists of references (as lists of words)

static merge_max_counters() → collections.Counter¶: Merge counters using maximum values.

static minimum_reference_length(references_list: List[List[str]]) → int¶

Compute the minimum reference corpus length.

The minimum reference corpus length is based on the shortest reference sentence length.

Parameters:	hypotheses – List of output sentences as lists of words references_list – List of lists of references (as lists of words)

static modified_ngram_precision(references_list: List[List[List[str]]], n: int, case_sensitive: bool) → Tuple[float, int]¶

Compute the modified n-gram precision on a list of sentences.

Parameters:	hypotheses – List of output sentences as lists of words references_list – List of lists of reference sentences (as lists of words) n – n-gram order case_sensitive – Whether to perform case-sensitive computation

static ngram_counts(n: int, lowercase: bool, delimiter: str = ' ') → collections.Counter¶

Get n-grams from a sentence.

Parameters:	sentence – Sentence as a list of words n – n-gram order lowercase – Convert ngrams to lowercase delimiter – delimiter to use to create counter entries

score_batch(hypotheses: List[List[str]], references: List[List[str]]) → float¶

Score a batch of hyp/ref pairs.

The default implementation of this method calls score_instance for each instance in the batch and returns the average score.

Parameters:	hypotheses – List of model predictions. references – List of golden outputs.
Returns:	A float.