neuralmonkey.evaluators.bleu module¶
-
class
neuralmonkey.evaluators.bleu.
BLEUEvaluator
(n: int = 4, deduplicate: bool = False, name: str = None, multiple_references_separator: str = None) → None¶ Bases:
neuralmonkey.evaluators.evaluator.Evaluator
-
__init__
(n: int = 4, deduplicate: bool = False, name: str = None, multiple_references_separator: str = None) → None¶ Instantiate BLEU evaluator.
Parameters: - n – Longest n-grams considered.
- deduplicate – Flag whether repated tokes should be treated as one.
- name – Name displayed in the logs and TensorBoard.
- multiple_references_separator – Token that separates multiple
reference sentences. If
None
, it assumes the reference is one sentence only.
-
static
bleu
(references: List[List[List[str]]], ngrams: int = 4, case_sensitive: bool = True)¶ Compute BLEU on a corpus with multiple references.
The n-grams are uniformly weighted.
Default is to use smoothing as in reference implementation on: https://github.com/ufal/qtleap/blob/master/cuni_train/bin/mteval-v13a.pl#L831-L873
Parameters: - hypotheses – List of hypotheses
- references – LIst of references. There can be more than one reference.
- ngrams – Maximum order of n-grams. Default 4.
- case_sensitive – Perform case-sensitive computation. Default True.
-
static
deduplicate_sentences
() → List[List[str]]¶
-
static
effective_reference_length
(references_list: List[List[List[str]]]) → int¶ Compute the effective reference corpus length.
The effective reference corpus length is based on best match length.
Parameters: - hypotheses – List of output sentences as lists of words
- references_list – List of lists of references (as lists of words)
-
static
merge_max_counters
() → collections.Counter¶ Merge counters using maximum values.
-
static
minimum_reference_length
(references_list: List[List[str]]) → int¶ Compute the minimum reference corpus length.
The minimum reference corpus length is based on the shortest reference sentence length.
Parameters: - hypotheses – List of output sentences as lists of words
- references_list – List of lists of references (as lists of words)
-
static
modified_ngram_precision
(references_list: List[List[List[str]]], n: int, case_sensitive: bool) → Tuple[float, int]¶ Compute the modified n-gram precision on a list of sentences.
Parameters: - hypotheses – List of output sentences as lists of words
- references_list – List of lists of reference sentences (as lists of words)
- n – n-gram order
- case_sensitive – Whether to perform case-sensitive computation
-
static
ngram_counts
(n: int, lowercase: bool, delimiter: str = ' ') → collections.Counter¶ Get n-grams from a sentence.
Parameters: - sentence – Sentence as a list of words
- n – n-gram order
- lowercase – Convert ngrams to lowercase
- delimiter – delimiter to use to create counter entries
-
score_batch
(hypotheses: List[List[str]], references: List[List[str]]) → float¶ Score a batch of hyp/ref pairs.
The default implementation of this method calls score_instance for each instance in the batch and returns the average score.
Parameters: - hypotheses – List of model predictions.
- references – List of golden outputs.
Returns: A float.
-