neuralmonkey.evaluators package¶

Submodules¶

neuralmonkey.evaluators.accuracy module¶

class neuralmonkey.evaluators.accuracy.AccuracyEvaluator(name: str = 'Accuracy') → None¶

Bases: object

static compare_scores(score1: float, score2: float) → int¶

class neuralmonkey.evaluators.accuracy.AccuracySeqLevelEvaluator(name: str = 'AccuracySeqLevel') → None¶

Bases: object

static compare_scores(score1: float, score2: float) → int¶

neuralmonkey.evaluators.average module¶

class neuralmonkey.evaluators.average.AverageEvaluator(name: str) → None¶

Bases: object

Just average the numeric output of a runner.

neuralmonkey.evaluators.beer module¶

class neuralmonkey.evaluators.beer.BeerWrapper(wrapper: str, name: str = 'BEER', encoding: str = 'utf-8') → None¶

Bases: object

Wrapper for BEER scorer.

Paper: http://aclweb.org/anthology/D14-1025 Code: https://github.com/stanojevic/beer

serialize_to_bytes(sentences: typing.List[typing.List[str]]) → bytes¶

neuralmonkey.evaluators.bleu module¶

class neuralmonkey.evaluators.bleu.BLEUEvaluator(n: int = 4, deduplicate: bool = False, name: typing.Union[str, NoneType] = None) → None¶

Bases: object

static bleu(hypotheses: typing.List[typing.List[str]], references: typing.List[typing.List[typing.List[str]]], ngrams: int = 4, case_sensitive: bool = True)¶

Compute BLEU on a corpus with multiple references.

The n-grams are uniformly weighted.

Default is to use smoothing as in reference implementation on: https://github.com/ufal/qtleap/blob/master/cuni_train/bin/mteval-v13a.pl#L831-L873

Parameters:	hypotheses – List of hypotheses references – LIst of references. There can be more than one reference. ngrams – Maximum order of n-grams. Default 4. case_sensitive – Perform case-sensitive computation. Default True.

static compare_scores(score1: float, score2: float) → int¶

static deduplicate_sentences(sentences: typing.List[typing.List[str]]) → typing.List[typing.List[str]]¶

static effective_reference_length(hypotheses: typing.List[typing.List[str]], references_list: typing.List[typing.List[typing.List[str]]]) → int¶

Compute the effective reference corpus length.

The effective reference corpus length is based on best match length.

Parameters:	hypotheses – List of output sentences as lists of words references_list – List of lists of references (as lists of words)

static merge_max_counters(counters: typing.List[collections.Counter]) → collections.Counter¶: Merge counters using maximum values.

static minimum_reference_length(hypotheses: typing.List[typing.List[str]], references_list: typing.List[typing.List[str]]) → int¶

Compute the minimum reference corpus length.

The minimum reference corpus length is based on the shortest reference sentence length.

Parameters:	hypotheses – List of output sentences as lists of words references_list – List of lists of references (as lists of words)

static modified_ngram_precision(hypotheses: typing.List[typing.List[str]], references_list: typing.List[typing.List[typing.List[str]]], n: int, case_sensitive: bool) → typing.Tuple[float, int]¶

Compute the modified n-gram precision on a list of sentences.

Parameters:	hypotheses – List of output sentences as lists of words references_list – List of lists of reference sentences (as lists of words) n – n-gram order case_sensitive – Whether to perform case-sensitive computation

static ngram_counts(sentence: typing.List[str], n: int, lowercase: bool, delimiter: str = ' ') → collections.Counter¶

Get n-grams from a sentence.

Parameters:	sentence – Sentence as a list of words n – n-gram order lowercase – Convert ngrams to lowercase delimiter – delimiter to use to create counter entries

neuralmonkey.evaluators.bleu_ref module¶

class neuralmonkey.evaluators.bleu_ref.BLEUReferenceImplWrapper(wrapper, name='BLEU', encoding='utf-8')¶

Bases: object

Wrapper for TectoMT’s wrapper for reference NIST and BLEU scorer.

serialize_to_bytes(sentences: typing.List[typing.List[str]]) → bytes¶

neuralmonkey.evaluators.chrf module¶

class neuralmonkey.evaluators.chrf.ChrFEvaluator(n: int = 6, beta: float = 1, ignored_symbols: typing.Union[typing.List[str], NoneType] = None, name: typing.Union[str, NoneType] = None) → None¶

Bases: object

Compute ChrF score.

See http://www.statmt.org/wmt15/pdf/WMT49.pdf

neuralmonkey.evaluators.edit_distance module¶

class neuralmonkey.evaluators.edit_distance.EditDistanceEvaluator(name: str = 'Edit distance') → None¶

Bases: object

static compare_scores(score1: float, score2: float) → int¶

static ratio(str1: str, str2: str) → float¶

neuralmonkey.evaluators.f1_bio module¶

class neuralmonkey.evaluators.f1_bio.F1Evaluator(name: str = 'F1 measure') → None¶

Bases: object

F1 evaluator for BIO tagging, e.g. NP chunking.

The entities are annotated as beginning of the entity (B), continuation of the entity (I), the rest is outside the entity (O).

static chunk2set(seq: typing.List[str]) → typing.Set[str]¶

static f1_score(decoded: typing.List[str], reference: typing.List[str]) → float¶

neuralmonkey.evaluators.gleu module¶

class neuralmonkey.evaluators.gleu.GLEUEvaluator(n: int = 4, deduplicate: bool = False, name: typing.Union[str, NoneType] = None) → None¶

Bases: object

Sentence-level evaluation metric correlating with BLEU on corpus-level.

From “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation” by Wu et al. (https://arxiv.org/pdf/1609.08144v2.pdf)

GLEU is the minimum of recall and precision of all n-grams up to n in references and hypotheses.

Ngram counts are based on the bleu methods.

static gleu(hypotheses: typing.List[typing.List[str]], references: typing.List[typing.List[typing.List[str]]], ngrams: int = 4, case_sensitive: bool = True) → float¶

Compute GLEU on a corpus with multiple references (no smoothing).

Parameters:	hypotheses – List of hypotheses references – LIst of references. There can be more than one reference. ngrams – Maximum order of n-grams. Default 4. case_sensitive – Perform case-sensitive computation. Default True.

static total_precision_recall(hypotheses: typing.List[typing.List[str]], references_list: typing.List[typing.List[typing.List[str]]], ngrams: int, case_sensitive: bool) → typing.Tuple[float, float]¶

Compute a modified n-gram precision and recall on a sentence list.

Parameters:	hypotheses – List of output sentences as lists of words references_list – List of lists of reference sentences (as lists of words) ngrams – n-gram order case_sensitive – Whether to perform case-sensitive computation

neuralmonkey.evaluators.mse module¶

class neuralmonkey.evaluators.mse.MeanSquaredErrorEvaluator(name: str = 'MeanSquaredError') → None¶

Bases: object

static compare_scores(score1: float, score2: float) → int¶

neuralmonkey.evaluators.multeval module¶

class neuralmonkey.evaluators.multeval.MultEvalWrapper(wrapper: str, name: str = 'MultEval', encoding: str = 'utf-8', metric: str = 'bleu', language: str = 'en') → None¶

Bases: object

Wrapper for mult-eval’s reference BLEU and METEOR scorer.

serialize_to_bytes(sentences: typing.List[typing.List[str]]) → bytes¶

neuralmonkey.evaluators.ter module¶

class neuralmonkey.evaluators.ter.TEREvaluator(name: str = 'TER') → None¶

Bases: object

Compute TER using the pyter library.

neuralmonkey.evaluators.wer module¶

class neuralmonkey.evaluators.wer.WEREvaluator(name: str = 'WER') → None¶

Bases: object

Compute WER (word error rate, used in speech recognition).

neuralmonkey.evaluators package¶

Submodules¶

neuralmonkey.evaluators.accuracy module¶

neuralmonkey.evaluators.average module¶

neuralmonkey.evaluators.beer module¶

neuralmonkey.evaluators.bleu module¶

neuralmonkey.evaluators.bleu_ref module¶

neuralmonkey.evaluators.chrf module¶

neuralmonkey.evaluators.edit_distance module¶

neuralmonkey.evaluators.f1_bio module¶

neuralmonkey.evaluators.gleu module¶

neuralmonkey.evaluators.mse module¶

neuralmonkey.evaluators.multeval module¶

neuralmonkey.evaluators.ter module¶

neuralmonkey.evaluators.wer module¶

Module contents¶