neuralmonkey.evaluators package

Submodules

neuralmonkey.evaluators.accuracy module

class neuralmonkey.evaluators.accuracy.AccuracyEvaluator(name='Accuracy')

Bases: object

static compare_scores(score1: float, score2: float) → int

neuralmonkey.evaluators.beer module

class neuralmonkey.evaluators.beer.BeerWrapper(wrapper: str, name: str = 'BEER', encoding: str = 'utf-8') → None

Bases: object

Wrapper for BEER scorer.

Paper: http://aclweb.org/anthology/D14-1025 Code: https://github.com/stanojevic/beer

serialize_to_bytes(sentences: typing.List[typing.List[str]]) → bytes

neuralmonkey.evaluators.bleu module

class neuralmonkey.evaluators.bleu.BLEUEvaluator(n: int = 4, deduplicate: bool = False, name: typing.Union[str, NoneType] = None) → None

Bases: object

static bleu(hypotheses: typing.List[typing.List[str]], references: typing.List[typing.List[typing.List[str]]], ngrams: int = 4, case_sensitive: bool = True)

Computes BLEU on a corpus with multiple references using uniform weights. Default is to use smoothing as in reference implementation on: https://github.com/ufal/qtleap/blob/master/cuni_train/bin/mteval-v13a.pl#L831-L873

Parameters:
  • hypotheses – List of hypotheses
  • references – LIst of references. There can be more than one reference.
  • ngrams – Maximum order of n-grams. Default 4.
  • case_sensitive – Perform case-sensitive computation. Default True.
static compare_scores(score1: float, score2: float) → int
static deduplicate_sentences(sentences: typing.List[typing.List[str]]) → typing.List[typing.List[str]]
static effective_reference_length(hypotheses: typing.List[typing.List[str]], references_list: typing.List[typing.List[typing.List[str]]]) → int

Computes the effective reference corpus length (based on best match length)

Parameters:
  • hypotheses – List of output sentences as lists of words
  • references_list – List of lists of references (as lists of words)
static merge_max_counters(counters: typing.List[collections.Counter]) → collections.Counter

Merge counters using maximum values

static minimum_reference_length(hypotheses: typing.List[typing.List[str]], references_list: typing.List[typing.List[str]]) → int

Computes the effective reference corpus length (based on the shortest reference sentence length)

Parameters:
  • hypotheses – List of output sentences as lists of words
  • references_list – List of lists of references (as lists of words)
static modified_ngram_precision(hypotheses: typing.List[typing.List[str]], references_list: typing.List[typing.List[typing.List[str]]], n: int, case_sensitive: bool) → typing.Tuple[float, int]

Computes the modified n-gram precision on a list of sentences

Parameters:
  • hypotheses – List of output sentences as lists of words
  • references_list – List of lists of reference sentences (as lists of words)
  • n – n-gram order
  • case_sensitive – Whether to perform case-sensitive computation
static ngram_counts(sentence: typing.List[str], n: int, lowercase: bool, delimiter: str = ' ') → collections.Counter

Get n-grams from a sentence

Parameters:
  • sentence – Sentence as a list of words
  • n – n-gram order
  • lowercase – Convert ngrams to lowercase
  • delimiter – delimiter to use to create counter entries

neuralmonkey.evaluators.bleu_ref module

class neuralmonkey.evaluators.bleu_ref.BLEUReferenceImplWrapper(wrapper, name='BLEU', encoding='utf-8')

Bases: object

Wrapper for TectoMT’s wrapper for reference NIST and BLEU scorer

serialize_to_bytes(sentences: typing.List[typing.List[str]]) → bytes

neuralmonkey.evaluators.edit_distance module

class neuralmonkey.evaluators.edit_distance.EditDistanceEvaluator(name: str = 'Edit distance') → None

Bases: object

static compare_scores(score1: float, score2: float) → int
static ratio(str1: str, str2: str) → float

neuralmonkey.evaluators.f1_bio module

class neuralmonkey.evaluators.f1_bio.F1Evaluator(name='F1 measure')

Bases: object

F1 evaluator for BIO tagging, e.g. NP chunking.

The entities are annotated as beginning of the entity (B), continuation of the entity (I), the rest is outside the entity (O).

static chunk2set(seq: typing.List[str]) → typing.Set[str]
static f1_score(decoded: typing.List[str], reference: typing.List[str]) → float

neuralmonkey.evaluators.gleu module

class neuralmonkey.evaluators.gleu.GLEUEvaluator(n=4, deduplicate=False, name=None)

Bases: object

Sentence-level evaluation metric that correlates with BLEU on corpus-level. From “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation” by Wu et al. (https://arxiv.org/pdf/1609.08144v2.pdf)

GLEU is the minimum of recall and precision of all n-grams up to n in references and hypotheses.

Ngram counts are based on the bleu methods.

static gleu(hypotheses: typing.List[typing.List[str]], references: typing.List[typing.List[typing.List[str]]], ngrams: int = 4, case_sensitive: bool = True) → float

Computes GLEU on a corpus with multiple references. No smoothing.

Parameters:
  • hypotheses – List of hypotheses
  • references – LIst of references. There can be more than one reference.
  • ngrams – Maximum order of n-grams. Default 4.
  • case_sensitive – Perform case-sensitive computation. Default True.
static total_precision_recall(hypotheses: typing.List[typing.List[str]], references_list: typing.List[typing.List[typing.List[str]]], ngrams: int, case_sensitive: bool) → typing.Tuple[float, float]
Computes the modified n-gram precision and recall
on a list of sentences
Parameters:
  • hypotheses – List of output sentences as lists of words
  • references_list – List of lists of reference sentences (as lists of words)
  • ngrams – n-gram order
  • case_sensitive – Whether to perform case-sensitive computation

neuralmonkey.evaluators.multeval module

class neuralmonkey.evaluators.multeval.MultEvalWrapper(wrapper: str, name: str = 'MultEval', encoding: str = 'utf-8', metric: str = 'bleu', language: str = 'en') → None

Bases: object

Wrapper for mult-eval’s reference BLEU and METEOR scorer.

serialize_to_bytes(sentences: typing.List[typing.List[str]]) → bytes

neuralmonkey.evaluators.ter module

class neuralmonkey.evaluators.ter.TEREvalutator(name='TER')

Bases: object

Compute TER using the pyter library.

Module contents