neuralmonkey.evaluators package¶
Submodules¶
neuralmonkey.evaluators.accuracy module¶
neuralmonkey.evaluators.beer module¶
-
class
neuralmonkey.evaluators.beer.
BeerWrapper
(wrapper: str, name: str = 'BEER', encoding: str = 'utf-8') → None¶ Bases:
object
Wrapper for BEER scorer.
Paper: http://aclweb.org/anthology/D14-1025 Code: https://github.com/stanojevic/beer
-
serialize_to_bytes
(sentences: typing.List[typing.List[str]]) → bytes¶
-
neuralmonkey.evaluators.bleu module¶
-
class
neuralmonkey.evaluators.bleu.
BLEUEvaluator
(n: int = 4, deduplicate: bool = False, name: typing.Union[str, NoneType] = None) → None¶ Bases:
object
-
static
bleu
(hypotheses: typing.List[typing.List[str]], references: typing.List[typing.List[typing.List[str]]], ngrams: int = 4, case_sensitive: bool = True)¶ Computes BLEU on a corpus with multiple references using uniform weights. Default is to use smoothing as in reference implementation on: https://github.com/ufal/qtleap/blob/master/cuni_train/bin/mteval-v13a.pl#L831-L873
Parameters: - hypotheses – List of hypotheses
- references – LIst of references. There can be more than one reference.
- ngrams – Maximum order of n-grams. Default 4.
- case_sensitive – Perform case-sensitive computation. Default True.
-
static
compare_scores
(score1: float, score2: float) → int¶
-
static
deduplicate_sentences
(sentences: typing.List[typing.List[str]]) → typing.List[typing.List[str]]¶
-
static
effective_reference_length
(hypotheses: typing.List[typing.List[str]], references_list: typing.List[typing.List[typing.List[str]]]) → int¶ Computes the effective reference corpus length (based on best match length)
Parameters: - hypotheses – List of output sentences as lists of words
- references_list – List of lists of references (as lists of words)
-
static
merge_max_counters
(counters: typing.List[collections.Counter]) → collections.Counter¶ Merge counters using maximum values
-
static
minimum_reference_length
(hypotheses: typing.List[typing.List[str]], references_list: typing.List[typing.List[str]]) → int¶ Computes the effective reference corpus length (based on the shortest reference sentence length)
Parameters: - hypotheses – List of output sentences as lists of words
- references_list – List of lists of references (as lists of words)
-
static
modified_ngram_precision
(hypotheses: typing.List[typing.List[str]], references_list: typing.List[typing.List[typing.List[str]]], n: int, case_sensitive: bool) → typing.Tuple[float, int]¶ Computes the modified n-gram precision on a list of sentences
Parameters: - hypotheses – List of output sentences as lists of words
- references_list – List of lists of reference sentences (as lists of words)
- n – n-gram order
- case_sensitive – Whether to perform case-sensitive computation
-
static
ngram_counts
(sentence: typing.List[str], n: int, lowercase: bool, delimiter: str = ' ') → collections.Counter¶ Get n-grams from a sentence
Parameters: - sentence – Sentence as a list of words
- n – n-gram order
- lowercase – Convert ngrams to lowercase
- delimiter – delimiter to use to create counter entries
-
static
neuralmonkey.evaluators.bleu_ref module¶
neuralmonkey.evaluators.edit_distance module¶
neuralmonkey.evaluators.f1_bio module¶
-
class
neuralmonkey.evaluators.f1_bio.
F1Evaluator
(name='F1 measure')¶ Bases:
object
F1 evaluator for BIO tagging, e.g. NP chunking.
The entities are annotated as beginning of the entity (B), continuation of the entity (I), the rest is outside the entity (O).
-
static
chunk2set
(seq: typing.List[str]) → typing.Set[str]¶
-
static
f1_score
(decoded: typing.List[str], reference: typing.List[str]) → float¶
-
static
neuralmonkey.evaluators.gleu module¶
-
class
neuralmonkey.evaluators.gleu.
GLEUEvaluator
(n=4, deduplicate=False, name=None)¶ Bases:
object
Sentence-level evaluation metric that correlates with BLEU on corpus-level. From “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation” by Wu et al. (https://arxiv.org/pdf/1609.08144v2.pdf)
GLEU is the minimum of recall and precision of all n-grams up to n in references and hypotheses.
Ngram counts are based on the bleu methods.
-
static
gleu
(hypotheses: typing.List[typing.List[str]], references: typing.List[typing.List[typing.List[str]]], ngrams: int = 4, case_sensitive: bool = True) → float¶ Computes GLEU on a corpus with multiple references. No smoothing.
Parameters: - hypotheses – List of hypotheses
- references – LIst of references. There can be more than one reference.
- ngrams – Maximum order of n-grams. Default 4.
- case_sensitive – Perform case-sensitive computation. Default True.
-
static
total_precision_recall
(hypotheses: typing.List[typing.List[str]], references_list: typing.List[typing.List[typing.List[str]]], ngrams: int, case_sensitive: bool) → typing.Tuple[float, float]¶ - Computes the modified n-gram precision and recall
- on a list of sentences
Parameters: - hypotheses – List of output sentences as lists of words
- references_list – List of lists of reference sentences (as lists of words)
- ngrams – n-gram order
- case_sensitive – Whether to perform case-sensitive computation
-
static
neuralmonkey.evaluators.multeval module¶
-
class
neuralmonkey.evaluators.multeval.
MultEvalWrapper
(wrapper: str, name: str = 'MultEval', encoding: str = 'utf-8', metric: str = 'bleu', language: str = 'en') → None¶ Bases:
object
Wrapper for mult-eval’s reference BLEU and METEOR scorer.
-
serialize_to_bytes
(sentences: typing.List[typing.List[str]]) → bytes¶
-
neuralmonkey.evaluators.ter module¶
-
class
neuralmonkey.evaluators.ter.
TEREvalutator
(name='TER')¶ Bases:
object
Compute TER using the pyter library.