neuralmonkey.evaluators.evaluator module¶

class neuralmonkey.evaluators.evaluator.Evaluator(name: str = None) → None¶

Bases: typing.Generic

Base class for evaluators in Neural Monkey.

Each evaluator has a __call__ method which returns a score for a batch of model predictions given a the references. This class provides default implementations of score_batch and score_instance functions.

__init__(name: str = None) → None¶: Initialize self. See help(type(self)) for accurate signature.

static compare_scores(score2: float) → int¶

Compare scores using this evaluator.

The default implementation regards the bigger score as better.

Parameters:	score1 – The first score. score2 – The second score.

Returns: An int. When score1 is better, returns 1. When score2 is better, returns -1. When the scores are equal, returns 0.

name¶

score_batch(hypotheses: List[EvalType], references: List[EvalType]) → float¶

Score a batch of hyp/ref pairs.

The default implementation of this method calls score_instance for each instance in the batch and returns the average score.

Parameters:	hypotheses – List of model predictions. references – List of golden outputs.
Returns:	A float.

score_instance(hypothesis: EvalType, reference: EvalType) → float¶

Score a single hyp/ref pair.

The default implementation of this method returns 1.0 when the hypothesis and the reference are equal and 0.0 otherwise.

Parameters:	hypothesis – The model prediction. reference – The golden output.
Returns:	A float.

class neuralmonkey.evaluators.evaluator.SequenceEvaluator(name: str = None) → None¶

Bases: neuralmonkey.evaluators.evaluator.Evaluator

Base class for token-level evaluators that work with sequences.

score_batch(hypotheses: List[Sequence[EvalType]], references: List[Sequence[EvalType]]) → float¶

Score batch of sequences.

The default implementation assumes equal sequence lengths and operates on the token level (i.e. token-level scores from the whole batch are averaged (in contrast to averaging each sequence first)).

Parameters:	hypotheses – List of model predictions. references – List of golden outputs.
Returns:	A float.

score_token(hyp_token: EvalType, ref_token: EvalType) → float¶

Score a single hyp/ref pair of tokens.

The default implementation returns 1.0 if the tokens are equal, 0.0 otherwise.

Parameters:	hyp_token – A prediction token. ref_token – A golden token.
Returns:	A score for the token hyp/ref pair.

neuralmonkey.evaluators.evaluator.check_lengths(scorer)¶