neuralmonkey.evaluators.evaluator module

class neuralmonkey.evaluators.evaluator.Evaluator(name: str = None) → None

Bases: typing.Generic

Base class for evaluators in Neural Monkey.

Each evaluator has a __call__ method which returns a score for a batch of model predictions given a the references. This class provides default implementations of score_batch and score_instance functions.

__init__(name: str = None) → None

Initialize self. See help(type(self)) for accurate signature.

static compare_scores(score2: float) → int

Compare scores using this evaluator.

The default implementation regards the bigger score as better.

Parameters:
  • score1 – The first score.
  • score2 – The second score.
Returns
An int. When score1 is better, returns 1. When score2 is better, returns -1. When the scores are equal, returns 0.
name
score_batch(hypotheses: List[EvalType], references: List[EvalType]) → float

Score a batch of hyp/ref pairs.

The default implementation of this method calls score_instance for each instance in the batch and returns the average score.

Parameters:
  • hypotheses – List of model predictions.
  • references – List of golden outputs.
Returns:

A float.

score_instance(hypothesis: EvalType, reference: EvalType) → float

Score a single hyp/ref pair.

The default implementation of this method returns 1.0 when the hypothesis and the reference are equal and 0.0 otherwise.

Parameters:
  • hypothesis – The model prediction.
  • reference – The golden output.
Returns:

A float.

class neuralmonkey.evaluators.evaluator.SequenceEvaluator(name: str = None) → None

Bases: neuralmonkey.evaluators.evaluator.Evaluator

Base class for token-level evaluators that work with sequences.

score_batch(hypotheses: List[Sequence[EvalType]], references: List[Sequence[EvalType]]) → float

Score batch of sequences.

The default implementation assumes equal sequence lengths and operates on the token level (i.e. token-level scores from the whole batch are averaged (in contrast to averaging each sequence first)).

Parameters:
  • hypotheses – List of model predictions.
  • references – List of golden outputs.
Returns:

A float.

score_token(hyp_token: EvalType, ref_token: EvalType) → float

Score a single hyp/ref pair of tokens.

The default implementation returns 1.0 if the tokens are equal, 0.0 otherwise.

Parameters:
  • hyp_token – A prediction token.
  • ref_token – A golden token.
Returns:

A score for the token hyp/ref pair.

neuralmonkey.evaluators.evaluator.check_lengths(scorer)