neuralmonkey.evaluators.multeval module

class neuralmonkey.evaluators.multeval.MultEvalWrapper(wrapper: str, name: str = 'MultEval', encoding: str = 'utf-8', metric: str = 'bleu', language: str = 'en') → None

Bases: neuralmonkey.evaluators.evaluator.Evaluator

Wrapper for mult-eval’s reference BLEU and METEOR scorer.

__init__(wrapper: str, name: str = 'MultEval', encoding: str = 'utf-8', metric: str = 'bleu', language: str = 'en') → None

Initialize the wrapper.

Parameters:
  • wrapper – Path to multeval.sh script
  • name – Name of the evaluator
  • encoding – Encoding of input files
  • language – Language of hypotheses and references
  • metric – Evaluation metric “bleu”, “ter”, “meteor”
score_batch(hypotheses: List[List[str]], references: List[List[str]]) → float

Score a batch of hyp/ref pairs.

The default implementation of this method calls score_instance for each instance in the batch and returns the average score.

Parameters:
  • hypotheses – List of model predictions.
  • references – List of golden outputs.
Returns:

A float.

serialize_to_bytes(sentences: List[List[str]]) → bytes