neuralmonkey.trainers.self_critical_objective module¶

Training objective for self-critical learning.

Self-critic learning is a modification of the REINFORCE algorithm that uses the reward of the train-time decoder output as a baseline in the update step.

For more details see: https://arxiv.org/pdf/1612.00563.pdf

class neuralmonkey.trainers.self_critical_objective.SelfCriticalObjective(decoder: neuralmonkey.decoders.decoder.Decoder, reward_function: Callable[[numpy.ndarray, numpy.ndarray], numpy.ndarray], weight: float = None) → None¶

Bases: neuralmonkey.trainers.objective.Objective

__init__(decoder: neuralmonkey.decoders.decoder.Decoder, reward_function: Callable[[numpy.ndarray, numpy.ndarray], numpy.ndarray], weight: float = None) → None¶

Self-critical objective.

Parameters:	decoder – A recurrent decoder. reward_function – A reward function computing score in Python. weight – Mixing weight for a trainer.
Returns:	Objective object to be used in generic trainer.

loss¶: Return the loss tensor fetched by the trainer.

weight¶

Return the weight of this objective.

The loss will be multiplied by this so the gradients can be controlled in case of multiple objectives.

Returns:	An optional tensor. If None, default weight of 1 is assumed.

neuralmonkey.trainers.self_critical_objective.reinforce_score(reward: tensorflow.python.framework.ops.Tensor, baseline: tensorflow.python.framework.ops.Tensor, decoded: tensorflow.python.framework.ops.Tensor, logits: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶

Cost function whose derivative is the REINFORCE equation.

This implements the primitive function to the central equation of the REINFORCE algorithm that estimates the gradients of the loss with respect to decoder logits.

It uses the fact that the second term of the product (the difference of the word distribution and one hot vector of the decoded word) is a derivative of negative log likelihood of the decoded word. The reward function and the baseline are however treated as a constant, so they influence the derivate only multiplicatively.

neuralmonkey.trainers.self_critical_objective.sentence_bleu(references: numpy.ndarray, hypotheses: numpy.ndarray) → numpy.ndarray¶

Compute index-based sentence-level BLEU score.

Computes sentence level BLEU on indices outputed by the decoder, i.e. whatever the decoder uses as a unit is used a token in the BLEU computation, ignoring the tokens may be sub-word units.

neuralmonkey.trainers.self_critical_objective.sentence_gleu(references: numpy.ndarray, hypotheses: numpy.ndarray) → numpy.ndarray¶

Compute index-based GLEU score.

GLEU score is a sentence-level metric used in Google’s Neural MT as a reward in reinforcement learning (https://arxiv.org/abs/1609.08144). It is a minimum of precision and recall on 1- to 4-grams.

It operates over the indices emitted by the decoder which are not necessarily tokens (could be characters or subword units).