neuralmonkey.trainers.self_critical_objective module¶
Training objective for selfcritical learning.
Selfcritic learning is a modification of the REINFORCE algorithm that uses the reward of the traintime decoder output as a baseline in the update step.
For more details see: https://arxiv.org/pdf/1612.00563.pdf

class
neuralmonkey.trainers.self_critical_objective.
SelfCriticalObjective
(decoder: neuralmonkey.decoders.decoder.Decoder, reward_function: Callable[[numpy.ndarray, numpy.ndarray], numpy.ndarray], weight: float = None) → None¶ Bases:
neuralmonkey.trainers.objective.Objective

__init__
(decoder: neuralmonkey.decoders.decoder.Decoder, reward_function: Callable[[numpy.ndarray, numpy.ndarray], numpy.ndarray], weight: float = None) → None¶ Selfcritical objective.
Parameters:  decoder – A recurrent decoder.
 reward_function – A reward function computing score in Python.
 weight – Mixing weight for a trainer.
Returns: Objective object to be used in generic trainer.

loss
¶ Return the loss tensor fetched by the trainer.

weight
¶ Return the weight of this objective.
The loss will be multiplied by this so the gradients can be controlled in case of multiple objectives.
Returns: An optional tensor. If None, default weight of 1 is assumed.


neuralmonkey.trainers.self_critical_objective.
reinforce_score
(reward: tensorflow.python.framework.ops.Tensor, baseline: tensorflow.python.framework.ops.Tensor, decoded: tensorflow.python.framework.ops.Tensor, logits: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶ Cost function whose derivative is the REINFORCE equation.
This implements the primitive function to the central equation of the REINFORCE algorithm that estimates the gradients of the loss with respect to decoder logits.
It uses the fact that the second term of the product (the difference of the word distribution and one hot vector of the decoded word) is a derivative of negative log likelihood of the decoded word. The reward function and the baseline are however treated as a constant, so they influence the derivate only multiplicatively.

neuralmonkey.trainers.self_critical_objective.
sentence_bleu
(references: numpy.ndarray, hypotheses: numpy.ndarray) → numpy.ndarray¶ Compute indexbased sentencelevel BLEU score.
Computes sentence level BLEU on indices outputed by the decoder, i.e. whatever the decoder uses as a unit is used a token in the BLEU computation, ignoring the tokens may be subword units.

neuralmonkey.trainers.self_critical_objective.
sentence_gleu
(references: numpy.ndarray, hypotheses: numpy.ndarray) → numpy.ndarray¶ Compute indexbased GLEU score.
GLEU score is a sentencelevel metric used in Google’s Neural MT as a reward in reinforcement learning (https://arxiv.org/abs/1609.08144). It is a minimum of precision and recall on 1 to 4grams.
It operates over the indices emitted by the decoder which are not necessarily tokens (could be characters or subword units).