neuralmonkey.trainers.self_critical_objective module¶
Training objective for selfcritical learning.
Selfcritic learning is a modification of the REINFORCE algorithm that uses the reward of the traintime decoder output as a baseline in the update step.
For more details see: https://arxiv.org/pdf/1612.00563.pdf

neuralmonkey.trainers.self_critical_objective.
reinforce_score
(reward: tensorflow.python.framework.ops.Tensor, baseline: tensorflow.python.framework.ops.Tensor, decoded: tensorflow.python.framework.ops.Tensor, logits: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶ Cost function whose derivative is the REINFORCE equation.
This implements the primitive function to the central equation of the REINFORCE algorithm that estimates the gradients of the loss with respect to decoder logits.
It uses the fact that the second term of the product (the difference of the word distribution and one hot vector of the decoded word) is a derivative of negative log likelihood of the decoded word. The reward function and the baseline are however treated as a constant, so they influence the derivate only multiplicatively.

neuralmonkey.trainers.self_critical_objective.
self_critical_objective
(decoder: neuralmonkey.decoders.decoder.Decoder, reward_function: Callable[[numpy.ndarray, numpy.ndarray], numpy.ndarray], weight: float = None) → neuralmonkey.trainers.generic_trainer.Objective¶ Selfcritical objective.
Parameters:  decoder – A recurrent decoder.
 reward_function – A reward function computing score in Python.
 weight – Mixing weight for a trainer.
Returns: Objective object to be used in generic trainer.

neuralmonkey.trainers.self_critical_objective.
sentence_bleu
(references: numpy.ndarray, hypotheses: numpy.ndarray) → numpy.ndarray¶ Compute indexbased sentencelevel BLEU score.
Computes sentence level BLEU on indices outputed by the decoder, i.e. whatever the decoder uses as a unit is used a token in the BLEU computation, ignoring the tokens may be subword units.

neuralmonkey.trainers.self_critical_objective.
sentence_gleu
(references: numpy.ndarray, hypotheses: numpy.ndarray) → numpy.ndarray¶ Compute indexbased GLEU score.
GLEU score is a sentencelevel metric used in Google’s Neural MT as a reward in reinforcement learning (https://arxiv.org/abs/1609.08144). It is a minimum of precision and recall on 1 to 4grams.
It operates over the indices emitted by the decoder which are not necessarily tokens (could be characters or subword units).