Decoding functions using multiple attentions for RNN decoders.
The attention mechanisms used in Neural Monkey are inherited from the
BaseAttention class defined in this module.
The attention function can be viewed as a soft lookup over an associative memory. The query vector is used to compute a similarity score of the keys of the associative memory and the resulting scores are used as weights in a weighted sum of the values associated with the keys. We call the (unnormalized) similarity scores energies, we call attention distribution the energies after (softmax) normalization, and we call the resulting weighted sum of states a context vector.
Note that it is possible (and true in most cases) that the attention keys are equal to the values. In case of self-attention, even queries are from the same set of vectors.
To abstract over different flavors of attention mechanism, we conceptualize the
procedure as follows: Each attention object has the
which operates on the query tensor. The attention function receives the query
tensor (the decoder state) and optionally the previous state of the decoder,
and computes the context vector. The function also receives a loop state,
which is used to store data in an autoregressive loop that generates a
The attention uses the loop state to store to store attention distributions
and context vectors in time. This structure is called
To be able to initialize the loop state, each attention object that uses this
feature defines the
initial_loop_state function with empty tensors.
Since there can be many modes in which the decoder that uses the attention
operates, the attention objects have the
finalize_loop method, which takes
the last attention loop state and the name of the mode (a string) and processes
this data to be available in the
histories dictionary. The single and most
used example of two modes are the train and runtime modes of the
BaseAttention(name: str, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶
The abstract class for the attenion mechanism flavors.
__init__(name: str, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶
Create a new
attention(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: Any) → Tuple[tensorflow.python.framework.ops.Tensor, Any]¶
Get context vector for a given query.
Return the static size of the context vector.
Returns: An integer specifying the context vector dimension.
finalize_loop(key: str, last_loop_state: Any) → None¶
Store the attention histories from loop state under a given key.
- key – The key to the histories dictionary to store the data in.
- last_loop_state – The loop state object from the last state of the decoding loop.
Return the attention histories dictionary.
Use this property after it has been populated.
Returns: The attention histories dictionary.
initial_loop_state() → Any¶
Get initial loop state for the attention object.
Returns: The newly created initial loop state object.
visualize_attention(key: str, max_outputs: int = 16) → None¶
Include the attention histories under a given key into a summary.
- key – The key to the attention histories dictionary.
- max_outputs – Maximum number of images to save.
empty_attention_loop_state(batch_size: Union[int, tensorflow.python.framework.ops.Tensor], length: Union[int, tensorflow.python.framework.ops.Tensor], dimension: Union[int, tensorflow.python.framework.ops.Tensor]) → neuralmonkey.attention.namedtuples.AttentionLoopState¶
Create an empty attention loop state.
The attention loop state is a technical object for storing the attention distributions and the context vectors in time. It is used with the
tf.while_loopdynamic implementation of decoders.
- batch_size – The size of the batch.
- length – The number of encoder states (keys).
- dimension – The dimension of the context vector
This function returns an empty attention loop state which means there are two empty Tensors one for attention distributions in time, and one for the attention context vectors in time.
get_attention_mask(encoder: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]) → Union[tensorflow.python.framework.ops.Tensor, NoneType]¶
Return the temporal or spatial mask of an encoder.
Parameters: encoder – The encoder to get the mask from. Returns: Either a 2D or a 3D tensor, depending on whether the encoder is temporal (e.g. recurrent encoder) or spatial (e.g. a CNN encoder).
get_attention_states(encoder: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]) → tensorflow.python.framework.ops.Tensor¶
Return the temporal or spatial states of an encoder.
Parameters: encoder – The encoder with the states to attend. Returns: Either a 3D or a 4D tensor, depending on whether the encoder is temporal (e.g. recurrent encoder) or spatial (e.g. a CNN encoder). The first two dimensions are (batch, time).