neuralmonkey.attention package¶
Submodules¶
neuralmonkey.attention.base_attention module¶
Decoding functions using multiple attentions for RNN decoders.
See http://arxiv.org/abs/1606.07481
The attention mechanisms used in Neural Monkey are inherited from the
BaseAttention
class defined in this module.
Each attention object has the attention
function which operates on the
attention_states
tensor. The attention function receives the query tensor,
the decoder previous state and input, and its inner state, which could bear an
arbitrary structure of information. The default structure for this is the
AttentionLoopState
, which contains a growing array of attention
distributions and context vectors in time. That’s why there is the
initial_loop_state
function in the BaseAttention
class.
Mainly for illustration purposes, the attention objects can keep their
histories, which is a dictionary populated with attention distributions in
time for every decoder, that used this attention object. This is because for
example the recurrent decoder is can be run twice for each sentence - once in
the training mode, in which the decoder gets the reference tokens on the
input, and once in the running mode, in which it gets its own outputs. The
histories object is constructed after the decoding and its construction
should be triggered manually from the decoder by calling the finalize_loop
method.
-
class
neuralmonkey.attention.base_attention.
AttentionLoopState
(contexts, weights)¶ Bases:
tuple
-
contexts
¶ Alias for field number 0
-
weights
¶ Alias for field number 1
-
-
class
neuralmonkey.attention.base_attention.
BaseAttention
(name: str, save_checkpoint: str = None, load_checkpoint: str = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
-
attention
(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: typing.Any, step: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, typing.Any]¶ Get context vector for a given query.
-
context_vector_size
¶
-
feed_dict
(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → typing.Dict[tensorflow.python.framework.ops.Tensor, typing.Any]¶
-
finalize_loop
(key: str, last_loop_state: typing.Any) → None¶
-
histories
¶
-
initial_loop_state
() → typing.Any¶ Get initial loop state for the attention object.
-
visualize_attention
(key: str) → None¶
-
-
neuralmonkey.attention.base_attention.
empty_attention_loop_state
() → neuralmonkey.attention.base_attention.AttentionLoopState¶ Create an empty attention loop state.
The attention loop state is a technical object for storing the attention distributions and the context vectors in time. It is used with the
tf.while_loop
dynamic implementation of the decoder.This function returns an empty attention loop state which means there are two empty arrays, one for attention distributions in time, and one for the attention context vectors in time.
-
neuralmonkey.attention.base_attention.
get_attention_mask
(encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]) → typing.Union[tensorflow.python.framework.ops.Tensor, NoneType]¶
-
neuralmonkey.attention.base_attention.
get_attention_states
(encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]) → tensorflow.python.framework.ops.Tensor¶
neuralmonkey.attention.combination module¶
Attention combination strategies.
This modules implements attention combination strategies for multi-encoder scenario when we may want to combine the hidden states of the encoders in more complicated fashion.
Currently there are two attention combination strategies flat and hierarchical (see paper Attention Combination Strategies for Multi-Source Sequence-to-Sequence Learning).
The combination strategies may use the sentinel mechanism which allows the decoder not to attend to the, and extract information on its own hidden state (see paper Knowing when to Look: Adaptive Attention via a Visual Sentinel for Image Captioning).
-
class
neuralmonkey.attention.combination.
FlatMultiAttention
(name: str, encoders: typing.List[typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]], attention_state_size: int, share_attn_projections: bool = False, use_sentinels: bool = False, save_checkpoint: str = None, load_checkpoint: str = None) → None¶ Bases:
neuralmonkey.attention.combination.MultiAttention
Flat attention combination strategy.
Using this attention combination strategy, hidden states of the encoders are first projected to the same space (different projection for different encoders) and then we compute a joint distribution over all the hidden states. The context vector is then a weighted sum of another / then projection of the encoders hidden states. The sentinel vector can be added as an additional hidden state.
See equations 8 to 10 in the Attention Combination Strategies paper.
-
attention
(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.base_attention.AttentionLoopState, step: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.base_attention.AttentionLoopState]¶
-
context_vector_size
¶
-
finalize_loop
(key: str, last_loop_state: neuralmonkey.attention.base_attention.AttentionLoopState) → None¶
-
get_encoder_projections
(scope)¶
-
initial_loop_state
() → neuralmonkey.attention.base_attention.AttentionLoopState¶
-
-
class
neuralmonkey.attention.combination.
HierarchicalLoopState
(child_loop_states, loop_state)¶ Bases:
tuple
-
child_loop_states
¶ Alias for field number 0
-
loop_state
¶ Alias for field number 1
-
-
class
neuralmonkey.attention.combination.
HierarchicalMultiAttention
(name: str, attentions: typing.List[neuralmonkey.attention.base_attention.BaseAttention], attention_state_size: int, use_sentinels: bool, share_attn_projections: bool, save_checkpoint: str = None, load_checkpoint: str = None) → None¶ Bases:
neuralmonkey.attention.combination.MultiAttention
Hierarchical attention combination.
Hierarchical attention combination strategy first computes the context vector for each encoder separately using whatever attention type the encoders have. After that it computes a second attention over the resulting context vectors and optionally the sentinel vector.
See equations 6 and 7 in the Attention Combination Strategies paper.
-
attention
(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.combination.HierarchicalLoopState, step: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.combination.HierarchicalLoopState]¶
-
context_vector_size
¶
-
finalize_loop
(key: str, last_loop_state: typing.Any) → None¶
-
initial_loop_state
() → neuralmonkey.attention.combination.HierarchicalLoopState¶
-
-
class
neuralmonkey.attention.combination.
MultiAttention
(name: str, attention_state_size: int, share_attn_projections: bool = False, use_sentinels: bool = False, save_checkpoint: str = None, load_checkpoint: str = None) → None¶ Bases:
neuralmonkey.attention.base_attention.BaseAttention
Base class for attention combination.
-
attention
(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: typing.Any, step: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, typing.Any]¶ Get context vector for given decoder state.
-
attn_size
¶
-
neuralmonkey.attention.coverage module¶
Coverage attention introduced in Tu et al. (2016).
See arxiv.org/abs/1601.04811
The CoverageAttention class inherites from the basic feed-forward attention introduced by Bahdanau et al. (2015)
-
class
neuralmonkey.attention.coverage.
CoverageAttention
(name: str, encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], dropout_keep_prob: float = 1.0, state_size: int = None, max_fertility: int = 5, save_checkpoint: str = None, load_checkpoint: str = None) → None¶ Bases:
neuralmonkey.attention.feed_forward.Attention
-
get_energies
(y: tensorflow.python.framework.ops.Tensor, weights_in_time: tensorflow.python.ops.tensor_array_ops.TensorArray)¶
-
neuralmonkey.attention.feed_forward module¶
The feed-forward attention mechanism.
This is the attention mechanism used in Bahdanau et al. (2015)
See arxiv.org/abs/1409.0473
-
class
neuralmonkey.attention.feed_forward.
Attention
(name: str, encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], dropout_keep_prob: float = 1.0, state_size: int = None, save_checkpoint: str = None, load_checkpoint: str = None) → None¶ Bases:
neuralmonkey.attention.base_attention.BaseAttention
-
attention
(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.base_attention.AttentionLoopState, step: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.base_attention.AttentionLoopState]¶
-
attention_mask
¶
-
attention_states
¶
-
bias_term
¶
-
context_vector_size
¶
-
finalize_loop
(key: str, last_loop_state: neuralmonkey.attention.base_attention.AttentionLoopState) → None¶
-
get_energies
(y, _)¶
-
initial_loop_state
() → neuralmonkey.attention.base_attention.AttentionLoopState¶
-
key_projection_matrix
¶
-
projection_bias_vector
¶
-
query_projection_matrix
¶
-
similarity_bias_vector
¶
-
state_size
¶
-
neuralmonkey.attention.scaled_dot_product module¶
The scaled dot-product attention mechanism defined in Vaswani et al. (2017).
The attention energies are computed as dot products between the query vector and the key vector. The query vector is scaled down by the square root of its dimensionality. This attention function has no trainable parameters.
See arxiv.org/abs/1706.03762
-
class
neuralmonkey.attention.scaled_dot_product.
MultiHeadAttention
(name: str, n_heads: int, keys_encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], values_encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, dropout_keep_prob: float = 1.0, save_checkpoint: str = None, load_checkpoint: str = None) → None¶ Bases:
neuralmonkey.attention.base_attention.BaseAttention
-
attention
(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.scaled_dot_product.MultiHeadLoopStateTA, step: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.scaled_dot_product.MultiHeadLoopStateTA]¶
-
attention_single_head
(query: tensorflow.python.framework.ops.Tensor, keys: tensorflow.python.framework.ops.Tensor, values: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor]¶
-
context_vector_size
¶
-
finalize_loop
(key: str, last_loop_state: neuralmonkey.attention.scaled_dot_product.MultiHeadLoopStateTA) → None¶
-
initial_loop_state
() → neuralmonkey.attention.scaled_dot_product.MultiHeadLoopStateTA¶
-
visualize_attention
(key: str) → None¶
-
-
class
neuralmonkey.attention.scaled_dot_product.
MultiHeadLoopStateTA
(contexts, head_weights)¶ Bases:
tuple
-
contexts
¶ Alias for field number 0
-
head_weights
¶ Alias for field number 1
-
-
class
neuralmonkey.attention.scaled_dot_product.
ScaledDotProdAttention
(name: str, keys_encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], values_encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, dropout_keep_prob: float = 1.0, save_checkpoint: str = None, load_checkpoint: str = None) → None¶ Bases:
neuralmonkey.attention.scaled_dot_product.MultiHeadAttention