neuralmonkey.attention package¶

Submodules¶

neuralmonkey.attention.base_attention module¶

Decoding functions using multiple attentions for RNN decoders.

The attention mechanisms used in Neural Monkey are inherited from the BaseAttention class defined in this module.

Each attention object has the attention function which operates on the attention_states tensor. The attention function receives the query tensor, the decoder previous state and input, and its inner state, which could bear an arbitrary structure of information. The default structure for this is the AttentionLoopState, which contains a growing array of attention distributions and context vectors in time. That’s why there is the initial_loop_state function in the BaseAttention class.

Mainly for illustration purposes, the attention objects can keep their histories, which is a dictionary populated with attention distributions in time for every decoder, that used this attention object. This is because for example the recurrent decoder is can be run twice for each sentence - once in the training mode, in which the decoder gets the reference tokens on the input, and once in the running mode, in which it gets its own outputs. The histories object is constructed after the decoding and its construction should be triggered manually from the decoder by calling the finalize_loop method.

class neuralmonkey.attention.base_attention.AttentionLoopState(contexts, weights)¶

Bases: tuple

contexts¶: Alias for field number 0

weights¶: Alias for field number 1

class neuralmonkey.attention.base_attention.BaseAttention(name: str, save_checkpoint: str = None, load_checkpoint: str = None) → None¶

Bases: neuralmonkey.model.model_part.ModelPart

attention(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: typing.Any, step: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, typing.Any]¶: Get context vector for a given query.

context_vector_size¶

feed_dict(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → typing.Dict[tensorflow.python.framework.ops.Tensor, typing.Any]¶

finalize_loop(key: str, last_loop_state: typing.Any) → None¶

histories¶

initial_loop_state() → typing.Any¶: Get initial loop state for the attention object.

visualize_attention(key: str) → None¶

neuralmonkey.attention.base_attention.empty_attention_loop_state() → neuralmonkey.attention.base_attention.AttentionLoopState¶

Create an empty attention loop state.

The attention loop state is a technical object for storing the attention distributions and the context vectors in time. It is used with the tf.while_loop dynamic implementation of the decoder.

This function returns an empty attention loop state which means there are two empty arrays, one for attention distributions in time, and one for the attention context vectors in time.

neuralmonkey.attention.base_attention.get_attention_mask(encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]) → typing.Union[tensorflow.python.framework.ops.Tensor, NoneType]¶

neuralmonkey.attention.base_attention.get_attention_states(encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]) → tensorflow.python.framework.ops.Tensor¶

neuralmonkey.attention.combination module¶

Attention combination strategies.

This modules implements attention combination strategies for multi-encoder scenario when we may want to combine the hidden states of the encoders in more complicated fashion.

Currently there are two attention combination strategies flat and hierarchical (see paper Attention Combination Strategies for Multi-Source Sequence-to-Sequence Learning).

The combination strategies may use the sentinel mechanism which allows the decoder not to attend to the, and extract information on its own hidden state (see paper Knowing when to Look: Adaptive Attention via a Visual Sentinel for Image Captioning).

class neuralmonkey.attention.combination.FlatMultiAttention(name: str, encoders: typing.List[typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]], attention_state_size: int, share_attn_projections: bool = False, use_sentinels: bool = False, save_checkpoint: str = None, load_checkpoint: str = None) → None¶

Bases: neuralmonkey.attention.combination.MultiAttention

Flat attention combination strategy.

Using this attention combination strategy, hidden states of the encoders are first projected to the same space (different projection for different encoders) and then we compute a joint distribution over all the hidden states. The context vector is then a weighted sum of another / then projection of the encoders hidden states. The sentinel vector can be added as an additional hidden state.

See equations 8 to 10 in the Attention Combination Strategies paper.

attention(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.base_attention.AttentionLoopState, step: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.base_attention.AttentionLoopState]¶

context_vector_size¶

finalize_loop(key: str, last_loop_state: neuralmonkey.attention.base_attention.AttentionLoopState) → None¶

get_encoder_projections(scope)¶

initial_loop_state() → neuralmonkey.attention.base_attention.AttentionLoopState¶

class neuralmonkey.attention.combination.HierarchicalLoopState(child_loop_states, loop_state)¶

Bases: tuple

child_loop_states¶: Alias for field number 0

loop_state¶: Alias for field number 1

class neuralmonkey.attention.combination.HierarchicalMultiAttention(name: str, attentions: typing.List[neuralmonkey.attention.base_attention.BaseAttention], attention_state_size: int, use_sentinels: bool, share_attn_projections: bool, save_checkpoint: str = None, load_checkpoint: str = None) → None¶

Bases: neuralmonkey.attention.combination.MultiAttention

Hierarchical attention combination.

Hierarchical attention combination strategy first computes the context vector for each encoder separately using whatever attention type the encoders have. After that it computes a second attention over the resulting context vectors and optionally the sentinel vector.

See equations 6 and 7 in the Attention Combination Strategies paper.

attention(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.combination.HierarchicalLoopState, step: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.combination.HierarchicalLoopState]¶

context_vector_size¶

finalize_loop(key: str, last_loop_state: typing.Any) → None¶

initial_loop_state() → neuralmonkey.attention.combination.HierarchicalLoopState¶

class neuralmonkey.attention.combination.MultiAttention(name: str, attention_state_size: int, share_attn_projections: bool = False, use_sentinels: bool = False, save_checkpoint: str = None, load_checkpoint: str = None) → None¶

Bases: neuralmonkey.attention.base_attention.BaseAttention

Base class for attention combination.

attention(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: typing.Any, step: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, typing.Any]¶: Get context vector for given decoder state.

attn_size¶

neuralmonkey.attention.coverage module¶

Coverage attention introduced in Tu et al. (2016).

See arxiv.org/abs/1601.04811

The CoverageAttention class inherites from the basic feed-forward attention introduced by Bahdanau et al. (2015)

class neuralmonkey.attention.coverage.CoverageAttention(name: str, encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], dropout_keep_prob: float = 1.0, state_size: int = None, max_fertility: int = 5, save_checkpoint: str = None, load_checkpoint: str = None) → None¶

Bases: neuralmonkey.attention.feed_forward.Attention

get_energies(y: tensorflow.python.framework.ops.Tensor, weights_in_time: tensorflow.python.ops.tensor_array_ops.TensorArray)¶

neuralmonkey.attention.feed_forward module¶

The feed-forward attention mechanism.

This is the attention mechanism used in Bahdanau et al. (2015)

See arxiv.org/abs/1409.0473

class neuralmonkey.attention.feed_forward.Attention(name: str, encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], dropout_keep_prob: float = 1.0, state_size: int = None, save_checkpoint: str = None, load_checkpoint: str = None) → None¶

Bases: neuralmonkey.attention.base_attention.BaseAttention

attention(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.base_attention.AttentionLoopState, step: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.base_attention.AttentionLoopState]¶

attention_mask¶

attention_states¶

bias_term¶

context_vector_size¶

finalize_loop(key: str, last_loop_state: neuralmonkey.attention.base_attention.AttentionLoopState) → None¶

get_energies(y, _)¶

hidden_features¶

initial_loop_state() → neuralmonkey.attention.base_attention.AttentionLoopState¶

key_projection_matrix¶

projection_bias_vector¶

query_projection_matrix¶

similarity_bias_vector¶

state_size¶

neuralmonkey.attention.scaled_dot_product module¶

The scaled dot-product attention mechanism defined in Vaswani et al. (2017).

The attention energies are computed as dot products between the query vector and the key vector. The query vector is scaled down by the square root of its dimensionality. This attention function has no trainable parameters.

See arxiv.org/abs/1706.03762

class neuralmonkey.attention.scaled_dot_product.MultiHeadAttention(name: str, n_heads: int, keys_encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], values_encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, dropout_keep_prob: float = 1.0, save_checkpoint: str = None, load_checkpoint: str = None) → None¶

Bases: neuralmonkey.attention.base_attention.BaseAttention

attention(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.scaled_dot_product.MultiHeadLoopStateTA, step: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.scaled_dot_product.MultiHeadLoopStateTA]¶

attention_single_head(query: tensorflow.python.framework.ops.Tensor, keys: tensorflow.python.framework.ops.Tensor, values: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor]¶

context_vector_size¶

finalize_loop(key: str, last_loop_state: neuralmonkey.attention.scaled_dot_product.MultiHeadLoopStateTA) → None¶

initial_loop_state() → neuralmonkey.attention.scaled_dot_product.MultiHeadLoopStateTA¶

visualize_attention(key: str) → None¶

class neuralmonkey.attention.scaled_dot_product.MultiHeadLoopStateTA(contexts, head_weights)¶

Bases: tuple

contexts¶: Alias for field number 0

head_weights¶: Alias for field number 1

class neuralmonkey.attention.scaled_dot_product.ScaledDotProdAttention(name: str, keys_encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], values_encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, dropout_keep_prob: float = 1.0, save_checkpoint: str = None, load_checkpoint: str = None) → None¶: Bases: neuralmonkey.attention.scaled_dot_product.MultiHeadAttention

neuralmonkey.attention package¶

Submodules¶

neuralmonkey.attention.base_attention module¶

neuralmonkey.attention.combination module¶

neuralmonkey.attention.coverage module¶

neuralmonkey.attention.feed_forward module¶

neuralmonkey.attention.scaled_dot_product module¶

Module contents¶