neuralmonkey.attention package

Submodules

neuralmonkey.attention.base_attention module

Decoding functions using multiple attentions for RNN decoders.

See http://arxiv.org/abs/1606.07481

The attention mechanisms used in Neural Monkey are inherited from the BaseAttention class defined in this module.

Each attention object has the attention function which operates on the attention_states tensor. The attention function receives the query tensor, the decoder previous state and input, and its inner state, which could bear an arbitrary structure of information. The default structure for this is the AttentionLoopState, which contains a growing array of attention distributions and context vectors in time. That’s why there is the initial_loop_state function in the BaseAttention class.

Mainly for illustration purposes, the attention objects can keep their histories, which is a dictionary populated with attention distributions in time for every decoder, that used this attention object. This is because for example the recurrent decoder is can be run twice for each sentence - once in the training mode, in which the decoder gets the reference tokens on the input, and once in the running mode, in which it gets its own outputs. The histories object is constructed after the decoding and its construction should be triggered manually from the decoder by calling the finalize_loop method.

class neuralmonkey.attention.base_attention.AttentionLoopState(contexts, weights)

Bases: tuple

contexts

Alias for field number 0

weights

Alias for field number 1

class neuralmonkey.attention.base_attention.BaseAttention(name: str, save_checkpoint: str = None, load_checkpoint: str = None) → None

Bases: neuralmonkey.model.model_part.ModelPart

attention(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: typing.Any, step: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, typing.Any]

Get context vector for a given query.

context_vector_size
feed_dict(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → typing.Dict[tensorflow.python.framework.ops.Tensor, typing.Any]
finalize_loop(key: str, last_loop_state: typing.Any) → None
histories
initial_loop_state() → typing.Any

Get initial loop state for the attention object.

visualize_attention(key: str) → None
neuralmonkey.attention.base_attention.empty_attention_loop_state() → neuralmonkey.attention.base_attention.AttentionLoopState

Create an empty attention loop state.

The attention loop state is a technical object for storing the attention distributions and the context vectors in time. It is used with the tf.while_loop dynamic implementation of the decoder.

This function returns an empty attention loop state which means there are two empty arrays, one for attention distributions in time, and one for the attention context vectors in time.

neuralmonkey.attention.base_attention.get_attention_mask(encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]) → typing.Union[tensorflow.python.framework.ops.Tensor, NoneType]
neuralmonkey.attention.base_attention.get_attention_states(encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]) → tensorflow.python.framework.ops.Tensor

neuralmonkey.attention.combination module

Attention combination strategies.

This modules implements attention combination strategies for multi-encoder scenario when we may want to combine the hidden states of the encoders in more complicated fashion.

Currently there are two attention combination strategies flat and hierarchical (see paper Attention Combination Strategies for Multi-Source Sequence-to-Sequence Learning).

The combination strategies may use the sentinel mechanism which allows the decoder not to attend to the, and extract information on its own hidden state (see paper Knowing when to Look: Adaptive Attention via a Visual Sentinel for Image Captioning).

class neuralmonkey.attention.combination.FlatMultiAttention(name: str, encoders: typing.List[typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]], attention_state_size: int, share_attn_projections: bool = False, use_sentinels: bool = False, save_checkpoint: str = None, load_checkpoint: str = None) → None

Bases: neuralmonkey.attention.combination.MultiAttention

Flat attention combination strategy.

Using this attention combination strategy, hidden states of the encoders are first projected to the same space (different projection for different encoders) and then we compute a joint distribution over all the hidden states. The context vector is then a weighted sum of another / then projection of the encoders hidden states. The sentinel vector can be added as an additional hidden state.

See equations 8 to 10 in the Attention Combination Strategies paper.

attention(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.base_attention.AttentionLoopState, step: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.base_attention.AttentionLoopState]
context_vector_size
finalize_loop(key: str, last_loop_state: neuralmonkey.attention.base_attention.AttentionLoopState) → None
get_encoder_projections(scope)
initial_loop_state() → neuralmonkey.attention.base_attention.AttentionLoopState
class neuralmonkey.attention.combination.HierarchicalLoopState(child_loop_states, loop_state)

Bases: tuple

child_loop_states

Alias for field number 0

loop_state

Alias for field number 1

class neuralmonkey.attention.combination.HierarchicalMultiAttention(name: str, attentions: typing.List[neuralmonkey.attention.base_attention.BaseAttention], attention_state_size: int, use_sentinels: bool, share_attn_projections: bool, save_checkpoint: str = None, load_checkpoint: str = None) → None

Bases: neuralmonkey.attention.combination.MultiAttention

Hierarchical attention combination.

Hierarchical attention combination strategy first computes the context vector for each encoder separately using whatever attention type the encoders have. After that it computes a second attention over the resulting context vectors and optionally the sentinel vector.

See equations 6 and 7 in the Attention Combination Strategies paper.

attention(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.combination.HierarchicalLoopState, step: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.combination.HierarchicalLoopState]
context_vector_size
finalize_loop(key: str, last_loop_state: typing.Any) → None
initial_loop_state() → neuralmonkey.attention.combination.HierarchicalLoopState
class neuralmonkey.attention.combination.MultiAttention(name: str, attention_state_size: int, share_attn_projections: bool = False, use_sentinels: bool = False, save_checkpoint: str = None, load_checkpoint: str = None) → None

Bases: neuralmonkey.attention.base_attention.BaseAttention

Base class for attention combination.

attention(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: typing.Any, step: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, typing.Any]

Get context vector for given decoder state.

attn_size

neuralmonkey.attention.coverage module

Coverage attention introduced in Tu et al. (2016).

See arxiv.org/abs/1601.04811

The CoverageAttention class inherites from the basic feed-forward attention introduced by Bahdanau et al. (2015)

class neuralmonkey.attention.coverage.CoverageAttention(name: str, encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], dropout_keep_prob: float = 1.0, state_size: int = None, max_fertility: int = 5, save_checkpoint: str = None, load_checkpoint: str = None) → None

Bases: neuralmonkey.attention.feed_forward.Attention

get_energies(y: tensorflow.python.framework.ops.Tensor, weights_in_time: tensorflow.python.ops.tensor_array_ops.TensorArray)

neuralmonkey.attention.feed_forward module

The feed-forward attention mechanism.

This is the attention mechanism used in Bahdanau et al. (2015)

See arxiv.org/abs/1409.0473

class neuralmonkey.attention.feed_forward.Attention(name: str, encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], dropout_keep_prob: float = 1.0, state_size: int = None, save_checkpoint: str = None, load_checkpoint: str = None) → None

Bases: neuralmonkey.attention.base_attention.BaseAttention

attention(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.base_attention.AttentionLoopState, step: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.base_attention.AttentionLoopState]
attention_mask
attention_states
bias_term
context_vector_size
finalize_loop(key: str, last_loop_state: neuralmonkey.attention.base_attention.AttentionLoopState) → None
get_energies(y, _)
hidden_features
initial_loop_state() → neuralmonkey.attention.base_attention.AttentionLoopState
key_projection_matrix
projection_bias_vector
query_projection_matrix
similarity_bias_vector
state_size

neuralmonkey.attention.scaled_dot_product module

The scaled dot-product attention mechanism defined in Vaswani et al. (2017).

The attention energies are computed as dot products between the query vector and the key vector. The query vector is scaled down by the square root of its dimensionality. This attention function has no trainable parameters.

See arxiv.org/abs/1706.03762

class neuralmonkey.attention.scaled_dot_product.MultiHeadAttention(name: str, n_heads: int, keys_encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], values_encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, dropout_keep_prob: float = 1.0, save_checkpoint: str = None, load_checkpoint: str = None) → None

Bases: neuralmonkey.attention.base_attention.BaseAttention

attention(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.scaled_dot_product.MultiHeadLoopStateTA, step: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.scaled_dot_product.MultiHeadLoopStateTA]
attention_single_head(query: tensorflow.python.framework.ops.Tensor, keys: tensorflow.python.framework.ops.Tensor, values: tensorflow.python.framework.ops.Tensor) → typing.Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor]
context_vector_size
finalize_loop(key: str, last_loop_state: neuralmonkey.attention.scaled_dot_product.MultiHeadLoopStateTA) → None
initial_loop_state() → neuralmonkey.attention.scaled_dot_product.MultiHeadLoopStateTA
visualize_attention(key: str) → None
class neuralmonkey.attention.scaled_dot_product.MultiHeadLoopStateTA(contexts, head_weights)

Bases: tuple

contexts

Alias for field number 0

head_weights

Alias for field number 1

class neuralmonkey.attention.scaled_dot_product.ScaledDotProdAttention(name: str, keys_encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], values_encoder: typing.Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, dropout_keep_prob: float = 1.0, save_checkpoint: str = None, load_checkpoint: str = None) → None

Bases: neuralmonkey.attention.scaled_dot_product.MultiHeadAttention

Module contents