neuralmonkey.attention.combination module

Attention combination strategies.

This modules implements attention combination strategies for multi-encoder scenario when we may want to combine the hidden states of the encoders in more complicated fashion.

Currently there are two attention combination strategies flat and hierarchical (see paper Attention Combination Strategies for Multi-Source Sequence-to-Sequence Learning).

The combination strategies may use the sentinel mechanism which allows the decoder not to attend to the, and extract information on its own hidden state (see paper Knowing when to Look: Adaptive Attention via a Visual Sentinel for Image Captioning).

class neuralmonkey.attention.combination.FlatMultiAttention(name: str, encoders: List[Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]], attention_state_size: int, share_attn_projections: bool = False, use_sentinels: bool = False, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Bases: neuralmonkey.attention.combination.MultiAttention

Flat attention combination strategy.

Using this attention combination strategy, hidden states of the encoders are first projected to the same space (different projection for different encoders) and then we compute a joint distribution over all the hidden states. The context vector is then a weighted sum of another / then projection of the encoders hidden states. The sentinel vector can be added as an additional hidden state.

See equations 8 to 10 in the Attention Combination Strategies paper.

__init__(name: str, encoders: List[Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]], attention_state_size: int, share_attn_projections: bool = False, use_sentinels: bool = False, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Create a new BaseAttention object.

attention(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.namedtuples.AttentionLoopState) → Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.namedtuples.AttentionLoopState]

Get context vector for given decoder state.

context_vector_size

Return the static size of the context vector.

Returns:An integer specifying the context vector dimension.
finalize_loop(key: str, last_loop_state: neuralmonkey.attention.namedtuples.AttentionLoopState) → None

Store the attention histories from loop state under a given key.

Parameters:
  • key – The key to the histories dictionary to store the data in.
  • last_loop_state – The loop state object from the last state of the decoding loop.
get_encoder_projections(scope)
initial_loop_state() → neuralmonkey.attention.namedtuples.AttentionLoopState

Get initial loop state for the attention object.

Returns:The newly created initial loop state object.
class neuralmonkey.attention.combination.HierarchicalMultiAttention(name: str, attentions: List[neuralmonkey.attention.base_attention.BaseAttention], attention_state_size: int, use_sentinels: bool, share_attn_projections: bool, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Bases: neuralmonkey.attention.combination.MultiAttention

Hierarchical attention combination.

Hierarchical attention combination strategy first computes the context vector for each encoder separately using whatever attention type the encoders have. After that it computes a second attention over the resulting context vectors and optionally the sentinel vector.

See equations 6 and 7 in the Attention Combination Strategies paper.

__init__(name: str, attentions: List[neuralmonkey.attention.base_attention.BaseAttention], attention_state_size: int, use_sentinels: bool, share_attn_projections: bool, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Create a new BaseAttention object.

attention(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.namedtuples.HierarchicalLoopState) → Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.namedtuples.HierarchicalLoopState]

Get context vector for given decoder state.

context_vector_size

Return the static size of the context vector.

Returns:An integer specifying the context vector dimension.
finalize_loop(key: str, last_loop_state: Any) → None

Store the attention histories from loop state under a given key.

Parameters:
  • key – The key to the histories dictionary to store the data in.
  • last_loop_state – The loop state object from the last state of the decoding loop.
initial_loop_state() → neuralmonkey.attention.namedtuples.HierarchicalLoopState

Get initial loop state for the attention object.

Returns:The newly created initial loop state object.
class neuralmonkey.attention.combination.MultiAttention(name: str, attention_state_size: int, share_attn_projections: bool = False, use_sentinels: bool = False, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Bases: neuralmonkey.attention.base_attention.BaseAttention

Base class for attention combination.

__init__(name: str, attention_state_size: int, share_attn_projections: bool = False, use_sentinels: bool = False, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Create a new BaseAttention object.

attention(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: Any) → Tuple[tensorflow.python.framework.ops.Tensor, Any]

Get context vector for given decoder state.

attn_size