neuralmonkey.attention.combination module¶
Attention combination strategies.
This modules implements attention combination strategies for multi-encoder scenario when we may want to combine the hidden states of the encoders in more complicated fashion.
Currently there are two attention combination strategies flat and hierarchical (see paper Attention Combination Strategies for Multi-Source Sequence-to-Sequence Learning).
The combination strategies may use the sentinel mechanism which allows the decoder not to attend to the, and extract information on its own hidden state (see paper Knowing when to Look: Adaptive Attention via a Visual Sentinel for Image Captioning).
-
class
neuralmonkey.attention.combination.
FlatMultiAttention
(name: str, encoders: List[Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]], attention_state_size: int, share_attn_projections: bool = False, use_sentinels: bool = False, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.attention.combination.MultiAttention
Flat attention combination strategy.
Using this attention combination strategy, hidden states of the encoders are first projected to the same space (different projection for different encoders) and then we compute a joint distribution over all the hidden states. The context vector is then a weighted sum of another / then projection of the encoders hidden states. The sentinel vector can be added as an additional hidden state.
See equations 8 to 10 in the Attention Combination Strategies paper.
-
__init__
(name: str, encoders: List[Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]], attention_state_size: int, share_attn_projections: bool = False, use_sentinels: bool = False, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create a new
BaseAttention
object.
-
attention
(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.namedtuples.AttentionLoopState) → Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.namedtuples.AttentionLoopState]¶ Get context vector for given decoder state.
-
context_vector_size
¶ Return the static size of the context vector.
Returns: An integer specifying the context vector dimension.
-
finalize_loop
(key: str, last_loop_state: neuralmonkey.attention.namedtuples.AttentionLoopState) → None¶ Store the attention histories from loop state under a given key.
Parameters: - key – The key to the histories dictionary to store the data in.
- last_loop_state – The loop state object from the last state of the decoding loop.
-
get_encoder_projections
(scope)¶
-
initial_loop_state
() → neuralmonkey.attention.namedtuples.AttentionLoopState¶ Get initial loop state for the attention object.
Returns: The newly created initial loop state object.
-
-
class
neuralmonkey.attention.combination.
HierarchicalMultiAttention
(name: str, attentions: List[neuralmonkey.attention.base_attention.BaseAttention], attention_state_size: int, use_sentinels: bool, share_attn_projections: bool, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.attention.combination.MultiAttention
Hierarchical attention combination.
Hierarchical attention combination strategy first computes the context vector for each encoder separately using whatever attention type the encoders have. After that it computes a second attention over the resulting context vectors and optionally the sentinel vector.
See equations 6 and 7 in the Attention Combination Strategies paper.
-
__init__
(name: str, attentions: List[neuralmonkey.attention.base_attention.BaseAttention], attention_state_size: int, use_sentinels: bool, share_attn_projections: bool, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create a new
BaseAttention
object.
-
attention
(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.namedtuples.HierarchicalLoopState) → Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.namedtuples.HierarchicalLoopState]¶ Get context vector for given decoder state.
-
context_vector_size
¶ Return the static size of the context vector.
Returns: An integer specifying the context vector dimension.
-
finalize_loop
(key: str, last_loop_state: Any) → None¶ Store the attention histories from loop state under a given key.
Parameters: - key – The key to the histories dictionary to store the data in.
- last_loop_state – The loop state object from the last state of the decoding loop.
-
initial_loop_state
() → neuralmonkey.attention.namedtuples.HierarchicalLoopState¶ Get initial loop state for the attention object.
Returns: The newly created initial loop state object.
-
-
class
neuralmonkey.attention.combination.
MultiAttention
(name: str, attention_state_size: int, share_attn_projections: bool = False, use_sentinels: bool = False, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.attention.base_attention.BaseAttention
Base class for attention combination.
-
__init__
(name: str, attention_state_size: int, share_attn_projections: bool = False, use_sentinels: bool = False, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create a new
BaseAttention
object.
-
attention
(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: Any) → Tuple[tensorflow.python.framework.ops.Tensor, Any]¶ Get context vector for given decoder state.
-
attn_size
¶
-