neuralmonkey.attention.transformer_cross_layer module¶
Input combination strategies for multi-source Transformer decoder.
-
neuralmonkey.attention.transformer_cross_layer.
flat
(queries: tensorflow.python.framework.ops.Tensor, encoder_states: List[tensorflow.python.framework.ops.Tensor], encoder_masks: List[tensorflow.python.framework.ops.Tensor], heads: int, attention_dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]) → tensorflow.python.framework.ops.Tensor¶ Run attention with flat input combination.
The procedure is as follows: 1. concatenate the states and mask along the time axis 2. run attention over the concatenation
Parameters: - queries – The input for the attention.
- encoder_states – The states of each encoder.
- encoder_masks – The temporal mask of each encoder.
- heads – Number of attention heads to use for each encoder.
- attention_dropout_callbacks – Dropout functions to apply in attention over each encoder.
- dropout_callback – The dropout function to apply on the output of the attention.
Returns: A Tensor that contains the context vector.
-
neuralmonkey.attention.transformer_cross_layer.
hierarchical
(queries: tensorflow.python.framework.ops.Tensor, encoder_states: List[tensorflow.python.framework.ops.Tensor], encoder_masks: List[tensorflow.python.framework.ops.Tensor], heads: List[int], heads_hier: int, attention_dropout_callbacks: List[Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]], dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]) → tensorflow.python.framework.ops.Tensor¶ Run attention with hierarchical input combination.
The procedure is as follows: 1. normalize queries 2. attend to every encoder 3. attend to the resulting context vectors (reuse normalized queries) 4. apply dropout, add residual connection and return
Parameters: - queries – The input for the attention.
- encoder_states – The states of each encoder.
- encoder_masks – The temporal mask of each encoder.
- heads – Number of attention heads to use for each encoder.
- heads_hier – Number of attention heads to use in the second attention.
- attention_dropout_callbacks – Dropout functions to apply in attention over each encoder.
- dropout_callback – The dropout function to apply in the second attention and over the outputs of each sub-attention.
Returns: A Tensor that contains the context vector.
-
neuralmonkey.attention.transformer_cross_layer.
parallel
(queries: tensorflow.python.framework.ops.Tensor, encoder_states: List[tensorflow.python.framework.ops.Tensor], encoder_masks: List[tensorflow.python.framework.ops.Tensor], heads: List[int], attention_dropout_callbacks: List[Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]], dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]) → tensorflow.python.framework.ops.Tensor¶ Run attention with parallel input combination.
The procedure is as follows: 1. normalize queries, 2. attend and dropout independently for every encoder, 3. sum up the results 4. add residual and return
Parameters: - queries – The input for the attention.
- encoder_states – The states of each encoder.
- encoder_masks – The temporal mask of each encoder.
- heads – Number of attention heads to use for each encoder.
- attention_dropout_callbacks – Dropout functions to apply in attention over each encoder.
- dropout_callback – The dropout function to apply on the outputs of each sub-attention.
Returns: A Tensor that contains the context vector.
-
neuralmonkey.attention.transformer_cross_layer.
serial
(queries: tensorflow.python.framework.ops.Tensor, encoder_states: List[tensorflow.python.framework.ops.Tensor], encoder_masks: List[tensorflow.python.framework.ops.Tensor], heads: List[int], attention_dropout_callbacks: List[Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]], dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]) → tensorflow.python.framework.ops.Tensor¶ Run attention with serial input combination.
The procedure is as follows: 1. repeat for every encoder:
- lnorm + attend + dropout + add residual
- update queries between layers
Parameters: - queries – The input for the attention.
- encoder_states – The states of each encoder.
- encoder_masks – The temporal mask of each encoder.
- heads – Number of attention heads to use for each encoder.
- attention_dropout_callbacks – Dropout functions to apply in attention over each encoder.
- dropout_callback – The dropout function to apply on the outputs of each sub-attention.
Returns: A Tensor that contains the context vector.
-
neuralmonkey.attention.transformer_cross_layer.
single
(queries: tensorflow.python.framework.ops.Tensor, states: tensorflow.python.framework.ops.Tensor, mask: tensorflow.python.framework.ops.Tensor, n_heads: int, attention_dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], normalize: bool = True, use_dropout: bool = True, residual: bool = True, use_att_transform_bias: bool = False)¶ Run attention on a single encoder.
Parameters: - queries – The input for the attention.
- states – The encoder states (keys & values).
- mask – The temporal mask of the encoder.
- n_heads – Number of attention heads to use.
- attention_dropout_callback – Dropout function to apply in attention.
- dropout_callback – Dropout function to apply on the attention output.
- normalize – If True, run layer normalization on the queries.
- use_dropout – If True, perform dropout on the attention output.
- residual – If True, sum the context vector with the input queries.
- use_att_transform_bias – If True, enable bias in the attention head projections (for all queries, keys and values).
Returns: A Tensor that contains the context vector.