neuralmonkey.attention.transformer_cross_layer module

Input combination strategies for multi-source Transformer decoder.

neuralmonkey.attention.transformer_cross_layer.flat(queries: tensorflow.python.framework.ops.Tensor, encoder_states: List[tensorflow.python.framework.ops.Tensor], encoder_masks: List[tensorflow.python.framework.ops.Tensor], heads: int, attention_dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]) → tensorflow.python.framework.ops.Tensor

Run attention with flat input combination.

The procedure is as follows: 1. concatenate the states and mask along the time axis 2. run attention over the concatenation

Parameters:
  • queries – The input for the attention.
  • encoder_states – The states of each encoder.
  • encoder_masks – The temporal mask of each encoder.
  • heads – Number of attention heads to use for each encoder.
  • attention_dropout_callbacks – Dropout functions to apply in attention over each encoder.
  • dropout_callback – The dropout function to apply on the output of the attention.
Returns:

A Tensor that contains the context vector.

neuralmonkey.attention.transformer_cross_layer.hierarchical(queries: tensorflow.python.framework.ops.Tensor, encoder_states: List[tensorflow.python.framework.ops.Tensor], encoder_masks: List[tensorflow.python.framework.ops.Tensor], heads: List[int], heads_hier: int, attention_dropout_callbacks: List[Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]], dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]) → tensorflow.python.framework.ops.Tensor

Run attention with hierarchical input combination.

The procedure is as follows: 1. normalize queries 2. attend to every encoder 3. attend to the resulting context vectors (reuse normalized queries) 4. apply dropout, add residual connection and return

Parameters:
  • queries – The input for the attention.
  • encoder_states – The states of each encoder.
  • encoder_masks – The temporal mask of each encoder.
  • heads – Number of attention heads to use for each encoder.
  • heads_hier – Number of attention heads to use in the second attention.
  • attention_dropout_callbacks – Dropout functions to apply in attention over each encoder.
  • dropout_callback – The dropout function to apply in the second attention and over the outputs of each sub-attention.
Returns:

A Tensor that contains the context vector.

neuralmonkey.attention.transformer_cross_layer.parallel(queries: tensorflow.python.framework.ops.Tensor, encoder_states: List[tensorflow.python.framework.ops.Tensor], encoder_masks: List[tensorflow.python.framework.ops.Tensor], heads: List[int], attention_dropout_callbacks: List[Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]], dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]) → tensorflow.python.framework.ops.Tensor

Run attention with parallel input combination.

The procedure is as follows: 1. normalize queries, 2. attend and dropout independently for every encoder, 3. sum up the results 4. add residual and return

Parameters:
  • queries – The input for the attention.
  • encoder_states – The states of each encoder.
  • encoder_masks – The temporal mask of each encoder.
  • heads – Number of attention heads to use for each encoder.
  • attention_dropout_callbacks – Dropout functions to apply in attention over each encoder.
  • dropout_callback – The dropout function to apply on the outputs of each sub-attention.
Returns:

A Tensor that contains the context vector.

neuralmonkey.attention.transformer_cross_layer.serial(queries: tensorflow.python.framework.ops.Tensor, encoder_states: List[tensorflow.python.framework.ops.Tensor], encoder_masks: List[tensorflow.python.framework.ops.Tensor], heads: List[int], attention_dropout_callbacks: List[Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]], dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]) → tensorflow.python.framework.ops.Tensor

Run attention with serial input combination.

The procedure is as follows: 1. repeat for every encoder:

  • lnorm + attend + dropout + add residual
  1. update queries between layers
Parameters:
  • queries – The input for the attention.
  • encoder_states – The states of each encoder.
  • encoder_masks – The temporal mask of each encoder.
  • heads – Number of attention heads to use for each encoder.
  • attention_dropout_callbacks – Dropout functions to apply in attention over each encoder.
  • dropout_callback – The dropout function to apply on the outputs of each sub-attention.
Returns:

A Tensor that contains the context vector.

neuralmonkey.attention.transformer_cross_layer.single(queries: tensorflow.python.framework.ops.Tensor, states: tensorflow.python.framework.ops.Tensor, mask: tensorflow.python.framework.ops.Tensor, n_heads: int, attention_dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], normalize: bool = True, use_dropout: bool = True, residual: bool = True, use_att_transform_bias: bool = False)

Run attention on a single encoder.

Parameters:
  • queries – The input for the attention.
  • states – The encoder states (keys & values).
  • mask – The temporal mask of the encoder.
  • n_heads – Number of attention heads to use.
  • attention_dropout_callback – Dropout function to apply in attention.
  • dropout_callback – Dropout function to apply on the attention output.
  • normalize – If True, run layer normalization on the queries.
  • use_dropout – If True, perform dropout on the attention output.
  • residual – If True, sum the context vector with the input queries.
  • use_att_transform_bias – If True, enable bias in the attention head projections (for all queries, keys and values).
Returns:

A Tensor that contains the context vector.