neuralmonkey.encoders.transformer module¶
Implementation of the encoder of the Transformer model.
Described in Vaswani et al. (2017), arxiv.org/abs/1706.03762

class
neuralmonkey.encoders.transformer.
TransformerEncoder
(name: str, input_sequence: neuralmonkey.model.stateful.TemporalStateful, ff_hidden_size: int, depth: int, n_heads: int, dropout_keep_prob: float = 1.0, attention_dropout_keep_prob: float = 1.0, target_space_id: int = None, use_att_transform_bias: bool = False, use_positional_encoding: bool = True, input_for_cross_attention: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, n_cross_att_heads: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
,neuralmonkey.model.stateful.TemporalStatefulWithOutput

__init__
(name: str, input_sequence: neuralmonkey.model.stateful.TemporalStateful, ff_hidden_size: int, depth: int, n_heads: int, dropout_keep_prob: float = 1.0, attention_dropout_keep_prob: float = 1.0, target_space_id: int = None, use_att_transform_bias: bool = False, use_positional_encoding: bool = True, input_for_cross_attention: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, n_cross_att_heads: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create an encoder of the Transformer model.
Described in Vaswani et al. (2017), arxiv.org/abs/1706.03762
Parameters:  input_sequence – Embedded input sequence.
 name – Name of the decoder. Should be unique accross all Neural Monkey objects.
 reuse – Reuse the model variables.
 dropout_keep_prob – Probability of keeping a value during dropout.
 target_space_id – Specifies the modality of the target space.
 use_att_transform_bias – Add bias when transforming qkv vectors for attention.
 use_positional_encoding – If True, position encoding signal is added to the input.
Keyword Arguments:  ff_hidden_size – Size of the feedforward sublayers.
 n_heads – Number of the selfattention heads.
 depth – Number of sublayers.
 attention_dropout_keep_prob – Probability of keeping a value during dropout on the attention output.
 input_for_cross_attention – An attendable model part that is attended using crossattention on every layer of the decoder, analogically to how encoder is attended in the decoder.
 n_cross_att_heads – Number of heads used in the crossattention.

cross_attention_sublayer
(queries: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶

dependencies
¶ Return a list of attribute names regarded as dependents.

encoder_inputs
¶

feedforward_sublayer
(layer_input: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶ Create the feedforward network sublayer.

layer
(level: int) → neuralmonkey.encoders.transformer.TransformerLayer¶

modality_matrix
¶ Create an embedding matrix for varyining target modalities.
Used to embed different target space modalities in the tensor2tensor models (e.g. during the zeroshot translation).

output
¶ Return the object output.
A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.

self_attention_sublayer
(prev_layer: neuralmonkey.encoders.transformer.TransformerLayer) → tensorflow.python.framework.ops.Tensor¶ Create the encoder selfattention sublayer.

target_modality_embedding
¶ Gather correct embedding of the target space modality.
See TransformerEncoder.modality_matrix for more information.

temporal_mask
¶ Return mask for the temporal_states.
A 2D Tensor of shape (batch, time) of type float32 which masks the temporal states so each sequence can have a different length. It should only contain ones or zeros.

temporal_states
¶ Return object states in time.
A 3D Tensor of shape (batch, time, state_size) which contains the states of the object in time (e.g. hidden states of a recurrent encoder.


class
neuralmonkey.encoders.transformer.
TransformerLayer
(states: tensorflow.python.framework.ops.Tensor, mask: tensorflow.python.framework.ops.Tensor) → None¶ Bases:
neuralmonkey.model.stateful.TemporalStateful

__init__
(states: tensorflow.python.framework.ops.Tensor, mask: tensorflow.python.framework.ops.Tensor) → None¶ Initialize self. See help(type(self)) for accurate signature.

temporal_mask
¶ Return mask for the temporal_states.
A 2D Tensor of shape (batch, time) of type float32 which masks the temporal states so each sequence can have a different length. It should only contain ones or zeros.

temporal_states
¶ Return object states in time.
A 3D Tensor of shape (batch, time, state_size) which contains the states of the object in time (e.g. hidden states of a recurrent encoder.


neuralmonkey.encoders.transformer.
position_signal
(dimension: int, length: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶