neuralmonkey.encoders.transformer module

Implementation of the encoder of the Transformer model.

Described in Vaswani et al. (2017), arxiv.org/abs/1706.03762

class neuralmonkey.encoders.transformer.TransformerEncoder(name: str, input_sequence: neuralmonkey.model.stateful.TemporalStateful, ff_hidden_size: int, depth: int, n_heads: int, dropout_keep_prob: float = 1.0, attention_dropout_keep_prob: float = 1.0, target_space_id: int = None, use_att_transform_bias: bool = False, use_positional_encoding: bool = True, input_for_cross_attention: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, n_cross_att_heads: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Bases: neuralmonkey.model.model_part.ModelPart, neuralmonkey.model.stateful.TemporalStatefulWithOutput

__init__(name: str, input_sequence: neuralmonkey.model.stateful.TemporalStateful, ff_hidden_size: int, depth: int, n_heads: int, dropout_keep_prob: float = 1.0, attention_dropout_keep_prob: float = 1.0, target_space_id: int = None, use_att_transform_bias: bool = False, use_positional_encoding: bool = True, input_for_cross_attention: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, n_cross_att_heads: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Create an encoder of the Transformer model.

Described in Vaswani et al. (2017), arxiv.org/abs/1706.03762

Parameters:
  • input_sequence – Embedded input sequence.
  • name – Name of the decoder. Should be unique accross all Neural Monkey objects.
  • reuse – Reuse the model variables.
  • dropout_keep_prob – Probability of keeping a value during dropout.
  • target_space_id – Specifies the modality of the target space.
  • use_att_transform_bias – Add bias when transforming qkv vectors for attention.
  • use_positional_encoding – If True, position encoding signal is added to the input.
Keyword Arguments:
 
  • ff_hidden_size – Size of the feedforward sublayers.
  • n_heads – Number of the self-attention heads.
  • depth – Number of sublayers.
  • attention_dropout_keep_prob – Probability of keeping a value during dropout on the attention output.
  • input_for_cross_attention – An attendable model part that is attended using cross-attention on every layer of the decoder, analogically to how encoder is attended in the decoder.
  • n_cross_att_heads – Number of heads used in the cross-attention.
cross_attention_sublayer(queries: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor
dependencies

Return a list of attribute names regarded as dependents.

encoder_inputs
feedforward_sublayer(layer_input: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor

Create the feed-forward network sublayer.

layer(level: int) → neuralmonkey.encoders.transformer.TransformerLayer
modality_matrix

Create an embedding matrix for varyining target modalities.

Used to embed different target space modalities in the tensor2tensor models (e.g. during the zero-shot translation).

output

Return the object output.

A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.

self_attention_sublayer(prev_layer: neuralmonkey.encoders.transformer.TransformerLayer) → tensorflow.python.framework.ops.Tensor

Create the encoder self-attention sublayer.

target_modality_embedding

Gather correct embedding of the target space modality.

See TransformerEncoder.modality_matrix for more information.

temporal_mask

Return mask for the temporal_states.

A 2D Tensor of shape (batch, time) of type float32 which masks the temporal states so each sequence can have a different length. It should only contain ones or zeros.

temporal_states

Return object states in time.

A 3D Tensor of shape (batch, time, state_size) which contains the states of the object in time (e.g. hidden states of a recurrent encoder.

class neuralmonkey.encoders.transformer.TransformerLayer(states: tensorflow.python.framework.ops.Tensor, mask: tensorflow.python.framework.ops.Tensor) → None

Bases: neuralmonkey.model.stateful.TemporalStateful

__init__(states: tensorflow.python.framework.ops.Tensor, mask: tensorflow.python.framework.ops.Tensor) → None

Initialize self. See help(type(self)) for accurate signature.

temporal_mask

Return mask for the temporal_states.

A 2D Tensor of shape (batch, time) of type float32 which masks the temporal states so each sequence can have a different length. It should only contain ones or zeros.

temporal_states

Return object states in time.

A 3D Tensor of shape (batch, time, state_size) which contains the states of the object in time (e.g. hidden states of a recurrent encoder.

neuralmonkey.encoders.transformer.position_signal(dimension: int, length: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor