neuralmonkey.encoders.transformer module¶

Implementation of the encoder of the Transformer model.

Described in Vaswani et al. (2017), arxiv.org/abs/1706.03762

class neuralmonkey.encoders.transformer.TransformerEncoder(name: str, input_sequence: neuralmonkey.model.stateful.TemporalStateful, ff_hidden_size: int, depth: int, n_heads: int, dropout_keep_prob: float = 1.0, attention_dropout_keep_prob: float = 1.0, target_space_id: int = None, use_att_transform_bias: bool = False, use_positional_encoding: bool = True, input_for_cross_attention: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, n_cross_att_heads: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶

Bases: neuralmonkey.model.model_part.ModelPart, neuralmonkey.model.stateful.TemporalStatefulWithOutput

__init__(name: str, input_sequence: neuralmonkey.model.stateful.TemporalStateful, ff_hidden_size: int, depth: int, n_heads: int, dropout_keep_prob: float = 1.0, attention_dropout_keep_prob: float = 1.0, target_space_id: int = None, use_att_transform_bias: bool = False, use_positional_encoding: bool = True, input_for_cross_attention: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, n_cross_att_heads: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶

Create an encoder of the Transformer model.

Described in Vaswani et al. (2017), arxiv.org/abs/1706.03762

Parameters:

input_sequence – Embedded input sequence.
name – Name of the decoder. Should be unique accross all Neural Monkey objects.
reuse – Reuse the model variables.
dropout_keep_prob – Probability of keeping a value during dropout.
target_space_id – Specifies the modality of the target space.
use_att_transform_bias – Add bias when transforming qkv vectors for attention.
use_positional_encoding – If True, position encoding signal is added to the input.

Keyword Arguments:

ff_hidden_size – Size of the feedforward sublayers.
n_heads – Number of the self-attention heads.
depth – Number of sublayers.
attention_dropout_keep_prob – Probability of keeping a value during dropout on the attention output.
input_for_cross_attention – An attendable model part that is attended using cross-attention on every layer of the decoder, analogically to how encoder is attended in the decoder.
n_cross_att_heads – Number of heads used in the cross-attention.

cross_attention_sublayer(queries: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶

dependencies¶: Return a list of attribute names regarded as dependents.

encoder_inputs¶

feedforward_sublayer(layer_input: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶: Create the feed-forward network sublayer.

layer(level: int) → neuralmonkey.encoders.transformer.TransformerLayer¶

modality_matrix¶

Create an embedding matrix for varyining target modalities.

Used to embed different target space modalities in the tensor2tensor models (e.g. during the zero-shot translation).

model_dimension¶

output¶

Return the object output.

A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.

self_attention_sublayer(prev_layer: neuralmonkey.encoders.transformer.TransformerLayer) → tensorflow.python.framework.ops.Tensor¶: Create the encoder self-attention sublayer.

target_modality_embedding¶

Gather correct embedding of the target space modality.

See TransformerEncoder.modality_matrix for more information.

temporal_mask¶

Return mask for the temporal_states.

A 2D Tensor of shape (batch, time) of type float32 which masks the temporal states so each sequence can have a different length. It should only contain ones or zeros.

temporal_states¶

Return object states in time.

A 3D Tensor of shape (batch, time, state_size) which contains the states of the object in time (e.g. hidden states of a recurrent encoder.

class neuralmonkey.encoders.transformer.TransformerLayer(states: tensorflow.python.framework.ops.Tensor, mask: tensorflow.python.framework.ops.Tensor) → None¶

Bases: neuralmonkey.model.stateful.TemporalStateful

__init__(states: tensorflow.python.framework.ops.Tensor, mask: tensorflow.python.framework.ops.Tensor) → None¶: Initialize self. See help(type(self)) for accurate signature.