neuralmonkey.encoders.transformer module¶
Implementation of the encoder of the Transformer model.
Described in Vaswani et al. (2017), arxiv.org/abs/1706.03762
-
class
neuralmonkey.encoders.transformer.
TransformerEncoder
(name: str, input_sequence: neuralmonkey.model.stateful.TemporalStateful, ff_hidden_size: int, depth: int, n_heads: int, dropout_keep_prob: float = 1.0, attention_dropout_keep_prob: float = 1.0, target_space_id: int = None, use_att_transform_bias: bool = False, use_positional_encoding: bool = True, input_for_cross_attention: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, n_cross_att_heads: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
,neuralmonkey.model.stateful.TemporalStatefulWithOutput
-
__init__
(name: str, input_sequence: neuralmonkey.model.stateful.TemporalStateful, ff_hidden_size: int, depth: int, n_heads: int, dropout_keep_prob: float = 1.0, attention_dropout_keep_prob: float = 1.0, target_space_id: int = None, use_att_transform_bias: bool = False, use_positional_encoding: bool = True, input_for_cross_attention: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, n_cross_att_heads: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create an encoder of the Transformer model.
Described in Vaswani et al. (2017), arxiv.org/abs/1706.03762
Parameters: - input_sequence – Embedded input sequence.
- name – Name of the decoder. Should be unique accross all Neural Monkey objects.
- reuse – Reuse the model variables.
- dropout_keep_prob – Probability of keeping a value during dropout.
- target_space_id – Specifies the modality of the target space.
- use_att_transform_bias – Add bias when transforming qkv vectors for attention.
- use_positional_encoding – If True, position encoding signal is added to the input.
Keyword Arguments: - ff_hidden_size – Size of the feedforward sublayers.
- n_heads – Number of the self-attention heads.
- depth – Number of sublayers.
- attention_dropout_keep_prob – Probability of keeping a value during dropout on the attention output.
- input_for_cross_attention – An attendable model part that is attended using cross-attention on every layer of the decoder, analogically to how encoder is attended in the decoder.
- n_cross_att_heads – Number of heads used in the cross-attention.
-
cross_attention_sublayer
(queries: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶
-
dependencies
¶ Return a list of attribute names regarded as dependents.
-
encoder_inputs
¶
-
feedforward_sublayer
(layer_input: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶ Create the feed-forward network sublayer.
-
layer
(level: int) → neuralmonkey.encoders.transformer.TransformerLayer¶
-
modality_matrix
¶ Create an embedding matrix for varyining target modalities.
Used to embed different target space modalities in the tensor2tensor models (e.g. during the zero-shot translation).
-
model_dimension
¶
-
output
¶ Return the object output.
A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.
-
self_attention_sublayer
(prev_layer: neuralmonkey.encoders.transformer.TransformerLayer) → tensorflow.python.framework.ops.Tensor¶ Create the encoder self-attention sublayer.
-
target_modality_embedding
¶ Gather correct embedding of the target space modality.
See TransformerEncoder.modality_matrix for more information.
-
temporal_mask
¶ Return mask for the temporal_states.
A 2D Tensor of shape (batch, time) of type float32 which masks the temporal states so each sequence can have a different length. It should only contain ones or zeros.
-
temporal_states
¶ Return object states in time.
A 3D Tensor of shape (batch, time, state_size) which contains the states of the object in time (e.g. hidden states of a recurrent encoder.
-
-
class
neuralmonkey.encoders.transformer.
TransformerLayer
(states: tensorflow.python.framework.ops.Tensor, mask: tensorflow.python.framework.ops.Tensor) → None¶ Bases:
neuralmonkey.model.stateful.TemporalStateful
-
__init__
(states: tensorflow.python.framework.ops.Tensor, mask: tensorflow.python.framework.ops.Tensor) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
temporal_mask
¶ Return mask for the temporal_states.
A 2D Tensor of shape (batch, time) of type float32 which masks the temporal states so each sequence can have a different length. It should only contain ones or zeros.
-
temporal_states
¶ Return object states in time.
A 3D Tensor of shape (batch, time, state_size) which contains the states of the object in time (e.g. hidden states of a recurrent encoder.
-
-
neuralmonkey.encoders.transformer.
position_signal
(dimension: int, length: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶