neuralmonkey.decoders.transformer module

Implementation of the decoder of the Transformer model.

Described in Vaswani et al. (2017), arxiv.org/abs/1706.03762

class neuralmonkey.decoders.transformer.TransformerDecoder(name: str, encoders: List[Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]], vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, ff_hidden_size: int, n_heads_self: int, n_heads_enc: Union[List[int], int], depth: int, max_output_len: int, attention_combination_strategy: str = 'serial', n_heads_hier: int = None, dropout_keep_prob: float = 1.0, embedding_size: int = None, embeddings_source: neuralmonkey.model.sequence.EmbeddedSequence = None, tie_embeddings: bool = True, label_smoothing: float = None, self_attention_dropout_keep_prob: float = 1.0, attention_dropout_keep_prob: Union[float, List[float]] = 1.0, use_att_transform_bias: bool = False, supress_unk: bool = False, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Bases: neuralmonkey.decoders.autoregressive.AutoregressiveDecoder

__init__(name: str, encoders: List[Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]], vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, ff_hidden_size: int, n_heads_self: int, n_heads_enc: Union[List[int], int], depth: int, max_output_len: int, attention_combination_strategy: str = 'serial', n_heads_hier: int = None, dropout_keep_prob: float = 1.0, embedding_size: int = None, embeddings_source: neuralmonkey.model.sequence.EmbeddedSequence = None, tie_embeddings: bool = True, label_smoothing: float = None, self_attention_dropout_keep_prob: float = 1.0, attention_dropout_keep_prob: Union[float, List[float]] = 1.0, use_att_transform_bias: bool = False, supress_unk: bool = False, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Create a decoder of the Transformer model.

Described in Vaswani et al. (2017), arxiv.org/abs/1706.03762

Parameters:
  • encoders – Input encoders for the decoder.
  • vocabulary – Target vocabulary.
  • data_id – Target data series.
  • name – Name of the decoder. Should be unique accross all Neural Monkey objects.
  • max_output_len – Maximum length of an output sequence.
  • dropout_keep_prob – Probability of keeping a value during dropout.
  • embedding_size – Size of embedding vectors for target words.
  • embeddings_source – Embedded sequence to take embeddings from.
  • tie_embeddings – Use decoder.embedding_matrix also in place of the output decoding matrix.
  • ff_hidden_size – Size of the feedforward sublayers.
  • n_heads_self – Number of the self-attention heads.
  • n_heads_enc – Number of the attention heads over each encoder. Either a list which size must be equal to encoders, or a single integer. In the latter case, the number of heads is equal for all encoders.
  • attention_comnbination_strategy – One of serial, parallel, flat, hierarchical. Controls the attention combination strategy for enc-dec attention.
  • n_heads_hier – Number of the attention heads for the second attention in the hierarchical attention combination.
  • depth – Number of sublayers.
  • label_smoothing – A label smoothing parameter for cross entropy loss computation.
  • attention_dropout_keep_prob – Probability of keeping a value during dropout on the attention output.
  • supress_unk – If true, decoder will not produce symbols for unknown tokens.
  • reuse – Reuse the variables from the given model part.
dimension
embed_inputs(inputs: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor
embedded_train_inputs
encoder_attention_sublayer(queries: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor

Create the encoder-decoder attention sublayer.

feedforward_sublayer(layer_input: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor

Create the feed-forward network sublayer.

get_body(train_mode: bool, sample: bool = False, temperature: float = 1.0) → Callable

Return the while loop body function.

get_initial_loop_state() → neuralmonkey.decoders.autoregressive.LoopState
layer(level: int, inputs: tensorflow.python.framework.ops.Tensor, mask: tensorflow.python.framework.ops.Tensor) → neuralmonkey.encoders.transformer.TransformerLayer
output_dimension
self_attention_sublayer(prev_layer: neuralmonkey.encoders.transformer.TransformerLayer) → tensorflow.python.framework.ops.Tensor

Create the decoder self-attention sublayer with output mask.

train_logits
class neuralmonkey.decoders.transformer.TransformerHistories

Bases: neuralmonkey.decoders.transformer.TransformerHistories

The loop state histories for the transformer decoder.

Shares attributes with the DecoderHistories class. The special attributes are listed below.

decoded_symbols

A tensor which stores the decoded symbols.

input_mask

A float tensor with zeros and ones which marks the valid positions on the input.