neuralmonkey.model.sequence module

Module which impements the sequence class and a few of its subclasses.

class neuralmonkey.model.sequence.EmbeddedFactorSequence(name: str, vocabularies: List[neuralmonkey.vocabulary.Vocabulary], data_ids: List[str], embedding_sizes: List[int], max_length: int = None, add_start_symbol: bool = False, add_end_symbol: bool = False, scale_embeddings_by_depth: bool = False, embeddings_source: Union[neuralmonkey.model.sequence.EmbeddedFactorSequence, NoneType] = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Bases: neuralmonkey.model.sequence.Sequence

A sequence that stores one or more embedded inputs (factors).

__init__(name: str, vocabularies: List[neuralmonkey.vocabulary.Vocabulary], data_ids: List[str], embedding_sizes: List[int], max_length: int = None, add_start_symbol: bool = False, add_end_symbol: bool = False, scale_embeddings_by_depth: bool = False, embeddings_source: Union[neuralmonkey.model.sequence.EmbeddedFactorSequence, NoneType] = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Construct a new instance of EmbeddedFactorSequence.

Takes three lists of vocabularies, data series IDs, and embedding sizes and construct a Sequence object. The supplied lists must be equal in length and the indices to these lists must correspond to each other

Parameters:
  • name – The name for the ModelPart object
  • vocabularies – A list of Vocabulary objects used for each factor
  • data_ids – A list of strings identifying the data series used for each factor
  • embedding_sizes – A list of integers specifying the size of the embedding vector for each factor
  • max_length – The maximum length of the sequences
  • add_start_symbol – Includes <s> in the sequence
  • add_end_symbol – Includes </s> in the sequence
  • scale_embeddings_by_depth – Set to True for T2T import compatibility
  • embeddings_source – EmbeddedSequence from which the embeedings will be reused.
  • save_checkpoint – The save_checkpoint parameter for ModelPart
  • load_checkpoint – The load_checkpoint parameter for ModelPart
embedding_matrices

Return a list of embedding matrices for each factor.

feed_dict(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → Dict[tensorflow.python.framework.ops.Tensor, Any]

Feed the placholders with the data.

Parameters:
  • dataset – The dataset.
  • train – A flag whether the train mode is enabled.
Returns:

The constructed feed dictionary that contains the factor data and the mask.

input_factors
input_shapes
input_types
tb_embedding_visualization(logdir: str, prj: <module 'tensorflow.contrib.tensorboard.plugins.projector' from '/home/docs/checkouts/readthedocs.org/user_builds/neural-monkey/envs/latest/lib/python3.5/site-packages/tensorflow/contrib/tensorboard/plugins/projector/__init__.py'>)

Link embeddings with vocabulary wordlist.

Used for tensorboard visualization.

Parameters:
  • logdir – directory where model is stored
  • projector – TensorBoard projector for storing linking info.
temporal_mask

Return mask for the temporal_states.

A 2D Tensor of shape (batch, time) of type float32 which masks the temporal states so each sequence can have a different length. It should only contain ones or zeros.

temporal_states

Return the embedded factors.

A 3D Tensor of shape (batch, time, dimension), where dimension is the sum of the embedding sizes supplied to the constructor.

class neuralmonkey.model.sequence.EmbeddedSequence(name: str, vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, embedding_size: int, max_length: int = None, add_start_symbol: bool = False, add_end_symbol: bool = False, scale_embeddings_by_depth: bool = False, embeddings_source: Union[neuralmonkey.model.sequence.EmbeddedSequence, NoneType] = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Bases: neuralmonkey.model.sequence.EmbeddedFactorSequence

A sequence of embedded inputs (for a single factor).

__init__(name: str, vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, embedding_size: int, max_length: int = None, add_start_symbol: bool = False, add_end_symbol: bool = False, scale_embeddings_by_depth: bool = False, embeddings_source: Union[neuralmonkey.model.sequence.EmbeddedSequence, NoneType] = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Construct a new instance of EmbeddedSequence.

Parameters:
  • name – The name for the ModelPart object
  • vocabulary – A Vocabulary object used for the sequence data
  • data_id – A string that identifies the data series used for the sequence data
  • embedding_sizes – An integer that specifies the size of the embedding vector for the sequence data
  • max_length – The maximum length of the sequences
  • add_start_symbol – Includes <s> in the sequence
  • add_end_symbol – Includes </s> in the sequence
  • scale_embeddings_by_depth – Set to True for T2T import compatibility
  • embeddings_sourceEmbeddedSequence from which the embeedings will be reused.
  • save_checkpoint – The save_checkpoint parameter for ModelPart
  • load_checkpoint – The load_checkpoint parameter for ModelPart
data_id

Return the input data series indentifier.

embedding_matrix

Return the embedding matrix for the sequence.

inputs

Return a 2D placeholder for the sequence inputs.

vocabulary

Return the input vocabulary.

class neuralmonkey.model.sequence.Sequence(name: str, max_length: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Bases: neuralmonkey.model.model_part.ModelPart, neuralmonkey.model.stateful.TemporalStateful

Base class for a data sequence.

This abstract class represents a batch of sequences of Tensors of possibly different lengths.

Sequence is essentialy a temporal stateful object whose states and mask are fed, or computed from fed values. It is also a ModelPart, and therefore, it can store variables such as embedding matrices.

__init__(name: str, max_length: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Construct a new Sequence object.

Parameters:
  • name – The name for the ModelPart object
  • max_length – Maximum length of sequences in the object (not checked)
  • save_checkpoint – The save_checkpoint parameter for ModelPart
  • load_checkpoint – The load_checkpoint parameter for ModelPart