neuralmonkey.model.sequence module¶

Module which impements the sequence class and a few of its subclasses.

class neuralmonkey.model.sequence.EmbeddedFactorSequence(name: str, vocabularies: List[neuralmonkey.vocabulary.Vocabulary], data_ids: List[str], embedding_sizes: List[int], max_length: int = None, add_start_symbol: bool = False, add_end_symbol: bool = False, scale_embeddings_by_depth: bool = False, embeddings_source: Union[neuralmonkey.model.sequence.EmbeddedFactorSequence, NoneType] = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶

Bases: neuralmonkey.model.sequence.Sequence

A sequence that stores one or more embedded inputs (factors).

__init__(name: str, vocabularies: List[neuralmonkey.vocabulary.Vocabulary], data_ids: List[str], embedding_sizes: List[int], max_length: int = None, add_start_symbol: bool = False, add_end_symbol: bool = False, scale_embeddings_by_depth: bool = False, embeddings_source: Union[neuralmonkey.model.sequence.EmbeddedFactorSequence, NoneType] = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶

Construct a new instance of EmbeddedFactorSequence.

Takes three lists of vocabularies, data series IDs, and embedding sizes and construct a Sequence object. The supplied lists must be equal in length and the indices to these lists must correspond to each other

Parameters:

name – The name for the ModelPart object
vocabularies – A list of Vocabulary objects used for each factor
data_ids – A list of strings identifying the data series used for each factor
embedding_sizes – A list of integers specifying the size of the embedding vector for each factor
max_length – The maximum length of the sequences
add_start_symbol – Includes <s> in the sequence
add_end_symbol – Includes </s> in the sequence
scale_embeddings_by_depth – Set to True for T2T import compatibility
embeddings_source – EmbeddedSequence from which the embeedings will be reused.
save_checkpoint – The save_checkpoint parameter for ModelPart
load_checkpoint – The load_checkpoint parameter for ModelPart

embedding_matrices¶: Return a list of embedding matrices for each factor.

feed_dict(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → Dict[tensorflow.python.framework.ops.Tensor, Any]¶

Feed the placholders with the data.

Parameters:	dataset – The dataset. train – A flag whether the train mode is enabled.
Returns:	The constructed feed dictionary that contains the factor data and the mask.

input_factors¶

input_shapes¶

input_types¶

tb_embedding_visualization(logdir: str, prj: <module 'tensorflow.contrib.tensorboard.plugins.projector' from '/home/docs/checkouts/readthedocs.org/user_builds/neural-monkey/envs/latest/lib/python3.5/site-packages/tensorflow/contrib/tensorboard/plugins/projector/__init__.py'>)¶

Link embeddings with vocabulary wordlist.

Used for tensorboard visualization.

Parameters:	logdir – directory where model is stored projector – TensorBoard projector for storing linking info.

temporal_mask¶

Return mask for the temporal_states.

A 2D Tensor of shape (batch, time) of type float32 which masks the temporal states so each sequence can have a different length. It should only contain ones or zeros.

temporal_states¶

Return the embedded factors.

A 3D Tensor of shape (batch, time, dimension), where dimension is the sum of the embedding sizes supplied to the constructor.

class neuralmonkey.model.sequence.EmbeddedSequence(name: str, vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, embedding_size: int, max_length: int = None, add_start_symbol: bool = False, add_end_symbol: bool = False, scale_embeddings_by_depth: bool = False, embeddings_source: Union[neuralmonkey.model.sequence.EmbeddedSequence, NoneType] = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶

Bases: neuralmonkey.model.sequence.EmbeddedFactorSequence

A sequence of embedded inputs (for a single factor).

__init__(name: str, vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, embedding_size: int, max_length: int = None, add_start_symbol: bool = False, add_end_symbol: bool = False, scale_embeddings_by_depth: bool = False, embeddings_source: Union[neuralmonkey.model.sequence.EmbeddedSequence, NoneType] = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶

Construct a new instance of EmbeddedSequence.

Parameters:

name – The name for the ModelPart object
vocabulary – A Vocabulary object used for the sequence data
data_id – A string that identifies the data series used for the sequence data
embedding_sizes – An integer that specifies the size of the embedding vector for the sequence data
max_length – The maximum length of the sequences
add_start_symbol – Includes <s> in the sequence
add_end_symbol – Includes </s> in the sequence
scale_embeddings_by_depth – Set to True for T2T import compatibility
embeddings_source – EmbeddedSequence from which the embeedings will be reused.
save_checkpoint – The save_checkpoint parameter for ModelPart
load_checkpoint – The load_checkpoint parameter for ModelPart

data_id¶: Return the input data series indentifier.

embedding_matrix¶: Return the embedding matrix for the sequence.

inputs¶: Return a 2D placeholder for the sequence inputs.

vocabulary¶: Return the input vocabulary.

class neuralmonkey.model.sequence.Sequence(name: str, max_length: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶

Bases: neuralmonkey.model.model_part.ModelPart, neuralmonkey.model.stateful.TemporalStateful

Base class for a data sequence.

This abstract class represents a batch of sequences of Tensors of possibly different lengths.

Sequence is essentialy a temporal stateful object whose states and mask are fed, or computed from fed values. It is also a ModelPart, and therefore, it can store variables such as embedding matrices.

__init__(name: str, max_length: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶

Construct a new Sequence object.

Parameters:	name – The name for the ModelPart object max_length – Maximum length of sequences in the object (not checked) save_checkpoint – The save_checkpoint parameter for ModelPart load_checkpoint – The load_checkpoint parameter for ModelPart