neuralmonkey.encoders.cnn_encoder module

CNN for image processing.

class neuralmonkey.encoders.cnn_encoder.CNNEncoder(name: str, data_id: str, convolutions: List[Union[Tuple[str, int, int, str, int], Tuple[str, int, int], Tuple[str, int, int, str]]], image_height: int, image_width: int, pixel_dim: int, fully_connected: List[int] = None, batch_normalize: bool = False, dropout_keep_prob: float = 0.5, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Bases: neuralmonkey.model.model_part.ModelPart, neuralmonkey.model.stateful.SpatialStatefulWithOutput

An image encoder.

It projects the input image through a serie of convolutioal operations. The projected image is vertically cut and fed to stacked RNN layers which encode the image into a single vector.

__init__(name: str, data_id: str, convolutions: List[Union[Tuple[str, int, int, str, int], Tuple[str, int, int], Tuple[str, int, int, str]]], image_height: int, image_width: int, pixel_dim: int, fully_connected: List[int] = None, batch_normalize: bool = False, dropout_keep_prob: float = 0.5, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Initialize a convolutional network for image processing.

The convolutional network can consist of plain convolutions, max-pooling layers and residual block. In the configuration, they are specified using the following tuples.

  • convolution: (“C”, kernel_size, stride, padding, out_channel);
  • max / average pooling: (“M”/”A”, kernel_size, stride, padding);
  • residual block: (“R”, kernel_size, out_channels).

Padding must be either “valid” or “same”.

Parameters:
  • convolutions – Configuration of convolutional layers.
  • data_id – Identifier of the data series in the dataset.
  • image_height – Height of the input image in pixels.
  • image_width – Width of the image.
  • pixel_dim – Number of color channels in the input images.
  • dropout_keep_prob – Probability of keeping neurons active in dropout. Dropout is done between all convolutional layers and fully connected layer.
batch_norm_callback(layer_output: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor
feed_dict(dataset: neuralmonkey.dataset.dataset.Dataset, train: bool = False) → Dict[tensorflow.python.framework.ops.Tensor, Any]
image_input
image_mask
image_processing_layers

Do all convolutions and return the last conditional map.

No dropout is applied between the convolutional layers. By default, the activation function is ReLU.

output

Output vector of the CNN.

If there are specified some fully connected layers, there are applied on top of the last convolutional map. Dropout is applied between all layers, default activation function is ReLU. There are only projection layers, no softmax is applied.

If there is fully_connected layer specified, average-pooled last convolutional map is used as a vector output.

spatial_mask

Return mask for the spatial_states.

A 3D Tensor of shape (batch, width, height) of type float32 which masks the spatial states that they can be of different shapes. The mask should only contain ones or zeros.

spatial_states

Return object states in space.

A 4D Tensor of shape (batch, width, height, state_size) which contains the states of the object in space (e.g. final layer of a convolution network processing an image.

class neuralmonkey.encoders.cnn_encoder.CNNTemporalView(name: str, cnn: neuralmonkey.encoders.cnn_encoder.CNNEncoder) → None

Bases: neuralmonkey.model.model_part.ModelPart, neuralmonkey.model.stateful.TemporalStatefulWithOutput

Slice the convolutional maps left to right.

__init__(name: str, cnn: neuralmonkey.encoders.cnn_encoder.CNNEncoder) → None

Initialize self. See help(type(self)) for accurate signature.

get_dependencies() → Set[ModelPart]

Collect recusively all encoders and decoders.

output

Return the object output.

A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.

temporal_mask

Return mask for the temporal_states.

A 2D Tensor of shape (batch, time) of type float32 which masks the temporal states so each sequence can have a different length. It should only contain ones or zeros.

temporal_states

Return object states in time.

A 3D Tensor of shape (batch, time, state_size) which contains the states of the object in time (e.g. hidden states of a recurrent encoder.

neuralmonkey.encoders.cnn_encoder.plain_convolution(prev_layer: tensorflow.python.framework.ops.Tensor, prev_mask: tensorflow.python.framework.ops.Tensor, specification: Tuple[str, int, int, str, int], batch_norm_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], layer_num: int) → Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor, int]
neuralmonkey.encoders.cnn_encoder.pooling(prev_layer: tensorflow.python.framework.ops.Tensor, prev_mask: tensorflow.python.framework.ops.Tensor, specification: Tuple[str, int, int, str], layer_num: int) → Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor]
neuralmonkey.encoders.cnn_encoder.residual_block(prev_layer: tensorflow.python.framework.ops.Tensor, prev_mask: tensorflow.python.framework.ops.Tensor, prev_channels: int, specification: Tuple[str, int, int], batch_norm_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], layer_num: int) → Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor, int]