neuralmonkey.encoders.imagenet_encoder module

Pre-trained ImageNet networks.

class neuralmonkey.encoders.imagenet_encoder.ImageNet(name: str, data_id: str, network_type: str, slim_models_path: str, load_checkpoint: str = None, spatial_layer: str = None, encoded_layer: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Bases: neuralmonkey.model.model_part.ModelPart, neuralmonkey.model.stateful.SpatialStatefulWithOutput

Pre-trained ImageNet network.

We use the ImageNet networks as they are in the tesnorflow/models repository ( In order use them, you need to clone the repository and configure the ImageNet object such that it has a full path to “research/slim” in the repository. Visit for information about checkpoints of the pre-trained models.

__init__(name: str, data_id: str, network_type: str, slim_models_path: str, load_checkpoint: str = None, spatial_layer: str = None, encoded_layer: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Initialize pre-trained ImageNet network.

  • name – Name of the model part (the ImageNet network, will be in its scope, independently on name).
  • data_id – Id of series with images (list of 3D numpy arrays)
  • network_type – Identifier of ImageNet network from TFSlim.
  • spatial_layer – String identifier of the convolutional map (model’s endpoint). Check TFSlim documentation for end point specifications.
  • encoded_layer – String id of the network layer that will be used as input of a decoder. None means averaging the convolutional maps.
  • path_to_models – Path to Slim models in tensorflow/models repository.
  • load_checkpoint – Checkpoint file from which the pre-trained network is loaded.
feed_dict(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → Dict[tensorflow.python.framework.ops.Tensor, Any]

Return a feed dictionary for the given feedable object.

  • dataset – A dataset instance from which to get the data.
  • train – Boolean indicating whether the model runs in training mode.

A FeedDict dictionary object.


Return the object output.

A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.


Return mask for the spatial_states.

A 3D Tensor of shape (batch, width, height) of type float32 which masks the spatial states that they can be of different shapes. The mask should only contain ones or zeros.


Return object states in space.

A 4D Tensor of shape (batch, width, height, state_size) which contains the states of the object in space (e.g. final layer of a convolution network processing an image.

class neuralmonkey.encoders.imagenet_encoder.ImageNetSpec

Bases: neuralmonkey.encoders.imagenet_encoder.ImageNetSpec

Specification of the Imagenet encoder.

Do not use this object directly, instead, use one of the ``get_*``functions in this module.


The variable scope of the network to use.


A tuple of two integers giving the image width and height in pixels.


The function that receives an image and applies the network.

neuralmonkey.encoders.imagenet_encoder.get_alexnet() → neuralmonkey.encoders.imagenet_encoder.ImageNetSpec
neuralmonkey.encoders.imagenet_encoder.get_resnet_by_type(resnet_type: str) → Callable[[], neuralmonkey.encoders.imagenet_encoder.ImageNetSpec]
neuralmonkey.encoders.imagenet_encoder.get_vgg_by_type(vgg_type: str) → Callable[[], neuralmonkey.encoders.imagenet_encoder.ImageNetSpec]