neuralmonkey.encoders.imagenet_encoder module

Pre-trained ImageNet networks.

class neuralmonkey.encoders.imagenet_encoder.ImageNet(name: str, data_id: str, network_type: str, slim_models_path: str, load_checkpoint: str = None, spatial_layer: str = None, encoded_layer: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Bases: neuralmonkey.model.model_part.ModelPart, neuralmonkey.model.stateful.SpatialStatefulWithOutput

Pre-trained ImageNet network.

We use the ImageNet networks as they are in the tesnorflow/models repository (https://github.com/tensorflow/models). In order use them, you need to clone the repository and configure the ImageNet object such that it has a full path to “research/slim” in the repository. Visit https://github.com/tensorflow/models/tree/master/research/slim for information about checkpoints of the pre-trained models.

__init__(name: str, data_id: str, network_type: str, slim_models_path: str, load_checkpoint: str = None, spatial_layer: str = None, encoded_layer: str = None, initializers: List[Tuple[str, Callable]] = None) → None

Initialize pre-trained ImageNet network.

Parameters:
  • name – Name of the model part (the ImageNet network, will be in its scope, independently on name).
  • data_id – Id of series with images (list of 3D numpy arrays)
  • network_type – Identifier of ImageNet network from TFSlim.
  • spatial_layer – String identifier of the convolutional map (model’s endpoint). Check TFSlim documentation for end point specifications.
  • encoded_layer – String id of the network layer that will be used as input of a decoder. None means averaging the convolutional maps.
  • path_to_models – Path to Slim models in tensorflow/models repository.
  • load_checkpoint – Checkpoint file from which the pre-trained network is loaded.
end_points
feed_dict(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → Dict[tensorflow.python.framework.ops.Tensor, Any]

Return a feed dictionary for the given feedable object.

Parameters:
  • dataset – A dataset instance from which to get the data.
  • train – Boolean indicating whether the model runs in training mode.
Returns:

A FeedDict dictionary object.

input_image
input_shapes
input_types
output

Return the object output.

A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.

spatial_mask

Return mask for the spatial_states.

A 3D Tensor of shape (batch, width, height) of type float32 which masks the spatial states that they can be of different shapes. The mask should only contain ones or zeros.

spatial_states

Return object states in space.

A 4D Tensor of shape (batch, width, height, state_size) which contains the states of the object in space (e.g. final layer of a convolution network processing an image.

class neuralmonkey.encoders.imagenet_encoder.ImageNetSpec

Bases: neuralmonkey.encoders.imagenet_encoder.ImageNetSpec

Specification of the Imagenet encoder.

Do not use this object directly, instead, use one of the ``get_*``functions in this module.

scope

The variable scope of the network to use.

image_size

A tuple of two integers giving the image width and height in pixels.

apply_net

The function that receives an image and applies the network.

neuralmonkey.encoders.imagenet_encoder.get_alexnet() → neuralmonkey.encoders.imagenet_encoder.ImageNetSpec
neuralmonkey.encoders.imagenet_encoder.get_resnet_by_type(resnet_type: str) → Callable[[], neuralmonkey.encoders.imagenet_encoder.ImageNetSpec]
neuralmonkey.encoders.imagenet_encoder.get_vgg_by_type(vgg_type: str) → Callable[[], neuralmonkey.encoders.imagenet_encoder.ImageNetSpec]