neuralmonkey.learning_utils module¶

neuralmonkey.learning_utils.evaluation(evaluators, dataset, runners, execution_results, result_data)¶

Evaluate the model outputs.

Parameters:	evaluators – List of tuples of series and evaluation functions. dataset – Dataset against which the evaluation is done. runners – List of runners (contains series ids and loss names). execution_results – Execution results that include the loss values. result_data – Dictionary from series names to list of outputs.
Returns:	Dictionary of evaluation names and their values which includes the metrics applied on respective series loss and loss values from the run.

neuralmonkey.learning_utils.print_final_evaluation(name: str, eval_result: Dict[str, float]) → None¶: Print final evaluation from a test dataset.

neuralmonkey.learning_utils.run_on_dataset(tf_manager: neuralmonkey.tf_manager.TensorFlowManager, runners: List[neuralmonkey.runners.base_runner.BaseRunner], dataset: neuralmonkey.dataset.dataset.Dataset, postprocess: Union[List[Tuple[str, Callable]], NoneType], write_out: bool = False, batch_size: Union[int, NoneType] = None, log_progress: int = 0) → Tuple[List[neuralmonkey.runners.base_runner.ExecutionResult], Dict[str, List[Any]]]¶

Apply the model on a dataset and optionally write outputs to files.

Parameters:

tf_manager – TensorFlow manager with initialized sessions.
runners – A function that runs the code
dataset – The dataset on which the model will be executed.
evaluators – List of evaluators that are used for the model evaluation if the target data are provided.
postprocess – an object to use as postprocessing of the
write_out – Flag whether the outputs should be printed to a file defined in the dataset object.
batch_size – size of the minibatch
log_progress – log progress every X seconds
extra_fetches – Extra tensors to evaluate for each batch.

Returns:

Tuple of resulting sentences/numpy arrays, and evaluation results if they are available which are dictionary function -> value.

neuralmonkey.learning_utils.training_loop(tf_manager: neuralmonkey.tf_manager.TensorFlowManager, epochs: int, trainer: neuralmonkey.trainers.generic_trainer.GenericTrainer, batch_size: int, log_directory: str, evaluators: List[Union[Tuple[str, Any], Tuple[str, str, Any]]], runners: List[neuralmonkey.runners.base_runner.BaseRunner], train_dataset: neuralmonkey.dataset.dataset.Dataset, val_dataset: Union[neuralmonkey.dataset.dataset.Dataset, List[neuralmonkey.dataset.dataset.Dataset]], test_datasets: Union[List[neuralmonkey.dataset.dataset.Dataset], NoneType] = None, logging_period: Union[str, int] = 20, validation_period: Union[str, int] = 500, val_preview_input_series: Union[List[str], NoneType] = None, val_preview_output_series: Union[List[str], NoneType] = None, val_preview_num_examples: int = 15, train_start_offset: int = 0, runners_batch_size: Union[int, NoneType] = None, initial_variables: Union[str, List[str], NoneType] = None, postprocess: Union[List[Tuple[str, Callable]], NoneType] = None) → None¶

Execute the training loop for given graph and data.

Parameters:

tf_manager – TensorFlowManager with initialized sessions.
epochs – Number of epochs for which the algoritm will learn.
trainer – The trainer object containg the TensorFlow code for computing the loss and optimization operation.
batch_size – number of examples in one mini-batch
log_directory – Directory where the TensordBoard log will be generated. If None, nothing will be done.
evaluators – List of evaluators. The last evaluator is used as the main. An evaluator is a tuple of the name of the generated series, the name of the dataset series the generated one is evaluated with and the evaluation function. If only one series names is provided, it means the generated and dataset series have the same name.
runners – List of runners for logging and evaluation runs
train_dataset – Dataset used for training
val_dataset – used for validation. Can be Dataset or a list of datasets. The last dataset is used as the main one for storing best results. When using multiple datasets. It is recommended to name them for better Tensorboard visualization.
test_datasets – List of datasets used for testing
logging_period – after how many batches should the logging happen. It can also be defined as a time period in format like: 3s; 4m; 6h; 1d; 3m15s; 3seconds; 4minutes; 6hours; 1days
validation_period – after how many batches should the validation happen. It can also be defined as a time period in same format as logging
val_preview_input_series – which input series to preview in validation
val_preview_output_series – which output series to preview in validation
val_preview_num_examples – how many examples should be printed during validation
train_start_offset – how many lines from the training dataset should be skipped. The training starts from the next batch.
runners_batch_size – batch size of runners. It is the same as batch_size if not specified
initial_variables – variables used for initialization, for example for continuation of training. Provide it with a path to your model directory and its checkpoint file group common prefix, e.g. “variables.data”, or “variables.data.3” in case of multiple checkpoints per experiment.
postprocess – A function which takes the dataset with its output series and generates additional series from them.