Welcome to Neural Monkey’s documentation!¶
Neural Monkey is an open-source toolkit for sequence learning using Tensorflow.
If you want to dig in the code, you can browse the repository.
Getting Started¶
Installation¶
Before you start, make sure that you already have installed Python 3.5, pip and git.
Create and activate a virtual environment to install the package into:
$ python3 -m venv nm
$ source nm/bin/activate
# after this, your prompt should change
Then clone Neural Monkey from GitHub and switch to its root directory:
(nm)$ git clone https://github.com/ufal/neuralmonkey
(nm)$ cd neuralmonkey
Run pip to install all requirements. For the CPU version install dependencies by this command:
(nm)$ pip install --upgrade -r requirements.txt
For the GPU version install dependencies try this command:
(nm)$ pip install --upgrade -r requirements-gpu.txt
If you are using the GPU version, make sure that the LD_LIBRARY_PATH
environment variable points to lib
and lib64
directories of your CUDA
and CuDNN installations. Similarly, your PATH
variable should point to the
bin
subdirectory of the CUDA installation directory.
You made it! Neural Monkey is now installed!
Note for Ubuntu 14.04 Users¶
If you get Segmentation fault errors at the very end of the training process, you can either ignore it, or follow the steps outlined in this document.
Package Overview¶
This overview should provide you with the basic insight on how Neural Monkey conceptualizes the problem of sequence-to-sequence learning and how the data flow during training and running models looks like.
Loading and Processing Datasets¶
We call a dataset a collection of named data series. By a series we mean a list of data items of the same type representing one type of input or desired output of a model. In the simple case of machine translation, there are two series: a list of source-language sentences and a list of target-language sentences.
The following scheme captures how a dataset is created from input data.
The dataset is created in the following steps:
- An input file is read using a reader. Reader can e.g., load a file
containing paths to JPEG images and load them as
numpy
arrays, or read a tokenized text as a list of lists (sentences) of string tokens. - Series created by the readers can be preprocessed by some series-level preprocessors. An example of such preprocessing is byte-pair encoding which loads a list of merges and segments the text accordingly.
- The final step before creating a dataset is applying dataset-level preprocessors which can take more series and output a new series.
Currently there are two implementations of a dataset. An in-memory dataset which stores all data in the memory and a lazy dataset which gradually reads the input files step by step and only stores the batches necessary for the computation in the memory.
Training and Running a Model¶
This section describes the training and running workflow. The main concepts and their interconnection can be seen in the following scheme.
The dataset series can be used to create a vocabulary. A vocabulary represents an indexed set of tokens and provides functionality for converting lists of tokenized sentences into matrices of token indices and vice versa. Vocabularies are used by encoders and decoders for feeding the provided series into the neural network.
The model itself is defined by encoders and decoders. Most of the TensorFlow code is in the encoders and decoders. Encoders are parts of the model which take some input and compute a representation of it. Decoders are model parts that produce some outputs. Our definition of encoders and decoders is more general than in the classical sequence-to-sequence learning. An encoder can be for example a convolutional network processing an image. The RNN decoder is for us only a special type of decoder, it can be also a sequence labeler or a simple multilayer-perceptron classifier.
Decoders are executed using so-called runners. Different runners
represent different ways of running the model. We might want to get a single
best estimation, get an n
-best list or a sample from the model. We might
want to use an RNN decoder to get the decoded sequences or we might be
interested in the word alignment obtained by its attention model. This is all
done by employing different runners over the decoders. The outputs of the
runners can be subject of further post-processing.
Additionally to runners, each training experiment has to have its trainer. A trainer is a special case of a runner that actually modifies the parameters of the model. It collects the objective functions and uses them in an optimizer.
Neural Monkey manages TensorFlow sessions using an object called TensorFlow manager. Its basic capability is to execute runners on provided datasets.
Post-Editing Task Tutorial¶
This tutorial will guide you through designing your first experiment in Neural Monkey.
Before we get started with the tutorial, please check that you have the Neural Monkey package properly installed and working.
Part I. - The Task¶
This section gives an overall description of the task we will try to solve in this tutorial. To make things more interesting than plain machine translation, let’s try automatic post-editing task (APE, rhyming well with Neural Monkey).
In short, automatic post-editing is a task, in which we have a source language
sentence (let’s call it f
, as grown-ups do), a machine-translated sentence
of f
(I actually don’t know what grown-ups call this, so let’s call this
e'
), and we are expected to generate another sentence in the same language
as e'
but cleaned of all the errors that the machine translation system have
made (let’s call this cleaned sentence e
). Consider this small example:
- Source sentence
f
: - Bärbel hat eine Katze.
- Machine-translated sentence
e'
: - Bärbel has a dog.
- Corrected translation
e
: - Bärbel has a cat.
In the example, the machine translation system wrongly translated the German word “Katze” as the English word “dog”. It is up to the post-editing system to fix this error.
In theory (and in practice), we regard the machine translation task as searching
for a target sentence e*
that has the highest probability of being the
translation given the source sentence f
. You can put it to a formula:
e* = argmax_e p(e|f)
In the post-editing task, the formula is slightly different:
e* = argmax_e p(e|f, e')
If you think about this a little, there are two ways one can look at this task. One is that we are translating the machine-translated sentence from a kind of synthetic language into a proper one, with additional knowledge what the source sentence was. The second view regards this as an ordinary machine translation task, with a little help from another MT system.
In our tutorial, we will assume the MT system used to produce the sentence
e'
was good enough. We thus generally trust it and expect only to make
small edits to the
translated sentence in order to make it fully correct. This means that we don’t need
to train a whole new MT system that would translate the source sentences from
scratch. Instead, we will build a system that will tell us how to edit the
machine translated sentence e'
.
Part II. - The Edit Operations¶
How can an automatic system tell us how to edit a sentence? Here’s one way to do
it: We will design a set of edit operations and train the system to generate a
sequence of these operations. If we consider a sequence of edit operations a
function R
(as in rewrite), which transforms one sequence to another, we
can adapt the formulas above to suit our needs more:
R* = argmax_R p(R(e')|f, e')
e* = R*(e')
So we are searching for the best edit function R*
that, once applied
to e'
, will give us the corrected output e*
.
Another question is what the class of all possible edit functions
should look like, for now we simply limit them to functions that can be
defined as sequences of edit operations.
The edit function R
processes the input sequence token-by-token in left-to-right
direction. It has a pointer to the input sequence, which starts by pointing to
the first word of the sequence.
We design three types of edit operations as follows:
- KEEP - this operation copies the current word to the output and moves the pointer to the next token of the input,
- DELETE - this operation does not emit anything to the output and moves the pointer to the next token of the input,
- INSERT - this operation puts a word on the output, leaving the pointer to the input intact.
The edit function applies all its operations to the input sentence. We handle malformed edit sequences simply: if the pointer reaches the end of the input seqence, operations KEEP and DELETE do nothing. If the sequence of edits ends before the end of the input sentence is reached, we apply as many additional KEEP operations as needed to reach the end of the input sequence.
Let’s see another example:
Bärbel has a dog .
KEEP KEEP KEEP DELETE cat KEEP
The word “cat” on the second line is an INSERT operation parameterized by the word “cat”. If we apply all the edit operations to the input (i.e. keep the words “Bärbel”, “has”, “a”, and “.”, delete the word “dog” and put the word “cat” in its place), we get the corrected target sentence.
Part III. - The Data¶
We are going to use the data for WMT 16 shared APE task. You can get them at the WMT 16 website or directly at the Lindat repository. There are three files in the repository:
- TrainDev.zip - contains training and development data set
- Test.zip - contains source and translated test data
- test_pe.zip - contains the post-edited test data
Now - before we start, let’s create our experiment directory, in which we will
place all our work. We shall call it for example exp-nm-ape
(feel free to
choose another weird string).
Extract all the files into the exp-nm-ape/data
directory. Rename the files and
directories so you get this directory structure:
exp-nm-ape
|
\== data
|
|== train
| |
| |== train.src
| |== train.mt
| \== train.pe
|
|== dev
| |
| |== dev.src
| |== dev.mt
| \== dev.pe
|
\== test
|
|== test.src
|== test.mt
\== test.pe
The data is already tokenized so we don’t need to run any preprocessing tools. The format of the data is plain text with one sentence per line. There are 12k training triplets of sentences, 1k development triplets and 2k of evaluation triplets.
Preprocessing of the Data¶
The next phase is to prepare the post editing sequences that we should learn during training. We apply the Levenshtein algorithm to find the shortest edit path from the translated sentence to the post-edited sentence. As a little coding excercise, you can implement your own script that does the job, or you may use our preprocessing script from the Neural Monkey package. For this, in the neuralmonkey root directory, run:
scripts/postedit_prepare_data.py \
--translated-sentences=exp-nm-ape/data/train/train.mt \
--target-sentences=exp-nm-ape/data/train/train.pe \
> exp-nm-ape/data/train/train.edits
And the same for the development data.
NOTE: You may have to change the path to the exp-nm-ape directory if it is not located inside the repository root directory.
NOTE 2: There is a hidden option of the preparation script
(--target-german=True
) which turns on some steps
tailored for better processing of German text. In this tutorial, we are not
going to use it.
If you look at the preprocessed files, you will see that the KEEP and DELETE operations are represented with special tokens while the INSERT operations are represented simply with the word they insert.
Congratulations! Now, you should have train.edits, dev.edits and test.edits files all in their respective data directories. We can now move to work with Neural Monkey configurations!
Part IV. - The Model Configuration¶
In Neural Monkey, all information about a model and its training is stored in configuration files. The syntax of these files is a plain INI syntax (more specifically, the one which gets processed by Python’s ConfigParser). The configuration file is structured into a set of sections, each describing a part of the training. In this section, we will go through all of them and write our configuration file needed for the training of the post-editing task.
First of all, create a file called post-edit.ini
and put it inside the
exp-nm-ape
directory. Put all the snippets that we will describe in the
following paragraphs into the file.
1 - Datasets¶
For training, we prepare two datasets. The first dataset will serve for the training, the second one for validation. In Neural Monkey, each dataset contains a number of so called data series. In our case, we will call the data series source, translated, and edits. Each of those series will contain the respective set of sentences.
It is assumed that all series within a given dataset have the same number of elements (i.e. sentences in our case).
The configuration of the datasets looks like this:
[train_dataset]
class=dataset.load_dataset_from_files
s_source="exp-nm-ape/data/train/train.src"
s_translated="exp-nm-ape/data/train/train.mt"
s_edits="exp-nm-ape/data/train/train.edits"
[val_dataset]
class=dataset.load_dataset_from_files
s_source="exp-nm-ape/data/dev/dev.src"
s_translated="exp-nm-ape/data/dev/dev.mt"
s_edits="exp-nm-ape/data/dev/dev.edits"
Note that series names (source, translated, and edits) are arbitrary and
defined by their first mention. The s_
prefix stands for “series” and
is used only here in the dataset sections, not later when the series are referred to.
These two INI sections represent two calls to function
neuralmonkey.config.dataset_from_files
, with the series file paths as keyword
arguments. The function serves as a constructor and builds an object for every call.
So at the end, we will have two objects representing the two datasets.
2 - Vocabularies¶
Each encoder and decoder which deals with language data operates with some kind
of vocabulary. In our case, the vocabulary is just a list of all unique words in
the training data. Note that apart the special <keep>
and <delete>
tokens, the vocabularies for the translated and edits series are from the
same language. We can save some memory and perhaps improve quality of the target
language embeddings by share vocabularies for these datasets. Therefore, we need
to create only two vocabulary objects:
[source_vocabulary]
class=vocabulary.from_dataset
datasets=[<train_dataset>]
series_ids=["source"]
max_size=50000
[target_vocabulary]
class=vocabulary.from_dataset
datasets=[<train_dataset>]
series_ids=["edits", "translated"]
max_size=50000
The first vocabulary object (called source_vocabulary
) represents the
(English) vocabulary used for this task. The 50,000 is the maximum size of the
vocabulary. If the actual vocabulary of the data was bigger, the rare words
would be replaced by the <unk>
token (hardcoded in Neural Monkey, not part
of the 50,000 items), which stands for unknown words. In
our case, however, the vocabularies of the datasets are much smaller so we won’t
lose any words.
Both vocabularies are created out of the training dataset, as specified by the
line datasets=[<train_dataset>]
(more datasets could be given in the list). This
means that if there are any unseen words in the development or test data, our
model will treat them as unknown words.
We know that the languages in the translated
series and edits
are
the same (except for the KEEP and DELETE tokens in the edits), so we create a
unified vocabulary for them. This is achieved by specifying
series_ids=[edits, translated]
. The one-hot encodings (or more precisely,
indices to the vocabulary) will be identical for words in translated
and
edits
.
3 - Encoders¶
Our network will have two inputs. Therefore, we must design two separate encoders. The first encoder will process source sentences, and the second will process translated sentences, i.e. the candidate translations that we are expected to post-edit. This is the configuration of the encoder for the source sentences:
[src_encoder]
class=encoders.recurrent.SentenceEncoder
rnn_size=300
max_input_len=50
embedding_size=300
dropout_keep_prob=0.8
data_id="source"
name="src_encoder"
vocabulary=<source_vocabulary>
This configuration initializes a new instance of sentence encoder with the
hidden state size set to 300 and the maximum input length set to 50. (Longer
sentences are trimmed.) The sentence encoder looks up the words in a word
embedding matrix. The size of the embedding vector used for each word from the
source vocabulary is set to 300. The source data series is fed to this
encoder. 20% of the weights is dropped out during training from the word
embeddings and from the attention vectors computed over the hidden states of
this encoder. Note the name
attribute must be set in each encoder and
decoder in order to prevent collisions of the names of Tensorflow graph nodes.
The configuration of the second encoder follows:
[trans_encoder]
class=encoders.recurrent.SentenceEncoder
rnn_size=300
max_input_len=50
embedding_size=300
dropout_keep_prob=0.8
data_id="translated"
name="trans_encoder"
vocabulary=<target_vocabulary>
This config creates a second encoder for the translated
data series. The
setting is the same as for the first encoder, except for the different
vocabulary and name.
To be able to use the attention mechanism, we need to define the attention components for each encoder we want to process. In our tutorial, we use attention over both of the encoders:
[src_attention]
class=attention.Attention
name="attention_src_encoder"
encoder=<src_encoder>
dropout_keep_prob=0.8
state_size=300
[trans_attention]
class=attention.Attention
name="attention_trans_encoder"
encoder=<trans_encoder>
dropout_keep_prob=0.8
state_size=300
4 - Decoder¶
Now, we configure perhaps the most important object of the training - the decoder. Without further ado, here it goes:
[decoder]
class=decoders.decoder.Decoder
name="decoder"
encoders=[<trans_encoder>, <src_encoder>]
attentions=[<trans_attention>, <src_attention>]
rnn_size=300
max_output_len=50
embeddings_source=<trans_encoder.input_sequence>
dropout_keep_prob=0.8
data_id="edits"
vocabulary=<target_vocabulary>
As in the case of encoders, the decoder needs its RNN and embedding size settings, maximum output length, dropout parameter, and vocabulary settings.
The outputs of the individual encoders are by default simply concatenated and
projected to the decoder hidden state (of rnn_size
). Internally, the code
is ready to support arbitrary mappings by adding one more parameter here:
encoder_projection
. For an additional view on the encoders, we use the two
attention mechanism objects defined in the previous section.
Note that you may set rnn_size
to None
. Neural Monkey will then directly
use the concatenation of encoder states without any mapping. This is particularly
useful when you have just one encoder as in MT.
The line embeddings_encoder=<trans_encoder.input_sequence>
means that the
embeddings (including embedding size) are to be shared with the input sequence
object of the trans_encoder
(the input sequence object is an underlying
structure that manages the input layer of an encoder and, in case of
SentenceEncoder
, provides access to the word embeddings.
The loss of the decoder is computed
against the edits
data series of whatever dataset the decoder will be
applied to.
5 - Runner and Trainer¶
As their names suggest, runners and trainers are used for running and training
models. The trainer
object provides the optimization operation to the graph. In
the case of the cross entropy trainer (used in our tutorial), the default optimizer
is Adam and it is run against the decoder’s loss, with added L2
regularization (controlled by the l2_weight
parameter of the
trainer). The runner is used to process a dataset by the model and return the
decoded sentences, and (if possible) decoder losses.
We define these two objects like this:
[trainer]
class=trainers.cross_entropy_trainer.CrossEntropyTrainer
decoders=[<decoder>]
l2_weight=1.0e-8
[runner]
class=runners.runner.GreedyRunner
decoder=<decoder>
output_series="greedy_edits"
Note that a runner can only have one decoder, but during training you can train several decoders, all contributing to the loss function.
The purpose of the trainer is to optimize the model, so we are not interested in the actual outputs it produces, only the loss compared to the reference outputs (and the loss is calculated by the given decoder).
The purpose of the runner is to get the actual outputs and for further use, they
are collected to a new series called greedy_edits
(see the line
output_series=
) of whatever dataset the runner will be applied to.
6 - Evaluation Metrics¶
During validation, the whole validation dataset gets processed by the models and the decoded sentences are evaluated against a reference to provide the user with the state of the training. For this, we need to specify evaluator objects which will be used to score the outputted sentences. In our case, we will use BLEU and TER:
[bleu]
class=evaluators.bleu.BLEUEvaluator
name="BLEU-4"
7 - TensorFlow Manager¶
In order to handle global variables such as how many CPU cores TensorFlow should use, you need to specify a “TensorFlow manager”:
[tf_manager]
class=tf_manager.TensorFlowManager
num_threads=4
num_sessions=1
minimize_metric=True
save_n_best=3
8 - Main Configuration Section¶
Almost there! The last part of the configuration puts all the pieces
together. It is called main
and specifies the rest of the training
parameters:
[main]
name="post editing"
output="exp-nm-ape/training"
runners=[<runner>]
tf_manager=<tf_manager>
trainer=<trainer>
train_dataset=<train_dataset>
val_dataset=<val_dataset>
evaluation=[("greedy_edits", "edits", <bleu>), ("greedy_edits", "edits", evaluators.ter.TER)]
batch_size=128
runners_batch_size=256
epochs=100
validation_period=1000
logging_period=20
The output
parameter specifies the directory, in which all the files generated by
the training (used for replicability of the experiment, logging, and saving best
models variables) are stored. It is also worth noting, that if the output
directory exists, the training is not run, unless the line
overwrite_output_dir=True
is also included here.
The runners
, tf_manager
, trainer
, train_dataset
and val_dataset
options are self-explanatory.
The parameter evaluation
takes list of tuples, where each tuple contains:
- the name of output series (as produced by some runner), greedy_edits
here,
- the name of the reference series of the dataset, edits
here,
- the reference to the evaluation algorithm, <bleu>
and evaluators.ter.TER
in the two tuples here.
The batch_size
parameter controls how many sentences will be in one training
mini-batch. When the model does not fit into GPU memory, it might be a good idea to
start reducing this number before anything else. The larger the batch size, however, the
sooner the training should converge to the optimum.
Runners are less memory-demanding, so runners_batch_size
can be set higher than batch_size
.
The epochs
parameter specifies
the number of passes through the training data that the training loop should
do. There is no early stopping mechanism in Neural Monkey yet, the training can be resumed after the
end, however. The training can be safely ctrl+C’ed in any time: Neural Monkey preserves the
last save_n_best
best model variables saved on the disk.
The validation and logging periods specify how often to measure the model’s
performance on the training batch (logging_period
) or on validation data
(validation_period
). Note that both logging and validation involve running the runners
over the current batch or the validation data, resp. If this happens too often,
the time needed to train the model can significantly grow.
At each validation (and logging), the output
is scored using the specified evaluation metrics. The last of the evaluation
metrics (TER in our case) is used to keep track of the model performance over
time. Whenever the score on validation data is better than any of the save_n_best
(3 in our case) previously saved models, the model is saved, discaring
unneccessary lower scoring models.
Part V. - Running an Experiment¶
Now that we have prepared the data and the experiment INI file, we can run the training. If your Neural Monkey installation is OK, you can just run this command from the root directory of the Neural Monkey repository:
bin/neuralmonkey-train exp-nm-ape/post-edit.ini
You should see the training program reporting the parsing of the configuration file, initializing the model, and eventually the training process. If everything goes well, the training should run for 100 epochs. You should see a new line with the status of the model’s performance on the current batch every few seconds, and there should be a validation report printed every few minutes.
As given in the main.output
config line, the Neural Monkey creates the directory
experiments/training
with these files:
git_commit
- the Git hash of the current Neural Monkey revision.git_diff
- the diff between the clean checkout and the working copy.experiment.ini
- the INI file used for running the training (a simple copy of the file NM was started with).experiment.log
- the output log of the training script.checkpoint
- file created by Tensorflow, keeps track of saved variables.events.out.tfevents.<TIME>.<HOST>
- file created by Tensorflow, keeps the summaries for TensorBoard visualisationvariables.data[.<N>]
- a set of files with N best saved models.variables.data.best
- a symbolic link that points to the variable file with the best model.
Part VI. - Evaluation of the Trained Model¶
If you have reached this point, you have nearly everything this tutorial offers. The last step of this tutorial is to take the trained model and to apply it to a previously unseen dataset. For this you will need two additional configuration files. But fear not - it’s not going to be that difficult. The first configuration file is the specification of the model. We have this from Part III and a small optional change is needed. The second configuration file tells the run script which datasets to process.
The optional change of the model INI file prevents the training dataset from loading. This is a flaw in the present design and it is planned to change. The procedure is simple:
- Copy the file
post-edit.ini
into e.g.post-edit.test.ini
- Open the
post-edit.test.ini
file and remove thetrain_dataset
andval_dataset
sections, as well as thetrain_dataset
andval_dataset
configuration from the[main]
section.
Now we have to make another file specifying the testing dataset
configuration. We will call this file post-edit_run.ini
:
[main]
test_datasets=[<eval_data>]
[eval_data]
class=dataset.load_dataset_from_files
s_source="exp-nm-ape/data/test/test.src"
s_translated="exp-nm-ape/data/test/test.mt"
s_greedy_edits_out="exp-nm-ape/test_output.edits"
The dataset specifies the two input series s_source
and s_translated
(the
candidate MT output output to be post-edited) as in the training. The series
s_edits
(containing reference edits) is not present in the evaluation
dataset, because we do not want to use the reference edits to
compute loss at this point. Usually, we don’t even know the correct output at runtime.
Instead, we introduce the output series s_greedy_edits_out
(the prefix s_
and
the suffix _out
are hardcoded in Neural Monkey and the series name in between
has to match the name of the series produced by the runner).
The line s_greedy_edits_out=
specifies the file where the output should be saved.
(You may want to alter the path to the exp-nm-ape
directory if it is not located inside
the Neural Monkey package root dir.)
We have all that we need to run the trained model on the evaluation dataset. From the root directory of the Neural Monkey repository, run:
bin/neuralmonkey-run exp-nm-ape/post-edit.test.ini exp-nm-ape/post-edit_run.ini
At the end, you should see a new file exp-nm-ape/test_output.edits
.
As you notice, the contents of this file are the
sequences of edit operations, which if applied to the machine translated
sentences, generate the output that we want. The final step is to call the
provided post-processing script. Again, feel free to write your own as a simple
exercise:
scripts/postedit_reconstruct_data.py \
--edits=exp-nm-ape/test_output.edits \
--translated-sentences=exp-nm-ape/data/test/test.mt \
> test_output.pe
Now, you can run the official tools (like mteval or the tercom software
available on the WMT 16 website)
to measure the score of test_output.pe
on the data/test/test.pe
reference evaluation dataset.
Part VII. - Conclusions¶
This tutorial gave you the basic overview of how to design your experiments using Neural Monkey. The sample experiment was the task of automatic post-editing. We got the data from the WMT 16 APE shared task and pre-processed them to fit our needs. We have written the configuration file and run the training. At the end, we evaluated the model on the test dataset.
If you want to learn more, the next step is perhaps to browse the examples
directory in Neural Monkey repository and see some further possible setups. If you are
planning to just design an experiment using existing modules, you can start by
editing one of those examples as well.
If you want to dig in the code, you can browse the repository. Please feel free to fork the repository and to send us pull requests. The API documentation is currently under construction, but it already contains a little information about Neural Monkey objects and their configuraiton options.
Have fun!
Machine Translation Tutorial¶
This tutorial will guide you through designing Machnine Translation experiments in Neural Monkey. We assumes that you already read the post-editing tutorial.
The goal of the translation task is to translate sentences from one language into another. For this tutorial we use data from the WMT 16 IT-domain translation shared task on English-to-Czech direction.
WMT is an annual machine translation conference where academic groups compete in translating different datasets over various language pairs.
Part I. - The Data¶
We are going to use the data for the WMT 16 IT-domain translation shared task. You can get them at the WMT IT Translation Shared Task webpage and there download Batch1 and Batch2 answers and Batch3 as a testing set. Or directly here and testset.
Note: In this tutorial we are using only small dataset as an example, which is not big enough for real-life machine translation training.
We find several files for different languages in the downloaded archive. From which we use only the following files as our training, validation and test set:
1. ``Batch1a_cs.txt and Batch1a_en.txt`` as our Training set
2. ``Batch2a_cs.txt and Batch2a_en.txt`` as a Validation set
3. ``Batch3a_en.txt`` as a Test set
Now - before we start, let’s make our experiment directory, in which we place
all our work. Let’s call it exp-nm-mt
.
First extract all the downloaded files, then make gzip files from individual files and put arrange them into the following directory structure:
exp-nm-mt
|
\== data
|
|== train
| |
| |== Batch1a_en.txt.gz
| \== Batch1a_cs.txt.gz
|
|== dev
| |
| |== Batch2a_en.txt.gz
| \== Batch2a_cs.txt.gz
|
\== test
|
\== Batch3a_en.txt.gz
- The gzipping is not necessary, if you put the dataset there in plaintext, it
- will work the same way. Neural Monkey recognizes gzipped files by their MIME
type and chooses the correct way to open them.
TODO The dataset is not tokenized and need to be preprocessed.
Byte Pair Encoding¶
Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem. Byte pair encoding (BPE) enables NMT model translation on open-vocabulary by encoding rare and unknown words as sequences of subword units. This is based on an intuition that various word classes are translatable via smaller units than words. More information in the paper https://arxiv.org/abs/1508.07909 BPE creates a list of merges that are used for splitting out-of-vocabulary words. Example of such splitting:
basketball => basket@@ ball
Postprocessing can be manually done by:
sed "s/@@ //g"
but Neural Monkey manages it for you.
BPE Generation¶
In order to use BPE, you must first generate merge_file, over all data. This file is generated on both source and target dataset. You can generate it by running following script:
neuralmonkey/lib/subword_nmt/learn_bpe.py -s 50000 < DATA > merge_file.bpe
With the data from this tutorial it would be the following command:
paste Batch1a_en.txt Batch1a_cs.txt \
| neuralmonkey/lib/subword_nmt/learn_bpe.py -s 8000 \
> exp-nm-mt/data/merge_file.bpe
You can change number of merges, this number is equivalent to the size of the vocabulary. Do not forget that as an input is the file containing both source and target sides.
Part II. - The Model Configuration¶
In this section, we create the configuration file
translation.ini
needed for the machine translation training.
We mention only the differences from the main post-editing tutorial.
1 - Datasets¶
- For training, we prepare two datasets. Since we are using BPE, we need to
- define the preprocessor. The configuration of the datasets looks like this:
[train_data]
class=dataset.load_dataset_from_files
s_source="exp-nm-mt/data/train/Batch1a_en.txt.gz"
s_target="exp-nm-mt/data/train/Batch1a_cs.txt.gz"
preprocessors=[("source", "source_bpe", <bpe_preprocess>), ("target", "target_bpe", <bpe_preprocess>)]
[val_data]
class=dataset.load_dataset_from_files
s_source="exp-nm-mt/data/dev/Batch2a_en.txt.gz"
s_target="exp-nm-mt/data/dev/Batch2a_cs.txt.gz"
preprocessors=[("source", "source_bpe", <bpe_preprocess>), ("target", "target_bpe", <bpe_preprocess>)]
2 - Preprocessor and Postprocessor¶
We need to tell the Neural Monkey how it should handle preprocessing and postprocessing due to the BPE:
[bpe_preprocess]
class=processors.bpe.BPEPreprocessor
merge_file="exp-nm-mt/data/merge_file.bpe"
[bpe_postprocess]
class=processors.bpe.BPEPostprocessor
3 - Vocabularies¶
For both encoder and decoder we use shared vocabulary created from BPE merges:
[shared_vocabulary]
class=vocabulary.from_bpe
path="exp-nm-mt/data/merge_file.bpe"
4 - Encoder and Decoder¶
The encoder and decored are similar to those from the post-editing tutorial:
[encoder]
class=encoders.recurrent.SentenceEncoder
name="sentence_encoder"
rnn_size=300
max_input_len=50
embedding_size=300
dropout_keep_prob=0.8
data_id="source_bpe"
vocabulary=<shared_vocabulary>
[attention]
class=attentions.Attention
name="att_sent_enc"
encoder=<encoder>
state_size=300
dropout_keep_prob=0.8
[decoder]
class=decoders.decoder.Decoder
name="decoder"
encoders=[<encoder>]
attentions=[<attention>]
rnn_size=256
embedding_size=300
dropout_keep_prob=0.8
data_id="target_bpe"
vocabulary=<shared_vocabulary>
max_output_len=50
You can notice that both encoder and decoder uses as input data id the data preprocessed by <bpe_preprocess>.
5 - Training Sections¶
The following sections are described in more detail in the post-editing tutorial:
[trainer]
class=trainers.cross_entropy_trainer.CrossEntropyTrainer
decoders=[<decoder>]
l2_weight=1.0e-8
[runner]
class=runners.runner.GreedyRunner
decoder=<decoder>
output_series="series_named_greedy"
postprocess=<bpe_postprocess>
[bleu]
class=evaluators.bleu.BLEUEvaluator
name="BLEU-4"
[tf_manager]
class=tf_manager.TensorFlowManager
num_threads=4
num_sessions=1
minimize_metric=False
save_n_best=3
As for the main configuration section do not forget to add BPE postprocessing:
[main]
name="machine translation"
output="exp-nm-mt/out-example-translation"
runners=[<runner>]
tf_manager=<tf_manager>
trainer=<trainer>
train_dataset=<train_data>
val_dataset=<val_data>
evaluation=[("series_named_greedy", "target", <bleu>), ("series_named_greedy", "target", evaluators.ter.TER)]
batch_size=80
runners_batch_size=256
epochs=10
validation_period=5000
logging_period=80
Part III. - Running and Evaluation of the Experiment¶
1 - Training¶
The training can be run as simply as:
bin/neuralmonkey-train exp-nm-mt/translation.ini
2 - Resuming Training¶
If training stopped and you want to resume it, you can load pre-trained parameters by specifying the initial_variables
of the model in the [main] section:
[main]
initial_variables=/path/to/variables.data
Note there is actually no file called variables.data
, but three files with this common prefix. The initial_variables
config value should correspond to this prefix.
3 - Evaluation¶
As for the evaluation, you need to create translation_run.ini
:
[main]
test_datasets=[<eval_data>]
; We saved 3 models (save_n_best=3), so there are
; multiple models we could to translate with.
; We can go with the best model, or select one manually:
;variables=["exp-nm-mt/out-example-translation/variables.data.0"]
[bpe_preprocess]
class=processors.bpe.BPEPreprocessor
merge_file="exp-nm-mt/data/merge_file.bpe"
[eval_data]
class=dataset.load_dataset_from_files
s_source="exp-nm-mt/data/test/Batch3a_en.txt.gz"
s_series_named_greedy_out="exp-nm-mt/out-example-translation/evaluation.txt.out"
preprocessors=[("source", "source_bpe", <bpe_preprocess>)]
and run:
bin/neuralmonkey-run exp-nm-mt/translation.ini exp-nm-mt/translation_run.ini
You are ready to experiment with your own models.
Configuration¶
Experiments with NeuralMonkey are configured using configuration files which specifies the architecture of the model, meta-parameters of the learning, the data, the way the data are processed and the way the model is run.
Syntax¶
The configuration files are based on the syntax of INI files, see e.g., the corresponding Wikipedia page..
Neural Monkey INI files contain
key-value pairs, delimited by an equal sign (=
) with no spaces
around. The key-value pairs are grouped into
sections (Neural Monkey requires all pairs to belong to a section.)
Every section starts with its header which consists of the section name in square brackets. Everything below the header is considered a part of the section.
Comments can appear on their own (otherwise empty) line, prefixed either with a
hash sign (#
) or a semicolon (;
) and possibly indented.
The configuration introduces several additional constructs for the values. There are both atomic values, and compound values.
Supported atomic values are:
- booleans: literals
True
andFalse
- integers: strings that could be interpreted as integers by Python
(e.g.,
1
,002
) - floats: strings that could be interpreted as floats by Python (e.g.,
1.0
,.123
,2.
,2.34e-12
) - strings: string literals in quotes (e.g.,
"walrus"
,"5"
) - section references: string literals in angle brackets (e.g.,
<encoder>
), sections are later interpreted as Python objects - Python names: strings without quotes which are neither booleans, integers
and floats, nor section references (e.g.,
neuralmonkey.encoders.SentenceEncoder
)
On top of that, there are two compound types syntax from Python:
- lists: comma-separated in squared brackets (e.g.,
[1, 2, 3]
) - tuples: comma-separated in round brackets (e.g.,
("target", <ter>)
)
Variables¶
The configuration file can contain a [vars]
section, defining
variables which can be used in the rest of the config file. Their values
can have any of the types listed above. To reference a variable, use its
name preceded by a dollar sign (e.g. $variable
). Variable values can
also be included inside strings using the str.format()
notation.
For example:
[vars]
parent_dir="experiments"
drop_keep_p=0.5
output_dir="{parent_dir}/test_drop{drop_keep_p:.2f}"
prefix="my"
[main]
output=$output_dir
...
[encoder]
name="{prefix}_encoder"
dropout_keep_prob=$drop_keep_p
...
Interpretation¶
Each configuration file contains a [main]
section which is
interpreted as a dictionary having keys specified in the section and
values which are results of interpretation of the right hand sides.
Both the atomic and compound types taken from Python (i.e., everything except the section references) are interpreted as their Python counterparts. (So if you write 42, Neural Monkey actually sees 42.)
Section references are interpreted as references to
objects constructed when interpreting the referenced section. (So if
you write <session_manager>
in a right-hand side and a section
[session_manager]
later in the file, Neural Monkey will construct
a Python object based on the key-value pairs in the section
[session_manager]
.)
Every section except the [main]
and [vars]
sections needs
to contain the key class
with
a value of Python name which is a callable (e.g., a class constructor or a
function). The other keys are used as named arguments of the callable.
Session Manager¶
This and following sections describes TensorFlow Manager from the users’ perspective: what
can be configured in Neural Monkey with respect to TensorFlow. The
configuration of the TensorFlow manager is specified within the INI file in
section with class neuralmonkey.tf_manager.TensorFlowManager
:
[session_manager]
class=tf_manager.TensorFlowManager
...
The session_manager
configuration object is then referenced from the main
section of the configuration:
[main]
tf_manager=<session_manager>
...
Training on GPU¶
You can easily switch between CPU and GPU version by running your experiments in virtual environment containing either CPU or GPU version of TensorFlow without any changes to config files.
Similarly, standard techniques like setting the environment variable
CUDA_VISIBLE_DEVICES
can be used to control which GPUs are accessible for
Neural Monkey.
By default, Neural Monkey prefers to allocate GPU memory stepwise only as
needed. This can create problems with memory
fragmentation. If you know that you can allocate the whole memory at once
add the following parameter the session_manager
section:
gpu_allow_growth=False
You can also restrict TensorFlow to use only a fixed proportion of GPU memory:
per_process_gpu_memory_fraction=0.65
This parameter tells TensorFlow to use only 65% of GPU memory.
Training on CPUs¶
TensorFlow Manager settings also affect training on CPUs.
The line:
num_threads=4
indicates that 4 CPUs should be used for TensorFlow computations.
API Documentation¶
neuralmonkey
package¶
The neuralmonkey package is the root package of this project.
Sub-modules
neuralmonkey¶
neuralmonkey package¶
Decoding functions using multiple attentions for RNN decoders.
See http://arxiv.org/abs/1606.07481
The attention mechanisms used in Neural Monkey are inherited from the
BaseAttention
class defined in this module.
The attention function can be viewed as a soft lookup over an associative memory. The query vector is used to compute a similarity score of the keys of the associative memory and the resulting scores are used as weights in a weighted sum of the values associated with the keys. We call the (unnormalized) similarity scores energies, we call attention distribution the energies after (softmax) normalization, and we call the resulting weighted sum of states a context vector.
Note that it is possible (and true in most cases) that the attention keys are equal to the values. In case of self-attention, even queries are from the same set of vectors.
To abstract over different flavors of attention mechanism, we conceptualize the
procedure as follows: Each attention object has the attention
function
which operates on the query tensor. The attention function receives the query
tensor (the decoder state) and optionally the previous state of the decoder,
and computes the context vector. The function also receives a loop state,
which is used to store data in an autoregressive loop that generates a
sequence.
The attention uses the loop state to store to store attention distributions
and context vectors in time. This structure is called AttentionLoopState
.
To be able to initialize the loop state, each attention object that uses this
feature defines the initial_loop_state
function with empty tensors.
Since there can be many modes in which the decoder that uses the attention
operates, the attention objects have the finalize_loop
method, which takes
the last attention loop state and the name of the mode (a string) and processes
this data to be available in the histories
dictionary. The single and most
used example of two modes are the train and runtime modes of the
autoregressive decoder.
-
class
neuralmonkey.attention.base_attention.
BaseAttention
(name: str, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
The abstract class for the attenion mechanism flavors.
-
__init__
(name: str, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create a new
BaseAttention
object.
-
attention
(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: Any) → Tuple[tensorflow.python.framework.ops.Tensor, Any]¶ Get context vector for a given query.
-
context_vector_size
¶ Return the static size of the context vector.
Returns: An integer specifying the context vector dimension.
-
finalize_loop
(key: str, last_loop_state: Any) → None¶ Store the attention histories from loop state under a given key.
Parameters: - key – The key to the histories dictionary to store the data in.
- last_loop_state – The loop state object from the last state of the decoding loop.
-
histories
¶ Return the attention histories dictionary.
Use this property after it has been populated.
Returns: The attention histories dictionary.
-
initial_loop_state
() → Any¶ Get initial loop state for the attention object.
Returns: The newly created initial loop state object.
-
visualize_attention
(key: str, max_outputs: int = 16) → None¶ Include the attention histories under a given key into a summary.
Parameters: - key – The key to the attention histories dictionary.
- max_outputs – Maximum number of images to save.
-
-
neuralmonkey.attention.base_attention.
empty_attention_loop_state
(batch_size: Union[int, tensorflow.python.framework.ops.Tensor], length: Union[int, tensorflow.python.framework.ops.Tensor], dimension: Union[int, tensorflow.python.framework.ops.Tensor]) → neuralmonkey.attention.namedtuples.AttentionLoopState¶ Create an empty attention loop state.
The attention loop state is a technical object for storing the attention distributions and the context vectors in time. It is used with the
tf.while_loop
dynamic implementation of decoders.Parameters: - batch_size – The size of the batch.
- length – The number of encoder states (keys).
- dimension – The dimension of the context vector
Returns: This function returns an empty attention loop state which means there are two empty Tensors one for attention distributions in time, and one for the attention context vectors in time.
-
neuralmonkey.attention.base_attention.
get_attention_mask
(encoder: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]) → Union[tensorflow.python.framework.ops.Tensor, NoneType]¶ Return the temporal or spatial mask of an encoder.
Parameters: encoder – The encoder to get the mask from. Returns: Either a 2D or a 3D tensor, depending on whether the encoder is temporal (e.g. recurrent encoder) or spatial (e.g. a CNN encoder).
-
neuralmonkey.attention.base_attention.
get_attention_states
(encoder: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]) → tensorflow.python.framework.ops.Tensor¶ Return the temporal or spatial states of an encoder.
Parameters: encoder – The encoder with the states to attend. Returns: Either a 3D or a 4D tensor, depending on whether the encoder is temporal (e.g. recurrent encoder) or spatial (e.g. a CNN encoder). The first two dimensions are (batch, time).
Attention combination strategies.
This modules implements attention combination strategies for multi-encoder scenario when we may want to combine the hidden states of the encoders in more complicated fashion.
Currently there are two attention combination strategies flat and hierarchical (see paper Attention Combination Strategies for Multi-Source Sequence-to-Sequence Learning).
The combination strategies may use the sentinel mechanism which allows the decoder not to attend to the, and extract information on its own hidden state (see paper Knowing when to Look: Adaptive Attention via a Visual Sentinel for Image Captioning).
-
class
neuralmonkey.attention.combination.
FlatMultiAttention
(name: str, encoders: List[Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]], attention_state_size: int, share_attn_projections: bool = False, use_sentinels: bool = False, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.attention.combination.MultiAttention
Flat attention combination strategy.
Using this attention combination strategy, hidden states of the encoders are first projected to the same space (different projection for different encoders) and then we compute a joint distribution over all the hidden states. The context vector is then a weighted sum of another / then projection of the encoders hidden states. The sentinel vector can be added as an additional hidden state.
See equations 8 to 10 in the Attention Combination Strategies paper.
-
__init__
(name: str, encoders: List[Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]], attention_state_size: int, share_attn_projections: bool = False, use_sentinels: bool = False, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create a new
BaseAttention
object.
-
attention
(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.namedtuples.AttentionLoopState) → Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.namedtuples.AttentionLoopState]¶ Get context vector for given decoder state.
-
context_vector_size
¶ Return the static size of the context vector.
Returns: An integer specifying the context vector dimension.
-
finalize_loop
(key: str, last_loop_state: neuralmonkey.attention.namedtuples.AttentionLoopState) → None¶ Store the attention histories from loop state under a given key.
Parameters: - key – The key to the histories dictionary to store the data in.
- last_loop_state – The loop state object from the last state of the decoding loop.
-
get_encoder_projections
(scope)¶
-
initial_loop_state
() → neuralmonkey.attention.namedtuples.AttentionLoopState¶ Get initial loop state for the attention object.
Returns: The newly created initial loop state object.
-
-
class
neuralmonkey.attention.combination.
HierarchicalMultiAttention
(name: str, attentions: List[neuralmonkey.attention.base_attention.BaseAttention], attention_state_size: int, use_sentinels: bool, share_attn_projections: bool, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.attention.combination.MultiAttention
Hierarchical attention combination.
Hierarchical attention combination strategy first computes the context vector for each encoder separately using whatever attention type the encoders have. After that it computes a second attention over the resulting context vectors and optionally the sentinel vector.
See equations 6 and 7 in the Attention Combination Strategies paper.
-
__init__
(name: str, attentions: List[neuralmonkey.attention.base_attention.BaseAttention], attention_state_size: int, use_sentinels: bool, share_attn_projections: bool, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create a new
BaseAttention
object.
-
attention
(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.namedtuples.HierarchicalLoopState) → Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.namedtuples.HierarchicalLoopState]¶ Get context vector for given decoder state.
-
context_vector_size
¶ Return the static size of the context vector.
Returns: An integer specifying the context vector dimension.
-
finalize_loop
(key: str, last_loop_state: Any) → None¶ Store the attention histories from loop state under a given key.
Parameters: - key – The key to the histories dictionary to store the data in.
- last_loop_state – The loop state object from the last state of the decoding loop.
-
initial_loop_state
() → neuralmonkey.attention.namedtuples.HierarchicalLoopState¶ Get initial loop state for the attention object.
Returns: The newly created initial loop state object.
-
-
class
neuralmonkey.attention.combination.
MultiAttention
(name: str, attention_state_size: int, share_attn_projections: bool = False, use_sentinels: bool = False, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.attention.base_attention.BaseAttention
Base class for attention combination.
-
__init__
(name: str, attention_state_size: int, share_attn_projections: bool = False, use_sentinels: bool = False, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create a new
BaseAttention
object.
-
attention
(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: Any) → Tuple[tensorflow.python.framework.ops.Tensor, Any]¶ Get context vector for given decoder state.
-
attn_size
¶
-
attn_v
¶
-
Coverage attention introduced in Tu et al. (2016).
See arxiv.org/abs/1601.04811
The CoverageAttention class inherites from the basic feed-forward attention introduced by Bahdanau et al. (2015)
-
class
neuralmonkey.attention.coverage.
CoverageAttention
(name: str, encoder: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], dropout_keep_prob: float = 1.0, state_size: int = None, max_fertility: int = 5, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.attention.feed_forward.Attention
-
__init__
(name: str, encoder: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], dropout_keep_prob: float = 1.0, state_size: int = None, max_fertility: int = 5, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create a new
BaseAttention
object.
-
get_energies
(y: tensorflow.python.framework.ops.Tensor, weights_in_time: tensorflow.python.framework.ops.Tensor)¶
-
The feed-forward attention mechanism.
This is the attention mechanism used in Bahdanau et al. (2015)
See arxiv.org/abs/1409.0473
-
class
neuralmonkey.attention.feed_forward.
Attention
(name: str, encoder: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], dropout_keep_prob: float = 1.0, state_size: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.attention.base_attention.BaseAttention
-
__init__
(name: str, encoder: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], dropout_keep_prob: float = 1.0, state_size: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create a new
BaseAttention
object.
-
attention
(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.namedtuples.AttentionLoopState) → Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.namedtuples.AttentionLoopState]¶ Get context vector for a given query.
-
attention_mask
¶
-
attention_states
¶
-
bias_term
¶
-
context_vector_size
¶ Return the static size of the context vector.
Returns: An integer specifying the context vector dimension.
-
finalize_loop
(key: str, last_loop_state: neuralmonkey.attention.namedtuples.AttentionLoopState) → None¶ Store the attention histories from loop state under a given key.
Parameters: - key – The key to the histories dictionary to store the data in.
- last_loop_state – The loop state object from the last state of the decoding loop.
-
get_energies
(y, _)¶
-
initial_loop_state
() → neuralmonkey.attention.namedtuples.AttentionLoopState¶ Get initial loop state for the attention object.
Returns: The newly created initial loop state object.
-
key_projection_matrix
¶
-
projection_bias_vector
¶
-
query_projection_matrix
¶
-
similarity_bias_vector
¶
-
state_size
¶
-
-
class
neuralmonkey.attention.namedtuples.
AttentionLoopState
¶ Bases:
neuralmonkey.attention.namedtuples.AttentionLoopState
Basic loop state of an attention mechanism.
-
contexts
¶ A tensor of shape
(query_time, batch, context_dim)
which stores the context vectors for every decoder time step.
-
weights
¶ A tensor of shape
(query_time, batch, keys_len)
which stores the attention distribution over the keys given the query in each decoder time step.
-
-
class
neuralmonkey.attention.namedtuples.
HierarchicalLoopState
¶ Bases:
neuralmonkey.attention.namedtuples.HierarchicalLoopState
Loop state of the hierarchical attention mechanism.
The input to the hierarchical attetnion is the output of a set of underlying (child) attentions. To record the inner states of the underlying attentions, we use the
HierarchicalLoopState
, which holds information about both the underlying attentions, and the top-level attention itself.-
child_loop_states
¶ A list of attention loop states of the underlying attention mechanisms.
-
loop_state
¶ The attention loop state of the top-level attention.
-
-
class
neuralmonkey.attention.namedtuples.
MultiHeadLoopState
¶ Bases:
neuralmonkey.attention.namedtuples.MultiHeadLoopState
Loop state of a multi-head attention.
-
contexts
¶ A tensor of shape
(query_time, batch, context_dim)
which stores the context vectors for every decoder time step.
-
head_weights
¶ A tensor of shape
(query_time, n_heads, batch, keys_len)
which stores the attention distribution over the keys given the query in each decoder time step for each attention head.
-
The scaled dot-product attention mechanism defined in Vaswani et al. (2017).
The attention energies are computed as dot products between the query vector and the key vector. The query vector is scaled down by the square root of its dimensionality. This attention function has no trainable parameters.
See arxiv.org/abs/1706.03762
-
class
neuralmonkey.attention.scaled_dot_product.
MultiHeadAttention
(name: str, n_heads: int, keys_encoder: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], values_encoder: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.attention.base_attention.BaseAttention
-
__init__
(name: str, n_heads: int, keys_encoder: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], values_encoder: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create a new
BaseAttention
object.
-
attention
(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.namedtuples.MultiHeadLoopState) → Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.namedtuples.MultiHeadLoopState]¶ Run a multi-head attention getting context vector for a given query.
This method is an API-wrapper for the global function ‘attention’ defined in this module. Transforms a query of shape(batch, query_size) to shape(batch, 1, query_size) and applies the attention function. Output context has shape(batch, 1, value_size) and weights have shape(batch, n_heads, 1, time(k)). The output is then processed to produce output vector of contexts and the following attention loop state.
Parameters: - query – Input query for the current decoding step of shape(batch, query_size).
- decoder_prev_state – Previous state of the decoder.
- decoder_input – Input to the RNN cell of the decoder.
- loop_state – Attention loop state.
Returns: Vector of contexts and the following attention loop state.
-
context_vector_size
¶ Return the static size of the context vector.
Returns: An integer specifying the context vector dimension.
-
finalize_loop
(key: str, last_loop_state: neuralmonkey.attention.namedtuples.MultiHeadLoopState) → None¶ Store the attention histories from loop state under a given key.
Parameters: - key – The key to the histories dictionary to store the data in.
- last_loop_state – The loop state object from the last state of the decoding loop.
-
initial_loop_state
() → neuralmonkey.attention.namedtuples.MultiHeadLoopState¶ Get initial loop state for the attention object.
Returns: The newly created initial loop state object.
-
visualize_attention
(key: str, max_outputs: int = 16) → None¶ Include the attention histories under a given key into a summary.
Parameters: - key – The key to the attention histories dictionary.
- max_outputs – Maximum number of images to save.
-
-
class
neuralmonkey.attention.scaled_dot_product.
ScaledDotProdAttention
(name: str, keys_encoder: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], values_encoder: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.attention.scaled_dot_product.MultiHeadAttention
-
__init__
(name: str, keys_encoder: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], values_encoder: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create a new
BaseAttention
object.
-
-
neuralmonkey.attention.scaled_dot_product.
attention
(queries: tensorflow.python.framework.ops.Tensor, keys: tensorflow.python.framework.ops.Tensor, values: tensorflow.python.framework.ops.Tensor, keys_mask: tensorflow.python.framework.ops.Tensor, num_heads: int, dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], masked: bool = False, use_bias: bool = False) → tensorflow.python.framework.ops.Tensor¶ Run multi-head scaled dot-product attention.
See arxiv.org/abs/1706.03762
When performing multi-head attention, the queries, keys and values vectors are first split to sets of smaller vectors, one for each attention head. Next, they are transformed using a linear layer and a separate attention (from a corresponding head) is applied on each set of the transformed triple of query, key and value. The resulting contexts from each head are then concatenated and a linear layer is applied on this concatenated output. The following can be summed by following equations:
MultiHead(Q, K, V) = Concat(head_1, ..., head_h) * W_o head_i = Attention(Q * W_Q_i, K * W_K_i, V * W_V_i)
The scaled dot-product attention is a simple dot-product between the query and a transposed key vector. The result is then scaled using square root of the vector dimensions and a softmax layer is applied. Finally, the output of the softmax layer is multiplied by the value vector. See the following equation:
Attention(Q, K, V) = softmax(Q * K^T / √(d_k)) * V
Parameters: - queries – Input queries of shape
(batch, time(q), k_channels)
. - keys – Input keys of shape
(batch, time(k), k_channels)
. - values – Input values of shape
(batch, time(k), v_channels)
. - keys_mask – A float Tensor for masking sequences in keys.
- num_heads – Number of attention heads.
- dropout_callback – Callable function implementing dropout.
- masked – Boolean indicating whether we want to mask future energies.
- use_bias – If True, enable bias in the attention head projections (for all queries, keys and values).
Returns: Contexts of shape
(batch, time(q), v_channels)
and weights of shape(batch, time(q), time(k))
.- queries – Input queries of shape
-
neuralmonkey.attention.scaled_dot_product.
empty_multi_head_loop_state
(batch_size: Union[int, tensorflow.python.framework.ops.Tensor], num_heads: Union[int, tensorflow.python.framework.ops.Tensor], length: Union[int, tensorflow.python.framework.ops.Tensor], dimension: Union[int, tensorflow.python.framework.ops.Tensor]) → neuralmonkey.attention.namedtuples.MultiHeadLoopState¶
-
neuralmonkey.attention.scaled_dot_product.
mask_energies
(energies_4d: tensorflow.python.framework.ops.Tensor, mask: tensorflow.python.framework.ops.Tensor, mask_value=-1000000000.0) → tensorflow.python.framework.ops.Tensor¶ Apply mask to the attention energies before passing to softmax.
Parameters: - energies_4d – Energies of shape
(batch, n_heads, time(q), time(k))
. - mask – Float Tensor of zeros and ones of shape
(batch, time(k))
, specifies valid positions in the energies tensor. - mask_value – Value used to mask energies. Default taken value from tensor2tensor.
Returns: Energies (logits) of valid positions. Same shape as
energies_4d
.Note
We do not use
mask_value=-np.inf
to avoid potential underflow.- energies_4d – Energies of shape
-
neuralmonkey.attention.scaled_dot_product.
mask_future
(energies: tensorflow.python.framework.ops.Tensor, mask_value=-1000000000.0) → tensorflow.python.framework.ops.Tensor¶ Mask energies of keys using lower triangular matrix.
Mask simulates autoregressive decoding, such that it prevents the attention to look at what has not yet been decoded. Mask is not necessary during training when true output values are used instead of the decoded ones.
Parameters: - energies – A tensor to mask.
- mask_value – Value used to mask energies.
Returns: Masked energies tensor.
-
neuralmonkey.attention.scaled_dot_product.
split_for_heads
(x: tensorflow.python.framework.ops.Tensor, n_heads: int, head_dim: int) → tensorflow.python.framework.ops.Tensor¶ Split a tensor for multi-head attention.
Split last dimension of 3D vector of shape
(batch, time, dim)
and return a 4D vector with shape(batch, n_heads, time, dim/n_heads)
.Parameters: - x – input Tensor of shape
(batch, time, dim)
. - n_heads – Number of attention heads.
- head_dim – Dimension of the attention heads.
Returns: A 4D Tensor of shape
(batch, n_heads, time, head_dim/n_heads)
- x – input Tensor of shape
-
class
neuralmonkey.attention.stateful_context.
StatefulContext
(name: str, encoder: neuralmonkey.model.stateful.Stateful, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.attention.base_attention.BaseAttention
Provides a Stateful encoder’s output as context to a decoder.
This is not really an attention mechanism, but rather a hack which (mis)uses the attention interface to provide a “static” context vector to the decoder cell. In other words, the context vector is the same for all positions in the sequence and doesn’t depend on the query vector.
To use this, simply pass an instance of this class to the decoder using the attentions parameter.
-
__init__
(name: str, encoder: neuralmonkey.model.stateful.Stateful, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create a new
BaseAttention
object.
-
attention
(query: tensorflow.python.framework.ops.Tensor, decoder_prev_state: tensorflow.python.framework.ops.Tensor, decoder_input: tensorflow.python.framework.ops.Tensor, loop_state: neuralmonkey.attention.namedtuples.AttentionLoopState) → Tuple[tensorflow.python.framework.ops.Tensor, neuralmonkey.attention.namedtuples.AttentionLoopState]¶ Get context vector for a given query.
-
attention_mask
¶
-
attention_states
¶
-
context_vector_size
¶ Return the static size of the context vector.
Returns: An integer specifying the context vector dimension.
-
finalize_loop
(key: str, last_loop_state: neuralmonkey.attention.namedtuples.AttentionLoopState) → None¶ Store the attention histories from loop state under a given key.
Parameters: - key – The key to the histories dictionary to store the data in.
- last_loop_state – The loop state object from the last state of the decoding loop.
-
initial_loop_state
() → neuralmonkey.attention.namedtuples.AttentionLoopState¶ Get initial loop state for the attention object.
Returns: The newly created initial loop state object.
-
state_size
¶
-
visualize_attention
(key: str, max_outputs: int = 16) → None¶ Include the attention histories under a given key into a summary.
Parameters: - key – The key to the attention histories dictionary.
- max_outputs – Maximum number of images to save.
-
Input combination strategies for multi-source Transformer decoder.
-
neuralmonkey.attention.transformer_cross_layer.
flat
(queries: tensorflow.python.framework.ops.Tensor, encoder_states: List[tensorflow.python.framework.ops.Tensor], encoder_masks: List[tensorflow.python.framework.ops.Tensor], heads: int, attention_dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]) → tensorflow.python.framework.ops.Tensor¶ Run attention with flat input combination.
The procedure is as follows: 1. concatenate the states and mask along the time axis 2. run attention over the concatenation
Parameters: - queries – The input for the attention.
- encoder_states – The states of each encoder.
- encoder_masks – The temporal mask of each encoder.
- heads – Number of attention heads to use for each encoder.
- attention_dropout_callbacks – Dropout functions to apply in attention over each encoder.
- dropout_callback – The dropout function to apply on the output of the attention.
Returns: A Tensor that contains the context vector.
-
neuralmonkey.attention.transformer_cross_layer.
hierarchical
(queries: tensorflow.python.framework.ops.Tensor, encoder_states: List[tensorflow.python.framework.ops.Tensor], encoder_masks: List[tensorflow.python.framework.ops.Tensor], heads: List[int], heads_hier: int, attention_dropout_callbacks: List[Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]], dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]) → tensorflow.python.framework.ops.Tensor¶ Run attention with hierarchical input combination.
The procedure is as follows: 1. normalize queries 2. attend to every encoder 3. attend to the resulting context vectors (reuse normalized queries) 4. apply dropout, add residual connection and return
Parameters: - queries – The input for the attention.
- encoder_states – The states of each encoder.
- encoder_masks – The temporal mask of each encoder.
- heads – Number of attention heads to use for each encoder.
- heads_hier – Number of attention heads to use in the second attention.
- attention_dropout_callbacks – Dropout functions to apply in attention over each encoder.
- dropout_callback – The dropout function to apply in the second attention and over the outputs of each sub-attention.
Returns: A Tensor that contains the context vector.
-
neuralmonkey.attention.transformer_cross_layer.
parallel
(queries: tensorflow.python.framework.ops.Tensor, encoder_states: List[tensorflow.python.framework.ops.Tensor], encoder_masks: List[tensorflow.python.framework.ops.Tensor], heads: List[int], attention_dropout_callbacks: List[Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]], dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]) → tensorflow.python.framework.ops.Tensor¶ Run attention with parallel input combination.
The procedure is as follows: 1. normalize queries, 2. attend and dropout independently for every encoder, 3. sum up the results 4. add residual and return
Parameters: - queries – The input for the attention.
- encoder_states – The states of each encoder.
- encoder_masks – The temporal mask of each encoder.
- heads – Number of attention heads to use for each encoder.
- attention_dropout_callbacks – Dropout functions to apply in attention over each encoder.
- dropout_callback – The dropout function to apply on the outputs of each sub-attention.
Returns: A Tensor that contains the context vector.
-
neuralmonkey.attention.transformer_cross_layer.
serial
(queries: tensorflow.python.framework.ops.Tensor, encoder_states: List[tensorflow.python.framework.ops.Tensor], encoder_masks: List[tensorflow.python.framework.ops.Tensor], heads: List[int], attention_dropout_callbacks: List[Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]], dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]) → tensorflow.python.framework.ops.Tensor¶ Run attention with serial input combination.
The procedure is as follows: 1. repeat for every encoder:
- lnorm + attend + dropout + add residual
- update queries between layers
Parameters: - queries – The input for the attention.
- encoder_states – The states of each encoder.
- encoder_masks – The temporal mask of each encoder.
- heads – Number of attention heads to use for each encoder.
- attention_dropout_callbacks – Dropout functions to apply in attention over each encoder.
- dropout_callback – The dropout function to apply on the outputs of each sub-attention.
Returns: A Tensor that contains the context vector.
-
neuralmonkey.attention.transformer_cross_layer.
single
(queries: tensorflow.python.framework.ops.Tensor, states: tensorflow.python.framework.ops.Tensor, mask: tensorflow.python.framework.ops.Tensor, n_heads: int, attention_dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], dropout_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], normalize: bool = True, use_dropout: bool = True, residual: bool = True, use_att_transform_bias: bool = False)¶ Run attention on a single encoder.
Parameters: - queries – The input for the attention.
- states – The encoder states (keys & values).
- mask – The temporal mask of the encoder.
- n_heads – Number of attention heads to use.
- attention_dropout_callback – Dropout function to apply in attention.
- dropout_callback – Dropout function to apply on the attention output.
- normalize – If True, run layer normalization on the queries.
- use_dropout – If True, perform dropout on the attention output.
- residual – If True, sum the context vector with the input queries.
- use_att_transform_bias – If True, enable bias in the attention head projections (for all queries, keys and values).
Returns: A Tensor that contains the context vector.
Abstract class for autoregressive decoding.
Either for the recurrent decoder, or for the transformer decoder.
The autoregressive decoder uses the while loop to get the outputs. Descendants should only specify the initial state and the while loop body.
-
class
neuralmonkey.decoders.autoregressive.
AutoregressiveDecoder
(name: str, vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, max_output_len: int, dropout_keep_prob: float = 1.0, embedding_size: int = None, embeddings_source: neuralmonkey.model.sequence.EmbeddedSequence = None, tie_embeddings: bool = False, label_smoothing: float = None, supress_unk: bool = False, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
-
__init__
(name: str, vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, max_output_len: int, dropout_keep_prob: float = 1.0, embedding_size: int = None, embeddings_source: neuralmonkey.model.sequence.EmbeddedSequence = None, tie_embeddings: bool = False, label_smoothing: float = None, supress_unk: bool = False, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Initialize parameters common for all autoregressive decoders.
Parameters: - name – Name of the decoder. Should be unique accross all Neural Monkey objects.
- vocabulary – Target vocabulary.
- data_id – Target data series.
- max_output_len – Maximum length of an output sequence.
- reuse – Reuse the variables from the model part.
- dropout_keep_prob – Probability of keeping a value during dropout.
- embedding_size – Size of embedding vectors for target words.
- embeddings_source – Embedded sequence to take embeddings from.
- tie_embeddings – Use decoder.embedding_matrix also in place of the output decoding matrix.
- label_smoothing – Label smoothing parameter.
- supress_unk – If true, decoder will not produce symbols for unknown tokens.
-
cost
¶
-
decoded
¶
-
decoding_b
¶
-
decoding_loop
(train_mode: bool, sample: bool = False, temperature: float = 1) → Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor]¶ Run the decoding while loop.
Calls get_initial_loop_state and constructs tf.while_loop with the continuation criterion returned from loop_continue_criterion, and body function returned from get_body.
After finishing the tf.while_loop, it calls finalize_loop to further postprocess the final decoder loop state (usually by stacking Tensors containing decoding histories).
Parameters: - train_mode – Boolean flag, telling whether this is a training run.
- sample – Boolean flag, telling whether we should sample the output symbols from the output distribution instead of using argmax or gold data.
- temperature – float value specifying the softmax temperature
-
decoding_w
¶
-
embedding_matrix
¶ Variables and operations for embedding of input words.
If we are reusing word embeddings, this function takes the embedding matrix from the first encoder
-
embedding_size
¶
-
feed_dict
(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → Dict[tensorflow.python.framework.ops.Tensor, Any]¶ Populate the feed dictionary for the decoder object.
Parameters: - dataset – The dataset to use for the decoder.
- train – Boolean flag, telling whether this is a training run.
-
finalize_loop
(final_loop_state: neuralmonkey.decoders.autoregressive.LoopState, train_mode: bool) → None¶ Execute post-while loop operations.
Parameters: - final_loop_state – Decoder loop state at the end of the decoding loop.
- train_mode – Boolean flag, telling whether this is a training run.
-
get_body
(train_mode: bool, sample: bool = False, temperature: float = 1) → Callable¶ Return the while loop body function.
-
get_initial_loop_state
() → neuralmonkey.decoders.autoregressive.LoopState¶
-
get_logits
(state: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶ Project the decoder’s output layer to logits over the vocabulary.
-
go_symbols
¶
-
loop_continue_criterion
(*args) → tensorflow.python.framework.ops.Tensor¶ Decide whether to break out of the while loop.
Parameters: loop_state – LoopState
instance (see the docs for this module). Represents current decoder loop state.
-
output_dimension
¶
-
runtime_logits
¶
-
runtime_logprobs
¶
-
runtime_loop_result
¶
-
runtime_loss
¶
-
runtime_mask
¶
-
runtime_output_states
¶
-
runtime_xents
¶
-
train_inputs
¶
-
train_logits
¶
-
train_logprobs
¶
-
train_loop_result
¶
-
train_loss
¶
-
train_mask
¶
-
train_output_states
¶
-
train_xents
¶
-
-
class
neuralmonkey.decoders.autoregressive.
DecoderConstants
¶ Bases:
neuralmonkey.decoders.autoregressive.DecoderConstants
The constants used by an autoregressive decoder.
-
train_inputs
¶ During training, this is populated by the target token ids.
-
-
class
neuralmonkey.decoders.autoregressive.
DecoderFeedables
¶ Bases:
neuralmonkey.decoders.autoregressive.DecoderFeedables
The input of a single step of an autoregressive decoder.
-
step
¶ A scalar int tensor, stores the number of the current time step.
-
finished
¶ A boolean tensor of shape
(batch)
, which says whether the decoding of a sentence in the batch is finished or not. (E.g. whether the end token has already been generated.)
-
input_symbol
¶ A boolean
batch
-sized tensor with the inputs to the decoder. During inference, this contains the previously generated tokens. During training, this contains the reference tokens.
-
prev_logits
¶ A tensor of shape
(batch, vocabulary)
. Contains the logits from the previous decoding step.
-
-
class
neuralmonkey.decoders.autoregressive.
DecoderHistories
¶ Bases:
neuralmonkey.decoders.autoregressive.DecoderHistories
The values collected during the run of an autoregressive decoder.
-
logits
¶ A tensor of shape
(time, batch, vocabulary)
which contains the unnormalized output scores of words in a vocabulary.
-
decoder_outputs
¶ A tensor of shape
(time, batch, state_size)
. The states of the decoder before the final output (logit) projection.
-
outputs
¶ An int tensor of shape
(time, batch)
. Stores the generated symbols. (Either an argmax-ed value from the logits, or a target token, during training.)
-
mask
¶ A float tensor of zeros and ones of shape
(time, batch)
. Keeps track of valid positions in the decoded data.
-
-
class
neuralmonkey.decoders.autoregressive.
LoopState
¶ Bases:
neuralmonkey.decoders.autoregressive.LoopState
The loop state object.
The LoopState is a structure that works with the tf.while_loop function the decoder loop state stores all the information that is not invariant for the decoder run.
-
histories
¶ A set of tensors that grow in time as the decoder proceeds.
-
constants
¶ A set of independent tensors that do not change during the entire decoder run.
-
feedables
¶ A set of tensors used as the input of a single decoder step.
-
Beam search decoder.
This module implements the beam search algorithm for autoregressive decoders.
As any autoregressive decoder, this decoder works dynamically, which means
it uses the tf.while_loop
function conditioned on both maximum output
length and list of finished hypotheses.
The beam search decoder uses four data strcutures during the decoding process.
SearchState
, SearchResults
, BeamSearchLoopState
, and
BeamSearchOutput
. The purpose of these is described in their own docstring.
These structures help the decoder to keep track of the decoding, enabling it to be called e.g. during ensembling, when the content of the structures can be changed and then fed back to the model.
The implementation mimics the API of the AutoregressiveDecoder
class. There
are functions that prepare and return values that are supplied to the
tf.while_loop
function.
-
class
neuralmonkey.decoders.beam_search_decoder.
BeamSearchDecoder
(name: str, parent_decoder: neuralmonkey.decoders.autoregressive.AutoregressiveDecoder, beam_size: int, max_steps: int, length_normalization: float) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
In-graph beam search decoder.
The hypothesis scoring algorithm is taken from https://arxiv.org/pdf/1609.08144.pdf. Length normalization is parameter alpha from equation 14.
-
__init__
(name: str, parent_decoder: neuralmonkey.decoders.autoregressive.AutoregressiveDecoder, beam_size: int, max_steps: int, length_normalization: float) → None¶ Construct the beam search decoder graph.
Parameters: - name – The name for the model part.
- parent_decoder – An autoregressive decoder from which to sample.
- beam_size – The number of hypotheses in the beam.
- max_steps – The maximum number of time steps to perform.
- length_normalization – The alpha parameter from Eq. 14 in the paper.
-
decoder_state
¶
-
decoding_loop
() → neuralmonkey.decoders.beam_search_decoder.BeamSearchOutput¶ Create the decoding loop.
This function mimics the behavior of the
decoding_loop
method of theAutoregressiveDecoder
, except the initial loop state is created outside this method because it is accessed and fed during ensembling.TODO: The
finalize_loop
method and the handling of attention loop states might be implemented in the future.Returns: This method returns a populated BeamSearchOutput
object.
-
expand_to_beam
(val: tensorflow.python.framework.ops.Tensor, dim: int = 0) → tensorflow.python.framework.ops.Tensor¶ Copy a tensor along a new beam dimension.
Parameters: - val – The
Tensor
to expand. - dim – The dimension along which to expand. Usually, the batch axis.
Returns: The expanded tensor.
- val – The
-
get_body
() → Callable[[Any], neuralmonkey.decoders.beam_search_decoder.BeamSearchLoopState]¶ Return a body function for
tf.while_loop
.Returns: A function that performs a single decoding step.
-
get_initial_loop_state
() → neuralmonkey.decoders.beam_search_decoder.BeamSearchLoopState¶ Construct the initial loop state for the beam search decoder.
During the construction, the body function of the underlying decoder is called once to retrieve the initial log probabilities of the first token.
The values are initialized as follows:
search_state
logprob_sum
- For each sentence in batch, logprob sum of the first hypothesis in the beam is set to zero while the others are set to negative infinity.prev_logprobs
- This is the softmax over the logits from the initial decoder step.lengths
- All zeros.finshed
- All false.
search_results
scores
- A (batch, beam)-sized tensor of zeros.token_ids
- A (1, batch, beam)-sized tensor filled with indices of decoder-specific initial input symbols (usually start symbol IDs).
decoder_loop_state
- The loop state of the underlying- autoregressive decoder, as returned from the initial call to the body function.
Returns: A populated BeamSearchLoopState
structure.
-
loop_continue_criterion
(*args) → tensorflow.python.framework.ops.Tensor¶ Decide whether to break out of the while loop.
The criterion for stopping the loop is that either all hypotheses are finished or a maximum number of steps has been reached. Here the number of steps is the number of steps of the underlying decoder minus one, because this function is evaluated after the decoder step has been called and its step has been incremented. This is caused by the fact that we call the decoder body function at the end of the beam body function. (And that, in turn, is to support ensembling.)
Parameters: args – A BeamSearchLoopState
instance.Returns: A scalar boolean Tensor
.
-
search_results
¶
-
search_state
¶
-
vocabulary
¶
-
-
class
neuralmonkey.decoders.beam_search_decoder.
BeamSearchLoopState
¶ Bases:
neuralmonkey.decoders.beam_search_decoder.BeamSearchLoopState
The loop state of the beam search decoder.
A loop state object that is used for transferring data between cycles through the symbolic while loop. It groups together the
SearchState
andSearchResults
structures and also keeps track of the underlying decoder loop state.-
search_state
¶ A
SearchState
object representing the current search state.
-
search_results
¶ The growing
SearchResults
object which accummulates the outputs of the decoding process.
-
decoder_loop_state
¶ The current loop state of the underlying autoregressive decoder.
-
-
class
neuralmonkey.decoders.beam_search_decoder.
BeamSearchOutput
¶ Bases:
neuralmonkey.decoders.beam_search_decoder.BeamSearchOutput
The final structure that is returned from the while loop.
-
last_search_step_output
¶ A populated
SearchResults
object.
-
last_dec_loop_state
¶ Final loop state of the underlying decoder.
-
last_search_state
¶ Final loop state of the beam search decoder.
-
attention_loop_states
¶ The final loop states of the attention objects.
-
-
class
neuralmonkey.decoders.beam_search_decoder.
SearchResults
¶ Bases:
neuralmonkey.decoders.beam_search_decoder.SearchResults
The intermediate results of the beam search decoding.
A cummulative structure that holds the actual decoded tokens and hypotheses scores (after applying a length penalty term).
-
scores
¶ A
(time, batch, beam)
-shaped tensor with the scores for each hypothesis. The score is computed from thelogprob_sum
of a hypothesis and accounting for the hypothesis length.
-
token_ids
¶ A
(time, batch, beam)
-shaped tensor with the vocabulary indices of the tokens in each hypothesis.
-
-
class
neuralmonkey.decoders.beam_search_decoder.
SearchState
¶ Bases:
neuralmonkey.decoders.beam_search_decoder.SearchState
Search state of a beam search decoder.
This structure keeps track of a current state of the beam search algorithm. The search state contains tensors that represent hypotheses in the beam, namely their log probability, length, and distribution over the vocabulary when decoding the last word, as well as if the hypothesis is finished or not.
-
logprob_sum
¶ A
(batch, beam)
-shaped tensor with the sums of token log-probabilities of each hypothesis.
-
prev_logprobs
¶ A
(batch, beam, vocabulary)
-sized tensor. Stores the log-distribution over the vocabulary from the previous decoding step for each hypothesis.
-
lengths
¶ A
(batch, beam)
-shaped tensor with the lengths of the hypotheses.
-
finished
¶ A boolean tensor with shape
(batch, beam)
. Marks finished and unfinished hypotheses.
-
-
class
neuralmonkey.decoders.classifier.
Classifier
(name: str, encoders: List[neuralmonkey.model.stateful.Stateful], vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, layers: List[int], activation_fn: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function relu>, dropout_keep_prob: float = 0.5, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
A simple MLP classifier over encoders.
The API pretends it is an RNN decoder which always generates a sequence of length exactly one.
-
__init__
(name: str, encoders: List[neuralmonkey.model.stateful.Stateful], vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, layers: List[int], activation_fn: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function relu>, dropout_keep_prob: float = 0.5, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Construct a new instance of the sequence classifier.
Parameters: - name – Name of the decoder. Should be unique accross all Neural Monkey objects
- encoders – Input encoders of the decoder
- vocabulary – Target vocabulary
- data_id – Target data series
- layers –
List defining structure of the NN. Ini example: layers=[100,20,5] ;creates classifier with hidden layers of
size 100, 20, 5 and one output layer depending on the size of vocabulary - activation_fn – activation function used on the output of each hidden layer.
- dropout_keep_prob – Probability of keeping a value during dropout
-
cost
¶
-
decoded
¶
-
decoded_logits
¶
-
decoded_seq
¶
-
feed_dict
(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → Dict[tensorflow.python.framework.ops.Tensor, Any]¶ Return a feed dictionary for the given feedable object.
Parameters: - dataset – A dataset instance from which to get the data.
- train – Boolean indicating whether the model runs in training mode.
Returns: A FeedDict dictionary object.
-
loss_with_decoded_ins
¶
-
loss_with_gt_ins
¶
-
runtime_logprobs
¶
-
runtime_loss
¶
-
train_loss
¶
-
-
class
neuralmonkey.decoders.ctc_decoder.
CTCDecoder
(name: str, encoder: neuralmonkey.model.stateful.TemporalStateful, vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, max_length: int = None, merge_repeated_targets: bool = False, merge_repeated_outputs: bool = True, beam_width: int = 1, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
Connectionist Temporal Classification.
See tf.nn.ctc_loss, tf.nn.ctc_greedy_decoder etc.
-
__init__
(name: str, encoder: neuralmonkey.model.stateful.TemporalStateful, vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, max_length: int = None, merge_repeated_targets: bool = False, merge_repeated_outputs: bool = True, beam_width: int = 1, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Construct a new parameterized object.
Parameters: - name – The name for the model part. Will be used in the variable and name scopes.
- reuse – Optional parameterized part with which to share parameters.
- save_checkpoint – Optional path to a checkpoint file which will store the parameters of this object.
- load_checkpoint – Optional path to a checkpoint file from which to load initial variables for this object.
- initializers – An InitializerSpecs instance with specification of the initializers.
-
cost
¶
-
decoded
¶
-
feed_dict
(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → Dict[tensorflow.python.framework.ops.Tensor, Any]¶ Return a feed dictionary for the given feedable object.
Parameters: - dataset – A dataset instance from which to get the data.
- train – Boolean indicating whether the model runs in training mode.
Returns: A FeedDict dictionary object.
-
logits
¶
-
runtime_loss
¶
-
train_loss
¶
-
train_targets
¶
-
-
class
neuralmonkey.decoders.decoder.
Decoder
(encoders: List[neuralmonkey.model.stateful.Stateful], vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, name: str, max_output_len: int, dropout_keep_prob: float = 1.0, embedding_size: int = None, embeddings_source: neuralmonkey.model.sequence.EmbeddedSequence = None, tie_embeddings: bool = False, label_smoothing: float = None, rnn_size: int = None, output_projection: Union[Tuple[Callable[[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor, List[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], int], Callable[[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor, List[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]] = None, encoder_projection: Callable[[tensorflow.python.framework.ops.Tensor, int, List[neuralmonkey.model.stateful.Stateful]], tensorflow.python.framework.ops.Tensor] = None, attentions: List[neuralmonkey.attention.base_attention.BaseAttention] = None, attention_on_input: bool = False, rnn_cell: str = 'GRU', conditional_gru: bool = False, supress_unk: bool = False, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.decoders.autoregressive.AutoregressiveDecoder
A class managing parts of the computation graph used during decoding.
-
__init__
(encoders: List[neuralmonkey.model.stateful.Stateful], vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, name: str, max_output_len: int, dropout_keep_prob: float = 1.0, embedding_size: int = None, embeddings_source: neuralmonkey.model.sequence.EmbeddedSequence = None, tie_embeddings: bool = False, label_smoothing: float = None, rnn_size: int = None, output_projection: Union[Tuple[Callable[[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor, List[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], int], Callable[[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor, List[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor]] = None, encoder_projection: Callable[[tensorflow.python.framework.ops.Tensor, int, List[neuralmonkey.model.stateful.Stateful]], tensorflow.python.framework.ops.Tensor] = None, attentions: List[neuralmonkey.attention.base_attention.BaseAttention] = None, attention_on_input: bool = False, rnn_cell: str = 'GRU', conditional_gru: bool = False, supress_unk: bool = False, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create a refactored version of monster decoder.
Parameters: - encoders – Input encoders of the decoder.
- vocabulary – Target vocabulary.
- data_id – Target data series.
- name – Name of the decoder. Should be unique accross all Neural Monkey objects.
- max_output_len – Maximum length of an output sequence.
- dropout_keep_prob – Probability of keeping a value during dropout.
- embedding_size – Size of embedding vectors for target words.
- embeddings_source – Embedded sequence to take embeddings from.
- tie_embeddings – Use decoder.embedding_matrix also in place of the output decoding matrix.
- rnn_size – Size of the decoder hidden state, if None set according to encoders.
- output_projection – How to generate distribution over vocabulary from decoder_outputs.
- encoder_projection – How to construct initial state from encoders.
- attention – The attention object to use. Optional.
- rnn_cell – RNN Cell used by the decoder (GRU or LSTM).
- conditional_gru – Flag whether to use the Conditional GRU architecture.
- attention_on_input – Flag whether attention from previous decoding step should be combined with the input in the next step.
- supress_unk – If true, decoder will not produce symbols for unknown tokens.
- reuse – Reuse the model variables from the given model part.
-
embed_input_symbol
(*args) → tensorflow.python.framework.ops.Tensor¶
-
encoder_projection
¶
-
finalize_loop
(final_loop_state: neuralmonkey.decoders.autoregressive.LoopState, train_mode: bool) → None¶ Execute post-while loop operations.
Parameters: - final_loop_state – Decoder loop state at the end of the decoding loop.
- train_mode – Boolean flag, telling whether this is a training run.
-
get_body
(train_mode: bool, sample: bool = False, temperature: float = 1) → Callable¶ Return the while loop body function.
-
get_initial_loop_state
() → neuralmonkey.decoders.autoregressive.LoopState¶
-
initial_state
¶ Compute initial decoder state.
The part of the computation graph that computes the initial state of the decoder.
-
input_plus_attention
(*args) → tensorflow.python.framework.ops.Tensor¶ Merge input and previous attentions.
Input and previous attentions are merged into a single vector of the size fo embedding.
-
output_dimension
¶
-
output_projection
¶
-
output_projection_spec
¶
-
rnn_size
¶
-
-
class
neuralmonkey.decoders.decoder.
RNNFeedables
¶ Bases:
neuralmonkey.decoders.decoder.RNNFeedables
The feedables used by an RNN-based decoder.
Shares attributes with the
DecoderFeedables
class. The special attributes are listed below.-
prev_rnn_state
¶ The recurrent state from the previous step. A tensor of shape
(batch, rnn_size)
-
prev_rnn_output
¶ The output of the recurrent network from the previous step. A tensor of shape
(batch, output_size)
-
prev_contexts
¶ A list of context vectors returned from attention mechanisms. Tensors of shape
(batch, encoder_state_size)
for each attended encoder.
-
-
class
neuralmonkey.decoders.decoder.
RNNHistories
¶ Bases:
neuralmonkey.decoders.decoder.RNNHistories
The loop state histories for RNN-based decoders.
Shares attributes with the
DecoderHistories
class. The special attributes are listed below.-
attention_histories
¶ A list of
AttentionLoopState
objects (or similar) populated by values from the attention mechanisms used in the decoder.
-
Encoder Projection Module.
This module contains different variants of projection of encoders into the initial state of the decoder.
Encoder projections are specified in the configuration file. Each encoder
projection function has a unified type EncoderProjection
, which is a
callable that takes three arguments:
train_mode
– boolean tensor specifying whether the train mode is onrnn_size
– the size of the resulting initial stateencoders
– a list ofStateful
objects used as the encoders.
To enable further parameterization of encoder projection functions, one can use higher-order functions.
-
neuralmonkey.decoders.encoder_projection.
concat_encoder_projection
(train_mode: tensorflow.python.framework.ops.Tensor, rnn_size: int = None, encoders: List[neuralmonkey.model.stateful.Stateful] = None) → tensorflow.python.framework.ops.Tensor¶ Concatenate the encoded values of the encoders.
-
neuralmonkey.decoders.encoder_projection.
empty_initial_state
(train_mode: tensorflow.python.framework.ops.Tensor, rnn_size: int, encoders: List[neuralmonkey.model.stateful.Stateful] = None) → tensorflow.python.framework.ops.Tensor¶ Return an empty vector.
-
neuralmonkey.decoders.encoder_projection.
linear_encoder_projection
(dropout_keep_prob: float) → Callable[[tensorflow.python.framework.ops.Tensor, int, List[neuralmonkey.model.stateful.Stateful]], tensorflow.python.framework.ops.Tensor]¶ Return a linear encoder projection.
Return a projection function which applies dropout on concatenated encoder final states and returns a linear projection to a rnn_size-sized tensor.
Parameters: dropout_keep_prob – The dropout keep probability
-
neuralmonkey.decoders.encoder_projection.
nematus_projection
(dropout_keep_prob: float = 1.0) → Callable[[tensorflow.python.framework.ops.Tensor, int, List[neuralmonkey.model.stateful.Stateful]], tensorflow.python.framework.ops.Tensor]¶ Return encoder projection used in Nematus.
The initial state is a dense projection with tanh activation computed on the averaged states of the encoders. Dropout is applied to the means (before the projection).
Parameters: dropout_keep_prob – The dropout keep probability.
Output Projection Module.
This module contains different variants of projection functions of decoder outputs into the logit function inputs.
Output projections are specified in the configuration file. Each output
projection function has a unified type OutputProjection
, which is a
callable that takes four arguments and returns a tensor:
prev_state
– the hidden state of the decoder.prev_output
– embedding of the previously decoded word (or train input)ctx_tensots
– a list of context vectors (for each attention object)
To enable further parameterization of output projection functions, one can use higher-order functions.
-
neuralmonkey.decoders.output_projection.
maxout_output
(maxout_size: int) → Tuple[Callable[[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor, List[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], int]¶ Apply maxout.
Compute RNN output out of the previous state and output, and the context tensors returned from attention mechanisms, as described in the article
This function corresponds to the equations for computation the t_tilde in the Bahdanau et al. (2015) paper, on page 14, with the maxout projection, before the last linear projection.
Parameters: maxout_size – The size of the hidden maxout layer in the deep output Returns: Returns the maxout projection of the concatenated inputs
-
neuralmonkey.decoders.output_projection.
mlp_output
(layer_sizes: List[int], activation: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function tanh>, dropout_keep_prob: float = 1.0) → Tuple[Callable[[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor, List[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], int]¶ Apply a multilayer perceptron.
Compute RNN deep output using the multilayer perceptron with a specified activation function. (Pascanu et al., 2013 [https://arxiv.org/pdf/1312.6026v5.pdf])
Parameters: - layer_sizes – A list of sizes of the hiddel layers of the MLP
- dropout_keep_prob – the dropout keep probability
- activation – The activation function to use in each layer.
-
neuralmonkey.decoders.output_projection.
nematus_output
(output_size: int, activation_fn: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function tanh>, dropout_keep_prob: float = 1.0) → Tuple[Callable[[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor, List[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], int]¶ Apply nonlinear one-hidden-layer deep output.
Implementation consistent with Nematus. Can be used instead of (and is in theory equivalent to) nonlinear_output.
Projects the RNN state, embedding of the previously outputted word, and concatenation of all context vectors into a shared vector space, sums them up and apply a hyperbolic tangent activation function.
-
neuralmonkey.decoders.output_projection.
nonlinear_output
(output_size: int, activation_fn: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function tanh>) → Tuple[Callable[[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor, List[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], int]¶
-
class
neuralmonkey.decoders.sequence_labeler.
SequenceLabeler
(name: str, encoder: Union[neuralmonkey.encoders.recurrent.RecurrentEncoder, neuralmonkey.encoders.facebook_conv.SentenceEncoder], vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
Classifier assing a label to each encoder’s state.
-
__init__
(name: str, encoder: Union[neuralmonkey.encoders.recurrent.RecurrentEncoder, neuralmonkey.encoders.facebook_conv.SentenceEncoder], vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Construct a new parameterized object.
Parameters: - name – The name for the model part. Will be used in the variable and name scopes.
- reuse – Optional parameterized part with which to share parameters.
- save_checkpoint – Optional path to a checkpoint file which will store the parameters of this object.
- load_checkpoint – Optional path to a checkpoint file from which to load initial variables for this object.
- initializers – An InitializerSpecs instance with specification of the initializers.
-
cost
¶
-
decoded
¶
-
decoding_b
¶
-
decoding_residual_w
¶
-
decoding_w
¶
-
feed_dict
(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → Dict[tensorflow.python.framework.ops.Tensor, Any]¶ Return a feed dictionary for the given feedable object.
Parameters: - dataset – A dataset instance from which to get the data.
- train – Boolean indicating whether the model runs in training mode.
Returns: A FeedDict dictionary object.
-
logits
¶
-
logprobs
¶
-
runtime_loss
¶
-
train_loss
¶
-
-
class
neuralmonkey.decoders.sequence_regressor.
SequenceRegressor
(name: str, encoders: List[neuralmonkey.model.stateful.Stateful], data_id: str, layers: List[int] = None, activation_fn: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function relu>, dropout_keep_prob: float = 1.0, dimension: int = 1, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
A simple MLP regression over encoders.
The API pretends it is an RNN decoder which always generates a sequence of length exactly one.
-
__init__
(name: str, encoders: List[neuralmonkey.model.stateful.Stateful], data_id: str, layers: List[int] = None, activation_fn: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function relu>, dropout_keep_prob: float = 1.0, dimension: int = 1, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Construct a new parameterized object.
Parameters: - name – The name for the model part. Will be used in the variable and name scopes.
- reuse – Optional parameterized part with which to share parameters.
- save_checkpoint – Optional path to a checkpoint file which will store the parameters of this object.
- load_checkpoint – Optional path to a checkpoint file from which to load initial variables for this object.
- initializers – An InitializerSpecs instance with specification of the initializers.
-
cost
¶
-
decoded
¶
-
feed_dict
(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → Dict[tensorflow.python.framework.ops.Tensor, Any]¶ Return a feed dictionary for the given feedable object.
Parameters: - dataset – A dataset instance from which to get the data.
- train – Boolean indicating whether the model runs in training mode.
Returns: A FeedDict dictionary object.
-
predictions
¶
-
runtime_loss
¶
-
train_loss
¶
-
Implementation of the decoder of the Transformer model.
Described in Vaswani et al. (2017), arxiv.org/abs/1706.03762
-
class
neuralmonkey.decoders.transformer.
TransformerDecoder
(name: str, encoders: List[Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]], vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, ff_hidden_size: int, n_heads_self: int, n_heads_enc: Union[List[int], int], depth: int, max_output_len: int, attention_combination_strategy: str = 'serial', n_heads_hier: int = None, dropout_keep_prob: float = 1.0, embedding_size: int = None, embeddings_source: neuralmonkey.model.sequence.EmbeddedSequence = None, tie_embeddings: bool = True, label_smoothing: float = None, self_attention_dropout_keep_prob: float = 1.0, attention_dropout_keep_prob: Union[float, List[float]] = 1.0, use_att_transform_bias: bool = False, supress_unk: bool = False, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.decoders.autoregressive.AutoregressiveDecoder
-
__init__
(name: str, encoders: List[Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful]], vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, ff_hidden_size: int, n_heads_self: int, n_heads_enc: Union[List[int], int], depth: int, max_output_len: int, attention_combination_strategy: str = 'serial', n_heads_hier: int = None, dropout_keep_prob: float = 1.0, embedding_size: int = None, embeddings_source: neuralmonkey.model.sequence.EmbeddedSequence = None, tie_embeddings: bool = True, label_smoothing: float = None, self_attention_dropout_keep_prob: float = 1.0, attention_dropout_keep_prob: Union[float, List[float]] = 1.0, use_att_transform_bias: bool = False, supress_unk: bool = False, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create a decoder of the Transformer model.
Described in Vaswani et al. (2017), arxiv.org/abs/1706.03762
Parameters: - encoders – Input encoders for the decoder.
- vocabulary – Target vocabulary.
- data_id – Target data series.
- name – Name of the decoder. Should be unique accross all Neural Monkey objects.
- max_output_len – Maximum length of an output sequence.
- dropout_keep_prob – Probability of keeping a value during dropout.
- embedding_size – Size of embedding vectors for target words.
- embeddings_source – Embedded sequence to take embeddings from.
- tie_embeddings – Use decoder.embedding_matrix also in place of the output decoding matrix.
- ff_hidden_size – Size of the feedforward sublayers.
- n_heads_self – Number of the self-attention heads.
- n_heads_enc – Number of the attention heads over each encoder.
Either a list which size must be equal to
encoders
, or a single integer. In the latter case, the number of heads is equal for all encoders. - attention_comnbination_strategy – One of
serial
,parallel
,flat
,hierarchical
. Controls the attention combination strategy for enc-dec attention. - n_heads_hier – Number of the attention heads for the second
attention in the
hierarchical
attention combination. - depth – Number of sublayers.
- label_smoothing – A label smoothing parameter for cross entropy loss computation.
- attention_dropout_keep_prob – Probability of keeping a value during dropout on the attention output.
- supress_unk – If true, decoder will not produce symbols for unknown tokens.
- reuse – Reuse the variables from the given model part.
-
dimension
¶
-
embed_inputs
(inputs: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶
-
embedded_train_inputs
¶
-
encoder_attention_sublayer
(queries: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶ Create the encoder-decoder attention sublayer.
-
feedforward_sublayer
(layer_input: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶ Create the feed-forward network sublayer.
-
get_body
(train_mode: bool, sample: bool = False, temperature: float = 1.0) → Callable¶ Return the while loop body function.
-
get_initial_loop_state
() → neuralmonkey.decoders.autoregressive.LoopState¶
-
layer
(level: int, inputs: tensorflow.python.framework.ops.Tensor, mask: tensorflow.python.framework.ops.Tensor) → neuralmonkey.encoders.transformer.TransformerLayer¶
-
output_dimension
¶
-
self_attention_sublayer
(prev_layer: neuralmonkey.encoders.transformer.TransformerLayer) → tensorflow.python.framework.ops.Tensor¶ Create the decoder self-attention sublayer with output mask.
-
train_logits
¶
-
-
class
neuralmonkey.decoders.transformer.
TransformerHistories
¶ Bases:
neuralmonkey.decoders.transformer.TransformerHistories
The loop state histories for the transformer decoder.
Shares attributes with the
DecoderHistories
class. The special attributes are listed below.-
decoded_symbols
¶ A tensor which stores the decoded symbols.
-
input_mask
¶ A float tensor with zeros and ones which marks the valid positions on the input.
-
-
class
neuralmonkey.decoders.word_alignment_decoder.
WordAlignmentDecoder
(encoder: neuralmonkey.encoders.recurrent.RecurrentEncoder, decoder: neuralmonkey.decoders.decoder.Decoder, data_id: str, name: str, reuse: neuralmonkey.model.model_part.ModelPart = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
A decoder that computes soft alignment from an attentive encoder.
Loss is computed as cross-entropy against a reference alignment.
-
__init__
(encoder: neuralmonkey.encoders.recurrent.RecurrentEncoder, decoder: neuralmonkey.decoders.decoder.Decoder, data_id: str, name: str, reuse: neuralmonkey.model.model_part.ModelPart = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Construct a new parameterized object.
Parameters: - name – The name for the model part. Will be used in the variable and name scopes.
- reuse – Optional parameterized part with which to share parameters.
- save_checkpoint – Optional path to a checkpoint file which will store the parameters of this object.
- load_checkpoint – Optional path to a checkpoint file from which to load initial variables for this object.
- initializers – An InitializerSpecs instance with specification of the initializers.
-
alignment_target
¶
-
cost
¶
-
decoded
¶
-
feed_dict
(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → Dict[tensorflow.python.framework.ops.Tensor, Any]¶ Return a feed dictionary for the given feedable object.
Parameters: - dataset – A dataset instance from which to get the data.
- train – Boolean indicating whether the model runs in training mode.
Returns: A FeedDict dictionary object.
-
ref_alignment
¶
-
runtime_loss
¶
-
runtime_outputs
¶
-
train_loss
¶
-
-
class
neuralmonkey.encoders.attentive.
AttentiveEncoder
(name: str, input_sequence: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], hidden_size: int, num_heads: int, output_size: int = None, state_proj_size: int = None, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
,neuralmonkey.model.stateful.TemporalStatefulWithOutput
An encoder with attention over the input and a fixed-dimension output.
Based on “A Structured Self-attentive Sentence Embedding”, https://arxiv.org/abs/1703.03130.
The encoder combines a sequence of vectors into a fixed-size matrix where each row of the matrix is computed using a different attention head. This matrix is exposed as the
temporal_states
property (the time dimension corresponds to the different attention heads). Theoutput
property provides a flattened and, optionally, projected representation of this matrix.-
__init__
(name: str, input_sequence: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful], hidden_size: int, num_heads: int, output_size: int = None, state_proj_size: int = None, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Initialize an instance of the encoder.
-
attention_weights
¶
-
output
¶ Return the object output.
A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.
-
temporal_mask
¶ Return mask for the temporal_states.
A 2D Tensor of shape (batch, time) of type float32 which masks the temporal states so each sequence can have a different length. It should only contain ones or zeros.
-
temporal_states
¶ Return object states in time.
A 3D Tensor of shape (batch, time, state_size) which contains the states of the object in time (e.g. hidden states of a recurrent encoder.
-
CNN for image processing.
-
class
neuralmonkey.encoders.cnn_encoder.
CNNEncoder
(name: str, data_id: str, convolutions: List[Union[Tuple[str, int, int, str, int], Tuple[str, int, int], Tuple[str, int, int, str]]], image_height: int, image_width: int, pixel_dim: int, fully_connected: List[int] = None, batch_normalize: bool = False, dropout_keep_prob: float = 0.5, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
,neuralmonkey.model.stateful.SpatialStatefulWithOutput
An image encoder.
It projects the input image through a serie of convolutioal operations. The projected image is vertically cut and fed to stacked RNN layers which encode the image into a single vector.
-
__init__
(name: str, data_id: str, convolutions: List[Union[Tuple[str, int, int, str, int], Tuple[str, int, int], Tuple[str, int, int, str]]], image_height: int, image_width: int, pixel_dim: int, fully_connected: List[int] = None, batch_normalize: bool = False, dropout_keep_prob: float = 0.5, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Initialize a convolutional network for image processing.
The convolutional network can consist of plain convolutions, max-pooling layers and residual block. In the configuration, they are specified using the following tuples.
- convolution: (“C”, kernel_size, stride, padding, out_channel);
- max / average pooling: (“M”/”A”, kernel_size, stride, padding);
- residual block: (“R”, kernel_size, out_channels).
Padding must be either “valid” or “same”.
Parameters: - convolutions – Configuration of convolutional layers.
- data_id – Identifier of the data series in the dataset.
- image_height – Height of the input image in pixels.
- image_width – Width of the image.
- pixel_dim – Number of color channels in the input images.
- dropout_keep_prob – Probability of keeping neurons active in dropout. Dropout is done between all convolutional layers and fully connected layer.
-
batch_norm_callback
(layer_output: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶
-
feed_dict
(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → Dict[tensorflow.python.framework.ops.Tensor, Any]¶ Return a feed dictionary for the given feedable object.
Parameters: - dataset – A dataset instance from which to get the data.
- train – Boolean indicating whether the model runs in training mode.
Returns: A FeedDict dictionary object.
-
image_input
¶
-
image_mask
¶
-
image_processing_layers
¶ Do all convolutions and return the last conditional map.
No dropout is applied between the convolutional layers. By default, the activation function is ReLU.
-
output
¶ Output vector of the CNN.
If there are specified some fully connected layers, there are applied on top of the last convolutional map. Dropout is applied between all layers, default activation function is ReLU. There are only projection layers, no softmax is applied.
If there is fully_connected layer specified, average-pooled last convolutional map is used as a vector output.
-
spatial_mask
¶ Return mask for the spatial_states.
A 3D Tensor of shape (batch, width, height) of type float32 which masks the spatial states that they can be of different shapes. The mask should only contain ones or zeros.
-
spatial_states
¶ Return object states in space.
A 4D Tensor of shape (batch, width, height, state_size) which contains the states of the object in space (e.g. final layer of a convolution network processing an image.
-
-
class
neuralmonkey.encoders.cnn_encoder.
CNNTemporalView
(name: str, cnn: neuralmonkey.encoders.cnn_encoder.CNNEncoder) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
,neuralmonkey.model.stateful.TemporalStatefulWithOutput
Slice the convolutional maps left to right.
-
__init__
(name: str, cnn: neuralmonkey.encoders.cnn_encoder.CNNEncoder) → None¶ Construct a new parameterized object.
Parameters: - name – The name for the model part. Will be used in the variable and name scopes.
- reuse – Optional parameterized part with which to share parameters.
- save_checkpoint – Optional path to a checkpoint file which will store the parameters of this object.
- load_checkpoint – Optional path to a checkpoint file from which to load initial variables for this object.
- initializers – An InitializerSpecs instance with specification of the initializers.
-
dependencies
¶ Return a list of attribute names regarded as dependents.
-
output
¶ Return the object output.
A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.
-
temporal_mask
¶ Return mask for the temporal_states.
A 2D Tensor of shape (batch, time) of type float32 which masks the temporal states so each sequence can have a different length. It should only contain ones or zeros.
-
temporal_states
¶ Return object states in time.
A 3D Tensor of shape (batch, time, state_size) which contains the states of the object in time (e.g. hidden states of a recurrent encoder.
-
-
neuralmonkey.encoders.cnn_encoder.
plain_convolution
(prev_layer: tensorflow.python.framework.ops.Tensor, prev_mask: tensorflow.python.framework.ops.Tensor, specification: Tuple[str, int, int, str, int], batch_norm_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], layer_num: int) → Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor, int]¶
-
neuralmonkey.encoders.cnn_encoder.
pooling
(prev_layer: tensorflow.python.framework.ops.Tensor, prev_mask: tensorflow.python.framework.ops.Tensor, specification: Tuple[str, int, int, str], layer_num: int) → Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor]¶
-
neuralmonkey.encoders.cnn_encoder.
residual_block
(prev_layer: tensorflow.python.framework.ops.Tensor, prev_mask: tensorflow.python.framework.ops.Tensor, prev_channels: int, specification: Tuple[str, int, int], batch_norm_callback: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], layer_num: int) → Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor, int]¶
From the paper Convolutional Sequence to Sequence Learning.
http://arxiv.org/abs/1705.03122
-
class
neuralmonkey.encoders.facebook_conv.
SentenceEncoder
(name: str, input_sequence: neuralmonkey.model.sequence.EmbeddedSequence, conv_features: int, encoder_layers: int, kernel_width: int = 5, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
,neuralmonkey.model.stateful.TemporalStatefulWithOutput
-
__init__
(name: str, input_sequence: neuralmonkey.model.sequence.EmbeddedSequence, conv_features: int, encoder_layers: int, kernel_width: int = 5, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Construct a new parameterized object.
Parameters: - name – The name for the model part. Will be used in the variable and name scopes.
- reuse – Optional parameterized part with which to share parameters.
- save_checkpoint – Optional path to a checkpoint file which will store the parameters of this object.
- load_checkpoint – Optional path to a checkpoint file from which to load initial variables for this object.
- initializers – An InitializerSpecs instance with specification of the initializers.
-
order_embeddings
¶
-
ordered_embedded_inputs
¶
-
output
¶ Return the object output.
A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.
-
temporal_mask
¶ Return mask for the temporal_states.
A 2D Tensor of shape (batch, time) of type float32 which masks the temporal states so each sequence can have a different length. It should only contain ones or zeros.
-
temporal_states
¶ Return object states in time.
A 3D Tensor of shape (batch, time, state_size) which contains the states of the object in time (e.g. hidden states of a recurrent encoder.
-
Pre-trained ImageNet networks.
-
class
neuralmonkey.encoders.imagenet_encoder.
ImageNet
(name: str, data_id: str, network_type: str, slim_models_path: str, load_checkpoint: str = None, spatial_layer: str = None, encoded_layer: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
,neuralmonkey.model.stateful.SpatialStatefulWithOutput
Pre-trained ImageNet network.
We use the ImageNet networks as they are in the tesnorflow/models repository (https://github.com/tensorflow/models). In order use them, you need to clone the repository and configure the ImageNet object such that it has a full path to “research/slim” in the repository. Visit https://github.com/tensorflow/models/tree/master/research/slim for information about checkpoints of the pre-trained models.
-
__init__
(name: str, data_id: str, network_type: str, slim_models_path: str, load_checkpoint: str = None, spatial_layer: str = None, encoded_layer: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Initialize pre-trained ImageNet network.
Parameters: - name – Name of the model part (the ImageNet network, will be in its scope, independently on name).
- data_id – Id of series with images (list of 3D numpy arrays)
- network_type – Identifier of ImageNet network from TFSlim.
- spatial_layer – String identifier of the convolutional map (model’s endpoint). Check TFSlim documentation for end point specifications.
- encoded_layer – String id of the network layer that will be used as input of a decoder. None means averaging the convolutional maps.
- path_to_models – Path to Slim models in tensorflow/models repository.
- load_checkpoint – Checkpoint file from which the pre-trained network is loaded.
-
feed_dict
(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → Dict[tensorflow.python.framework.ops.Tensor, Any]¶ Return a feed dictionary for the given feedable object.
Parameters: - dataset – A dataset instance from which to get the data.
- train – Boolean indicating whether the model runs in training mode.
Returns: A FeedDict dictionary object.
-
input_image
¶
-
output
¶ Return the object output.
A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.
-
spatial_mask
¶ Return mask for the spatial_states.
A 3D Tensor of shape (batch, width, height) of type float32 which masks the spatial states that they can be of different shapes. The mask should only contain ones or zeros.
-
spatial_states
¶ Return object states in space.
A 4D Tensor of shape (batch, width, height, state_size) which contains the states of the object in space (e.g. final layer of a convolution network processing an image.
-
-
class
neuralmonkey.encoders.imagenet_encoder.
ImageNetSpec
¶ Bases:
neuralmonkey.encoders.imagenet_encoder.ImageNetSpec
Specification of the Imagenet encoder.
Do not use this object directly, instead, use one of the ``get_*``functions in this module.
-
scope
¶ The variable scope of the network to use.
-
image_size
¶ A tuple of two integers giving the image width and height in pixels.
-
apply_net
¶ The function that receives an image and applies the network.
-
-
neuralmonkey.encoders.imagenet_encoder.
get_alexnet
() → neuralmonkey.encoders.imagenet_encoder.ImageNetSpec¶
-
neuralmonkey.encoders.imagenet_encoder.
get_resnet_by_type
(resnet_type: str) → Callable[[], neuralmonkey.encoders.imagenet_encoder.ImageNetSpec]¶
-
neuralmonkey.encoders.imagenet_encoder.
get_vgg_by_type
(vgg_type: str) → Callable[[], neuralmonkey.encoders.imagenet_encoder.ImageNetSpec]¶
-
class
neuralmonkey.encoders.numpy_stateful_filler.
SpatialFiller
(name: str, input_shape: List[int], data_id: str, projection_dim: int = None, ff_hidden_dim: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
,neuralmonkey.model.stateful.SpatialStatefulWithOutput
Placeholder class for 3D numerical input.
This model part is used to feed 3D tensors (e.g., pre-trained convolutional maps image captioning). Optionally, the states are projected to given size.
-
__init__
(name: str, input_shape: List[int], data_id: str, projection_dim: int = None, ff_hidden_dim: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Instantiate SpatialFiller.
Parameters: - name – Name of the model part.
- input_shape – Dimensionality of the input.
- data_id – Name of the data series with numpy objects.
- projection_dim – Optional, dimension of the states projection.
-
feed_dict
(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → Dict[tensorflow.python.framework.ops.Tensor, Any]¶ Return a feed dictionary for the given feedable object.
Parameters: - dataset – A dataset instance from which to get the data.
- train – Boolean indicating whether the model runs in training mode.
Returns: A FeedDict dictionary object.
-
output
¶ Return the object output.
A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.
-
spatial_mask
¶ Return mask for the spatial_states.
A 3D Tensor of shape (batch, width, height) of type float32 which masks the spatial states that they can be of different shapes. The mask should only contain ones or zeros.
-
spatial_states
¶ Return object states in space.
A 4D Tensor of shape (batch, width, height, state_size) which contains the states of the object in space (e.g. final layer of a convolution network processing an image.
-
-
class
neuralmonkey.encoders.numpy_stateful_filler.
StatefulFiller
(name: str, dimension: int, data_id: str, output_shape: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
,neuralmonkey.model.stateful.Stateful
Placeholder class for stateful input.
This model part is used to feed 1D tensors to the model. Optionally, it projects the states to given dimension.
-
__init__
(name: str, dimension: int, data_id: str, output_shape: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Instantiate StatefulFiller.
Parameters: - name – Name of the model part.
- dimension – Dimensionality of the input.
- data_id – Series containing the numpy objects.
- output_shape – Dimension of optional state projection.
-
feed_dict
(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → Dict[tensorflow.python.framework.ops.Tensor, Any]¶ Return a feed dictionary for the given feedable object.
Parameters: - dataset – A dataset instance from which to get the data.
- train – Boolean indicating whether the model runs in training mode.
Returns: A FeedDict dictionary object.
-
output
¶ Return the object output.
A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.
-
-
class
neuralmonkey.encoders.pooling.
SequenceAveragePooling
(name: str, input_sequence: neuralmonkey.model.stateful.TemporalStateful, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.encoders.pooling.SequencePooling
An average pooling layer over a sequence.
Averages a sequence over time to produce a single state.
-
output
¶ Return the object output.
A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.
-
-
class
neuralmonkey.encoders.pooling.
SequenceMaxPooling
(name: str, input_sequence: neuralmonkey.model.stateful.TemporalStateful, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.encoders.pooling.SequencePooling
A max pooling layer over a sequence.
Takes the maximum of a sequence over time to produce a single state.
-
output
¶ Return the object output.
A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.
-
-
class
neuralmonkey.encoders.pooling.
SequencePooling
(name: str, input_sequence: neuralmonkey.model.stateful.TemporalStateful, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
,neuralmonkey.model.stateful.Stateful
An abstract pooling layer over a sequence.
-
__init__
(name: str, input_sequence: neuralmonkey.model.stateful.TemporalStateful, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Initialize an instance of the pooling layer.
-
-
class
neuralmonkey.encoders.raw_rnn_encoder.
RawRNNEncoder
(name: str, data_id: str, input_size: int, rnn_layers: List[Union[Tuple[int], Tuple[int, str], Tuple[int, str, str]]], max_input_len: int = None, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
,neuralmonkey.model.stateful.TemporalStatefulWithOutput
A raw RNN encoder that gets input as a tensor.
-
__init__
(name: str, data_id: str, input_size: int, rnn_layers: List[Union[Tuple[int], Tuple[int, str], Tuple[int, str, str]]], max_input_len: int = None, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create a new instance of the encoder.
Parameters: - data_id – Identifier of the data series fed to this encoder
- name – An unique identifier for this encoder
- rnn_layers – A list of tuples specifying the size and, optionally, the direction (‘forward’, ‘backward’ or ‘bidirectional’) and cell type (‘GRU’ or ‘LSTM’) of each RNN layer.
- dropout_keep_prob – The dropout keep probability (default 1.0)
-
feed_dict
(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → Dict[tensorflow.python.framework.ops.Tensor, Any]¶ Populate the feed dictionary with the encoder inputs.
Parameters: - dataset – The dataset to use
- train – Boolean flag telling whether it is training time
-
output
¶ Return the object output.
A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.
-
temporal_mask
¶ Return mask for the temporal_states.
A 2D Tensor of shape (batch, time) of type float32 which masks the temporal states so each sequence can have a different length. It should only contain ones or zeros.
-
temporal_states
¶ Return object states in time.
A 3D Tensor of shape (batch, time, state_size) which contains the states of the object in time (e.g. hidden states of a recurrent encoder.
-
-
class
neuralmonkey.encoders.recurrent.
DeepSentenceEncoder
(name: str, vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, embedding_size: int, rnn_sizes: List[int], rnn_directions: List[str], rnn_cell: str = 'GRU', add_residual: bool = False, max_input_len: int = None, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None, embedding_initializer: Callable = None) → None¶ Bases:
neuralmonkey.encoders.recurrent.SentenceEncoder
-
__init__
(name: str, vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, embedding_size: int, rnn_sizes: List[int], rnn_directions: List[str], rnn_cell: str = 'GRU', add_residual: bool = False, max_input_len: int = None, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None, embedding_initializer: Callable = None) → None¶ Create a new instance of the deep sentence encoder.
Parameters: - name – ModelPart name.
- vocabulary – The input vocabulary.
- data_id – The input sequence data ID.
- embedding_size – The dimension of the embedding vectors in the input sequence.
- max_input_len – Maximum length of the input sequence (disregard tokens after this position).
- rnn_sizes – The list of dimensions of the RNN hidden state vectors in respective layers.
- rnn_cell – One of “GRU”, “NematusGRU”, “LSTM”. Which kind of memory cell to use.
- rnn_directions – The list of rnn directions in the respective layers. Should be equally long as rnn_sizes. Each item must be one of “forward”, “backward”, “bidirectional”. Determines in what order to process the input sequence. Note that choosing “bidirectional” will double the resulting vector dimension as well as the number of the parameters in the given layer.
- add_residual – Add residual connections to each RNN layer output.
- dropout_keep_prob – 1 - dropout probability.
- save_checkpoint – ModelPart save checkpoint file.
- load_checkpoint – ModelPart load checkpoint file.
-
rnn
¶ Run stacked RNN given sizes and directions.
Inputs of the first RNN are the RNN inputs to the encoder. Outputs from each layer are used as inputs to the next one. As a final state of the stacked RNN, the final state of the final layer is used.
-
-
class
neuralmonkey.encoders.recurrent.
FactoredEncoder
(name: str, vocabularies: List[neuralmonkey.vocabulary.Vocabulary], data_ids: List[str], embedding_sizes: List[int], rnn_size: int, rnn_cell: str = 'GRU', rnn_direction: str = 'bidirectional', add_residual: bool = False, max_input_len: int = None, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None, input_initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.encoders.recurrent.RecurrentEncoder
-
__init__
(name: str, vocabularies: List[neuralmonkey.vocabulary.Vocabulary], data_ids: List[str], embedding_sizes: List[int], rnn_size: int, rnn_cell: str = 'GRU', rnn_direction: str = 'bidirectional', add_residual: bool = False, max_input_len: int = None, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None, input_initializers: List[Tuple[str, Callable]] = None) → None¶ Create a new instance of the factored encoder.
Parameters: - name – ModelPart name.
- vocabularies – The vocabularies for each factor.
- data_ids – The input sequence data ID for each factor.
- embedding_sizes – The dimension of the embedding vectors in the input sequence for each factor.
- max_input_len – Maximum length of the input sequence (disregard tokens after this position).
- rnn_size – The dimension of the RNN hidden state vector.
- rnn_cell – One of “GRU”, “NematusGRU”, “LSTM”. Which kind of memory cell to use.
- rnn_direction – One of “forward”, “backward”, “bidirectional”. In what order to process the input sequence. Note that choosing “bidirectional” will double the resulting vector dimension as well as the number of encoder parameters.
- add_residual – Add residual connections to the RNN layer output.
- dropout_keep_prob – 1 - dropout probability.
- save_checkpoint – ModelPart save checkpoint file.
- load_checkpoint – ModelPart load checkpoint file.
-
-
class
neuralmonkey.encoders.recurrent.
RNNSpec
¶ Bases:
neuralmonkey.encoders.recurrent.RNNSpec
Recurrent neural network specifications.
-
size
¶ The state size.
-
direction
¶ The RNN processing direction. One of
forward
,backward
, andbidirectional
.
-
cell_type
¶ The recurrent cell type to use. Refer to
encoders.recurrent.RNN_CELL_TYPES
for possible values.
-
-
class
neuralmonkey.encoders.recurrent.
RecurrentEncoder
(name: str, input_sequence: neuralmonkey.model.stateful.TemporalStateful, rnn_size: int, rnn_cell: str = 'GRU', rnn_direction: str = 'bidirectional', add_residual: bool = False, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
,neuralmonkey.model.stateful.TemporalStatefulWithOutput
-
__init__
(name: str, input_sequence: neuralmonkey.model.stateful.TemporalStateful, rnn_size: int, rnn_cell: str = 'GRU', rnn_direction: str = 'bidirectional', add_residual: bool = False, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create a new instance of a recurrent encoder.
Parameters: - name – ModelPart name.
- input_seqeunce – The input sequence for the encoder.
- rnn_size – The dimension of the RNN hidden state vector.
- rnn_cell – One of “GRU”, “NematusGRU”, “LSTM”. Which kind of memory cell to use.
- rnn_direction – One of “forward”, “backward”, “bidirectional”. In what order to process the input sequence. Note that choosing “bidirectional” will double the resulting vector dimension as well as the number of encoder parameters.
- add_residual – Add residual connections to the RNN layer output.
- dropout_keep_prob – 1 - dropout probability.
- save_checkpoint – ModelPart save checkpoint file.
- load_checkpoint – ModelPart load checkpoint file.
-
output
¶ Return the object output.
A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.
-
rnn
¶
-
rnn_input
¶
-
temporal_mask
¶ Return mask for the temporal_states.
A 2D Tensor of shape (batch, time) of type float32 which masks the temporal states so each sequence can have a different length. It should only contain ones or zeros.
-
temporal_states
¶ Return object states in time.
A 3D Tensor of shape (batch, time, state_size) which contains the states of the object in time (e.g. hidden states of a recurrent encoder.
-
-
class
neuralmonkey.encoders.recurrent.
SentenceEncoder
(name: str, vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, embedding_size: int, rnn_size: int, rnn_cell: str = 'GRU', rnn_direction: str = 'bidirectional', add_residual: bool = False, max_input_len: int = None, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None, embedding_initializer: Callable = None) → None¶ Bases:
neuralmonkey.encoders.recurrent.RecurrentEncoder
-
__init__
(name: str, vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, embedding_size: int, rnn_size: int, rnn_cell: str = 'GRU', rnn_direction: str = 'bidirectional', add_residual: bool = False, max_input_len: int = None, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None, embedding_initializer: Callable = None) → None¶ Create a new instance of the sentence encoder.
Parameters: - name – ModelPart name.
- vocabulary – The input vocabulary.
- data_id – The input sequence data ID.
- embedding_size – The dimension of the embedding vectors in the input sequence.
- max_input_len – Maximum length of the input sequence (disregard tokens after this position).
- rnn_size – The dimension of the RNN hidden state vector.
- rnn_cell – One of “GRU”, “NematusGRU”, “LSTM”. Which kind of memory cell to use.
- rnn_direction – One of “forward”, “backward”, “bidirectional”. In what order to process the input sequence. Note that choosing “bidirectional” will double the resulting vector dimension as well as the number of encoder parameters.
- add_residual – Add residual connections to the RNN layer output.
- dropout_keep_prob – 1 - dropout probability.
- save_checkpoint – ModelPart save checkpoint file.
- load_checkpoint – ModelPart load checkpoint file.
-
-
neuralmonkey.encoders.recurrent.
rnn_layer
(rnn_input: tensorflow.python.framework.ops.Tensor, lengths: tensorflow.python.framework.ops.Tensor, rnn_spec: neuralmonkey.encoders.recurrent.RNNSpec, add_residual: bool) → Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor]¶ Construct a RNN layer given its inputs and specs.
Parameters: - rnn_inputs – The input sequence to the RNN.
- lengths – Lengths of input sequences.
- rnn_spec – A valid RNNSpec tuple specifying the network architecture.
- add_residual – Add residual connections to the layer output.
Encoder for sentences withou explicit segmentation.
-
class
neuralmonkey.encoders.sentence_cnn_encoder.
SentenceCNNEncoder
(name: str, input_sequence: neuralmonkey.model.sequence.Sequence, segment_size: int, highway_depth: int, rnn_size: int, filters: List[Tuple[int, int]], dropout_keep_prob: float = 1.0, use_noisy_activations: bool = False, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
,neuralmonkey.model.stateful.TemporalStatefulWithOutput
Recurrent over Convolutional Encoder.
Encoder processing a sentence using a CNN then running a bidirectional RNN on the result.
Based on: Jason Lee, Kyunghyun Cho, Thomas Hofmann: Fully Character-Level Neural Machine Translation without Explicit Segmentation.
See https://arxiv.org/pdf/1610.03017.pdf
-
__init__
(name: str, input_sequence: neuralmonkey.model.sequence.Sequence, segment_size: int, highway_depth: int, rnn_size: int, filters: List[Tuple[int, int]], dropout_keep_prob: float = 1.0, use_noisy_activations: bool = False, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create a new instance of the sentence encoder.
Parameters: - name – An unique identifier for this encoder
- segment_size – The size of the segments over which we apply max-pooling.
- highway_depth – Depth of the highway layer.
- rnn_size – The size of the encoder’s hidden state. Note that the actual encoder output state size will be twice as long because it is the result of concatenation of forward and backward hidden states.
- filters – Specification of CNN filters. It is a list of tuples specifying the filter size and number of channels.
Keyword Arguments: dropout_keep_prob – The dropout keep probability (default 1.0)
-
bidirectional_rnn
¶
-
cnn_encoded
¶ 1D convolution with max-pool that processing characters.
-
highway_layer
¶ Highway net projection following the CNN.
-
output
¶ Return the object output.
A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.
-
rnn_cells
() → Tuple[tensorflow.python.ops.rnn_cell_impl.RNNCell, tensorflow.python.ops.rnn_cell_impl.RNNCell]¶ Return the graph template to for creating RNN memory cells.
-
temporal_mask
¶ Return mask for the temporal_states.
A 2D Tensor of shape (batch, time) of type float32 which masks the temporal states so each sequence can have a different length. It should only contain ones or zeros.
-
temporal_states
¶ Return object states in time.
A 3D Tensor of shape (batch, time, state_size) which contains the states of the object in time (e.g. hidden states of a recurrent encoder.
-
Encoder for sentence classification with 1D convolutions and max-pooling.
-
class
neuralmonkey.encoders.sequence_cnn_encoder.
SequenceCNNEncoder
(name: str, vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, embedding_size: int, filters: List[Tuple[int, int]], max_input_len: int = None, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
,neuralmonkey.model.stateful.Stateful
Encoder processing a sequence using a CNN.
-
__init__
(name: str, vocabulary: neuralmonkey.vocabulary.Vocabulary, data_id: str, embedding_size: int, filters: List[Tuple[int, int]], max_input_len: int = None, dropout_keep_prob: float = 1.0, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create a new instance of the CNN sequence encoder.
Based on: Yoon Kim: Convolutional Neural Networks for Sentence Classification (http://emnlp2014.org/papers/pdf/EMNLP2014181.pdf)
Parameters: - vocabulary – Input vocabulary
- data_id – Identifier of the data series fed to this encoder
- name – An unique identifier for this encoder
- max_input_len – Maximum length of an encoded sequence
- embedding_size – The size of the embedding vector assigned to each word
- filters – Specification of CNN filters. It is a list of tuples specifying the filter size and number of channels.
- dropout_keep_prob – The dropout keep probability (default 1.0)
-
embedded_inputs
¶
-
feed_dict
(dataset: neuralmonkey.dataset.Dataset, train: bool = False) → Dict[tensorflow.python.framework.ops.Tensor, Any]¶ Populate the feed dictionary with the encoder inputs.
Parameters: - dataset – The dataset to use
- train – Boolean flag telling whether it is training time
-
output
¶ Return the object output.
A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.
-
Implementation of the encoder of the Transformer model.
Described in Vaswani et al. (2017), arxiv.org/abs/1706.03762
-
class
neuralmonkey.encoders.transformer.
TransformerEncoder
(name: str, input_sequence: neuralmonkey.model.stateful.TemporalStateful, ff_hidden_size: int, depth: int, n_heads: int, dropout_keep_prob: float = 1.0, attention_dropout_keep_prob: float = 1.0, target_space_id: int = None, use_att_transform_bias: bool = False, use_positional_encoding: bool = True, input_for_cross_attention: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, n_cross_att_heads: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
,neuralmonkey.model.stateful.TemporalStatefulWithOutput
-
__init__
(name: str, input_sequence: neuralmonkey.model.stateful.TemporalStateful, ff_hidden_size: int, depth: int, n_heads: int, dropout_keep_prob: float = 1.0, attention_dropout_keep_prob: float = 1.0, target_space_id: int = None, use_att_transform_bias: bool = False, use_positional_encoding: bool = True, input_for_cross_attention: Union[neuralmonkey.model.stateful.TemporalStateful, neuralmonkey.model.stateful.SpatialStateful] = None, n_cross_att_heads: int = None, reuse: neuralmonkey.model.model_part.ModelPart = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Create an encoder of the Transformer model.
Described in Vaswani et al. (2017), arxiv.org/abs/1706.03762
Parameters: - input_sequence – Embedded input sequence.
- name – Name of the decoder. Should be unique accross all Neural Monkey objects.
- reuse – Reuse the model variables.
- dropout_keep_prob – Probability of keeping a value during dropout.
- target_space_id – Specifies the modality of the target space.
- use_att_transform_bias – Add bias when transforming qkv vectors for attention.
- use_positional_encoding – If True, position encoding signal is added to the input.
Keyword Arguments: - ff_hidden_size – Size of the feedforward sublayers.
- n_heads – Number of the self-attention heads.
- depth – Number of sublayers.
- attention_dropout_keep_prob – Probability of keeping a value during dropout on the attention output.
- input_for_cross_attention – An attendable model part that is attended using cross-attention on every layer of the decoder, analogically to how encoder is attended in the decoder.
- n_cross_att_heads – Number of heads used in the cross-attention.
-
cross_attention_sublayer
(queries: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶
-
dependencies
¶ Return a list of attribute names regarded as dependents.
-
encoder_inputs
¶
-
feedforward_sublayer
(layer_input: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶ Create the feed-forward network sublayer.
-
layer
(level: int) → neuralmonkey.encoders.transformer.TransformerLayer¶
-
modality_matrix
¶ Create an embedding matrix for varyining target modalities.
Used to embed different target space modalities in the tensor2tensor models (e.g. during the zero-shot translation).
-
output
¶ Return the object output.
A 2D Tensor of shape (batch, state_size) which contains the resulting state of the object.
-
self_attention_sublayer
(prev_layer: neuralmonkey.encoders.transformer.TransformerLayer) → tensorflow.python.framework.ops.Tensor¶ Create the encoder self-attention sublayer.
-
target_modality_embedding
¶ Gather correct embedding of the target space modality.
See TransformerEncoder.modality_matrix for more information.
-
temporal_mask
¶ Return mask for the temporal_states.
A 2D Tensor of shape (batch, time) of type float32 which masks the temporal states so each sequence can have a different length. It should only contain ones or zeros.
-
temporal_states
¶ Return object states in time.
A 3D Tensor of shape (batch, time, state_size) which contains the states of the object in time (e.g. hidden states of a recurrent encoder.
-
-
class
neuralmonkey.encoders.transformer.
TransformerLayer
(states: tensorflow.python.framework.ops.Tensor, mask: tensorflow.python.framework.ops.Tensor) → None¶ Bases:
neuralmonkey.model.stateful.TemporalStateful
-
__init__
(states: tensorflow.python.framework.ops.Tensor, mask: tensorflow.python.framework.ops.Tensor) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
temporal_mask
¶ Return mask for the temporal_states.
A 2D Tensor of shape (batch, time) of type float32 which masks the temporal states so each sequence can have a different length. It should only contain ones or zeros.
-
temporal_states
¶ Return object states in time.
A 3D Tensor of shape (batch, time, state_size) which contains the states of the object in time (e.g. hidden states of a recurrent encoder.
-
-
neuralmonkey.encoders.transformer.
position_signal
(dimension: int, length: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶
-
class
neuralmonkey.evaluators.accuracy.
AccuracyEvaluator
(name: str = None) → None¶ Bases:
neuralmonkey.evaluators.evaluator.SequenceEvaluator
Accuracy Evaluator.
This class uses the default SequenceEvaluator implementation, i.e. works on sequences of equal lengths (but can be used to others as well) and use == as the token scorer.
-
class
neuralmonkey.evaluators.accuracy.
AccuracySeqLevelEvaluator
(name: str = None) → None¶ Bases:
neuralmonkey.evaluators.evaluator.Evaluator
Sequence-level accuracy evaluator.
This class uses the default evaluator implementation. It gives 1.0 to equal sequences and 0.0 to others, averaging the scores over the batch.
-
class
neuralmonkey.evaluators.average.
AverageEvaluator
(name: str = None) → None¶ Bases:
neuralmonkey.evaluators.evaluator.Evaluator
Just average the numeric output of a runner.
-
score_instance
(hypothesis: float, reference: float) → float¶ Score a single hyp/ref pair.
The default implementation of this method returns 1.0 when the hypothesis and the reference are equal and 0.0 otherwise.
Parameters: - hypothesis – The model prediction.
- reference – The golden output.
Returns: A float.
-
-
class
neuralmonkey.evaluators.beer.
BeerWrapper
(wrapper: str, name: str = 'BEER', encoding: str = 'utf-8') → None¶ Bases:
neuralmonkey.evaluators.evaluator.Evaluator
Wrapper for BEER scorer.
Paper: http://aclweb.org/anthology/D14-1025 Code: https://github.com/stanojevic/beer
-
__init__
(wrapper: str, name: str = 'BEER', encoding: str = 'utf-8') → None¶ Initialize the BEER wrapper.
Parameters: - name – Name of the evaluator.
- wrapper – Path to the BEER’s executable.
- encoding – Data encoding.
-
score_batch
(hypotheses: List[List[str]], references: List[List[str]]) → float¶ Score a batch of hyp/ref pairs.
The default implementation of this method calls score_instance for each instance in the batch and returns the average score.
Parameters: - hypotheses – List of model predictions.
- references – List of golden outputs.
Returns: A float.
-
serialize_to_bytes
(sentences: List[List[str]]) → bytes¶
-
-
class
neuralmonkey.evaluators.bleu.
BLEUEvaluator
(n: int = 4, deduplicate: bool = False, name: str = None, multiple_references_separator: str = None) → None¶ Bases:
neuralmonkey.evaluators.evaluator.Evaluator
-
__init__
(n: int = 4, deduplicate: bool = False, name: str = None, multiple_references_separator: str = None) → None¶ Instantiate BLEU evaluator.
Parameters: - n – Longest n-grams considered.
- deduplicate – Flag whether repated tokes should be treated as one.
- name – Name displayed in the logs and TensorBoard.
- multiple_references_separator – Token that separates multiple
reference sentences. If
None
, it assumes the reference is one sentence only.
-
static
bleu
(references: List[List[List[str]]], ngrams: int = 4, case_sensitive: bool = True)¶ Compute BLEU on a corpus with multiple references.
The n-grams are uniformly weighted.
Default is to use smoothing as in reference implementation on: https://github.com/ufal/qtleap/blob/master/cuni_train/bin/mteval-v13a.pl#L831-L873
Parameters: - hypotheses – List of hypotheses
- references – LIst of references. There can be more than one reference.
- ngrams – Maximum order of n-grams. Default 4.
- case_sensitive – Perform case-sensitive computation. Default True.
-
static
deduplicate_sentences
() → List[List[str]]¶
-
static
effective_reference_length
(references_list: List[List[List[str]]]) → int¶ Compute the effective reference corpus length.
The effective reference corpus length is based on best match length.
Parameters: - hypotheses – List of output sentences as lists of words
- references_list – List of lists of references (as lists of words)
-
static
merge_max_counters
() → collections.Counter¶ Merge counters using maximum values.
-
static
minimum_reference_length
(references_list: List[List[str]]) → int¶ Compute the minimum reference corpus length.
The minimum reference corpus length is based on the shortest reference sentence length.
Parameters: - hypotheses – List of output sentences as lists of words
- references_list – List of lists of references (as lists of words)
-
static
modified_ngram_precision
(references_list: List[List[List[str]]], n: int, case_sensitive: bool) → Tuple[float, int]¶ Compute the modified n-gram precision on a list of sentences.
Parameters: - hypotheses – List of output sentences as lists of words
- references_list – List of lists of reference sentences (as lists of words)
- n – n-gram order
- case_sensitive – Whether to perform case-sensitive computation
-
static
ngram_counts
(n: int, lowercase: bool, delimiter: str = ' ') → collections.Counter¶ Get n-grams from a sentence.
Parameters: - sentence – Sentence as a list of words
- n – n-gram order
- lowercase – Convert ngrams to lowercase
- delimiter – delimiter to use to create counter entries
-
score_batch
(hypotheses: List[List[str]], references: List[List[str]]) → float¶ Score a batch of hyp/ref pairs.
The default implementation of this method calls score_instance for each instance in the batch and returns the average score.
Parameters: - hypotheses – List of model predictions.
- references – List of golden outputs.
Returns: A float.
-
-
class
neuralmonkey.evaluators.chrf.
ChrFEvaluator
(n: int = 6, beta: float = 1.0, ignored_symbols: List[str] = None, name: str = None) → None¶ Bases:
neuralmonkey.evaluators.evaluator.Evaluator
Compute ChrF score.
See http://www.statmt.org/wmt15/pdf/WMT49.pdf
-
__init__
(n: int = 6, beta: float = 1.0, ignored_symbols: List[str] = None, name: str = None) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
chr_p
(hyp_ngrams: List[Dict[str, int]], ref_ngrams: List[Dict[str, int]]) → float¶
-
chr_r
(hyp_ngrams: List[Dict[str, int]], ref_ngrams: List[Dict[str, int]]) → float¶
-
score_instance
(hypothesis: List[str], reference: List[str]) → float¶ Score a single hyp/ref pair.
The default implementation of this method returns 1.0 when the hypothesis and the reference are equal and 0.0 otherwise.
Parameters: - hypothesis – The model prediction.
- reference – The golden output.
Returns: A float.
-
-
class
neuralmonkey.evaluators.edit_distance.
EditDistanceEvaluator
(name: str = None) → None¶ Bases:
neuralmonkey.evaluators.evaluator.Evaluator
-
static
compare_scores
(score2: float) → int¶ Compare scores using this evaluator.
The default implementation regards the bigger score as better.
Parameters: - score1 – The first score.
- score2 – The second score.
- Returns
- An int. When score1 is better, returns 1. When score2 is better, returns -1. When the scores are equal, returns 0.
-
score_batch
(hypotheses: List[List[str]], references: List[List[str]]) → float¶ Score a batch of hyp/ref pairs.
The default implementation of this method calls score_instance for each instance in the batch and returns the average score.
Parameters: - hypotheses – List of model predictions.
- references – List of golden outputs.
Returns: A float.
-
score_instance
(hypothesis: List[str], reference: List[str]) → float¶ Score a single hyp/ref pair.
The default implementation of this method returns 1.0 when the hypothesis and the reference are equal and 0.0 otherwise.
Parameters: - hypothesis – The model prediction.
- reference – The golden output.
Returns: A float.
-
static
-
class
neuralmonkey.evaluators.evaluator.
Evaluator
(name: str = None) → None¶ Bases:
typing.Generic
Base class for evaluators in Neural Monkey.
Each evaluator has a __call__ method which returns a score for a batch of model predictions given a the references. This class provides default implementations of score_batch and score_instance functions.
-
__init__
(name: str = None) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
static
compare_scores
(score2: float) → int¶ Compare scores using this evaluator.
The default implementation regards the bigger score as better.
Parameters: - score1 – The first score.
- score2 – The second score.
- Returns
- An int. When score1 is better, returns 1. When score2 is better, returns -1. When the scores are equal, returns 0.
-
name
¶
-
score_batch
(hypotheses: List[EvalType], references: List[EvalType]) → float¶ Score a batch of hyp/ref pairs.
The default implementation of this method calls score_instance for each instance in the batch and returns the average score.
Parameters: - hypotheses – List of model predictions.
- references – List of golden outputs.
Returns: A float.
-
score_instance
(hypothesis: EvalType, reference: EvalType) → float¶ Score a single hyp/ref pair.
The default implementation of this method returns 1.0 when the hypothesis and the reference are equal and 0.0 otherwise.
Parameters: - hypothesis – The model prediction.
- reference – The golden output.
Returns: A float.
-
-
class
neuralmonkey.evaluators.evaluator.
SequenceEvaluator
(name: str = None) → None¶ Bases:
neuralmonkey.evaluators.evaluator.Evaluator
Base class for token-level evaluators that work with sequences.
-
score_batch
(hypotheses: List[Sequence[EvalType]], references: List[Sequence[EvalType]]) → float¶ Score batch of sequences.
The default implementation assumes equal sequence lengths and operates on the token level (i.e. token-level scores from the whole batch are averaged (in contrast to averaging each sequence first)).
Parameters: - hypotheses – List of model predictions.
- references – List of golden outputs.
Returns: A float.
-
score_token
(hyp_token: EvalType, ref_token: EvalType) → float¶ Score a single hyp/ref pair of tokens.
The default implementation returns 1.0 if the tokens are equal, 0.0 otherwise.
Parameters: - hyp_token – A prediction token.
- ref_token – A golden token.
Returns: A score for the token hyp/ref pair.
-
-
neuralmonkey.evaluators.evaluator.
check_lengths
(scorer)¶
-
class
neuralmonkey.evaluators.f1_bio.
F1Evaluator
(name: str = None) → None¶ Bases:
neuralmonkey.evaluators.evaluator.Evaluator
F1 evaluator for BIO tagging, e.g. NP chunking.
The entities are annotated as beginning of the entity (B), continuation of the entity (I), the rest is outside the entity (O).
-
static
chunk2set
() → Set[str]¶
-
score_instance
(hypothesis: List[str], reference: List[str]) → float¶ Score a single hyp/ref pair.
The default implementation of this method returns 1.0 when the hypothesis and the reference are equal and 0.0 otherwise.
Parameters: - hypothesis – The model prediction.
- reference – The golden output.
Returns: A float.
-
static
-
class
neuralmonkey.evaluators.gleu.
GLEUEvaluator
(n: int = 4, deduplicate: bool = False, name: str = None) → None¶ Bases:
neuralmonkey.evaluators.evaluator.Evaluator
Sentence-level evaluation metric correlating with BLEU on corpus-level.
From “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation” by Wu et al. (https://arxiv.org/pdf/1609.08144v2.pdf)
GLEU is the minimum of recall and precision of all n-grams up to n in references and hypotheses.
Ngram counts are based on the bleu methods.
-
__init__
(n: int = 4, deduplicate: bool = False, name: str = None) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
static
gleu
(references: List[List[List[str]]], ngrams: int = 4, case_sensitive: bool = True) → float¶ Compute GLEU on a corpus with multiple references (no smoothing).
Parameters: - hypotheses – List of hypotheses
- references – LIst of references. There can be more than one reference.
- ngrams – Maximum order of n-grams. Default 4.
- case_sensitive – Perform case-sensitive computation. Default True.
-
score_batch
(hypotheses: List[List[str]], references: List[List[str]]) → float¶ Score a batch of hyp/ref pairs.
The default implementation of this method calls score_instance for each instance in the batch and returns the average score.
Parameters: - hypotheses – List of model predictions.
- references – List of golden outputs.
Returns: A float.
-
static
total_precision_recall
(references_list: List[List[List[str]]], ngrams: int, case_sensitive: bool) → Tuple[float, float]¶ Compute a modified n-gram precision and recall on a sentence list.
Parameters: - hypotheses – List of output sentences as lists of words
- references_list – List of lists of reference sentences (as lists of words)
- ngrams – n-gram order
- case_sensitive – Whether to perform case-sensitive computation
-
-
class
neuralmonkey.evaluators.mse.
MeanSquaredErrorEvaluator
(name: str = None) → None¶ Bases:
neuralmonkey.evaluators.evaluator.SequenceEvaluator
Mean squared error evaluator.
Assumes equal vector length across the batch (see SequenceEvaluator.score_batch)
-
static
compare_scores
(score2: float) → int¶ Compare scores using this evaluator.
The default implementation regards the bigger score as better.
Parameters: - score1 – The first score.
- score2 – The second score.
- Returns
- An int. When score1 is better, returns 1. When score2 is better, returns -1. When the scores are equal, returns 0.
-
score_token
(hyp_elem: float, ref_elem: float) → float¶ Score a single hyp/ref pair of tokens.
The default implementation returns 1.0 if the tokens are equal, 0.0 otherwise.
Parameters: - hyp_token – A prediction token.
- ref_token – A golden token.
Returns: A score for the token hyp/ref pair.
-
static
-
class
neuralmonkey.evaluators.mse.
PairwiseMeanSquaredErrorEvaluator
(name: str = None) → None¶ Bases:
neuralmonkey.evaluators.evaluator.Evaluator
Pairwise mean squared error evaluator.
For vectors of different dimension across the batch.
-
static
compare_scores
(score2: float) → int¶ Compare scores using this evaluator.
The default implementation regards the bigger score as better.
Parameters: - score1 – The first score.
- score2 – The second score.
- Returns
- An int. When score1 is better, returns 1. When score2 is better, returns -1. When the scores are equal, returns 0.
-
score_instance
(hypothesis: List[float], reference: List[float]) → float¶ Compute mean square error between two vectors.
-
static
-
class
neuralmonkey.evaluators.multeval.
MultEvalWrapper
(wrapper: str, name: str = 'MultEval', encoding: str = 'utf-8', metric: str = 'bleu', language: str = 'en') → None¶ Bases:
neuralmonkey.evaluators.evaluator.Evaluator
Wrapper for mult-eval’s reference BLEU and METEOR scorer.
-
__init__
(wrapper: str, name: str = 'MultEval', encoding: str = 'utf-8', metric: str = 'bleu', language: str = 'en') → None¶ Initialize the wrapper.
Parameters: - wrapper – Path to multeval.sh script
- name – Name of the evaluator
- encoding – Encoding of input files
- language – Language of hypotheses and references
- metric – Evaluation metric “bleu”, “ter”, “meteor”
-
score_batch
(hypotheses: List[List[str]], references: List[List[str]]) → float¶ Score a batch of hyp/ref pairs.
The default implementation of this method calls score_instance for each instance in the batch and returns the average score.
Parameters: - hypotheses – List of model predictions.
- references – List of golden outputs.
Returns: A float.
-
serialize_to_bytes
(sentences: List[List[str]]) → bytes¶
-
-
class
neuralmonkey.evaluators.sacrebleu.
SacreBLEUEvaluator
(name: str, smooth: str = 'exp', smooth_floor: float = 0.0, force: bool = False, lowercase: bool = False, tokenize: str = 'none', use_effective_order: bool = False) → None¶ Bases:
neuralmonkey.evaluators.evaluator.Evaluator
SacreBLEU evaluator wrapper.
-
__init__
(name: str, smooth: str = 'exp', smooth_floor: float = 0.0, force: bool = False, lowercase: bool = False, tokenize: str = 'none', use_effective_order: bool = False) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
score_batch
(hypotheses: List[List[str]], references: List[List[str]]) → float¶ Score a batch of hyp/ref pairs.
The default implementation of this method calls score_instance for each instance in the batch and returns the average score.
Parameters: - hypotheses – List of model predictions.
- references – List of golden outputs.
Returns: A float.
-
-
class
neuralmonkey.evaluators.ter.
TEREvaluator
(name: str = None) → None¶ Bases:
neuralmonkey.evaluators.evaluator.Evaluator
Compute TER using the pyter library.
-
static
compare_scores
(score2: float) → int¶ Compare scores using this evaluator.
The default implementation regards the bigger score as better.
Parameters: - score1 – The first score.
- score2 – The second score.
- Returns
- An int. When score1 is better, returns 1. When score2 is better, returns -1. When the scores are equal, returns 0.
-
score_instance
(hypothesis: List[str], reference: List[str]) → float¶ Score a single hyp/ref pair.
The default implementation of this method returns 1.0 when the hypothesis and the reference are equal and 0.0 otherwise.
Parameters: - hypothesis – The model prediction.
- reference – The golden output.
Returns: A float.
-
static
-
class
neuralmonkey.evaluators.wer.
WEREvaluator
(name: str = None) → None¶ Bases:
neuralmonkey.evaluators.evaluator.Evaluator
Compute WER (word error rate, used in speech recognition).
-
static
compare_scores
(score2: float) → int¶ Compare scores using this evaluator.
The default implementation regards the bigger score as better.
Parameters: - score1 – The first score.
- score2 – The second score.
- Returns
- An int. When score1 is better, returns 1. When score2 is better, returns -1. When the scores are equal, returns 0.
-
score_batch
(hypotheses: List[List[str]], references: List[List[str]]) → float¶ Score a batch of hyp/ref pairs.
The default implementation of this method calls score_instance for each instance in the batch and returns the average score.
Parameters: - hypotheses – List of model predictions.
- references – List of golden outputs.
Returns: A float.
-
score_instance
(hypothesis: List[str], reference: List[str]) → float¶ Score a single hyp/ref pair.
The default implementation of this method returns 1.0 when the hypothesis and the reference are equal and 0.0 otherwise.
Parameters: - hypothesis – The model prediction.
- reference – The golden output.
Returns: A float.
-
static
-
class
neuralmonkey.runners.base_runner.
BaseRunner
(output_series: str, decoder: MP) → None¶ Bases:
neuralmonkey.runners.base_runner.GraphExecutor
,typing.Generic
Base class for runners.
Runners are graph executors that retrieve tensors from the model without changing the model parameters. Each runner has a top-level model part it relates to.
-
class
Executable
(executor: Executor, compute_losses: bool, summaries: bool, num_sessions: int) → None¶ Bases:
neuralmonkey.runners.base_runner.Executable
-
next_to_execute
() → Tuple[Union[Dict, List], List[Dict[tensorflow.python.framework.ops.Tensor, Union[int, float, numpy.ndarray]]]]¶ Get the tensors and additional feed dicts for execution.
-
-
__init__
(output_series: str, decoder: MP) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
decoder_data_id
¶
-
loss_names
¶
-
class
-
class
neuralmonkey.runners.base_runner.
ExecutionResult
¶ Bases:
neuralmonkey.runners.base_runner.ExecutionResult
A data structure that represents the result of a graph execution.
The goal of each runner is to populate this structure and set it as its
self._result
.-
outputs
¶ A batch of outputs of the runner.
-
losses
¶ A (possibly empty) list of loss values computed during the run.
-
scalar_summaries
¶ A TensorFlow summary object with scalar values.
-
histogram_summaries
¶ A TensorFlow summary object with histograms.
-
image_summaries
¶ A TensorFlow summary object with images.
-
-
class
neuralmonkey.runners.base_runner.
GraphExecutor
(dependencies: Set[neuralmonkey.model.model_part.GenericModelPart]) → None¶ Bases:
neuralmonkey.model.model_part.GenericModelPart
The abstract parent class of all graph executors.
In Neural Monkey, a graph executor is an object that retrieves tensors from the computational graph. The two major groups of graph executors are trainers and runners.
Each graph executor is an instance of GenericModelPart class, which means it has parameterized and feedable dependencies which reference the model part objects needed to be created in order to compute the tensors of interest (called “fetches”).
Every graph executor has a method called get_executable, which returns an GraphExecutor.Executable instance, which specifies what tensors to execute and collects results from the session execution.
-
class
Executable
(executor: Executor, compute_losses: bool, summaries: bool, num_sessions: int) → None¶ Bases:
typing.Generic
Abstract base class for executables.
Executables are objects associated with the graph executors. Each executable has two main functions: next_to_execute and collect_results. These functions are called in a loop, until the executable’s result has been set.
To make use of Mypy’s type checking, the executables are generic and are parameterized by the type of their graph executor. Since Python does not know the concept of nested classes, each executable receives the instance of the graph executor through its constructor.
When subclassing GraphExecutor, it is also necessary to subclass the Executable class and name it Executable, so it overrides the definition of this class. Following this guideline, the default implementation of the get_executable function on the graph executor will work without the need of overriding it.
-
__init__
(executor: Executor, compute_losses: bool, summaries: bool, num_sessions: int) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
collect_results
(results: List[Dict]) → None¶
-
executor
¶
-
next_to_execute
() → Tuple[Union[Dict, List], List[Dict[tensorflow.python.framework.ops.Tensor, Union[int, float, numpy.ndarray]]]]¶ Get the tensors and additional feed dicts for execution.
-
result
¶
-
set_result
(outputs: List[Any], losses: List[float], scalar_summaries: tensorflow.core.framework.summary_pb2.Summary, histogram_summaries: tensorflow.core.framework.summary_pb2.Summary, image_summaries: tensorflow.core.framework.summary_pb2.Summary) → None¶
-
-
__init__
(dependencies: Set[neuralmonkey.model.model_part.GenericModelPart]) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
dependencies
¶ Return a list of attribute names regarded as dependents.
-
feedables
¶
-
fetches
¶
-
get_executable
(compute_losses: bool, summaries: bool, num_sessions: int) → neuralmonkey.runners.base_runner.GraphExecutor.Executable¶
-
parameterizeds
¶
-
class
-
neuralmonkey.runners.base_runner.
reduce_execution_results
(execution_results: List[neuralmonkey.runners.base_runner.ExecutionResult]) → neuralmonkey.runners.base_runner.ExecutionResult¶ Aggregate execution results into one.
-
class
neuralmonkey.runners.beamsearch_runner.
BeamSearchRunner
(output_series: str, decoder: neuralmonkey.decoders.beam_search_decoder.BeamSearchDecoder, rank: int = 1, postprocess: Callable[[List[str]], List[str]] = None) → None¶ Bases:
neuralmonkey.runners.base_runner.BaseRunner
A runner which takes the output from a beam search decoder.
The runner and the beam search decoder support ensembling.
-
class
Executable
(executor: neuralmonkey.runners.beamsearch_runner.BeamSearchRunner, compute_losses: bool, summaries: bool, num_sessions: int) → None¶ Bases:
neuralmonkey.runners.base_runner.Executable
-
__init__
(executor: neuralmonkey.runners.beamsearch_runner.BeamSearchRunner, compute_losses: bool, summaries: bool, num_sessions: int) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
collect_results
(results: List[Dict]) → None¶
-
next_to_execute
() → Tuple[Union[Dict, List], List[Dict[tensorflow.python.framework.ops.Tensor, Union[int, float, numpy.ndarray]]]]¶ Get the tensors and additional feed dicts for execution.
-
prepare_results
(output)¶
-
-
__init__
(output_series: str, decoder: neuralmonkey.decoders.beam_search_decoder.BeamSearchDecoder, rank: int = 1, postprocess: Callable[[List[str]], List[str]] = None) → None¶ Initialize the beam search runner.
Parameters: - output_series – Name of the series produced by the runner.
- decoder – The beam search decoder to use.
- rank – The hypothesis from the beam to select. Setting rank to 1 selects the best hypothesis.
- postprocess – The postprocessor to apply to the output data.
-
fetches
¶
-
loss_names
¶
-
class
-
neuralmonkey.runners.beamsearch_runner.
beam_search_runner_range
(output_series: str, decoder: neuralmonkey.decoders.beam_search_decoder.BeamSearchDecoder, max_rank: int = None, postprocess: Callable[[List[str]], List[str]] = None) → List[neuralmonkey.runners.beamsearch_runner.BeamSearchRunner]¶ Return beam search runners for a range of ranks from 1 to max_rank.
This means there is max_rank output series where the n-th series contains the n-th best hypothesis from the beam search.
Parameters: - output_series – Prefix of output series.
- decoder – Beam search decoder shared by all runners.
- max_rank – Maximum rank of the hypotheses.
- postprocess – Series-level postprocess applied on output.
Returns: List of beam search runners getting hypotheses with rank from 1 to max_rank.
-
class
neuralmonkey.runners.ctc_debug_runner.
CTCDebugRunner
(output_series: str, decoder: neuralmonkey.decoders.ctc_decoder.CTCDecoder) → None¶ Bases:
neuralmonkey.runners.base_runner.BaseRunner
A runner that print out raw CTC output including the blank symbols.
-
class
Executable
(executor: Executor, compute_losses: bool, summaries: bool, num_sessions: int) → None¶ Bases:
neuralmonkey.runners.base_runner.Executable
-
collect_results
(results: List[Dict]) → None¶
-
-
__init__
(output_series: str, decoder: neuralmonkey.decoders.ctc_decoder.CTCDecoder) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
fetches
¶
-
loss_names
¶
-
class
-
class
neuralmonkey.runners.label_runner.
LabelRunner
(output_series: str, decoder: neuralmonkey.decoders.sequence_labeler.SequenceLabeler, postprocess: Callable[[List[List[str]]], List[List[str]]] = None) → None¶ Bases:
neuralmonkey.runners.base_runner.BaseRunner
-
class
Executable
(executor: Executor, compute_losses: bool, summaries: bool, num_sessions: int) → None¶ Bases:
neuralmonkey.runners.base_runner.Executable
-
collect_results
(results: List[Dict]) → None¶
-
-
__init__
(output_series: str, decoder: neuralmonkey.decoders.sequence_labeler.SequenceLabeler, postprocess: Callable[[List[List[str]]], List[List[str]]] = None) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
fetches
¶
-
loss_names
¶
-
class
A runner outputing logits or normalized distriution from a decoder.
-
class
neuralmonkey.runners.logits_runner.
LogitsRunner
(output_series: str, decoder: neuralmonkey.decoders.classifier.Classifier, normalize: bool = True, pick_index: int = None, pick_value: str = None) → None¶ Bases:
neuralmonkey.runners.base_runner.BaseRunner
A runner which takes the output from decoder.decoded_logits.
The logits / normalized probabilities are outputted as tab-separates string values. If the decoder produces a list of logits (as the recurrent decoder), the tab separated arrays are separated with commas. Alternatively, we may be interested in a single distribution dimension.
-
class
Executable
(executor: Executor, compute_losses: bool, summaries: bool, num_sessions: int) → None¶ Bases:
neuralmonkey.runners.base_runner.Executable
-
collect_results
(results: List[Dict]) → None¶
-
-
__init__
(output_series: str, decoder: neuralmonkey.decoders.classifier.Classifier, normalize: bool = True, pick_index: int = None, pick_value: str = None) → None¶ Initialize the logits runner.
Parameters: - output_series – Name of the series produced by the runner.
- decoder – A decoder having logits.
- normalize – Flag whether the logits should be normalized with softmax.
- pick_index – If not None, it specifies the index of the logit or the probability that should be on output.
- pick_value – If not None, it specifies a value from the decoder’s vocabulary whose logit or probability should be on output.
-
fetches
¶
-
loss_names
¶
-
class
-
class
neuralmonkey.runners.perplexity_runner.
PerplexityRunner
(output_series: str, decoder: neuralmonkey.decoders.autoregressive.AutoregressiveDecoder) → None¶ Bases:
neuralmonkey.runners.base_runner.BaseRunner
-
class
Executable
(executor: Executor, compute_losses: bool, summaries: bool, num_sessions: int) → None¶ Bases:
neuralmonkey.runners.base_runner.Executable
-
collect_results
(results: List[Dict]) → None¶
-
-
__init__
(output_series: str, decoder: neuralmonkey.decoders.autoregressive.AutoregressiveDecoder) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
fetches
¶
-
loss_names
¶
-
class
-
class
neuralmonkey.runners.plain_runner.
PlainRunner
(output_series: str, decoder: Union[neuralmonkey.decoders.autoregressive.AutoregressiveDecoder, neuralmonkey.decoders.ctc_decoder.CTCDecoder, neuralmonkey.decoders.classifier.Classifier, neuralmonkey.decoders.sequence_labeler.SequenceLabeler], postprocess: Callable[[List[List[str]]], List[List[str]]] = None) → None¶ Bases:
neuralmonkey.runners.base_runner.BaseRunner
A runner which takes the output from decoder.decoded.
-
class
Executable
(executor: Executor, compute_losses: bool, summaries: bool, num_sessions: int) → None¶ Bases:
neuralmonkey.runners.base_runner.Executable
-
collect_results
(results: List[Dict]) → None¶
-
-
__init__
(output_series: str, decoder: Union[neuralmonkey.decoders.autoregressive.AutoregressiveDecoder, neuralmonkey.decoders.ctc_decoder.CTCDecoder, neuralmonkey.decoders.classifier.Classifier, neuralmonkey.decoders.sequence_labeler.SequenceLabeler], postprocess: Callable[[List[List[str]]], List[List[str]]] = None) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
fetches
¶
-
loss_names
¶
-
class
-
class
neuralmonkey.runners.regression_runner.
RegressionRunner
(output_series: str, decoder: neuralmonkey.decoders.sequence_regressor.SequenceRegressor, postprocess: Callable[[List[float]], List[float]] = None) → None¶ Bases:
neuralmonkey.runners.base_runner.BaseRunner
A runnner that takes the predictions of a sequence regressor.
-
class
Executable
(executor: Executor, compute_losses: bool, summaries: bool, num_sessions: int) → None¶ Bases:
neuralmonkey.runners.base_runner.Executable
-
collect_results
(results: List[Dict]) → None¶
-
-
__init__
(output_series: str, decoder: neuralmonkey.decoders.sequence_regressor.SequenceRegressor, postprocess: Callable[[List[float]], List[float]] = None) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
fetches
¶
-
loss_names
¶
-
class
-
class
neuralmonkey.runners.runner.
GreedyRunner
(output_series: str, decoder: Union[neuralmonkey.decoders.autoregressive.AutoregressiveDecoder, neuralmonkey.decoders.classifier.Classifier], postprocess: Callable[[List[List[str]]], List[List[str]]] = None) → None¶ Bases:
neuralmonkey.runners.base_runner.BaseRunner
-
class
Executable
(executor: Executor, compute_losses: bool, summaries: bool, num_sessions: int) → None¶ Bases:
neuralmonkey.runners.base_runner.Executable
-
collect_results
(results: List[Dict]) → None¶
-
next_to_execute
() → Tuple[Union[Dict, List], List[Dict[tensorflow.python.framework.ops.Tensor, Union[int, float, numpy.ndarray]]]]¶ Get the tensors and additional feed dicts for execution.
-
-
__init__
(output_series: str, decoder: Union[neuralmonkey.decoders.autoregressive.AutoregressiveDecoder, neuralmonkey.decoders.classifier.Classifier], postprocess: Callable[[List[List[str]]], List[List[str]]] = None) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
fetches
¶
-
loss_names
¶
-
class
-
class
neuralmonkey.runners.tensor_runner.
RepresentationRunner
(output_series: str, encoder: neuralmonkey.model.model_part.GenericModelPart, attribute: str = 'output', select_session: int = None) → None¶ Bases:
neuralmonkey.runners.tensor_runner.TensorRunner
Runner printing out representation from an encoder.
Use this runner to get input / other data representation out from one of Neural Monkey encoders.
-
__init__
(output_series: str, encoder: neuralmonkey.model.model_part.GenericModelPart, attribute: str = 'output', select_session: int = None) → None¶ Initialize the representation runner.
Parameters: - output_series – Name of the output series with vectors.
- encoder – The encoder to use. This can be any
GenericModelPart
object. - attribute – The name of the encoder attribute that contains the data.
- used_session – Id of the TensorFlow session used in case of model ensembles.
-
-
class
neuralmonkey.runners.tensor_runner.
TensorRunner
(output_series: str, modelparts: List[neuralmonkey.model.model_part.GenericModelPart], tensors: List[str], batch_dims: List[int], tensors_by_name: List[str], batch_dims_by_name: List[int], select_session: int = None, single_tensor: bool = False) → None¶ Bases:
neuralmonkey.runners.base_runner.BaseRunner
Runner class for printing tensors from a model.
Use this runner if you want to retrieve a specific tensor from the model using a given dataset. The runner generates an output data series which will contain the tensors in a dictionary of numpy arrays.
-
class
Executable
(executor: Executor, compute_losses: bool, summaries: bool, num_sessions: int) → None¶ Bases:
neuralmonkey.runners.base_runner.Executable
-
collect_results
(results: List[Dict]) → None¶
-
-
__init__
(output_series: str, modelparts: List[neuralmonkey.model.model_part.GenericModelPart], tensors: List[str], batch_dims: List[int], tensors_by_name: List[str], batch_dims_by_name: List[int], select_session: int = None, single_tensor: bool = False) → None¶ Construct a new
TensorRunner
object.Note that at this time, one must specify the toplevel objects so that it is ensured that the graph is built. The reason for this behavior is that the graph is constructed lazily and therefore if the tensors to store are provided by indirect reference (name), the system does not know early enough that it needs to create them.
Parameters: - output_series – The name of the generated output data series.
- modelparts – A list of
GenericModelPart
objects that hold the tensors that will be retrieved. - tensors – A list of names of tensors that should be retrieved.
- batch_dims_by_ref – A list of integers that correspond to the batch dimension in each wanted tensor.
- tensors_by_name – A list of tensor names to fetch. If a tensor is not in the graph, a warning is generated and the tensor is ignored.
- batch_dims_by_name – A list of integers that correspond to the batch dimension in each wanted tensor specified by name.
- select_session – An optional integer specifying the session to use in case of ensembling. When not used, tensors from all sessions are stored. In case of a single session, this option has no effect.
- single_tensor – If True, it is assumed that only one tensor is to be fetched, and the execution result will consist of this tensor only. If False, the result will be a dict mapping tensor names to NumPy arrays.
-
fetches
¶
-
loss_names
¶
-
class
-
class
neuralmonkey.runners.word_alignment_runner.
WordAlignmentRunner
(output_series: str, attention: neuralmonkey.attention.base_attention.BaseAttention, decoder: neuralmonkey.decoders.decoder.Decoder) → None¶ Bases:
neuralmonkey.runners.base_runner.BaseRunner
-
class
Executable
(executor: Executor, compute_losses: bool, summaries: bool, num_sessions: int) → None¶ Bases:
neuralmonkey.runners.base_runner.Executable
-
collect_results
(results: List[Dict]) → None¶
-
-
__init__
(output_series: str, attention: neuralmonkey.attention.base_attention.BaseAttention, decoder: neuralmonkey.decoders.decoder.Decoder) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
fetches
¶
-
loss_names
¶
-
class
-
class
neuralmonkey.trainers.cross_entropy_trainer.
CrossEntropyTrainer
(decoders: List[Any], decoder_weights: List[Union[tensorflow.python.framework.ops.Tensor, float, NoneType]] = None, l1_weight: float = 0.0, l2_weight: float = 0.0, clip_norm: float = None, optimizer: tensorflow.python.training.optimizer.Optimizer = None, var_scopes: List[str] = None, var_collection: str = None) → None¶ Bases:
neuralmonkey.trainers.generic_trainer.GenericTrainer
-
__init__
(decoders: List[Any], decoder_weights: List[Union[tensorflow.python.framework.ops.Tensor, float, NoneType]] = None, l1_weight: float = 0.0, l2_weight: float = 0.0, clip_norm: float = None, optimizer: tensorflow.python.training.optimizer.Optimizer = None, var_scopes: List[str] = None, var_collection: str = None) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
-
neuralmonkey.trainers.cross_entropy_trainer.
xent_objective
(decoder, weight=None) → neuralmonkey.trainers.generic_trainer.Objective¶ Get XENT objective from decoder with cost.
-
class
neuralmonkey.trainers.delayed_update_trainer.
DelayedUpdateTrainer
(batches_per_update: int, objectives: List[neuralmonkey.trainers.generic_trainer.Objective], l1_weight: float = 0.0, l2_weight: float = 0.0, clip_norm: float = None, optimizer: tensorflow.python.training.optimizer.Optimizer = None, var_scopes: List[str] = None, var_collection: str = None) → None¶ Bases:
neuralmonkey.trainers.generic_trainer.GenericTrainer
-
class
Executable
(executor: neuralmonkey.trainers.delayed_update_trainer.DelayedUpdateTrainer, compute_losses: bool, summaries: bool, num_sessions: int) → None¶ Bases:
neuralmonkey.runners.base_runner.Executable
-
__init__
(executor: neuralmonkey.trainers.delayed_update_trainer.DelayedUpdateTrainer, compute_losses: bool, summaries: bool, num_sessions: int) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
collect_results
(results: List[Dict]) → None¶
-
next_to_execute
() → Tuple[Union[Dict, List], List[Dict[tensorflow.python.framework.ops.Tensor, Union[int, float, numpy.ndarray]]]]¶ Get the tensors and additional feed dicts for execution.
-
-
__init__
(batches_per_update: int, objectives: List[neuralmonkey.trainers.generic_trainer.Objective], l1_weight: float = 0.0, l2_weight: float = 0.0, clip_norm: float = None, optimizer: tensorflow.python.training.optimizer.Optimizer = None, var_scopes: List[str] = None, var_collection: str = None) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
accumulate_ops
¶
-
cumulator_counter
¶
-
diff_buffer
¶
-
existing_grads_and_vars
¶
-
gradient_buffers
¶
-
objective_buffers
¶
-
raw_gradients
¶ Return averaged gradients over buffers.
-
reset_ops
¶
-
summaries
¶
-
class
-
class
neuralmonkey.trainers.generic_trainer.
GenericTrainer
(objectives: List[neuralmonkey.trainers.generic_trainer.Objective], l1_weight: float = 0.0, l2_weight: float = 0.0, clip_norm: float = None, optimizer: tensorflow.python.training.optimizer.Optimizer = None, var_scopes: List[str] = None, var_collection: str = None) → None¶ Bases:
neuralmonkey.runners.base_runner.GraphExecutor
-
class
Executable
(executor: neuralmonkey.trainers.generic_trainer.GenericTrainer, compute_losses: bool, summaries: bool, num_sessions: int) → None¶ Bases:
neuralmonkey.runners.base_runner.Executable
-
__init__
(executor: neuralmonkey.trainers.generic_trainer.GenericTrainer, compute_losses: bool, summaries: bool, num_sessions: int) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
collect_results
(results: List[Dict]) → None¶
-
next_to_execute
() → Tuple[Union[Dict, List], List[Dict[tensorflow.python.framework.ops.Tensor, Union[int, float, numpy.ndarray]]]]¶ Get the tensors and additional feed dicts for execution.
-
-
__init__
(objectives: List[neuralmonkey.trainers.generic_trainer.Objective], l1_weight: float = 0.0, l2_weight: float = 0.0, clip_norm: float = None, optimizer: tensorflow.python.training.optimizer.Optimizer = None, var_scopes: List[str] = None, var_collection: str = None) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
static
default_optimizer
()¶
-
differentiable_loss_sum
¶ Compute the differentiable loss (including regularization).
-
fetches
¶
-
gradients
¶
-
objective_values
¶ Compute unweighted losses for fetching.
-
raw_gradients
¶ Compute the gradients.
-
regularization_losses
¶ Compute the regularization losses, e.g. L1 and L2.
-
summaries
¶
-
train_op
¶ Construct the training op.
-
var_list
¶
-
class
-
class
neuralmonkey.trainers.generic_trainer.
Objective
¶ Bases:
neuralmonkey.trainers.generic_trainer.Objective
The training objective.
-
name
¶ The name for the objective. Used in TensorBoard.
-
decoder
¶ The decoder which generates the value to optimize.
-
loss
¶ The loss tensor fetched by the trainer.
-
gradients
¶ Manually specified gradients. Useful for reinforcement learning.
-
weight
¶ The weight of this objective. The loss will be multiplied by this so the gradients can be controled in case of multiple objectives.
-
-
class
neuralmonkey.trainers.multitask_trainer.
MultitaskTrainer
(trainers: List[neuralmonkey.trainers.generic_trainer.GenericTrainer]) → None¶ Bases:
neuralmonkey.runners.base_runner.GraphExecutor
Wrapper for scheduling multitask training.
The wrapper contains a list of trainer objects. They are being called in the order defined by this list thus simulating a task switching schedule.
-
__init__
(trainers: List[neuralmonkey.trainers.generic_trainer.GenericTrainer]) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
fetches
¶
-
get_executable
(compute_losses: bool = True, summaries: bool = True, num_sessions: int = 1) → neuralmonkey.runners.base_runner.GraphExecutor.Executable¶
-
Training objectives for reinforcement learning.
-
neuralmonkey.trainers.rl_trainer.
rl_objective
(decoder: neuralmonkey.decoders.decoder.Decoder, reward_function: Callable[[numpy.ndarray, numpy.ndarray], numpy.ndarray], subtract_baseline: bool = False, normalize: bool = False, temperature: float = 1.0, ce_smoothing: float = 0.0, alpha: float = 1.0, sample_size: int = 1) → neuralmonkey.trainers.generic_trainer.Objective¶ Construct RL objective for training with sentence-level feedback.
Depending on the options the objective corresponds to: 1) sample_size = 1, normalize = False, ce_smoothing = 0.0
Bandit objective (Eq. 2) described in ‘Bandit Structured Prediction for Neural Sequence-to-Sequence Learning’ (http://www.aclweb.org/anthology/P17-1138) It’s recommended to set subtract_baseline = True.- sample_size > 1, normalize = True, ce_smoothing = 0.0
Minimum Risk Training as described in ‘Minimum Risk Training for Neural Machine Translation’ (http://www.aclweb.org/anthology/P16-1159) (Eq. 12).- sample_size > 1, normalize = False, ce_smoothing = 0.0
The Google ‘Reinforce’ objective as proposed in ‘Google’s NMT System: Bridging the Gap between Human and Machine Translation’ (https://arxiv.org/pdf/1609.08144.pdf) (Eq. 8).- sample_size > 1, normalize = False, ce_smoothing > 0.0
Google’s ‘Mixed’ objective in the above paper (Eq. 9), where ce_smoothing implements alpha.Note that ‘alpha’ controls the sharpness of the normalized distribution, while ‘temperature’ controls the sharpness during sampling.
Parameters: - decoder – a recurrent decoder to sample from
- reward_function – any evaluator object
- subtract_baseline – avg reward is subtracted from obtained reward
- normalize – the probabilities of the samples are re-normalized
- sample_size – number of samples to obtain feedback for
- ce_smoothing – add cross-entropy loss with this coefficient to loss
- alpha – determines the shape of the normalized distribution
- temperature – the softmax temperature for sampling
Returns: Objective object to be used in generic trainer
Training objective for self-critical learning.
Self-critic learning is a modification of the REINFORCE algorithm that uses the reward of the train-time decoder output as a baseline in the update step.
For more details see: https://arxiv.org/pdf/1612.00563.pdf
-
neuralmonkey.trainers.self_critical_objective.
reinforce_score
(reward: tensorflow.python.framework.ops.Tensor, baseline: tensorflow.python.framework.ops.Tensor, decoded: tensorflow.python.framework.ops.Tensor, logits: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶ Cost function whose derivative is the REINFORCE equation.
This implements the primitive function to the central equation of the REINFORCE algorithm that estimates the gradients of the loss with respect to decoder logits.
It uses the fact that the second term of the product (the difference of the word distribution and one hot vector of the decoded word) is a derivative of negative log likelihood of the decoded word. The reward function and the baseline are however treated as a constant, so they influence the derivate only multiplicatively.
-
neuralmonkey.trainers.self_critical_objective.
self_critical_objective
(decoder: neuralmonkey.decoders.decoder.Decoder, reward_function: Callable[[numpy.ndarray, numpy.ndarray], numpy.ndarray], weight: float = None) → neuralmonkey.trainers.generic_trainer.Objective¶ Self-critical objective.
Parameters: - decoder – A recurrent decoder.
- reward_function – A reward function computing score in Python.
- weight – Mixing weight for a trainer.
Returns: Objective object to be used in generic trainer.
-
neuralmonkey.trainers.self_critical_objective.
sentence_bleu
(references: numpy.ndarray, hypotheses: numpy.ndarray) → numpy.ndarray¶ Compute index-based sentence-level BLEU score.
Computes sentence level BLEU on indices outputed by the decoder, i.e. whatever the decoder uses as a unit is used a token in the BLEU computation, ignoring the tokens may be sub-word units.
-
neuralmonkey.trainers.self_critical_objective.
sentence_gleu
(references: numpy.ndarray, hypotheses: numpy.ndarray) → numpy.ndarray¶ Compute index-based GLEU score.
GLEU score is a sentence-level metric used in Google’s Neural MT as a reward in reinforcement learning (https://arxiv.org/abs/1609.08144). It is a minimum of precision and recall on 1- to 4-grams.
It operates over the indices emitted by the decoder which are not necessarily tokens (could be characters or subword units).
Unit tests for the multitask trainer.
-
class
neuralmonkey.trainers.test_multitask_trainer.
TestMP
(name: str, reuse: Union[neuralmonkey.model.model_part.ModelPart, NoneType] = None, save_checkpoint: str = None, load_checkpoint: str = None, initializers: List[Tuple[str, Callable]] = None) → None¶ Bases:
neuralmonkey.model.model_part.ModelPart
-
loss
¶
-
-
class
neuralmonkey.trainers.test_multitask_trainer.
TestMultitaskTrainer
(methodName='runTest')¶ Bases:
unittest.case.TestCase
-
setUp
()¶ Hook method for setting up the test fixture before exercising it.
-
classmethod
setUpClass
()¶ Hook method for setting up class fixture before running tests in the class.
-
test_mt_trainer
()¶
-
API checking module.
This module serves as a library of API checks used as assertions during constructing the computational graph.
-
exception
neuralmonkey.checking.
CheckingException
¶ Bases:
Exception
-
neuralmonkey.checking.
assert_same_shape
(tensor_a: tensorflow.python.framework.ops.Tensor, tensor_b: tensorflow.python.framework.ops.Tensor) → None¶ Check if two tensors have the same shape.
-
neuralmonkey.checking.
assert_shape
(tensor: tensorflow.python.framework.ops.Tensor, expected_shape: List[Union[int, NoneType]]) → None¶ Check shape of a tensor.
Parameters: - tensor – Tensor to be chcecked.
- expected_shape – Expected shape where None means the same as in TF and -1 means not checking the dimension.
-
neuralmonkey.checking.
check_dataset_and_coders
(dataset: neuralmonkey.dataset.Dataset, runners: Iterable[neuralmonkey.runners.base_runner.BaseRunner]) → None¶
Implementation of the dataset class.
-
class
neuralmonkey.dataset.
BatchingScheme
(batch_size: int, batch_bucket_span: int = None, token_level_batching: bool = False, bucketing_ignore_series: List[str] = None, use_leftover_buckets: bool = True) → None¶ Bases:
object
-
__init__
(batch_size: int, batch_bucket_span: int = None, token_level_batching: bool = False, bucketing_ignore_series: List[str] = None, use_leftover_buckets: bool = True) → None¶ Construct the baching scheme.
-
batch_size
¶ Number of examples in one mini-batch.
-
batch_bucket_span
¶ The span of the bucket for bucketed batching.
-
token_level_batching
¶ Count the batch_size per individual tokens in the batch instead of examples.
-
bucketing_ignore_series
¶ Series to ignore during bucketing.
-
use_leftover_buckets
¶ Whether to throw out bucket contents at the end of the epoch or to use them.
-
-
-
class
neuralmonkey.dataset.
Dataset
(name: str, iterators: Dict[str, Callable[[], Iterator]], outputs: Dict[str, Tuple[str, Callable[[str, Any], NoneType]]] = None, buffer_size: Tuple[int, int] = None, shuffled: bool = False) → None¶ Bases:
object
Buffered and batched dataset.
This class serves as collection of data series for particular encoders and decoders in the model.
Dataset has a number of data series, which are sequences of data (of any type) that should have the same length. The sequences are loaded in a buffer and can be loaded lazily.
Using the batches method, dataset yields batches, through which the data are accessed by the model.
-
__init__
(name: str, iterators: Dict[str, Callable[[], Iterator]], outputs: Dict[str, Tuple[str, Callable[[str, Any], NoneType]]] = None, buffer_size: Tuple[int, int] = None, shuffled: bool = False) → None¶ Construct a new instance of the dataset class.
Do not call this method from the configuration directly. Instead, use the from_files function of this module.
The dataset iterators are provided through factory functions, which return the opened iterators when called with no arguments.
Parameters: - name – The name for the dataset.
- iterators – A series-iterator generator mapping.
- lazy – If False, load the data from iterators to a list and store the list in memory.
- buffer_size – Use this tuple as a minimum and maximum buffer size for pre-loading data. This should be (a few times) larger than the batch size used for mini-batching. When the buffer size gets under the lower threshold, it is refilled with the new data and optionally reshuffled. If the buffer size is None, all data is loaded into memory.
- shuffled – Whether to shuffle the buffer during batching.
-
batches
(scheme: neuralmonkey.dataset.BatchingScheme) → Iterator[Dataset]¶ Split the dataset into batches.
Parameters: scheme – BatchingScheme configuration object. Returns: Generator yielding the batches.
-
get_series
(name: str) → Iterator¶ Get the data series with a given name.
Parameters: name – The name of the series to fetch. Returns: A freshly initialized iterator over the data series. Raises: KeyError if the series does not exists.
-
has_series
(name: str) → bool¶ Check if the dataset contains a series of a given name.
Parameters: name – Series name Returns: True if the dataset contains the series, False otherwise.
-
maybe_get_series
(name: str) → Union[Iterator, NoneType]¶ Get the data series with a given name, if it exists.
Parameters: name – The name of the series to fetch. Returns: The data series or None if it does not exist.
-
series
¶
-
subset
(start: int, length: int) → neuralmonkey.dataset.Dataset¶ Create a subset of the dataset.
The sub-dataset will inherit the laziness and buffer size and shuffling from the parent dataset.
Parameters: - start – Index of the first data instance in the dataset.
- length – Number of instances to include in the subset.
Returns: A subset Dataset object.
-
-
neuralmonkey.dataset.
from_files
(*args, **kwargs)¶
-
neuralmonkey.dataset.
load
(name: str, series: List[str], data: List[Union[str, List[str], Tuple[Union[str, List[str]], Callable[[List[str]], Any]], Tuple[Callable, str], Callable[[Dataset], Iterator[DataType]]]], outputs: List[Union[Tuple[str, str], Tuple[str, str, Callable[[str, Any], NoneType]]]] = None, buffer_size: int = None, shuffled: bool = False) → neuralmonkey.dataset.Dataset¶ Create a dataset using specification from the configuration.
The dataset provides iterators over data series. The dataset has a buffer, which pre-fetches a given number of the data series lazily. In case the dataset is not lazy (buffer size is None), the iterators are built on top of in-memory arrays. Otherwise, the iterators operate on the data sources directly.
Parameters: - name – The name of the dataset.
- series – A list of names of data series the dataset contains.
- data – The specification of the data sources for each series.
- outputs – A list of output specifications.
- buffer_size – The size of the buffer. If set, the dataset will be loaded lazily into the buffer (useful for large datasets). The buffer size specifies the number of sequences to pre-load. This is useful for pseudo-shuffling of large data on-the-fly. Ideally, this should be (much) larger than the batch size. Note that the buffer gets refilled each time its size is less than half the buffer_size. When refilling, the buffer gets refilled to the specified size.
- shuffled – Whether to shuffle the dataset buffer (done upon refill).
-
neuralmonkey.dataset.
load_dataset_from_files
(name: str, lazy: bool = False, preprocessors: List[Tuple[str, str, Callable]] = None, **kwargs) → neuralmonkey.dataset.Dataset¶ Load a dataset from the files specified by the provided arguments.
Paths to the data are provided in a form of dictionary.
Keyword Arguments: - name – The name of the dataset to use. If None (default), the name will be inferred from the file names.
- lazy – Boolean flag specifying whether to use lazy loading (useful for large files). Note that the lazy dataset cannot be shuffled. Defaults to False.
- preprocessor – A callable used for preprocessing of the input sentences.
- kwargs – Dataset keyword argument specs. These parameters should begin with ‘s_’ prefix and may end with ‘_out’ suffix. For example, a data series ‘source’ which specify the source sentences should be initialized with the ‘s_source’ parameter, which specifies the path and optinally reader of the source file. If runners generate data of the ‘target’ series, the output file should be initialized with the ‘s_target_out’ parameter. Series identifiers should not contain underscores. Dataset-level preprocessors are defined with ‘pre_’ prefix followed by a new series name. In case of the pre-processed series, a callable taking the dataset and returning a new series is expected as a value.
Returns: The newly created dataset.
Provides a high-level API for training and using a model.
-
class
neuralmonkey.experiment.
Experiment
(config_path: str, train_mode: bool = False, overwrite_output_dir: bool = False, config_changes: List[str] = None) → None¶ Bases:
object
-
__init__
(config_path: str, train_mode: bool = False, overwrite_output_dir: bool = False, config_changes: List[str] = None) → None¶ Initialize a Neural Monkey experiment.
Parameters: - config_path – The path to the experiment configuration file.
- train_mode – Indicates whether the model should be prepared for training.
- overwrite_output_dir – Indicates whether an existing experiment should be reused. If True, this overrides the setting in the configuration file.
- config_changes – A list of modifications that will be made to the loaded configuration file before parsing.
-
build_model
() → None¶ Build the configuration and the computational graph.
This function is invoked by all of the main entrypoints of the Experiment class (train, evaluate, run). It manages the building of the TensorFlow graph.
The bulding procedure is executed as follows: 1. Random seeds are set. 2. Configuration is built (instantiated) and normalized. 3. TODO(tf-data) tf.data.Dataset instance is created and registered
in the model parts. (This is not implemented yet!)- Graph executors are “blessed”. This causes the rest of the TF Graph
- to be built.
- Sessions are initialized using the TF Manager object.
Raises: RuntimeError when the model is already built.
-
evaluate
(dataset: neuralmonkey.dataset.Dataset, write_out: bool = False, batch_size: int = None, log_progress: int = 0) → Dict[str, Any]¶ Run the model on a given dataset and evaluate the outputs.
Parameters: - dataset – The dataset on which the model will be executed.
- write_out – Flag whether the outputs should be printed to a file defined in the dataset object.
- batch_size – size of the minibatch
- log_progress – log progress every X seconds
Returns: Dictionary of evaluation names and their values which includes the metrics applied on respective series loss and loss values from the run.
-
classmethod
get_current
() → neuralmonkey.experiment.Experiment¶ Return the experiment that is currently being built.
-
get_initializer
(var_name: str, default: Callable = None) → Union[Callable, NoneType]¶ Return the initializer associated with the given variable name.
Calling the method marks the given initializer as used.
-
get_path
(filename: str, cont_index: int = None) → str¶ Return the path to the most recent version of the given file.
-
load_variables
(variable_files: List[str] = None) → None¶ Load variables from files.
Parameters: variable_files – A list of checkpoint file prefixes. A TF checkpoint is usually three files with a common prefix. This list should have the same number of files as there are sessions in the tf_manager object.
-
model
¶ Get configuration namespace of the experiment.
The Experiment stores the configuration recipe in self.config. When the configuration is built (meaning the classes referenced from the config file are instantiated), it is saved in the model property of the experiment.
Returns: The built namespace config object. Raises: RuntimeError when the configuration model has not been built.
-
run_model
(dataset: neuralmonkey.dataset.Dataset, write_out: bool = False, batch_size: int = None, log_progress: int = 0) → Tuple[List[neuralmonkey.runners.base_runner.ExecutionResult], Dict[str, List[Any]]]¶ Run the model on a given dataset.
Parameters: - dataset – The dataset on which the model will be executed.
- write_out – Flag whether the outputs should be printed to a file defined in the dataset object.
- batch_size – size of the minibatch
- log_progress – log progress every X seconds
Returns: A list of `ExecutionResult`s and a dictionary of the output series.
-
train
() → None¶ Train model specified by this experiment.
This function is one of the main functions (entrypoints) called on the experiment. It builds the model (if needed) and runs the training procedure.
Raises: RuntimeError when the experiment is not intended for training.
-
update_initializers
(initializers: Iterable[Tuple[str, Callable]]) → None¶ Update the dictionary mapping variable names to initializers.
-
-
neuralmonkey.experiment.
create_config
(train_mode: bool = True) → neuralmonkey.config.configuration.Configuration¶
-
neuralmonkey.experiment.
save_git_info
(git_commit_file: str, git_diff_file: str, branch: str = 'HEAD', repo_dir: str = None) → None¶
-
neuralmonkey.experiment.
visualize_embeddings
(sequences: List[neuralmonkey.model.sequence.EmbeddedFactorSequence], output_dir: str) → None¶
Collection of various functions and function wrappers.
-
neuralmonkey.functions.
inverse_sigmoid_decay
(param, rate, min_value: float = 0.0, max_value: float = 1.0, name: Union[str, NoneType] = None, dtype=tf.float32) → tensorflow.python.framework.ops.Tensor¶ Compute an inverse sigmoid decay: k/(k+exp(x/k)).
The result will be scaled to the range (min_value, max_value).
Parameters: - param – The parameter x from the formula.
- rate – Non-negative k from the formula.
-
neuralmonkey.functions.
noam_decay
(learning_rate: float, model_dimension: int, warmup_steps: int) → tensorflow.python.framework.ops.Tensor¶ Return decay function as defined in Vaswani et al., 2017, Equation 3.
https://arxiv.org/abs/1706.03762
lrate = (d_model)^-0.5 * min(step_num^-0.5, step_num * warmup_steps^-1.5)
Parameters: - model_dimension – Size of the hidden states of decoder and encoder
- warmup_steps – Number of warm-up steps
-
neuralmonkey.functions.
piecewise_function
(param, values, changepoints, name=None, dtype=tf.float32)¶ Compute a piecewise function.
Parameters: - param – The function parameter.
- values – List of function values (numbers or tensors).
- changepoints – Sorted list of points where the function changes from one value to the next. Must be one item shorter than values.
-
neuralmonkey.learning_utils.
evaluation
(evaluators, batch, runners, execution_results, result_data)¶ Evaluate the model outputs.
Parameters: - evaluators – List of tuples of series and evaluation functions.
- batch – Batch of data against which the evaluation is done.
- runners – List of runners (contains series ids and loss names).
- execution_results – Execution results that include the loss values.
- result_data – Dictionary from series names to list of outputs.
Returns: Dictionary of evaluation names and their values which includes the metrics applied on respective series loss and loss values from the run.
-
neuralmonkey.learning_utils.
print_final_evaluation
(name: str, eval_result: Dict[str, float]) → None¶ Print final evaluation from a test dataset.
-
neuralmonkey.learning_utils.
run_on_dataset
(tf_manager: neuralmonkey.tf_manager.TensorFlowManager, runners: List[neuralmonkey.runners.base_runner.BaseRunner], dataset: neuralmonkey.dataset.Dataset, postprocess: Union[List[Tuple[str, Callable]], NoneType], batching_scheme: neuralmonkey.dataset.BatchingScheme, write_out: bool = False, log_progress: int = 0) → Tuple[List[neuralmonkey.runners.base_runner.ExecutionResult], Dict[str, List[Any]]]¶ Apply the model on a dataset and optionally write outputs to files.
This function processes the dataset in batches and optionally prints out the execution progress.
Parameters: - tf_manager – TensorFlow manager with initialized sessions.
- runners – A function that runs the code
- dataset – The dataset on which the model will be executed.
- evaluators – List of evaluators that are used for the model evaluation if the target data are provided.
- postprocess – Dataset-level postprocessors
- write_out – Flag whether the outputs should be printed to a file defined in the dataset object.
- batching_scheme – Scheme used for batching.
- log_progress – log progress every X seconds
- extra_fetches – Extra tensors to evaluate for each batch.
Returns: Tuple of resulting sentences/numpy arrays, and evaluation results if they are available which are dictionary function -> value.
-
neuralmonkey.learning_utils.
training_loop
(tf_manager: neuralmonkey.tf_manager.TensorFlowManager, epochs: int, trainers: List[Union[neuralmonkey.trainers.generic_trainer.GenericTrainer, neuralmonkey.trainers.multitask_trainer.MultitaskTrainer]], batching_scheme: neuralmonkey.dataset.BatchingScheme, runners_batching_scheme: neuralmonkey.dataset.BatchingScheme, log_directory: str, evaluators: List[Union[Tuple[str, Any], Tuple[str, str, Any]]], main_metric: str, runners: List[neuralmonkey.runners.base_runner.BaseRunner], train_dataset: neuralmonkey.dataset.Dataset, val_datasets: List[neuralmonkey.dataset.Dataset], test_datasets: Union[List[neuralmonkey.dataset.Dataset], NoneType], log_timer: Callable[[int, float], bool], val_timer: Callable[[int, float], bool], val_preview_input_series: Union[List[str], NoneType], val_preview_output_series: Union[List[str], NoneType], val_preview_num_examples: int, postprocess: Union[List[Tuple[str, Callable]], NoneType], train_start_offset: int, initial_variables: Union[str, List[str], NoneType], final_variables: str) → None¶ Execute the training loop for given graph and data.
Parameters: - tf_manager – TensorFlowManager with initialized sessions.
- epochs – Number of epochs for which the algoritm will learn.
- trainer – The trainer object containg the TensorFlow code for computing the loss and optimization operation.
- batch_size – Number of examples in one mini-batch.
- batching_scheme – Batching scheme specification. Cannot be provided when batch_size is specified.
- log_directory – Directory where the TensordBoard log will be generated. If None, nothing will be done.
- evaluators – List of evaluators. The last evaluator is used as the main. An evaluator is a tuple of the name of the generated series, the name of the dataset series the generated one is evaluated with and the evaluation function. If only one series names is provided, it means the generated and dataset series have the same name.
- runners – List of runners for logging and evaluation runs
- train_dataset – Dataset used for training
- val_dataset – used for validation. Can be Dataset or a list of datasets. The last dataset is used as the main one for storing best results. When using multiple datasets. It is recommended to name them for better Tensorboard visualization.
- test_datasets – List of datasets used for testing
- logging_period – after how many batches should the logging happen. It can also be defined as a time period in format like: 3s; 4m; 6h; 1d; 3m15s; 3seconds; 4minutes; 6hours; 1days
- validation_period – after how many batches should the validation happen. It can also be defined as a time period in same format as logging
- val_preview_input_series – which input series to preview in validation
- val_preview_output_series – which output series to preview in validation
- val_preview_num_examples – how many examples should be printed during validation
- train_start_offset – how many lines from the training dataset should be skipped. The training starts from the next batch.
- runners_batch_size – batch size of runners. Reuses the training batching scheme with bucketing turned off.
- initial_variables – variables used for initialization, for example for continuation of training. Provide it with a path to your model directory and its checkpoint file group common prefix, e.g. “variables.data”, or “variables.data.3” in case of multiple checkpoints per experiment.
- postprocess – A function which takes the dataset with its output series and generates additional series from them.
-
class
neuralmonkey.logging.
Logging
¶ Bases:
object
-
static
debug
(label: str = None)¶
-
debug_disabled_for
= ['']¶
-
static
debug_enabled
()¶
-
debug_enabled_for
= ['none']¶
-
static
log
(color: str = 'yellow') → None¶ Log a message with a colored timestamp.
-
log_file
= None¶
-
static
log_print
() → None¶ Print a string both to console and a log file is it is defined.
-
static
notice
() → None¶ Log a notice with a colored timestamp.
-
static
print_header
(path: str) → None¶ Print the title of the experiment and a set of arguments it uses.
-
static
set_log_file
() → None¶ Set up the file where the logging will be done.
-
strict_mode
= ''¶
-
static
warn
() → None¶ Log a warning.
-
static
-
neuralmonkey.logging.
debug
(message: str, label: str = None)¶
-
neuralmonkey.logging.
debug_enabled
(label: str = None)¶
-
neuralmonkey.logging.
log
(message: str, color: str = 'yellow') → None¶ Log a message with a colored timestamp.
-
neuralmonkey.logging.
log_print
(text: str) → None¶ Print a string both to console and a log file is it is defined.
-
neuralmonkey.logging.
notice
(message: str) → None¶ Log a notice with a colored timestamp.
-
neuralmonkey.logging.
warn
(message: str) → None¶ Log a warning.
-
neuralmonkey.run.
load_runtime_config
(config_path: str) → argparse.Namespace¶ Load a runtime configuration file.
-
neuralmonkey.run.
main
() → None¶
TensorFlow Manager.
TensorFlow manager is a helper object in Neural Monkey which manages TensorFlow sessions, execution of the computation graph, and saving and restoring of model variables.
-
class
neuralmonkey.tf_manager.
TensorFlowManager
(num_sessions: int, num_threads: int, save_n_best: int = 1, minimize_metric: bool = False, gpu_allow_growth: bool = True, per_process_gpu_memory_fraction: float = 1.0, enable_tf_debug: bool = False) → None¶ Bases:
object
Inteface between computational graph, data and TF sessions.
-
sessions
¶ List of active Tensorflow sessions.
-
__init__
(num_sessions: int, num_threads: int, save_n_best: int = 1, minimize_metric: bool = False, gpu_allow_growth: bool = True, per_process_gpu_memory_fraction: float = 1.0, enable_tf_debug: bool = False) → None¶ Initialize a TensorflowManager.
At this moment the graph must already exist. This method initializes required number of TensorFlow sessions and initializes them with provided variable files if they are provided.
Parameters: - num_sessions – Number of sessions to be initialized.
- num_threads – Number of threads sessions will run in.
- save_n_best – How many best models to keep
- minimize_metric – Whether the best model is the one with the lowest or the highest score
- gpu_allow_growth – TF to allocate incrementally, not all at once.
- per_process_gpu_memory_fraction – Limit TF memory use.
-
best_vars_file
¶
-
execute
(batch: neuralmonkey.dataset.Dataset, feedables: Set[neuralmonkey.model.feedable.Feedable], runners: Sequence[neuralmonkey.runners.base_runner.GraphExecutor], train: bool = False, compute_losses: bool = True, summaries: bool = True) → List[neuralmonkey.runners.base_runner.ExecutionResult]¶ Execute runners on a batch of data.
First, extract executables from the provided runners, telling the runners whether to compute also losses and summaries. Second, until all executables are satisfied (have the result attribute set), run the executables on the batch.
Parameters: - batch – A batch of data.
- execution_scripts – List of runners to execute.
- train – Training mode flag (this value is fed to the train_mode placeholders in model parts).
- compute_losses – Flag to runners whether run loss operations.
- summaries – Flag to runners whether to run summary operations.
Returns: A list of ExecutionResult tuples, one for each executable (runner).
-
init_saving
(vars_prefix: str) → None¶
-
initialize_model_parts
(runners: Sequence[neuralmonkey.runners.base_runner.GraphExecutor], save: bool = False) → None¶ Initialize model parts variables from their checkpoints.
-
initialize_sessions
() → None¶
-
restore
(variable_files: Union[str, List[str]]) → None¶
-
restore_best_vars
() → None¶
-
save
(variable_files: Union[str, List[str]]) → None¶
-
validation_hook
(score: float, epoch: int, batch: int) → None¶
-
-
neuralmonkey.tf_manager.
get_default_tf_manager
() → neuralmonkey.tf_manager.TensorFlowManager¶
A set of helper functions for TensorFlow.
-
neuralmonkey.tf_utils.
append_tensor
(tensor: tensorflow.python.framework.ops.Tensor, appendval: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶ Append an
N
-D Tensor to an(N+1)
-D Tensor.Parameters: - tensor – The original Tensor
- appendval – The Tensor to add
Returns: An
(N+1)
-D Tensor withappendval
on the last position.
-
neuralmonkey.tf_utils.
gather_flat
(x: tensorflow.python.framework.ops.Tensor, indices: tensorflow.python.framework.ops.Tensor, batch_size: Union[int, tensorflow.python.framework.ops.Tensor] = 1, beam_size: Union[int, tensorflow.python.framework.ops.Tensor] = 1) → tensorflow.python.framework.ops.Tensor¶ Gather values from the flattened (shape=[batch * beam, …]) input.
This function expects a flattened tensor with first dimension of size batch x beam elements. Using the given batch and beam size, it reshapes the input tensor to a tensor of shape
(batch, beam, ...)
and gather the values from it using the index tensor.Parameters: - x – A flattened
Tensor
from which to gather values. - indices – Index tensor.
- batch_size – The size of the batch.
- beam_size – The size of the beam.
Returns: The
Tensor
of gathered values.- x – A flattened
-
neuralmonkey.tf_utils.
get_initializer
(var_name: str, default: Callable = None) → Union[Callable, NoneType]¶ Return the initializer associated with the given variable name.
The name of the current variable scope is prepended to the variable name.
This should only be called during model building.
-
neuralmonkey.tf_utils.
get_shape_list
(x: tensorflow.python.framework.ops.Tensor) → List[Union[int, tensorflow.python.framework.ops.Tensor]]¶ Return list of dims, statically where possible.
Compute the static shape of a tensor. Where the dimension is not static (e.g. batch or time dimension), symbolic Tensor is returned.
Based on tensor2tensor.
Parameters: x – The Tensor
to process.Returns: A list of integers and Tensors.
-
neuralmonkey.tf_utils.
get_state_shape_invariants
(state: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.tensor_shape.TensorShape¶ Return the shape invariant of a tensor.
This function computes the loosened shape invariant of a state tensor. Only invariant dimension is the state size dimension, which is the last.
Based on tensor2tensor.
Parameters: state – The state tensor. Returns: A TensorShape
object with all but the last dimensions set toNone
.
-
neuralmonkey.tf_utils.
get_variable
(name: str, shape: List[int] = None, dtype: tensorflow.python.framework.dtypes.DType = None, initializer: Callable = None, **kwargs) → tensorflow.python.ops.variables.Variable¶ Get an existing variable with these parameters or create a new one.
This is a wrapper around tf.get_variable. The initializer parameter is treated as a default which can be overriden by a call to update_initializers.
This should only be called during model building.
-
neuralmonkey.tf_utils.
layer_norm
(x: tensorflow.python.framework.ops.Tensor, epsilon: float = 1e-06) → tensorflow.python.framework.ops.Tensor¶ Layer normalize the tensor x, averaging over the last dimension.
Implementation based on tensor2tensor.
Parameters: - x – The
Tensor
to normalize. - epsilon – The smoothing parameter of the normalization.
Returns: The normalized tensor.
- x – The
-
neuralmonkey.tf_utils.
partial_transpose
(x: tensorflow.python.framework.ops.Tensor, indices: List[int]) → tensorflow.python.framework.ops.Tensor¶ Do a transpose on a subset of tensor dimensions.
Compute a permutation of first k dimensions of a tensor.
Parameters: - x – The
Tensor
to transpose. - indices – The permutation of the first k dimensions of
x
.
Returns: The transposed tensor.
- x – The
-
neuralmonkey.tf_utils.
tf_print
(tensor: tensorflow.python.framework.ops.Tensor, message: str = None, debug_label: str = None) → tensorflow.python.framework.ops.Tensor¶ Print the value of a tensor to the debug log.
Better than tf.Print, logs to console only when the “tensorval” debug subject is turned on.
Idea found at: https://stackoverflow.com/a/39649614
Parameters: tensor – The tensor whose value to print Returns: As tf.Print, this function returns a tensor identical to the input tensor, with the printing side-effect added.
-
neuralmonkey.tf_utils.
update_initializers
(initializers: Iterable[Tuple[str, Callable]]) → None¶
Training script for sequence to sequence learning.
-
neuralmonkey.train.
main
() → None¶
Vocabulary class module.
This module implements the Vocabulary class and the helper functions that can be used to obtain a Vocabulary instance.
-
class
neuralmonkey.vocabulary.
Vocabulary
(tokenized_text: List[str] = None, unk_sample_prob: float = 0.0) → None¶ Bases:
collections.abc.Sized
-
__init__
(tokenized_text: List[str] = None, unk_sample_prob: float = 0.0) → None¶ Create a new instance of a vocabulary.
Parameters: tokenized_text – The initial list of words to add.
-
add_characters
(word: str) → None¶
-
add_tokenized_text
(tokenized_text: List[str]) → None¶ Add words from a list to the vocabulary.
Parameters: tokenized_text – The list of words to add.
-
add_word
(word: str, occurences: int = 1) → None¶ Add a word to the vocablulary.
Parameters: - word – The word to add. If it’s already there, increment the count.
- occurences – increment the count of word by the number of occurences
-
get_unk_sampled_word_index
(word)¶ Return index of the specified word with sampling of unknown words.
This method returns the index of the specified word in the vocabulary. If the frequency of the word in the vocabulary is 1 (the word was only seen once in the whole training dataset), with probability of self.unk_sample_prob, generate the index of the unknown token instead.
Parameters: word – The word to look up. Returns: Index of the word, index of the unknown token if sampled, or index of the unknown token if the word is not present in the vocabulary.
-
get_word_index
(word: str) → int¶ Return index of the specified word.
Parameters: word – The word to look up. Returns: Index of the word or index of the unknown token if the word is not present in the vocabulary.
-
log_sample
(size: int = 5) → None¶ Log a sample of the vocabulary.
Parameters: size – How many sample words to log.
-
save_wordlist
(path: str, overwrite: bool = False, save_frequencies: bool = False, encoding: str = 'utf-8') → None¶ Save the vocabulary as a wordlist.
The file is ordered by the ids of words. This function is used mainly for embedding visualization.
Parameters: - path – The path to save the file to.
- overwrite – Flag whether to overwrite existing file. Defaults to False.
- save_frequencies – flag if frequencies should be stored. This parameter adds header into the output file.
Raises: - FileExistsError if the file exists and overwrite flag is
disabled.
-
sentences_to_tensor
(sentences: List[List[str]], max_len: int = None, pad_to_max_len: bool = True, train_mode: bool = False, add_start_symbol: bool = False, add_end_symbol: bool = False) → Tuple[numpy.ndarray, numpy.ndarray]¶ Generate the tensor representation for the provided sentences.
Parameters: - sentences – List of sentences as lists of tokens.
- max_len – If specified, all sentences will be truncated to this length.
- pad_to_max_len – If True, the tensor will be padded to max_len, even if all of the sentences are shorter. If False, the shape of the tensor will be determined by the maximum length of the sentences in the batch.
- train_mode – Flag whether we are training or not (enables/disables unk sampling).
- add_start_symbol – If True, the <s> token will be added to the beginning of each sentence vector. Enabling this option extends the maximum length by one.
- add_end_symbol – If True, the </s> token will be added to the end of each sentence vector, provided that the sentence is shorter than max_len. If not, the end token is not added. Unlike add_start_symbol, enabling this option does not alter the maximum length.
Returns: A tuple of a sentence tensor and a padding weight vector.
The shape of the tensor representing the sentences is either (batch_max_len, batch_size) or (batch_max_len+1, batch_size), depending on the value of the add_start_symbol argument. batch_max_len is the length of the longest sentence in the batch (including the optional </s> token), limited by max_len (if specified).
The shape of the padding vector is the same as of the sentence vector.
-
truncate
(size: int) → None¶ Truncate the vocabulary to the requested size.
The infrequent tokens are discarded.
Parameters: size – The final size of the vocabulary
-
truncate_by_min_freq
(min_freq: int) → None¶ Truncate the vocabulary only keeping words with a minimum frequency.
Parameters: min_freq – The minimum frequency of included words.
-
vectors_to_sentences
(vectors: Union[List[numpy.ndarray], numpy.ndarray]) → List[List[str]]¶ Convert vectors of indexes of vocabulary items to lists of words.
Parameters: vectors – List of vectors of vocabulary indices. Returns: List of lists of words.
-
-
neuralmonkey.vocabulary.
from_dataset
(datasets: List[neuralmonkey.dataset.Dataset], series_ids: List[str], max_size: int, save_file: str = None, overwrite: bool = False, min_freq: Union[int, NoneType] = None, unk_sample_prob: float = 0.5) → neuralmonkey.vocabulary.Vocabulary¶ Load a vocabulary from a dataset with an option to save it.
Parameters: - datasets – A list of datasets from which to create the vocabulary
- series_ids – A list of ids of series of the datasets that should be used producing the vocabulary
- max_size – The maximum size of the vocabulary
- save_file – A file to save the vocabulary to. If None (default), the vocabulary will not be saved.
- overwrite – Overwrite existing file.
- min_freq – Do not include words with frequency smaller than this.
- unk_sample_prob – The probability with which to sample unks out of words with frequency 1. Defaults to 0.5.
Returns: The new Vocabulary instance.
-
neuralmonkey.vocabulary.
from_file
(*args, **kwargs) → neuralmonkey.vocabulary.Vocabulary¶
-
neuralmonkey.vocabulary.
from_nematus_json
(path: str, max_size: int = None, pad_to_max_size: bool = False) → neuralmonkey.vocabulary.Vocabulary¶ Load vocabulary from Nematus JSON format.
The JSON format is a flat dictionary that maps words to their index in the vocabulary.
Parameters: - path – Path to the file.
- max_size – Maximum vocabulary size including ‘unk’ and ‘eos’ symbols, but not including <pad> and <s> symbol.
- pad_to_max_size – If specified, the vocabulary is padded with dummy symbols up to the specified maximum size.
-
neuralmonkey.vocabulary.
from_t2t_vocabulary
(path: str, encoding: str = 'utf-8') → neuralmonkey.vocabulary.Vocabulary¶ Load a vocabulary generated during tensor2tensor training.
Parameters: - path – The path to the vocabulary file.
- encoding – The encoding of the vocabulary file (defaults to UTF-8).
Returns: The new Vocabulary instantce.
-
neuralmonkey.vocabulary.
from_wordlist
(path: str, encoding: str = 'utf-8', contains_header: bool = True, contains_frequencies: bool = True) → neuralmonkey.vocabulary.Vocabulary¶ Load a vocabulary from a wordlist.
The file can contain either list of words with no header. Or it can contain words and their counts separated by tab and a header on the first line.
Parameters: - path – The path to the wordlist file
- encoding – The encoding of the wordlist file (defaults to UTF-8)
- contains_header – if the file have a header on first line
- contains_frequencies – if the file contains frequencies in second column
Returns: The new Vocabulary instance.
-
neuralmonkey.vocabulary.
initialize_vocabulary
(directory: str, name: str, datasets: List[neuralmonkey.dataset.Dataset] = None, series_ids: List[str] = None, max_size: int = None) → neuralmonkey.vocabulary.Vocabulary¶ Initialize a vocabulary.
This function is supposed to initialize vocabulary when called from the configuration file. It first checks whether the vocabulary is already loaded on the provided path and if not, it tries to generate it from the provided dataset.
Parameters: - directory – Directory where the vocabulary should be stored.
- name – Name of the vocabulary which is also the name of the file it is stored it.
- datasets – A a list of datasets from which the vocabulary can be created.
- series_ids – A list of ids of series of the datasets that should be used for producing the vocabulary.
- max_size – The maximum size of the vocabulary
Returns: The new vocabulary
-
neuralmonkey.vocabulary.
is_special_token
(word: str) → bool¶ Check whether word is a special token (such as <pad> or <s>).
Parameters: word – The word to check Returns: True if the word is special, False otherwise.
The neuralmonkey package is the root package of this project.
Visualization¶
LogBook¶
Neural Monkey LogBook is a simple web application for preview the outputs of the experiments in the browser.
The experiment data are stored in a directory structure, where each experiment directory contains the experiment configuration, state of the git repository, the experiment was executed with, detailed log of the computation and other files necessary to execute the model that has been trained.
LogBook is meant as a complement to using TensorBoard, whose summaries are stored in the same directory structure.
How to run it¶
You can run the server using the following command:
bin/neuralmonkey-logbook --logdir=<experiments> --port=<port> --host=<host>
where <experiments> is the directory where the experiments are listed and <port> is the number of the port the server will run on, and <host> is the IP address of the host (defaults to 127.0.0.1, if you want the logbook to be visible to other computers in the network, set the host to 0.0.0.0)
Then you can navigate in your browser to http://localhost:<port> to view the experiment logs.
TensorBoard¶
You can use TensorBoard <https://www.tensorflow.org/versions/r0.9/how_tos/summaries_and_tensorboard/index.html> to visualize your TensorFlow graph, see summaries of quantitative metrics about the execution of your graph, and show additional data like images that pass through it.
You can start it by following command:
tensorboard --logdir=<experiments>
And then you can navigate in your browser to http://localhost:6006/ (or if the TensorBoard assigns different port) and view all the summaries about your experiment.
How to read TensorBoard¶
The step in the TensorBoard is describing how many inputs (not batches) was processed.
Attention Visualization¶
If you are using an attention decoder, visualization of the soft alignment of each sentence in the first validation batch will appear in the Images tab in TensorBoard. The images might look like this:

Here, the source sentence is on the vertical axis and the target sentence on
the horizontal axis. The size of each image is max_output_len * max_input_len
so most of the time, there will be some blank rows at the bottom and some trailing columns with “phantom” attention (corresponding to positions after the end of the output sentence).
You can use the tf_save_images.py
script to save the whole history of images as a sequence of PNG files:
# For the first sentence in the batch
scripts/tf_save_images.py events.out attention_0/image/0 --prefix images/attention_0_
Use feh
to view the images as a time-lapse:
feh -g 300x300 -Z --force-aliasing --slideshow-delay 0.2 images/attention_0_*.png
Or enlarge them and turn them into an animated GIF using:
convert images/attention_0_*.png -scale 300x300 images/attention_0.gif
Advanced Features¶
Byte Pair Encoding¶
This is explained in the machine translation tutorial.
Dropout¶
Neural networks with a large number of parameters have a serious problem with an overfitting. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. But during the test time, the dropout is turned off. More information in https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf
If you want to enable dropout on an encoder or on the decoder, you can simply add dropout_keep_prob to the particular section:
[encoder]
class=encoders.recurrent.SentenceEncoder
dropout_keep_prob=0.8
...
or:
[decoder]
class=decoders.decoder.Decoder
dropout_keep_prob=0.8
...
Pervasive Dropout¶
Detailed information in https://arxiv.org/abs/1512.05287
If you want allow dropout on the recurrent layer of your encoder, you can add use_pervasive_dropout parameter into it and then the dropout probability will be used:
[encoder]
class=encoders.recurrent.SentenceEncoder
dropout_keep_prob=0.8
use_pervasive_dropout=True
...
Attention Seeded by GIZA++ Word Alignments¶
todo: OC to reference the paper and describe how to use this in NM
Use SGE cluster array job for inference¶
To speed up the inference, the neuralmonkey-run
binary provides the
--grid
option, which can be used when running the program as a SGE array
job.
The run
script make use of the SGE_TASK_ID
and SGE_TASK_STEPSIZE
environment variables that are set in each computing node of the array job.
If the --grid
option is supplied and these variables are present, it runs
the inference only on a subset of the dataset, specified by the variables.
Consider this example test_data.ini
:
[main]
test_datasets=[<dataset>]
variables=["path/to/variables.data"]
[dataset]
class=dataset.load_dataset_from_files
s_source="data/source.en"
s_target_out="out/target.de"
If we want to run a model configured in model.ini
on this dataset, we can
do:
neuralmonkey-run model.ini test_data.ini
And the program executes the model on the dataset loaded from
data/source.en
and stores the results in out/target.de
.
If the source file is large or if you use a slow inference method (such as beam
search), you may want to split the source file into smaller parts and execute
the model on all of them in parallel. If you have access to a SGE cluster, you
don’t have to do it manually - just create an array job and supply the
--grid
option to the program. Now, suppose that the source file contains
100,000 sentences and you want to split it to 100 parts and run it on
cluster. To accomplish this, just run:
qsub <qsub_options> -t 1-100000:1000 -b y \
"neuralmonkey-run --grid model.ini test_data.ini"
This will submit 100 jobs to your cluster. Each job will use its
SGE_TASK_ID
and SGE_TASK_STEPSIZE
parameters to determine its part of
the data to process. It then runs the inference only on the subset of the
dataset and stores the result in a suffixed file.
For example, if the SGE_TASK_ID
is 3, the SGE_TASK_STEPSIZE
is 100, and
the --grid
option is specified, the inference will be run on lines 201 to
300 of the file data/source.en
and the output will be written to
out/target.de.0000000200
.
After all the jobs are finished, you just need to manually run:
cat out/target.de.* > out/target.de
and delete the intermediate files. (Careful when your file has more than 10^10 lines - you need to concatenate the intermediate files in the right order!)
GPU Benchmarks¶
We have done some benchmarks on our department to find out differences between GPUs and we have decided to shared them here. Therefore they do not test speed of Neural Monkey, but they test different GPU cards with the same setup in Neural Monkey.
The benchmark test consisted of one epoch of Machine Translation training in Neural Monkey on a set of fixed data. The size of the model nicely fit into the 2GB memory, therefore GPUs with more memory could have better results with bigger models in comparison to CPUs. All GPUs have CUDA8.0
Setup (cc=cuda capability) | Running time |
GeForce GTX 1080; cc6.1 | 9:55:58 |
GeForce GTX 1080; cc6.1 | 10:19:40 |
GeForce GTX 1080; cc6.1 | 12:34:34 |
GeForce GTX 1080; cc6.1 | 13:01:05 |
GeForce GTX Titan Z; cc3.5 | 16:05:24 |
Tesla K40c; cc3.5 | 22:41:01 |
Tesla K40c; cc3.5 | 22:43:10 |
Tesla K40c; cc3.5 | 24:19:45 |
16 cores Intel Xeon Sandy Bridge 2012 CPU | 46:33:14 |
16 cores Intel Xeon Sandy Bridge 2012 CPU | 52:36:56 |
Quadro K2000; cc3.0 | 59:47:58 |
8 cores Intel Xeon Sandy Bridge 2012 CPU | 60:39:17 |
GeForce GT 630; cc3.0 | 103:42:30 |
8 cores Intel Xeon Westmere 2010 CPU | 134:41:22 |
Development Guidelines¶
Github Workflow¶
This is a brief document about the Neural Monkey development workflow. Its primary aim is to describe the environment around the Github repository (e.g. continuous integration tests, documentation), pull requests, code-review, etc.
This document is written chronologically, from the point of view of a contributor.
Creating an issue¶
Everytime there is a need to change the codebase, the contributor should create a corresponing issue in the Github repository.
The name of the issue should be comprehensive, and should summarize the issue in less than 10 words. In the issue description, all the relevant information should be mentioned, and, if applicable, a sketch of the solution should be given so the fashion and method of the solution can be subject to further discussion.
Labels¶
There is a number of label tags to use to provide an easier way to orient among the issues. Here is an explanation of some of them, so they are not used incorrectly (notably, there is a slight difference between “enhancement” and “feature”).
- bug: Use when there is something wrong in the current codebase that needs to be fixed. For example, “Random seeds are not working”
- documentation: Use when the main topic of the issue or pull request is to contribute to the documentation (be it a rst document or a request for more docstrings)
- tests: Similarly to documentation, use if the main topic of the issue is to write a test or to do changes to the testing process itself.
- feature: A request for implementing a feature regarding the training of the models or the models themselves, e.g. “Minimum risk training” or “Implementation of conditional GRU”.
- enhancement: A request for implementing a feature to Neural Monkey aimed at improving the user experience with the package, e.g. “GPU profiling” or “Logging of config building”.
- help wanted: Used as an additional label, which specify that solving the issue is suitable either for new contributors or for researchers who want to try out a feature, which would be otherwise implemented after a longer time.
- refactor: Refactor issues are requests for cleaning the codebase, using better ways to achieve the same results, conforming to a future API, etc. For example, “Rewrite decoder using decorators”
Todo
Replace text with label pictures from Github
Selecting an issue to work on and assigning people¶
Note
If you want to start working on something and don’t have a preference, check out the issues labeled “Help wanted”
When you decide to work on an issue, assign yourself to it and describe your plans on how you will proceed (in case there is no solution sketch provided in the issue description). This way, others may comment on your plans prior to the work, which can save a lot of time.
Please make sure that you put all additional information as a comment to the issue in case the issue has been discussed elsewhere.
Creating a branch¶
Prior to writing code (or at least before the first commit), you should create a
branch for solution of the issue. This command creates a new branch called
your_branch_name
and switches your working copy to that branch:
$ git checkout -b your_branch_name
Writing code¶
On the new branch, you can make changes and commit, until your solution is done.
It is worth noting that we are trying to keep our code clean by enforcing some code writing rules and guidelines. These are automatically check by Travis CI on each push to the Github repository. Here is a list of tools used to check the quality of the code:
Todo
provide short description to the tools, check that markdownlint has correct URL
You can run the tests on your local machine by using scripts (and requirements)
from the tests/
directory of this package,
This is a usual mantra that you can use for committing and pushing to the remote branch in the repository:
$ git add .
$ git commit -m 'your commit message'
$ git push origin your_branch_name
Note
If you are working on a branch with someone else, it is always a good
idea to do a git pull --rebase
before pushing. This command
updates your branch with remote changes and apply your new commits on
top of them.
Warning
If your commit message contains the string [ci skip]
the
continuous integration tests are not run. However, try not to use
this feature unless you know what you’re doing.
Creating a pull request¶
Whenever you want to add a feature or push a bugfix, you should make a new pull request, which can be reviewed and merged by someone else. The typical workflow should be as follows:
- Create a new branch, make your changes and push them to the repository.
- You should now see the new branch on the Github project page. When you open the branch page, click on “Create Pull request” button.
- When the pull request is created, the continuous integration tests are run on Travis. You can see the status of the test run on the pull request page. There is also a link to Travis so you can inspect the results of the test run, and make additional changes in order to make the tests successful, if needed. Additionally to the code quality checking tools, unit and regression tests are run as well.
When you create a pull request, assign one or two people to do the review.
Code review and merging¶
Your pull requests should always be subject to code review. After you create the pull request, select one or two contributors and assign them to make a review.
This phase consists of discussion about the introduced changes, suggestions, and another requirements made by the reviewers. Anyone who wants to do a review can contribute, the reviewer roles are not considered exclusive.
After all of the reviewers’ comments have been addressed and the reviewers approved the pull request, the pull request can be merged. It is usually a good idea to rebase the code to the recent version of master. Assuming your working copy is switched to the master branch, do:
$ git pull --rebase
$ git checkout your_branch_name
$ git rebase master
These commands first update your local copy of master from the remote
repository, then switch your working copy to the your_branch_name
branch,
and then rebases the branch on the updated master.
Rebasing is a process in which commits from a branch (your_branch_name
) are
applied on a second branch (master), and the new HEAD is marked as the first
branch.
Warning
Rebasing is a process which overwrites history. Therefore be absolutely sure that you know what are you doing. Usually if you work on a branch alone, rebasing is a safe procedure.
When the branch is rebased, you have to force-push it to the repository:
$ git push -f origin your_branch_name
This command overwrites the your branch in the remote repository with your local branch (which is now rebased on master, and therefore, up-to-date)
Note
You can use rebasing also for updating your branch to work with newer versions of master instead of merging the master in the branch. Bear in mind though, that you should force-push these updates, so no-one works on the outdated version of the branch.
Finally, one more round of tests is run and if everything is OK, you can click
the “Merge pull request” button, which executes the merge. You can also click
another button to delete the your_branch_name
branch from the repository
after the merge.
Documentation¶
Documentation related to GitHub is written in Markdown files, Python documentation
using reStructuredText. This
concerns both the standalone documents (in /docs/
) and the docstrings in
source code.
Style of the Markdown files is automatically checked using Markdownlint.
Running tests¶
Every time a commit is pushed to the Github repository, the tests are run on Travis CI.
If you want to run the tests locally, install the required tools:
(nm)$ pip install --upgrade -r <(cat tests/*_requirements.txt)
Test scripts¶
Test scripts located in the tests directory:
- tests_run.sh runs training with small dataset and small.ini configuration
- unit-tests_run.sh runs unit tests
- lint_run.sh runs pylint
- mypy_run.sh runs mypy
All the scripts should be run from the main directory of the repository. There is also run_tests.sh in the main directory, that runs all the tests above.