neuralmonkey.processors package

Submodules

neuralmonkey.processors.alignment module

class neuralmonkey.processors.alignment.WordAlignmentPreprocessor(source_len, target_len, dtype=<class 'numpy.float32'>, normalize=True, zero_based=True)

Bases: object

A preprocessor for word alignments in a text format.

One of the following formats is expected:

s1-t1 s2-t2 ...

s1:1/w1 s2:t2/w2 ...

where each s and t is the index of a word in the source and target sentence, respectively, and w is the corresponding weight. If the weight is not given, it is assumend to be 1. The separators - and : are interchangeable.

The output of the preprocessor is an alignment matrix of the fixed shape (target_len, source_len) for each sentence.

neuralmonkey.processors.bpe module

class neuralmonkey.processors.bpe.BPEPostprocessor(separator: str = '@@') → None

Bases: object

decode(sentence: typing.List[str]) → typing.List[str]
class neuralmonkey.processors.bpe.BPEPreprocessor(merge_file: str, separator: str = '@@', encoding: str = 'utf-8') → None

Bases: object

Wrapper class for Byte-Pair Encoding.

Paper: https://arxiv.org/abs/1508.07909 Code: https://github.com/rsennrich/subword-nmt

neuralmonkey.processors.editops module

class neuralmonkey.processors.editops.Postprocess(source_id: str, edits_id: str, result_postprocess: typing.Callable[[typing.Iterable[typing.List[str]]], typing.Iterable[typing.List[str]]] = None) → None

Bases: object

Proprocessor applying edit operations on a series.

class neuralmonkey.processors.editops.Preprocess(source_id: str, target_id: str) → None

Bases: object

Preprocessor transorming two series into series of edit operations.

neuralmonkey.processors.editops.convert_to_edits(source: typing.List[str], target: typing.List[str]) → typing.List[str]
neuralmonkey.processors.editops.reconstruct(source: typing.List[str], edits: typing.List[str]) → typing.List[str]

neuralmonkey.processors.german module

class neuralmonkey.processors.german.GermanPostprocessor(compounding=True, contracting=True, pronouns=True)

Bases: object

decode(sentence)
class neuralmonkey.processors.german.GermanPreprocessor(compounding=True, contracting=True, pronouns=True)

Bases: object

neuralmonkey.processors.helpers module

neuralmonkey.processors.helpers.pipeline(processors: typing.List[typing.Callable]) → typing.Callable

Concatenate processors.

neuralmonkey.processors.helpers.postprocess_char_based(sentences: typing.List[typing.List[str]]) → typing.List[typing.List[str]]
neuralmonkey.processors.helpers.preprocess_char_based(sentence: typing.List[str]) → typing.List[str]
neuralmonkey.processors.helpers.untruecase(sentences: typing.List[typing.List[str]]) → typing.Generator[[typing.List[str], NoneType], NoneType]

neuralmonkey.processors.speech module

neuralmonkey.processors.speech.SpeechFeaturesPreprocessor(feature_type: str = 'mfcc', delta_order: int = 0, delta_window: int = 2, **kwargs) → typing.Callable

Calculate speech features.

First, the given type of features (e.g. MFCC) is computed using a window of length winlen and step winstep; for additional keyword arguments (specific to each feature type), see http://python-speech-features.readthedocs.io/. Then, delta features up to delta_order are added.

By default, 13 MFCCs per frame are computed. To add delta and delta-delta features (resulting in 39 coefficients per frame), set delta_order=2.

Parameters:
  • feature_type – mfcc, fbank, logfbank or ssc (default is mfcc)
  • delta_order – maximum order of the delta features (default is 0)
  • delta_window – window size for delta features (default is 2)
  • **kwargs – keyword arguments for the appropriate function from python_speech_features
Returns:

A numpy array of shape [num_frames, num_features].

Module contents