neuralmonkey.processors package¶
Submodules¶
neuralmonkey.processors.alignment module¶
-
class
neuralmonkey.processors.alignment.
WordAlignmentPreprocessor
(source_len, target_len, dtype=<class 'numpy.float32'>, normalize=True, zero_based=True)¶ Bases:
object
A preprocessor for word alignments in a text format.
One of the following formats is expected:
s1-t1 s2-t2 ...
s1:1/w1 s2:t2/w2 ...
where each s and t is the index of a word in the source and target sentence, respectively, and w is the corresponding weight. If the weight is not given, it is assumend to be 1. The separators - and : are interchangeable.
The output of the preprocessor is an alignment matrix of the fixed shape (target_len, source_len) for each sentence.
neuralmonkey.processors.bpe module¶
-
class
neuralmonkey.processors.bpe.
BPEPostprocessor
(separator: str = '@@') → None¶ Bases:
object
-
decode
(sentence: typing.List[str]) → typing.List[str]¶
-
-
class
neuralmonkey.processors.bpe.
BPEPreprocessor
(merge_file: str, separator: str = '@@', encoding: str = 'utf-8') → None¶ Bases:
object
Wrapper class for Byte-Pair Encoding.
Paper: https://arxiv.org/abs/1508.07909 Code: https://github.com/rsennrich/subword-nmt
neuralmonkey.processors.editops module¶
-
class
neuralmonkey.processors.editops.
Postprocess
(source_id: str, edits_id: str, result_postprocess: typing.Callable[[typing.Iterable[typing.List[str]]], typing.Iterable[typing.List[str]]] = None) → None¶ Bases:
object
Proprocessor applying edit operations on a series.
-
class
neuralmonkey.processors.editops.
Preprocess
(source_id: str, target_id: str) → None¶ Bases:
object
Preprocessor transorming two series into series of edit operations.
-
neuralmonkey.processors.editops.
convert_to_edits
(source: typing.List[str], target: typing.List[str]) → typing.List[str]¶
-
neuralmonkey.processors.editops.
reconstruct
(source: typing.List[str], edits: typing.List[str]) → typing.List[str]¶
neuralmonkey.processors.german module¶
-
class
neuralmonkey.processors.german.
GermanPostprocessor
(compounding=True, contracting=True, pronouns=True)¶ Bases:
object
-
decode
(sentence)¶
-
-
class
neuralmonkey.processors.german.
GermanPreprocessor
(compounding=True, contracting=True, pronouns=True)¶ Bases:
object
neuralmonkey.processors.helpers module¶
-
neuralmonkey.processors.helpers.
pipeline
(processors: typing.List[typing.Callable]) → typing.Callable¶ Concatenate processors.
-
neuralmonkey.processors.helpers.
postprocess_char_based
(sentences: typing.List[typing.List[str]]) → typing.List[typing.List[str]]¶
-
neuralmonkey.processors.helpers.
preprocess_char_based
(sentence: typing.List[str]) → typing.List[str]¶
-
neuralmonkey.processors.helpers.
untruecase
(sentences: typing.List[typing.List[str]]) → typing.Generator[[typing.List[str], NoneType], NoneType]¶
neuralmonkey.processors.speech module¶
-
neuralmonkey.processors.speech.
SpeechFeaturesPreprocessor
(feature_type: str = 'mfcc', delta_order: int = 0, delta_window: int = 2, **kwargs) → typing.Callable¶ Calculate speech features.
First, the given type of features (e.g. MFCC) is computed using a window of length winlen and step winstep; for additional keyword arguments (specific to each feature type), see http://python-speech-features.readthedocs.io/. Then, delta features up to delta_order are added.
By default, 13 MFCCs per frame are computed. To add delta and delta-delta features (resulting in 39 coefficients per frame), set delta_order=2.
Parameters: - feature_type – mfcc, fbank, logfbank or ssc (default is mfcc)
- delta_order – maximum order of the delta features (default is 0)
- delta_window – window size for delta features (default is 2)
- **kwargs – keyword arguments for the appropriate function from python_speech_features
Returns: A numpy array of shape [num_frames, num_features].