neuralmonkey.dataset.helpers module¶
Helper functions for building datasets.
-
neuralmonkey.dataset.helpers.
from_files
(name: str, lazy: bool = False, preprocessors: List[Tuple[str, str, Callable]] = None, **kwargs) → neuralmonkey.dataset.dataset.Dataset¶ Load a dataset from the files specified by the provided arguments.
Paths to the data are provided in a form of dictionary.
Keyword Arguments: - name – The name of the dataset to use. If None (default), the name will be inferred from the file names.
- lazy – Boolean flag specifying whether to use lazy loading (useful for large files). Note that the lazy dataset cannot be shuffled. Defaults to False.
- preprocessor – A callable used for preprocessing of the input sentences.
- kwargs – Dataset keyword argument specs. These parameters should begin with ‘s_’ prefix and may end with ‘_out’ suffix. For example, a data series ‘source’ which specify the source sentences should be initialized with the ‘s_source’ parameter, which specifies the path and optinally reader of the source file. If runners generate data of the ‘target’ series, the output file should be initialized with the ‘s_target_out’ parameter. Series identifiers should not contain underscores. Dataset-level preprocessors are defined with ‘pre_’ prefix followed by a new series name. In case of the pre-processed series, a callable taking the dataset and returning a new series is expected as a value.
Returns: The newly created dataset.
Raises: Exception when no input files are provided.
-
neuralmonkey.dataset.helpers.
load_dataset_from_files
(name: str, lazy: bool = False, preprocessors: List[Tuple[str, str, Callable]] = None, **kwargs) → neuralmonkey.dataset.dataset.Dataset¶ Load a dataset from the files specified by the provided arguments.
Paths to the data are provided in a form of dictionary.
Keyword Arguments: - name – The name of the dataset to use. If None (default), the name will be inferred from the file names.
- lazy – Boolean flag specifying whether to use lazy loading (useful for large files). Note that the lazy dataset cannot be shuffled. Defaults to False.
- preprocessor – A callable used for preprocessing of the input sentences.
- kwargs – Dataset keyword argument specs. These parameters should begin with ‘s_’ prefix and may end with ‘_out’ suffix. For example, a data series ‘source’ which specify the source sentences should be initialized with the ‘s_source’ parameter, which specifies the path and optinally reader of the source file. If runners generate data of the ‘target’ series, the output file should be initialized with the ‘s_target_out’ parameter. Series identifiers should not contain underscores. Dataset-level preprocessors are defined with ‘pre_’ prefix followed by a new series name. In case of the pre-processed series, a callable taking the dataset and returning a new series is expected as a value.
Returns: The newly created dataset.
Raises: Exception when no input files are provided.