neuralmonkey.dataset.lazy_dataset module¶
Lazy dataset which does not load the whole data into memory.
-
class
neuralmonkey.dataset.lazy_dataset.
LazyDataset
(name: str, series_paths_and_readers: Dict[str, Tuple[List[str], Callable[[List[str]], Any]]], series_outputs: Dict[str, str], preprocessors: List[Tuple[str, str, Callable]] = None) → None¶ Bases:
neuralmonkey.dataset.dataset.Dataset
Implements the lazy dataset.
The main difference between this implementation and the default one is that the contents of the file are not fully loaded to the memory. Instead, everytime the function
get_series
is called, a new file handle is created and a generator which yields lines from the file is returned.-
__init__
(name: str, series_paths_and_readers: Dict[str, Tuple[List[str], Callable[[List[str]], Any]]], series_outputs: Dict[str, str], preprocessors: List[Tuple[str, str, Callable]] = None) → None¶ Create a new instance of the lazy dataset.
Lazy dataset first initializes the parent Dataset object with empty series
Parameters: - name – The name of the dataset
- series_paths_and_readers – Dictionary that maps each series ID to a list of files and a reader.
- series_outputs – A mapping of series IDs to output files.
- preprocessors – The preprocessors to apply to the data series. Each preprocessor is defined by source series ID, resulting preprocessed series ID, and a function that is applied on the source series.
-
add_lazy_series
(name: str) → None¶
-
add_series
(name: str, series: List) → None¶
-
get_series
(name: str) → Iterable¶ Get the data series with a given name.
This function opens a new file handle and returns a generator which yields preprocessed lines from the file.
Parameters: name – The name of the series to fetch. Returns: The data series. Raises: KeyError if the series does not exist.
-
has_series
(name: str) → bool¶ Check if the dataset contains a series of a given name.
Parameters: name – Series name Returns: True if the dataset contains the series, False otherwise.
-
maybe_get_series
(name: str) → Union[Iterable, NoneType]¶ Get the data series with a given name or None if it does not exist.
This function opens a new file handle and returns a generator which yields preprocessed lines from the file.
Parameters: name – The name of the series to fetch. Returns: The data series or None if it does not exist.
-
series_ids
¶
-
shuffle
() → None¶ Do nothing, not in-memory shuffle is impossible.
TODO: this is related to the
__len__
method.
-
subset
(start: int, length: int) → neuralmonkey.dataset.dataset.Dataset¶
-