Dataset
Dataset and DataModule for past and future satellite data
- class cloudcasting.dataset.SatelliteDataModule(zarr_path: list[str] | str, history_mins: int, forecast_mins: int, sample_freq_mins: int, batch_size: int = 16, num_workers: int = 0, variables: list[str] | str | None = None, prefetch_factor: int | None = None, train_period: list[str | None] | tuple[str | None] | None = None, val_period: list[str | None] | tuple[str | None] | None = None, test_period: list[str | None] | tuple[str | None] | None = None, nan_to_num: bool = False, pin_memory: bool = False, persistent_workers: bool = False)
- __init__(zarr_path: list[str] | str, history_mins: int, forecast_mins: int, sample_freq_mins: int, batch_size: int = 16, num_workers: int = 0, variables: list[str] | str | None = None, prefetch_factor: int | None = None, train_period: list[str | None] | tuple[str | None] | None = None, val_period: list[str | None] | tuple[str | None] | None = None, test_period: list[str | None] | tuple[str | None] | None = None, nan_to_num: bool = False, pin_memory: bool = False, persistent_workers: bool = False)
A lightning DataModule for loading past and future satellite data
- Parameters:
zarr_path (list[str] | str) – Path to the satellite data. Can be a string or list
history_mins (int) – How many minutes of history will be used as input features
forecast_mins (int) – How many minutes of future will be used as target features
sample_freq_mins (int) – The sample frequency to use for the satellite data
batch_size (int) – Batch size. Defaults to 16.
num_workers (int) – Number of workers to use in multiprocess batch loading. Defaults to 0.
variables (list[str] | str) – The variables to load from the satellite data (defaults to all)
prefetch_factor (int) – Number of data to be prefetched at the end of each worker process
train_period (list[str] | tuple[str] | None) – Date range filter for train dataloader
val_period (list[str] | tuple[str] | None) – Date range filter for validation dataloader
test_period (list[str] | tuple[str] | None) – Date range filter for test dataloader
nan_to_num (bool) – Whether to convert NaNs to -1. Defaults to False.
pin_memory (bool) – If True, the data loader will copy Tensors into device/CUDA pinned memory before returning them. Defaults to False.
persistent_workers (bool) – If True, the data loader will not shut down the worker processes after a dataset has been consumed once. This allows you to keep the workers Dataset instances alive. Defaults to False.
- test_dataloader() DataLoader[tuple[ndarray[Any, dtype[float32]], ndarray[Any, dtype[float32]]]]
Construct test dataloader
- class cloudcasting.dataset.SatelliteDataset(zarr_path: list[str] | str, start_time: str | None, end_time: str | None, history_mins: int, forecast_mins: int, sample_freq_mins: int, variables: list[str] | str | None = None, preshuffle: bool = False, nan_to_num: bool = False)
- __init__(zarr_path: list[str] | str, start_time: str | None, end_time: str | None, history_mins: int, forecast_mins: int, sample_freq_mins: int, variables: list[str] | str | None = None, preshuffle: bool = False, nan_to_num: bool = False)
A torch Dataset for loading past and future satellite data
- Parameters:
zarr_path (list[str] | str) – Path to the satellite data. Can be a string or list
start_time (str) – The satellite data is filtered to exclude timestamps before this
end_time (str) – The satellite data is filtered to exclude timestamps after this
history_mins (int) – How many minutes of history will be used as input features
forecast_mins (int) – How many minutes of future will be used as target features
sample_freq_mins (int) – The sample frequency to use for the satellite data
variables (list[str] | str) – The variables to load from the satellite data (defaults to all)
preshuffle (bool) – Whether to shuffle the data - useful for validation. Defaults to False.
nan_to_num (bool) – Whether to convert NaNs to -1. Defaults to False.
- class cloudcasting.dataset.ValidationSatelliteDataset(zarr_path: list[str] | str, history_mins: int, forecast_mins: int = 180, sample_freq_mins: int = 15, nan_to_num: bool = False)
- __init__(zarr_path: list[str] | str, history_mins: int, forecast_mins: int = 180, sample_freq_mins: int = 15, nan_to_num: bool = False)
A torch Dataset used only in the validation proceedure.
- Parameters:
zarr_path (list[str] | str) – Path to the satellite data for validation. Can be a string or list
history_mins (int) – How many minutes of history will be used as input features
forecast_mins (int) – How many minutes of future will be used as target features
sample_freq_mins (int) – The sample frequency to use for the satellite data
nan_to_num (bool) – Whether to convert NaNs to -1. Defaults to False.