Dataset

Dataset and DataModule for past and future satellite data

A lightning DataModule for loading past and future satellite data

Parameters:

zarr_path (list[str] | str) – Path to the satellite data. Can be a string or list
history_mins (int) – How many minutes of history will be used as input features
forecast_mins (int) – How many minutes of future will be used as target features
sample_freq_mins (int) – The sample frequency to use for the satellite data
batch_size (int) – Batch size. Defaults to 16.
num_workers (int) – Number of workers to use in multiprocess batch loading. Defaults to 0.
variables (list[str] | str) – The variables to load from the satellite data (defaults to all)
prefetch_factor (int) – Number of data to be prefetched at the end of each worker process
train_period (list[str] | tuple[str] | None) – Date range filter for train dataloader
val_period (list[str] | tuple[str] | None) – Date range filter for validation dataloader
test_period (list[str] | tuple[str] | None) – Date range filter for test dataloader
nan_to_num (bool) – Whether to convert NaNs to -1. Defaults to False.
pin_memory (bool) – If True, the data loader will copy Tensors into device/CUDA pinned memory before returning them. Defaults to False.
persistent_workers (bool) – If True, the data loader will not shut down the worker processes after a dataset has been consumed once. This allows you to keep the workers Dataset instances alive. Defaults to False.

test_dataloader() → DataLoader[tuple[ndarray[Any, dtype[float32]], ndarray[Any, dtype[float32]]]]: Construct test dataloader

train_dataloader() → DataLoader[tuple[ndarray[Any, dtype[float32]], ndarray[Any, dtype[float32]]]]: Construct train dataloader

val_dataloader() → DataLoader[tuple[ndarray[Any, dtype[float32]], ndarray[Any, dtype[float32]]]]: Construct validation dataloader

class cloudcasting.dataset.SatelliteDataset(zarr_path: list[str] | str, start_time: str | None, end_time: str | None, history_mins: int, forecast_mins: int, sample_freq_mins: int, variables: list[str] | str | None = None, preshuffle: bool = False, nan_to_num: bool = False)

__init__(zarr_path: list[str] | str, start_time: str | None, end_time: str | None, history_mins: int, forecast_mins: int, sample_freq_mins: int, variables: list[str] | str | None = None, preshuffle: bool = False, nan_to_num: bool = False)

A torch Dataset for loading past and future satellite data

Parameters:

zarr_path (list[str] | str) – Path to the satellite data. Can be a string or list
start_time (str) – The satellite data is filtered to exclude timestamps before this
end_time (str) – The satellite data is filtered to exclude timestamps after this
history_mins (int) – How many minutes of history will be used as input features
forecast_mins (int) – How many minutes of future will be used as target features
sample_freq_mins (int) – The sample frequency to use for the satellite data
variables (list[str] | str) – The variables to load from the satellite data (defaults to all)
preshuffle (bool) – Whether to shuffle the data - useful for validation. Defaults to False.
nan_to_num (bool) – Whether to convert NaNs to -1. Defaults to False.

class cloudcasting.dataset.ValidationSatelliteDataset(zarr_path: list[str] | str, history_mins: int, forecast_mins: int = 180, sample_freq_mins: int = 15, nan_to_num: bool = False)

__init__(zarr_path: list[str] | str, history_mins: int, forecast_mins: int = 180, sample_freq_mins: int = 15, nan_to_num: bool = False)

A torch Dataset used only in the validation proceedure.

Parameters:

zarr_path (list[str] | str) – Path to the satellite data for validation. Can be a string or list
history_mins (int) – How many minutes of history will be used as input features
forecast_mins (int) – How many minutes of future will be used as target features
sample_freq_mins (int) – The sample frequency to use for the satellite data
nan_to_num (bool) – Whether to convert NaNs to -1. Defaults to False.