Dataset

Dataset and DataModule for past and future satellite data

class cloudcasting.dataset.SatelliteDataModule(zarr_path: list[str] | str, history_mins: int, forecast_mins: int, sample_freq_mins: int, batch_size: int = 16, num_workers: int = 0, variables: list[str] | str | None = None, prefetch_factor: int | None = None, train_period: list[str | None] | tuple[str | None] | None = None, val_period: list[str | None] | tuple[str | None] | None = None, test_period: list[str | None] | tuple[str | None] | None = None, nan_to_num: bool = False, pin_memory: bool = False, persistent_workers: bool = False)
__init__(zarr_path: list[str] | str, history_mins: int, forecast_mins: int, sample_freq_mins: int, batch_size: int = 16, num_workers: int = 0, variables: list[str] | str | None = None, prefetch_factor: int | None = None, train_period: list[str | None] | tuple[str | None] | None = None, val_period: list[str | None] | tuple[str | None] | None = None, test_period: list[str | None] | tuple[str | None] | None = None, nan_to_num: bool = False, pin_memory: bool = False, persistent_workers: bool = False)

A lightning DataModule for loading past and future satellite data

Parameters:
  • zarr_path (list[str] | str) – Path to the satellite data. Can be a string or list

  • history_mins (int) – How many minutes of history will be used as input features

  • forecast_mins (int) – How many minutes of future will be used as target features

  • sample_freq_mins (int) – The sample frequency to use for the satellite data

  • batch_size (int) – Batch size. Defaults to 16.

  • num_workers (int) – Number of workers to use in multiprocess batch loading. Defaults to 0.

  • variables (list[str] | str) – The variables to load from the satellite data (defaults to all)

  • prefetch_factor (int) – Number of data to be prefetched at the end of each worker process

  • train_period (list[str] | tuple[str] | None) – Date range filter for train dataloader

  • val_period (list[str] | tuple[str] | None) – Date range filter for validation dataloader

  • test_period (list[str] | tuple[str] | None) – Date range filter for test dataloader

  • nan_to_num (bool) – Whether to convert NaNs to -1. Defaults to False.

  • pin_memory (bool) – If True, the data loader will copy Tensors into device/CUDA pinned memory before returning them. Defaults to False.

  • persistent_workers (bool) – If True, the data loader will not shut down the worker processes after a dataset has been consumed once. This allows you to keep the workers Dataset instances alive. Defaults to False.

test_dataloader() DataLoader[tuple[ndarray[Any, dtype[float32]], ndarray[Any, dtype[float32]]]]

Construct test dataloader

train_dataloader() DataLoader[tuple[ndarray[Any, dtype[float32]], ndarray[Any, dtype[float32]]]]

Construct train dataloader

val_dataloader() DataLoader[tuple[ndarray[Any, dtype[float32]], ndarray[Any, dtype[float32]]]]

Construct validation dataloader

class cloudcasting.dataset.SatelliteDataset(zarr_path: list[str] | str, start_time: str | None, end_time: str | None, history_mins: int, forecast_mins: int, sample_freq_mins: int, variables: list[str] | str | None = None, preshuffle: bool = False, nan_to_num: bool = False)
__init__(zarr_path: list[str] | str, start_time: str | None, end_time: str | None, history_mins: int, forecast_mins: int, sample_freq_mins: int, variables: list[str] | str | None = None, preshuffle: bool = False, nan_to_num: bool = False)

A torch Dataset for loading past and future satellite data

Parameters:
  • zarr_path (list[str] | str) – Path to the satellite data. Can be a string or list

  • start_time (str) – The satellite data is filtered to exclude timestamps before this

  • end_time (str) – The satellite data is filtered to exclude timestamps after this

  • history_mins (int) – How many minutes of history will be used as input features

  • forecast_mins (int) – How many minutes of future will be used as target features

  • sample_freq_mins (int) – The sample frequency to use for the satellite data

  • variables (list[str] | str) – The variables to load from the satellite data (defaults to all)

  • preshuffle (bool) – Whether to shuffle the data - useful for validation. Defaults to False.

  • nan_to_num (bool) – Whether to convert NaNs to -1. Defaults to False.

class cloudcasting.dataset.ValidationSatelliteDataset(zarr_path: list[str] | str, history_mins: int, forecast_mins: int = 180, sample_freq_mins: int = 15, nan_to_num: bool = False)
__init__(zarr_path: list[str] | str, history_mins: int, forecast_mins: int = 180, sample_freq_mins: int = 15, nan_to_num: bool = False)

A torch Dataset used only in the validation proceedure.

Parameters:
  • zarr_path (list[str] | str) – Path to the satellite data for validation. Can be a string or list

  • history_mins (int) – How many minutes of history will be used as input features

  • forecast_mins (int) – How many minutes of future will be used as target features

  • sample_freq_mins (int) – The sample frequency to use for the satellite data

  • nan_to_num (bool) – Whether to convert NaNs to -1. Defaults to False.