yeoda package

Submodules

yeoda.datacube module

class yeoda.datacube.DataCube(raster_data)[source]

Bases: object

Basic datacube class defining all selection and datacube operations.

add_dimension(name, values, inplace=False) DataCube[source]

Adds a new dimension to the datacube.

Parameters
  • name (str) – Name of the new dimension.

  • values (list) – Values along the new dimension (e.g., cloud cover, quality flag, …). They have to have the same length as all the rows in the file register.

  • inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.

Returns

DataCube object with an additional dimension in the file register.

Return type

DataCube

align_dimension(other, name, inplace=False) DataCube[source]

Aligns this datacube with another datacube along the specified dimension name.

Parameters
  • other (DataCube) – Datacube to align with.

  • name (str) – Name of the dimension, which is used for aligning/filtering the values for all datacubes.

  • inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.

Returns

Datacube with common values along the given dimension with respect to another datacube.

Return type

DataCube

apply_nan()[source]

Converts no data values of the internal data to np.nan. Note that this replacement implicitly converts the data format to float.

clear_ram()[source]

Releases memory allocated by the internal data object.

clone() DataCube[source]

Clones, i.e. deep-copies a datacube.

Returns

Cloned/copied datacube.

Return type

DataCube

close()[source]

Closes open file handles.

property data_geom: RasterGeometry

Raster/tile geometry of the internal data.

property data_view: Dataset

View on internal raster data.

property dimensions: list

Dimensions of the datacube. i.e. the columns of the file register without the ‘filepath’ entry.

property file_register: DataFrame

File register of the datacube.

property filepaths: List[str]

Unique list of file paths stored in the file register. Note that this property does not preserve the order of the file paths in the file register.

intersect(other, on_dimension=None, inplace=False) DataCube[source]

Intersects this datacube with another datacube. This is equal to an SQL INNER JOIN operation. In other words:

  • all uncommon columns and rows (if on_dimension is given) are removed

  • duplicates are removed

Parameters
  • other (DataCube) – Datacube to intersect with.

  • on_dimension (str, optional) – Dimension name to intersect on, meaning that only equal entries along this dimension will be retained.

  • inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.

Returns

Intersected datacube.

Return type

DataCube

property is_empty: bool

Checks if the datacube is empty, i.e. does not contain any files.

property mosaic: MosaicGeometry

Mosaic geometry representing the spatial properties of the datacube.

property n_tiles: int

Number of tiles.

rename_dimensions(dimensions_map, inplace=False) DataCube[source]

Renames the dimensions of the datacube.

Parameters
  • dimensions_map (dict) – A dictionary representing the relation between old and new dimension names. The keys are the old dimension names, the values the new dimension names (e.g., {‘time_begin’: ‘time’}).

  • inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.

Returns

DataCube object with renamed dimensions/columns of the file register.

Return type

DataCube

select_bbox(bbox, sref=None, inplace=False) DataCube[source]

Selects a rectangular (if it is provided in native units) region from the datacube, according to the given bounding box.

Parameters
  • bbox (list of 2 2-tuple) – Bounding box to select, i.e. [(x_min, y_min), (x_max, y_max)].

  • sref (geospade.crs.SpatialRef, optional) – CRS of the given bounding box coordinates. Defaults to the CRS of the mosaic.

  • inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.

Returns

Datacube object with a file register and a mosaic only consisting of the intersected tiles.

Return type

DataCube

select_by_dimension(expressions, name=None, inplace=False) DataCube[source]

Filters the data cube according to the given extents and returns a (new) data cube.

Parameters
  • expressions (callable) –

    A list of functions expecting one input argument, which will be replace with the respective column of the file register later on, and returning a boolean value for each entry in the file register (used for the decision if it will be selected or not). Two examples are given below:

    • datacube.select_by_dimension(lambda s: s == “X”, name=’dim’, inplace=True)

    • `datacube.select_by_dimension(lambda t: (t >= start_time) & (t <= end_time), name=’time’,

    inplace=True)`

  • name (str, optional) – Name of the dimension. Defaults to the name of the stack dimension.

  • inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.

Returns

Subset of the original datacube.

Return type

DataCube

Notes

The results of the expressions are concatenated via an OR operation.

select_files_with_pattern(pattern, full_path=False, inplace=False) DataCube[source]

Filters all file paths according to the given pattern.

Parameters
  • pattern (str) – A regular expression (e.g., “.*S1A.*GRD.*”).

  • full_path (boolean, optional) – Uses the full file paths for filtering if it is set to True. Otherwise, the file name is used (default value is False).

  • inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.

Returns

DataCube object with a filtered file register according to the given pattern.

Return type

DataCube

select_polygon(polygon, sref=None, apply_mask=True, inplace=False) DataCube[source]

Selects a region delineated by the given polygon from the datacube.

Parameters
  • polygon (ogr.Geometry) – Polygon specifying the pixels to collect.

  • sref (geospade.crs.SpatialRef, optional) – CRS of the given bounding box coordinates. Defaults to the CRS of the mosaic.

  • apply_mask (bool, optional) – True if pixels outside the polygon should be set to a no data value (default). False if every pixel withing the bounding box of the polygon should be included.

  • inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.

Returns

Datacube object with a file register and a mosaic only consisting of the intersected tiles.

Return type

DataCube

select_px_window(row, col, height=1, width=1, inplace=False) DataCube[source]

Selects a rectangular region corresponding to the given pixel window from the datacube.

Parameters
  • row (int) – Top-left row number of the pixel window anchor.

  • col (int) – Top-left column number of the pixel window anchor.

  • height (int, optional) – Number of rows/height of the pixel window. Defaults to 1.

  • width (int, optional) – Number of columns/width of the pixel window. Defaults to 1.

  • inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.

Returns

Datacube with a data and a mosaic geometry only consisting of the intersected tile with the pixel window.

Return type

DataCube

Notes

The mosaic will be only sliced if it consists of one tile to prevent ambiguities in terms of the definition of the pixel window.

select_tiles(tile_names, inplace=False) DataCube[source]

Selects the given tiles from the datacube.

Parameters
  • tile_names (list of str) – Tile names/IDs.

  • inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.

Returns

Datacube with a mosaic and a file register only consisting of the given tiles.

Return type

DataCube

select_xy(x, y, sref=None, inplace=False) DataCube[source]

Selects a pixel from the datacube according to the given coordinate tuple.

Parameters
  • x (number) – Coordinate in X direction.

  • y (number) – Coordinate in Y direction.

  • sref (geospade.crs.SpatialRef, optional) – CRS of the given coordinate tuple. Defaults to the CRS of the mosaic.

  • inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.

Returns

Datacube object with a file register and a mosaic only consisting of the intersected tile containing information on the location of the single-pixel time series.

Return type

DataCube

sort_by_dimension(name, ascending=True, inplace=False) DataCube[source]

Sorts the datacube/file register according to the given dimension.

Parameters
  • name (str) – Name of the dimension.

  • ascending (bool, optional) – If true (default), sorts in ascending order, otherwise in descending order.

  • inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.

Returns

Sorted datacube.

Return type

DataCube

split_by_dimension(expressions, name=None) List[DataCube][source]

Creates subsets/a new datacube from the original datacube for each expression.

Parameters
  • expressions (callable) –

    A list of functions expecting one input argument, which will be replace with the respective column of the file register later on, and returning a boolean value for each entry in the file register (used for the decision if it will be selected or not). Two examples are given below:

    • datacube.select_by_dimension(lambda s: s == “X”, name=’dim’, inplace=True)

    • `datacube.select_by_dimension(lambda t: (t >= start_time) & (t <= end_time), name=’time’,

    inplace=True)`

  • name (str, optional) – Name of the dimension. Defaults to the name of the stack dimension.

Returns

datacubes – A list of datacubes corresponding to each expression.

Return type

list

split_by_temporal_freq(time_freq, name=None) List[DataCube][source]

Temporally splits the original datacube according to a given frequency string.

Parameters
Returns

datacubes – A list of datacubes corresponding to the given temporal frequency intervals.

Return type

list

Notes

Empty datacubes are discarded.

unite(other, inplace=False) DataCube[source]

Unites this datacube with respect to another datacube. This is equal to an SQL UNION operation. In other words:

  • all columns are put into one DataFrame

  • duplicates are removed

  • gaps are filled with NaN

Parameters
  • other (DataCube) – Datacube to unite with.

  • inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.

Returns

United datacube.

Return type

DataCube

class yeoda.datacube.DataCubeReader(file_register, mosaic, stack_dimension='layer_id', tile_dimension='tile_id', file_class=None, file_class_kwargs=None, **kwargs)[source]

Bases: DataCube

Datacube reader class inheriting from DataCube.

classmethod from_filepaths(filepaths, fn_class=<class 'geopathfinder.file_naming.SmartFilename'>, fields_def=None, fn_kwargs=None, mosaic=None, tile_class=<class 'geospade.raster.Tile'>, sref=None, file_class=None, file_class_kwargs=None, dimensions=None, tile_dimension='tile', stack_dimension='time', use_metadata=False, md_decoder=None, n_cores=1, **kwargs) DataCubeReader[source]

Creates a DataCubeReader instance from a list of file paths.

Parameters
  • filepaths (list of str) – List of file paths to ingest into the datacube.

  • fn_class (SmartFilename, optional) – Filename class used to interpret the file name. Defaults to SmartFilename.

  • fields_def (dict, optional) – Dictionary defining the elements of a specific file name. For further details take a look at geopathfinder’s SmartFilename class. This argument can be used if fn_class is None.

  • fn_kwargs (dict, optional) – Keyword arguments for fn_class.

  • mosaic (geospade.raster.MosaicGeometry) – Mosaic representing the spatial allocation of the given files. The tile_dimension part of the file name must match the tile IDs/names of the mosaic. By default a mosaic is automatically retrieved from the spatial extent of the files.

  • tile_class (geospade.raster.Tile, optional) – Tile class used for creating a default mosaic, if mosaic is not provided. Defaults to Tile.

  • sref (geospade.crs.SpatialRef, optional) – CRS of the given files. Defaults to the CRS of the mosaic.

  • file_class (class, optional) – Class used to open a reference file for retrieving basic information. Defaults to none, meaning that the datacube uses the default classes assigned to each file extension/data format.

  • file_class_kwargs (dict, optional) – Keyword arguments for file_class.

  • dimensions (list, optional) – Desired dimensions of the datacube in compliance with the chosen file naming convention.

  • tile_dimension (str, optional) – Dimension/column name of the dimension containing tile ID’s in correspondence with the tiles in mosaic. Defaults to ‘tile_id’.

  • stack_dimension (str, optional) – Dimension/column name of the dimension, where to stack the files along (first axis), e.g. time, bands etc. Defaults to ‘layer_id’, i.e. the layer ID’s are used as the main coordinates to stack the files.

  • use_metadata (bool, optional) – True if dimensions should be retrieved from the metadata of the files (defaults to False).

  • md_decoder (dict, optional) – Dictionary mapping dimension names/attribute names with decoding functions.

  • n_cores (int, optional) – Number of cores used to interpret files in parallel (defaults to 1).

  • kwargs (dict) – Keywords passed to the DataCubeReader constructor.

Returns

Datacube reader instance.

Return type

DataCubeReader

read(*args, **kwargs)[source]

Reads data from disk.

Parameters
  • args (tuple) – Positional arguments for the RasterDataReader().read() function.

  • kwargs (dict) – Keyword arguments for the RasterDataReader().read() function.

Notes

Details about the available arguments can be retrieved from the respective read() functions in veranda.

class yeoda.datacube.DataCubeWriter(mosaic, file_register=None, data=None, ext='.nc', stack_dimension='layer_id', tile_dimension='tile_id', **kwargs)[source]

Bases: DataCube

Datacube writer class inheriting from DataCube.

export(use_mosaic=False, data_variables=None, encoder=None, encoder_kwargs=None, overwrite=False, **kwargs)[source]

Writes all internally stored data to disk.

Parameters
  • use_mosaic (bool, optional) – True if data should be written according to the mosaic. False if data composes a new tile and should not be tiled (default).

  • data_variables (list of str, optional) – Data variables to write. Defaults to None, i.e. all data variables are written.

  • encoder (callable, optional) – Function allowing to encode data before writing it to disk.

  • encoder_kwargs (dict, optional) – Keyword arguments for the encoder.

  • overwrite (bool, optional) – True if data should be overwritten, False if not (default).

classmethod from_data(data, dirpath, fn_class=<class 'geopathfinder.file_naming.SmartFilename'>, fn_map=None, def_fields=None, stack_groups=None, fn_groups_map=None, ext='.nc', mosaic=None, stack_dimension='layer_id', tile_dimension='tile_id', **kwargs) DataCubeWriter[source]

Creates a DataCubeWriter instance from an xarray dataset.

Parameters
  • data (xr.Dataset, optional) – Raster data stored in memory to derive the mosaic and file register from.

  • dirpath (str) – Full directory path where the files are located/should be written to.

  • fn_class (SmartFilename, optional) – Filename class used to create a file name from the coordinates in data. Defaults to SmartFilename.

  • fn_map (dict, optional) – Dictionary mapping dimension/coordinate names of data with dimension names of the file naming convention.

  • def_fields (dict, optional) – Dictionary containing default attributes/values used when creating all file names.

  • stack_groups (dict, optional) – Defines the relation between the stack coordinates and a group ID, i.e. in what portions along the stack dimension the data should be written. The keys are the coordinates and the value a group ID.

  • fn_groups_map (dict, optional) – If stack_groups is set, then you can assign new filename attributes to each group ID by using this argument. It’s format should be a dictionary mapping group IDs (keys) with filename fields (values).

  • ext (str, optional) – File extension/format. Defaults to “.nc”.

  • mosaic (geospade.raster.MosaicGeometry) – Mosaic representing the spatial allocation of the given files. The tiles of the mosaic have to match the ID’s/names of the tile_dimension column.

  • stack_dimension (str, optional) – Dimension/column name of the dimension, where to stack the files along (first axis), e.g. time, bands etc. Defaults to ‘layer_id’, i.e. the layer ID’s are used as the main coordinates to stack the files.

  • tile_dimension (str, optional) – Dimension/column name of the dimension containing tile ID’s in correspondence with the tiles in mosaic. Defaults to ‘tile_id’.

  • kwargs (dict) – Keywords passed to the DataCubeWriter class.

Returns

Datacube writer instance.

Return type

DataCubeWriter

write(data, use_mosaic=False, data_variables=None, encoder=None, encoder_kwargs=None, overwrite=False, **kwargs)[source]

Writes a certain chunk of data to disk.

Parameters
  • data (xr.Dataset) – Data chunk to be written to disk or being appended to existing data.

  • use_mosaic (bool, optional) – True if data should be written according to the mosaic. False if data composes a new tile and should not be tiled (default).

  • data_variables (list of str, optional) – Data variables to write. Defaults to None, i.e. all data variables are written.

  • encoder (callable, optional) – Function allowing to encode data before writing it to disk.

  • encoder_kwargs (dict, optional) – Keyword arguments for the encoder.

  • overwrite (bool, optional) – True if data should be overwritten, False if not (default).

  • kwargs (dict) – Keywords passed to the RasterDataWriter().write() method.

yeoda.datacube.parse_filepaths(slice_proc)[source]

Parses a portion of file paths, i.e. retrieves decoded attributes from the file name itself or the metadata, and writes the output as a data frame to disk (for joining it with the output all workers afterwards).

Parameters

slice_proc (slice) – Index range corresponding to the filepaths to parse.

yeoda.datacube.parse_init(filepaths, fn_class, fields_def, fn_kwargs, file_class, fc_kwargs, fn_dims, md_dims, md_decoder, tmp_dirpath)[source]

Helper method for setting the entries of global variable PROC_OBJS to be available during multiprocessing.

yeoda.errors module

exception yeoda.errors.DimensionUnkown(dimension_name)[source]

Bases: KeyError

Class to handle exceptions thrown by unknown dimensions/columns in the file register of the datacube.

exception yeoda.errors.FileTypeUnknown(ext)[source]

Bases: Exception

Class to handle exceptions thrown by unknown file types/extensions.

yeoda.utils module

yeoda.utils.create_fn_class(fields_def, pad='-', delimiter='_')[source]

Creates a DefaultFilename class inheriting from SmartFilename.

Parameters
  • fields_def (OrderedDict) –

    Name of fields (keys) in right order and length (values). It must contain:
    • ”len”: int

      Length of filename part (must be given). “0” to allow any length.

    • ”start”: int, optional

      Start index of filename part (default is 0).

    • ”delim”: str, optional

      Delimiter between this and the following filename part (default is the one from the parent class).

    • ”pad”: str,

      Padding for filename part (default is the one from the parent class).

  • ext (str, optional) – File name extension (default: None).

  • pad (str, optional) – Padding symbol (default: ‘-‘).

Returns

DefaultFilename – Default filename class.

Return type

class

yeoda.utils.to_list(value)[source]

Takes a value and wraps it into a list if it is not already one. The result is returned. If None is passed, None is returned.

Parameters

value (object) – value to convert

Returns

A list that wraps the value.

Return type

list or None

Module contents