yeoda package
Submodules
yeoda.datacube module
- class yeoda.datacube.DataCube(raster_data)[source]
Bases:
object
Basic datacube class defining all selection and datacube operations.
- add_dimension(name, values, inplace=False) DataCube [source]
Adds a new dimension to the datacube.
- Parameters
name (str) – Name of the new dimension.
values (list) – Values along the new dimension (e.g., cloud cover, quality flag, …). They have to have the same length as all the rows in the file register.
inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.
- Returns
DataCube object with an additional dimension in the file register.
- Return type
- align_dimension(other, name, inplace=False) DataCube [source]
Aligns this datacube with another datacube along the specified dimension name.
- Parameters
- Returns
Datacube with common values along the given dimension with respect to another datacube.
- Return type
- apply_nan()[source]
Converts no data values of the internal data to np.nan. Note that this replacement implicitly converts the data format to float.
- clone() DataCube [source]
Clones, i.e. deep-copies a datacube.
- Returns
Cloned/copied datacube.
- Return type
- property data_geom: RasterGeometry
Raster/tile geometry of the internal data.
- property data_view: Dataset
View on internal raster data.
- property dimensions: list
Dimensions of the datacube. i.e. the columns of the file register without the ‘filepath’ entry.
- property filepaths: List[str]
Unique list of file paths stored in the file register. Note that this property does not preserve the order of the file paths in the file register.
- intersect(other, on_dimension=None, inplace=False) DataCube [source]
Intersects this datacube with another datacube. This is equal to an SQL INNER JOIN operation. In other words:
all uncommon columns and rows (if on_dimension is given) are removed
duplicates are removed
- Parameters
other (DataCube) – Datacube to intersect with.
on_dimension (str, optional) – Dimension name to intersect on, meaning that only equal entries along this dimension will be retained.
inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.
- Returns
Intersected datacube.
- Return type
- property mosaic: MosaicGeometry
Mosaic geometry representing the spatial properties of the datacube.
- rename_dimensions(dimensions_map, inplace=False) DataCube [source]
Renames the dimensions of the datacube.
- Parameters
dimensions_map (dict) – A dictionary representing the relation between old and new dimension names. The keys are the old dimension names, the values the new dimension names (e.g., {‘time_begin’: ‘time’}).
inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.
- Returns
DataCube object with renamed dimensions/columns of the file register.
- Return type
- select_bbox(bbox, sref=None, inplace=False) DataCube [source]
Selects a rectangular (if it is provided in native units) region from the datacube, according to the given bounding box.
- Parameters
bbox (list of 2 2-tuple) – Bounding box to select, i.e. [(x_min, y_min), (x_max, y_max)].
sref (geospade.crs.SpatialRef, optional) – CRS of the given bounding box coordinates. Defaults to the CRS of the mosaic.
inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.
- Returns
Datacube object with a file register and a mosaic only consisting of the intersected tiles.
- Return type
- select_by_dimension(expressions, name=None, inplace=False) DataCube [source]
Filters the data cube according to the given extents and returns a (new) data cube.
- Parameters
expressions (callable) –
A list of functions expecting one input argument, which will be replace with the respective column of the file register later on, and returning a boolean value for each entry in the file register (used for the decision if it will be selected or not). Two examples are given below:
datacube.select_by_dimension(lambda s: s == “X”, name=’dim’, inplace=True)
`datacube.select_by_dimension(lambda t: (t >= start_time) & (t <= end_time), name=’time’,
inplace=True)`
name (str, optional) – Name of the dimension. Defaults to the name of the stack dimension.
inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.
- Returns
Subset of the original datacube.
- Return type
Notes
The results of the expressions are concatenated via an OR operation.
- select_files_with_pattern(pattern, full_path=False, inplace=False) DataCube [source]
Filters all file paths according to the given pattern.
- Parameters
pattern (str) – A regular expression (e.g., “.*S1A.*GRD.*”).
full_path (boolean, optional) – Uses the full file paths for filtering if it is set to True. Otherwise, the file name is used (default value is False).
inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.
- Returns
DataCube object with a filtered file register according to the given pattern.
- Return type
- select_polygon(polygon, sref=None, apply_mask=True, inplace=False) DataCube [source]
Selects a region delineated by the given polygon from the datacube.
- Parameters
polygon (ogr.Geometry) – Polygon specifying the pixels to collect.
sref (geospade.crs.SpatialRef, optional) – CRS of the given bounding box coordinates. Defaults to the CRS of the mosaic.
apply_mask (bool, optional) – True if pixels outside the polygon should be set to a no data value (default). False if every pixel withing the bounding box of the polygon should be included.
inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.
- Returns
Datacube object with a file register and a mosaic only consisting of the intersected tiles.
- Return type
- select_px_window(row, col, height=1, width=1, inplace=False) DataCube [source]
Selects a rectangular region corresponding to the given pixel window from the datacube.
- Parameters
row (int) – Top-left row number of the pixel window anchor.
col (int) – Top-left column number of the pixel window anchor.
height (int, optional) – Number of rows/height of the pixel window. Defaults to 1.
width (int, optional) – Number of columns/width of the pixel window. Defaults to 1.
inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.
- Returns
Datacube with a data and a mosaic geometry only consisting of the intersected tile with the pixel window.
- Return type
Notes
The mosaic will be only sliced if it consists of one tile to prevent ambiguities in terms of the definition of the pixel window.
- select_tiles(tile_names, inplace=False) DataCube [source]
Selects the given tiles from the datacube.
- Parameters
tile_names (list of str) – Tile names/IDs.
inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.
- Returns
Datacube with a mosaic and a file register only consisting of the given tiles.
- Return type
- select_xy(x, y, sref=None, inplace=False) DataCube [source]
Selects a pixel from the datacube according to the given coordinate tuple.
- Parameters
x (number) – Coordinate in X direction.
y (number) – Coordinate in Y direction.
sref (geospade.crs.SpatialRef, optional) – CRS of the given coordinate tuple. Defaults to the CRS of the mosaic.
inplace (boolean, optional) – If true, the current class instance will be altered. If false (default), a new class instance will be returned.
- Returns
Datacube object with a file register and a mosaic only consisting of the intersected tile containing information on the location of the single-pixel time series.
- Return type
- sort_by_dimension(name, ascending=True, inplace=False) DataCube [source]
Sorts the datacube/file register according to the given dimension.
- Parameters
- Returns
Sorted datacube.
- Return type
- split_by_dimension(expressions, name=None) List[DataCube] [source]
Creates subsets/a new datacube from the original datacube for each expression.
- Parameters
expressions (callable) –
A list of functions expecting one input argument, which will be replace with the respective column of the file register later on, and returning a boolean value for each entry in the file register (used for the decision if it will be selected or not). Two examples are given below:
datacube.select_by_dimension(lambda s: s == “X”, name=’dim’, inplace=True)
`datacube.select_by_dimension(lambda t: (t >= start_time) & (t <= end_time), name=’time’,
inplace=True)`
name (str, optional) – Name of the dimension. Defaults to the name of the stack dimension.
- Returns
datacubes – A list of datacubes corresponding to each expression.
- Return type
- split_by_temporal_freq(time_freq, name=None) List[DataCube] [source]
Temporally splits the original datacube according to a given frequency string.
- Parameters
time_freq (str) – Pandas DateOffset frequency string (see https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects)
name (str, optional) – Name of the dimension. Defaults to the name of the stack dimension.
- Returns
datacubes – A list of datacubes corresponding to the given temporal frequency intervals.
- Return type
Notes
Empty datacubes are discarded.
- class yeoda.datacube.DataCubeReader(file_register, mosaic, stack_dimension='layer_id', tile_dimension='tile_id', file_class=None, file_class_kwargs=None, **kwargs)[source]
Bases:
DataCube
Datacube reader class inheriting from DataCube.
- classmethod from_filepaths(filepaths, fn_class=<class 'geopathfinder.file_naming.SmartFilename'>, fields_def=None, fn_kwargs=None, mosaic=None, tile_class=<class 'geospade.raster.Tile'>, sref=None, file_class=None, file_class_kwargs=None, dimensions=None, tile_dimension='tile', stack_dimension='time', use_metadata=False, md_decoder=None, n_cores=1, **kwargs) DataCubeReader [source]
Creates a DataCubeReader instance from a list of file paths.
- Parameters
filepaths (list of str) – List of file paths to ingest into the datacube.
fn_class (SmartFilename, optional) – Filename class used to interpret the file name. Defaults to SmartFilename.
fields_def (dict, optional) – Dictionary defining the elements of a specific file name. For further details take a look at geopathfinder’s SmartFilename class. This argument can be used if fn_class is None.
fn_kwargs (dict, optional) – Keyword arguments for fn_class.
mosaic (geospade.raster.MosaicGeometry) – Mosaic representing the spatial allocation of the given files. The tile_dimension part of the file name must match the tile IDs/names of the mosaic. By default a mosaic is automatically retrieved from the spatial extent of the files.
tile_class (geospade.raster.Tile, optional) – Tile class used for creating a default mosaic, if mosaic is not provided. Defaults to Tile.
sref (geospade.crs.SpatialRef, optional) – CRS of the given files. Defaults to the CRS of the mosaic.
file_class (class, optional) – Class used to open a reference file for retrieving basic information. Defaults to none, meaning that the datacube uses the default classes assigned to each file extension/data format.
file_class_kwargs (dict, optional) – Keyword arguments for file_class.
dimensions (list, optional) – Desired dimensions of the datacube in compliance with the chosen file naming convention.
tile_dimension (str, optional) – Dimension/column name of the dimension containing tile ID’s in correspondence with the tiles in mosaic. Defaults to ‘tile_id’.
stack_dimension (str, optional) – Dimension/column name of the dimension, where to stack the files along (first axis), e.g. time, bands etc. Defaults to ‘layer_id’, i.e. the layer ID’s are used as the main coordinates to stack the files.
use_metadata (bool, optional) – True if dimensions should be retrieved from the metadata of the files (defaults to False).
md_decoder (dict, optional) – Dictionary mapping dimension names/attribute names with decoding functions.
n_cores (int, optional) – Number of cores used to interpret files in parallel (defaults to 1).
kwargs (dict) – Keywords passed to the DataCubeReader constructor.
- Returns
Datacube reader instance.
- Return type
- class yeoda.datacube.DataCubeWriter(mosaic, file_register=None, data=None, ext='.nc', stack_dimension='layer_id', tile_dimension='tile_id', **kwargs)[source]
Bases:
DataCube
Datacube writer class inheriting from DataCube.
- export(use_mosaic=False, data_variables=None, encoder=None, encoder_kwargs=None, overwrite=False, **kwargs)[source]
Writes all internally stored data to disk.
- Parameters
use_mosaic (bool, optional) – True if data should be written according to the mosaic. False if data composes a new tile and should not be tiled (default).
data_variables (list of str, optional) – Data variables to write. Defaults to None, i.e. all data variables are written.
encoder (callable, optional) – Function allowing to encode data before writing it to disk.
encoder_kwargs (dict, optional) – Keyword arguments for the encoder.
overwrite (bool, optional) – True if data should be overwritten, False if not (default).
- classmethod from_data(data, dirpath, fn_class=<class 'geopathfinder.file_naming.SmartFilename'>, fn_map=None, def_fields=None, stack_groups=None, fn_groups_map=None, ext='.nc', mosaic=None, stack_dimension='layer_id', tile_dimension='tile_id', **kwargs) DataCubeWriter [source]
Creates a DataCubeWriter instance from an xarray dataset.
- Parameters
data (xr.Dataset, optional) – Raster data stored in memory to derive the mosaic and file register from.
dirpath (str) – Full directory path where the files are located/should be written to.
fn_class (SmartFilename, optional) – Filename class used to create a file name from the coordinates in data. Defaults to SmartFilename.
fn_map (dict, optional) – Dictionary mapping dimension/coordinate names of data with dimension names of the file naming convention.
def_fields (dict, optional) – Dictionary containing default attributes/values used when creating all file names.
stack_groups (dict, optional) – Defines the relation between the stack coordinates and a group ID, i.e. in what portions along the stack dimension the data should be written. The keys are the coordinates and the value a group ID.
fn_groups_map (dict, optional) – If stack_groups is set, then you can assign new filename attributes to each group ID by using this argument. It’s format should be a dictionary mapping group IDs (keys) with filename fields (values).
ext (str, optional) – File extension/format. Defaults to “.nc”.
mosaic (geospade.raster.MosaicGeometry) – Mosaic representing the spatial allocation of the given files. The tiles of the mosaic have to match the ID’s/names of the tile_dimension column.
stack_dimension (str, optional) – Dimension/column name of the dimension, where to stack the files along (first axis), e.g. time, bands etc. Defaults to ‘layer_id’, i.e. the layer ID’s are used as the main coordinates to stack the files.
tile_dimension (str, optional) – Dimension/column name of the dimension containing tile ID’s in correspondence with the tiles in mosaic. Defaults to ‘tile_id’.
kwargs (dict) – Keywords passed to the DataCubeWriter class.
- Returns
Datacube writer instance.
- Return type
- write(data, use_mosaic=False, data_variables=None, encoder=None, encoder_kwargs=None, overwrite=False, **kwargs)[source]
Writes a certain chunk of data to disk.
- Parameters
data (xr.Dataset) – Data chunk to be written to disk or being appended to existing data.
use_mosaic (bool, optional) – True if data should be written according to the mosaic. False if data composes a new tile and should not be tiled (default).
data_variables (list of str, optional) – Data variables to write. Defaults to None, i.e. all data variables are written.
encoder (callable, optional) – Function allowing to encode data before writing it to disk.
encoder_kwargs (dict, optional) – Keyword arguments for the encoder.
overwrite (bool, optional) – True if data should be overwritten, False if not (default).
kwargs (dict) – Keywords passed to the RasterDataWriter().write() method.
- yeoda.datacube.parse_filepaths(slice_proc)[source]
Parses a portion of file paths, i.e. retrieves decoded attributes from the file name itself or the metadata, and writes the output as a data frame to disk (for joining it with the output all workers afterwards).
- Parameters
slice_proc (slice) – Index range corresponding to the filepaths to parse.
yeoda.errors module
yeoda.utils module
- yeoda.utils.create_fn_class(fields_def, pad='-', delimiter='_')[source]
Creates a DefaultFilename class inheriting from SmartFilename.
- Parameters
fields_def (OrderedDict) –
- Name of fields (keys) in right order and length (values). It must contain:
- ”len”: int
Length of filename part (must be given). “0” to allow any length.
- ”start”: int, optional
Start index of filename part (default is 0).
- ”delim”: str, optional
Delimiter between this and the following filename part (default is the one from the parent class).
- ”pad”: str,
Padding for filename part (default is the one from the parent class).
ext (str, optional) – File name extension (default: None).
pad (str, optional) – Padding symbol (default: ‘-‘).
- Returns
DefaultFilename – Default filename class.
- Return type
class