phenotypic.tools#
Developer tools shared across fungal colony plate workflows.
Lightweight helpers for timing, mask validation, constants, color conversions, error handling, and HDF storage used by the processing pipeline. Includes a timed execution decorator, mask validators, colourspace utilities, custom exceptions, and HDF helpers for persisting plate datasets and measurements.
- class phenotypic.tools.HDF(filepath, name: str, mode: Literal['single', 'set'])[source]#
Bases:
objectRepresents an interface to manage HDF5 files with support for single or set image modes, and ensures safe and compatible file access with retry and error-handling mechanisms.
The class facilitates operations on HDF5 files commonly used for storing phenotypic data in both single image and image set modes. This class includes utilities to handle locking errors and ensure compatibility by initializing proper HDF5 modes while providing safe access methods for writing.
- filepath#
Path to the HDF5 file on the filesystem.
- Type:
Path
- mode#
Specifies the mode for the HDF5 file, either single image or image set.
- Type:
Literal[‘single’, ‘set’]
- home_posix#
The specific root directory of the HDF5 resource in the file, derived based on its mode.
- Type:
- set_data_posix#
The subgroup path for the data entity in image set mode, if applicable.
- Type:
str, optional
- __init__(filepath, name: str, mode: Literal['single', 'set'])[source]#
Initializes a class instance to manage HDF5 file structures for single or set image data based on the given filepath, name of the resource, and operational mode.
- filepath#
Path to the HDF5 file.
- Type:
Path
- mode#
Operational mode determining the structure and organization within the HDF5 file. Must be either ‘single’ or ‘set’.
- Type:
Literal[‘single’, ‘set’]
- root_posix#
Posix path representing the root directory within the HDF5 file based on the mode.
- Type:
- home_posix#
Posix path representing the home directory for the resource within the HDF5 file based on the mode.
- Type:
- set_data_posix#
Posix path for the data subdirectory within the resource home directory. Only initialized in ‘set’ mode.
- Type:
Optional[str]
- Parameters:
filepath – Path to the target HDF5 file. Must have an HDF5-compatible extension, or a ValueError is raised.
name (str) – Name of the resource to be managed in the file. Used to construct the home directory for the resource within the HDF5 file.
mode (Literal['single', 'set']) – Operational mode. Specifies whether the resource represents a ‘single’ or ‘set’ image data. If the mode is invalid, a ValueError is raised.
- Raises:
ValueError – If the filepath does not have an HDF5-compatible extension.
ValueError – If the mode is neither ‘single’ nor ‘set’.
- static assert_swmr_on(g: Group) None[source]#
Assert that SWMR mode is enabled on the group’s file.
- Parameters:
g (Group) – HDF5 group to check.
- Raises:
RuntimeError – If SWMR mode is not enabled.
- Return type:
None
- static close_handle(handle: File | Group) None[source]#
- Parameters:
handle (File | Group)
- Return type:
None
- static get_group(handle: File, posix) Group[source]#
Retrieves or creates a group in an HDF5 file.
This method checks the validity of the provided HDF5 file handle and tries to retrieve the specified group based on the given posix path. If the group does not exist and the file is not opened in read-only mode, the group gets created. If the file is in read-only mode and the group does not exist, an error is raised.
- Parameters:
- Returns:
The corresponding h5py group within the HDF5 file.
- Return type:
- Raises:
ValueError – If the HDF5 file handle is invalid or no longer valid.
ValueError – If the file handle mode cannot be determined.
KeyError – If the specified group does not exist in read-only mode.
- get_home(handle)[source]#
Retrieves a specific group from an HDF file corresponding to single image data.
This method is used to fetch a predefined group from an HDF container, where the group is identified by a constant key related to single image data. The function provides a static interface allowing invocation without requiring an instance of the class.
- Parameters:
handle – The HDF file handle from which the group should be retrieved.
- Returns:
The group corresponding to single image data, retrieved based on the defined SINGLE_IMAGE_ROOT_POSIX.
- Raises:
Appropriate exceptions may be raised by the underlying HDF.get_group() method, –
based on the implementation and provided handle or key. –
- get_protected_metadata_subgroup(handle: File, image_name: str) Group[source]#
- Parameters:
handle (File)
image_name (str)
- Return type:
Group
- get_public_metadata_subgroup(handle: File, image_name: str) Group[source]#
- Parameters:
handle (File)
image_name (str)
- Return type:
Group
- static get_uncompressed_sizes_for_group(group: Group) tuple[dict[str, int], int][source]#
Recursively collect the uncompressed (logical) sizes of SWMR-compatible datasets.
This function walks the provided HDF5 group and inspects every dataset without reading any data. For each dataset that is compatible with SWMR writing rules (i.e., chunked layout and no variable-length data types), it computes the uncompressed size in bytes as: dtype.itemsize * number_of_elements.
Notes - This works regardless of whether datasets are stored compressed on disk; the
reported size is the logical size when uncompressed in memory.
Variable-length strings (and datasets containing variable-length fields) are excluded because they are not SWMR-write friendly and their uncompressed size cannot be determined from metadata alone.
The operation is safe under SWMR: it only reads object metadata, creates no new refine, and does not modify the file.
- Parameters:
group (Group) – The root h5py.Group to traverse.
- Returns:
sizes: dict mapping absolute dataset paths (e.g., ‘/grp/ds’) to uncompressed size in bytes.
total_bytes: sum of all values in sizes.
- Return type:
A tuple (sizes, total_bytes) where
- static load_frame(group: Group, *, require_swmr: bool = False) DataFrame[source]#
Load a pandas DataFrame from HDF5 storage.
- Parameters:
group (Group) – HDF5 group containing the DataFrame data.
require_swmr (bool) – If True, assert SWMR mode is enabled.
- Returns:
Reconstructed pandas DataFrame.
- Raises:
RuntimeError – If require_swmr=True and SWMR mode not enabled.
ValueError – If data validation fails.
- Return type:
- static load_series(group: Group, *, dataset: str = 'values', index_dataset: str = 'index', require_swmr: bool = False) Series[source]#
Load a pandas Series from HDF5 storage.
Reconstructs the Series with original name, index names, order, and missingness. Respects logical length from attributes.
- Parameters:
- Returns:
Reconstructed pandas Series.
- Raises:
RuntimeError – If require_swmr=True and SWMR mode not enabled.
ValueError – If data validation fails.
- Return type:
- static preallocate_frame_layout(group: Group, dataframe: DataFrame, *, chunks: tuple[int, ...] = (25,), compression: str = 'gzip', preallocate: int = 100, string_fixed_length: int = 100, require_swmr: bool = False) None[source]#
Preallocate HDF5 layout for a pandas DataFrame without writing data.
Creates layout for shared index and column series using Series preallocation.
- Parameters:
group (Group) – HDF5 group to write to.
dataframe (DataFrame) – pandas DataFrame to create layout for.
compression (str) – Compression algorithm.
preallocate (int) – Initial allocation size.
string_fixed_length (int) – Character length for fixed-length strings.
require_swmr (bool) – If True, assert SWMR mode is enabled.
- Raises:
RuntimeError – If require_swmr=True and SWMR mode not enabled. If group.file.swmr_mode is True and datasets don’t exist.
ValueError – If DataFrame validation fails.
- Return type:
None
- static preallocate_series_layout(group: Group, series: Series, *, dataset: str = 'values', index_dataset: str = 'index', chunks: tuple[int, ...] = (25,), compression: str = 'gzip', preallocate: int = 100, string_fixed_length: int = 100) None[source]#
Preallocate HDF5 layout for a pandas Series without writing data.
Creates resizable, chunked, compressed datasets with initial shape (preallocate,) and maxshape (None,). Initializes masks to zeros and sets len=0.
- Parameters:
group (Group) – HDF5 group to write to.
series (Series) – pandas Series to create layout for (used for schema).
dataset (str) – Name for the values dataset.
index_dataset (str) – Name for the index dataset.
compression (str) – Compression algorithm.
preallocate (int) – Initial allocation size.
string_fixed_length (int) – Character length for fixed-length strings.
- Raises:
RuntimeError – If require_swmr=True and SWMR mode not enabled. If group.file.swmr_mode is True and datasets don’t exist.
ValueError – If series validation fails.
- Return type:
None
- safe_writer() File[source]#
Returns a writer object that provides safe and controlled write access to an HDF5 file at the specified filepath or creates it if it doesn’t exist. Ensures that the file uses the ‘latest’ version of the HDF5 library for compatibility and performance.
Handles HDF5 file locking conflicts by attempting to clear consistency flags and retrying file opening with exponential backoff.
- static save_array2hdf5(group, array, name, **kwargs)[source]#
Saves a given numpy array to an HDF5 group. If a dataset with the specified name already exists in the group, it checks if the shapes match. If the shapes match, it updates the existing dataset; otherwise, it removes the existing dataset and creates a new one with the specified name. If a dataset with the given name doesn’t exist, it creates a new dataset.
- Parameters:
group – h5py.Group The HDF5 group in which the dataset will be saved.
array – numpy.ndarray The data array to be stored in the dataset.
name – str The name of the dataset within the group.
**kwargs – dict Additional keyword arguments to pass when creating a new dataset.
- static save_frame_append(group: Group, dataframe: DataFrame, *, require_swmr: bool = True) None[source]#
Append a pandas DataFrame to existing HDF5 datasets.
- Parameters:
- Raises:
RuntimeError – If require_swmr=True and SWMR mode not enabled.
ValueError – If validation fails or schema mismatch.
- Return type:
None
- static save_frame_new(group: Group, dataframe: DataFrame, *, chunks: tuple[int, ...] = (25,), compression: str = 'gzip', preallocate: int = 100, string_fixed_length: int = 100, require_swmr: bool = False) None[source]#
Create datasets and write a pandas DataFrame to HDF5.
- Parameters:
group (Group) – HDF5 group to write to.
dataframe (DataFrame) – pandas DataFrame to persist.
compression (str) – Compression algorithm for new datasets.
preallocate (int) – Initial allocation size for new datasets.
string_fixed_length (int) – Character length for fixed-length strings.
require_swmr (bool) – If True, assert SWMR mode is enabled.
- Raises:
RuntimeError – If require_swmr=True and SWMR mode not enabled.
ValueError – If DataFrame validation fails.
- Return type:
None
- static save_frame_update(group: Group, dataframe: DataFrame, *, start: int = 0, require_swmr: bool = True) None[source]#
Update a pandas DataFrame in HDF5 at specified position.
- Parameters:
- Raises:
RuntimeError – If require_swmr=True and SWMR mode not enabled.
ValueError – If validation fails or schema mismatch.
- Return type:
None
- static save_series_append(group: Group, series: Series, *, dataset: str = 'values', index_dataset: str = 'index', require_swmr: bool = True) None[source]#
Append a pandas Series to existing HDF5 datasets.
Appends at the end using current logical length. Resizes datasets if needed and updates logical length.
- Parameters:
- Raises:
RuntimeError – If require_swmr=True and SWMR mode not enabled.
ValueError – If validation fails or schema mismatch.
- Return type:
None
- static save_series_new(group: Group, series: Series, *, dataset: str = 'values', index_dataset: str = 'index', chunks: tuple[int, ...] = (25,), compression: str = 'gzip', preallocate: int = 100, string_fixed_length: int = 100, require_swmr: bool = False) None[source]#
Create datasets and write a pandas Series to HDF5.
Creates new datasets or reuses existing preallocated layout. Writes the first len(series) elements and sets logical length.
- Parameters:
group (Group) – HDF5 group to write to.
series (Series) – pandas Series to persist.
dataset (str) – Name for the values dataset.
index_dataset (str) – Name for the index dataset.
compression (str) – Compression algorithm for new datasets.
preallocate (int) – Initial allocation size for new datasets.
string_fixed_length (int) – Character length for fixed-length strings.
require_swmr (bool) – If True, assert SWMR mode is enabled.
- Raises:
RuntimeError – If require_swmr=True and SWMR mode not enabled.
ValueError – If series validation fails.
- Return type:
None
- static save_series_update(group: Group, series: Series, *, start: int = 0, dataset: str = 'values', index_dataset: str = 'index', require_swmr: bool = True) None[source]#
Update a pandas Series in HDF5 at specified position.
Overwrites [start:start+len(series)] and updates logical length to the largest contiguous written extent.
- Parameters:
- Raises:
RuntimeError – If require_swmr=True and SWMR mode not enabled.
ValueError – If validation fails or schema mismatch.
- Return type:
None
- strict_writer() File[source]#
Provides access to an HDF5 file in read/write mode using the h5py library. This property is used to obtain an h5py.File object configured with the latest library version.
Note
If using SWMR mode, don’t forget to enable SWMR mode:
Enable SWMR mode
hdf = HDF(filepath) with hdf.writer as writer: writer.swmr_mode = True # rest of your code
- swmr_writer() File[source]#
Returns a writer object that provides safe SWMR-compatible write access to an HDF5 file. Creates the file if it doesn’t exist and enables SWMR mode properly.
This method ensures proper SWMR mode initialization by creating the file with the correct settings from the start, avoiding cache conflicts that occur when trying to enable SWMR mode after opening.
- EXT = {'.h5', '.hdf', '.hdf5', '.he5'}#
- IMAGE_MEASUREMENT_SUBGROUP_KEY = 'measurements'#
- IMAGE_SET_DATA_POSIX = 'data'#
- IMAGE_SET_ROOT_POSIX = '/phenotypic/image_sets/'#
- IMAGE_STATUS_SUBGROUP_KEY = 'status'#
- PROTECTED_METADATA_SUBGROUP_KEY = 'protected_metadata'#
- PUBLIC_METADATA_SUBGROUP_KEY = 'public_metadata'#
- SINGLE_IMAGE_ROOT_POSIX = '/phenotypic/images/'#
- phenotypic.tools.is_binary_mask(arr: numpy.ndarray)[source]#
- Parameters:
arr (numpy.ndarray)
- phenotypic.tools.timed_execution(func)[source]#
Decorator to measure and print the execution time of a function.
Submodules#
- phenotypic.tools.colourspaces_
- phenotypic.tools.constants_
BBOXBBOX.__format__()BBOX.append_rst_to_doc()BBOX.capitalize()BBOX.casefold()BBOX.category()BBOX.center()BBOX.count()BBOX.encode()BBOX.endswith()BBOX.expandtabs()BBOX.find()BBOX.format()BBOX.format_map()BBOX.get_headers()BBOX.get_labels()BBOX.index()BBOX.isalnum()BBOX.isalpha()BBOX.isascii()BBOX.isdecimal()BBOX.isdigit()BBOX.isidentifier()BBOX.islower()BBOX.isnumeric()BBOX.isprintable()BBOX.isspace()BBOX.istitle()BBOX.isupper()BBOX.join()BBOX.ljust()BBOX.lower()BBOX.lstrip()BBOX.maketrans()BBOX.partition()BBOX.removeprefix()BBOX.removesuffix()BBOX.replace()BBOX.rfind()BBOX.rindex()BBOX.rjust()BBOX.rpartition()BBOX.rsplit()BBOX.rst_table()BBOX.rstrip()BBOX.split()BBOX.splitlines()BBOX.startswith()BBOX.strip()BBOX.swapcase()BBOX.title()BBOX.translate()BBOX.upper()BBOX.zfill()BBOX.CATEGORYBBOX.CENTER_CCBBOX.CENTER_RRBBOX.MAX_CCBBOX.MAX_RRBBOX.MIN_CCBBOX.MIN_RR
GRIDGRID.__format__()GRID.append_rst_to_doc()GRID.capitalize()GRID.casefold()GRID.category()GRID.center()GRID.count()GRID.encode()GRID.endswith()GRID.expandtabs()GRID.find()GRID.format()GRID.format_map()GRID.get_headers()GRID.get_labels()GRID.index()GRID.isalnum()GRID.isalpha()GRID.isascii()GRID.isdecimal()GRID.isdigit()GRID.isidentifier()GRID.islower()GRID.isnumeric()GRID.isprintable()GRID.isspace()GRID.istitle()GRID.isupper()GRID.join()GRID.ljust()GRID.lower()GRID.lstrip()GRID.maketrans()GRID.partition()GRID.removeprefix()GRID.removesuffix()GRID.replace()GRID.rfind()GRID.rindex()GRID.rjust()GRID.rpartition()GRID.rsplit()GRID.rst_table()GRID.rstrip()GRID.split()GRID.splitlines()GRID.startswith()GRID.strip()GRID.swapcase()GRID.title()GRID.translate()GRID.upper()GRID.zfill()GRID.CATEGORYGRID.COL_INTERVAL_ENDGRID.COL_INTERVAL_STARTGRID.COL_NUMGRID.ROW_INTERVAL_ENDGRID.ROW_INTERVAL_STARTGRID.ROW_NUMGRID.SECTION_NUM
GRID_LINREG_STATS_EXTRACTORIMAGE_MODEIMAGE_MODE.is_ambiguous()IMAGE_MODE.is_array()IMAGE_MODE.is_matrix()IMAGE_MODE.is_none()IMAGE_MODE.AMBIGUOUS_FORMATSIMAGE_MODE.BGRIMAGE_MODE.BGRAIMAGE_MODE.CHANNELS_DEFAULTIMAGE_MODE.DEFAULT_SCHEMAIMAGE_MODE.GRAYSCALEIMAGE_MODE.GRAYSCALE_SINGLE_CHANNELIMAGE_MODE.HSVIMAGE_MODE.LINEAR_RGBIMAGE_MODE.MATRIX_FORMATSIMAGE_MODE.NONEIMAGE_MODE.RGBIMAGE_MODE.RGBAIMAGE_MODE.RGBA_OR_BGRAIMAGE_MODE.RGB_OR_BGRIMAGE_MODE.SUPPORTED_FORMATS
IMAGE_TYPESIOMETADATAMETADATA.__format__()METADATA.append_rst_to_doc()METADATA.capitalize()METADATA.casefold()METADATA.category()METADATA.center()METADATA.count()METADATA.encode()METADATA.endswith()METADATA.expandtabs()METADATA.find()METADATA.format()METADATA.format_map()METADATA.get_headers()METADATA.get_labels()METADATA.index()METADATA.isalnum()METADATA.isalpha()METADATA.isascii()METADATA.isdecimal()METADATA.isdigit()METADATA.isidentifier()METADATA.islower()METADATA.isnumeric()METADATA.isprintable()METADATA.isspace()METADATA.istitle()METADATA.isupper()METADATA.join()METADATA.ljust()METADATA.lower()METADATA.lstrip()METADATA.maketrans()METADATA.partition()METADATA.removeprefix()METADATA.removesuffix()METADATA.replace()METADATA.rfind()METADATA.rindex()METADATA.rjust()METADATA.rpartition()METADATA.rsplit()METADATA.rst_table()METADATA.rstrip()METADATA.split()METADATA.splitlines()METADATA.startswith()METADATA.strip()METADATA.swapcase()METADATA.title()METADATA.translate()METADATA.upper()METADATA.zfill()METADATA.BIT_DEPTHMETADATA.CATEGORYMETADATA.IMAGE_NAMEMETADATA.IMAGE_TYPEMETADATA.IMFORMATMETADATA.PARENT_IMAGE_NAMEMETADATA.PARENT_UUIDMETADATA.SUFFIXMETADATA.UUID
MPLOBJECTPIPE_STATUSPIPE_STATUS.__format__()PIPE_STATUS.append_rst_to_doc()PIPE_STATUS.capitalize()PIPE_STATUS.casefold()PIPE_STATUS.category()PIPE_STATUS.center()PIPE_STATUS.count()PIPE_STATUS.encode()PIPE_STATUS.endswith()PIPE_STATUS.expandtabs()PIPE_STATUS.find()PIPE_STATUS.format()PIPE_STATUS.format_map()PIPE_STATUS.get_headers()PIPE_STATUS.get_labels()PIPE_STATUS.index()PIPE_STATUS.isalnum()PIPE_STATUS.isalpha()PIPE_STATUS.isascii()PIPE_STATUS.isdecimal()PIPE_STATUS.isdigit()PIPE_STATUS.isidentifier()PIPE_STATUS.islower()PIPE_STATUS.isnumeric()PIPE_STATUS.isprintable()PIPE_STATUS.isspace()PIPE_STATUS.istitle()PIPE_STATUS.isupper()PIPE_STATUS.join()PIPE_STATUS.ljust()PIPE_STATUS.lower()PIPE_STATUS.lstrip()PIPE_STATUS.maketrans()PIPE_STATUS.partition()PIPE_STATUS.removeprefix()PIPE_STATUS.removesuffix()PIPE_STATUS.replace()PIPE_STATUS.rfind()PIPE_STATUS.rindex()PIPE_STATUS.rjust()PIPE_STATUS.rpartition()PIPE_STATUS.rsplit()PIPE_STATUS.rst_table()PIPE_STATUS.rstrip()PIPE_STATUS.split()PIPE_STATUS.splitlines()PIPE_STATUS.startswith()PIPE_STATUS.strip()PIPE_STATUS.swapcase()PIPE_STATUS.title()PIPE_STATUS.translate()PIPE_STATUS.upper()PIPE_STATUS.zfill()PIPE_STATUS.CATEGORYPIPE_STATUS.MEASUREDPIPE_STATUS.PROCESSED
- phenotypic.tools.exceptions_
ArrayKeyValueShapeMismatchErrorDataIntegrityErrorEmptyImageErrorGridImageInputErrorIllegalAssignmentErrorIllegalElementAssignmentErrorImageOperationErrorImmutableComponentErrorInputShapeMismatchErrorInterfaceErrorInvalidHsvSchemaErrorInvalidMapValueErrorInvalidMaskScalarValueErrorInvalidMaskValueErrorInvalidShapeErrorMetadataKeySpacesErrorMetadataKeyValueErrorMetadataValueNonScalarErrorNoArrayErrorNoComponentErrorNoImageDataErrorNoObjectsErrorNoOutputErrorObjectNotFoundErrorOperationFailedErrorOperationIntegrityErrorOutputValueErrorPhenoTypicErrorUUIDReassignmentErrorUnknownErrorUnsupportedFileTypeErrorUnsupportedImageFormatUuidAssignmentError
- phenotypic.tools.funcs_
- phenotypic.tools.hdf_
HDFHDF.filepathHDF.nameHDF.modeHDF.root_posixHDF.home_posixHDF.set_data_posixHDF.SINGLE_IMAGE_ROOT_POSIXHDF.IMAGE_SET_ROOT_POSIXHDF.IMAGE_SET_DATA_POSIXHDF.EXTHDF.IMAGE_MEASUREMENT_SUBGROUP_KEYHDF.IMAGE_STATUS_SUBGROUP_KEYHDF.__init__()HDF.assert_swmr_on()HDF.close_handle()HDF.get_data_group()HDF.get_group()HDF.get_home()HDF.get_image_group()HDF.get_image_measurement_subgroup()HDF.get_protected_metadata_subgroup()HDF.get_public_metadata_subgroup()HDF.get_root_group()HDF.get_status_subgroup()HDF.get_uncompressed_sizes_for_group()HDF.load_frame()HDF.load_series()HDF.preallocate_frame_layout()HDF.preallocate_series_layout()HDF.reader()HDF.safe_writer()HDF.save_array2hdf5()HDF.save_frame_append()HDF.save_frame_new()HDF.save_frame_update()HDF.save_series_append()HDF.save_series_new()HDF.save_series_update()HDF.strict_writer()HDF.swmr_reader()HDF.swmr_writer()HDF.EXTHDF.IMAGE_MEASUREMENT_SUBGROUP_KEYHDF.IMAGE_SET_DATA_POSIXHDF.IMAGE_SET_ROOT_POSIXHDF.IMAGE_STATUS_SUBGROUP_KEYHDF.PROTECTED_METADATA_SUBGROUP_KEYHDF.PUBLIC_METADATA_SUBGROUP_KEYHDF.SINGLE_IMAGE_ROOT_POSIX