phenotypic.analysis.EdgeCorrector#
- class phenotypic.analysis.EdgeCorrector(on: str, groupby: list[str], time_label: str = 'Metadata_Time', nrows: int = 8, ncols: int = 12, top_n: int = 3, pvalue: float = 0.05, connectivity: int = 4, agg_func: str = 'mean', num_workers: int = 1)[source]#
Bases:
SetAnalyzerAnalyzer for detecting and correcting edge effects in colony detection.
This class identifies colonies at grid edges (missing orthogonal neighbors) and caps their measurement values to prevent edge effects in growth assays. Edge colonies often show artificially inflated measurements due to lack of competition for resources.
Category: EDGE_CORRECTION# Name
Description
CorrectedCapThe carrying capacity for the target measurement
Methods
Initializes the class with specified parameters to configure the state of the object.
Analyze and apply edge correction to grid-based colony measurements.
Return the corrected measurement DataFrame.
Visualize edge correction results.
- Parameters:
- __init__(on: str, groupby: list[str], time_label: str = 'Metadata_Time', nrows: int = 8, ncols: int = 12, top_n: int = 3, pvalue: float = 0.05, connectivity: int = 4, agg_func: str = 'mean', num_workers: int = 1)[source]#
Initializes the class with specified parameters to configure the state of the object. The class is aimed at processing and analyzing connectivity data with multiple grouping and aggregation options, while ensuring input validation.
- Parameters:
on (str) – The dataset column to analyze or process.
groupby (list[str]) – List of column names for grouping the data.
time_label (str) – Specific time reference column, defaulting to “Metadata_Time”.
nrows (int) – Number of rows in the dataset, must be positive.
ncols (int) – Number of columns in the dataset, must be positive.
top_n (int) – Number of top results to analyze. Must be a positive integer.
pvalue (float) – Statistical threshold for significance testing between the surrounded and edge colonies. defaults to 0.05. Set to 0.0 to apply to all plates.
connectivity (int) – The connectivity mode to use. Must be either 4 or 8.
agg_func (str) – Aggregation function to apply, defaulting to ‘mean’.
num_workers (int) – Number of workers for parallel processing.
- Raises:
ValueError – If connectivity is not 4 or 8.
ValueError – If nrows or ncols are not positive integers.
ValueError – If top_n is not a positive integer.
- analyze(data: pandas.DataFrame) pandas.DataFrame[source]#
Analyze and apply edge correction to grid-based colony measurements.
This method processes the input DataFrame by grouping according to specified columns and applying edge correction to each group independently. Edge colonies (those missing orthogonal neighbors) have their measurements capped to prevent artificially inflated values.
- Parameters:
data (pandas.DataFrame) – DataFrame containing grid section numbers (GRID.SECTION_NUM) and measurement data. Must include all columns specified in self.groupby and self.on.
- Returns:
DataFrame with corrected measurement values. Original structure is preserved with only the measurement column modified for edge-affected rows.
- Raises:
KeyError – If required columns are missing from input DataFrame.
ValueError – If data is empty or malformed.
- Return type:
Examples
Applying edge correction to a 96-well plate dataset
>>> import pandas as pd >>> import numpy as np >>> from phenotypic.analysis import EdgeCorrector >>> from phenotypic.tools.constants_ import GRID >>> >>> # Create sample grid data with measurements >>> np.random.seed(42) >>> data = pd.DataFrame({ ... 'ImageName': ['img1'] * 96, ... GRID.SECTION_NUM: range(96), ... 'Area': np.random.uniform(100, 500, 96) ... }) >>> >>> # Apply edge correction >>> corrector = EdgeCorrector( ... on='Area', ... groupby=['ImageName'], ... nrows=8, ... ncols=12, ... top_n=10 ... ) >>> corrected = corrector.analyze(data) >>> >>> # Check results >>> results = corrector.results()
Notes
Stores original data in self._original_data for comparison
Stores corrected data in self._latest_measurements for retrieval
Groups are processed independently with their own thresholds
- show(figsize: tuple[int, int] | None = None, max_groups: int = 20, collapsed: bool = True, criteria: dict[str, any] | None = None, **kwargs) tuple[Figure, matplotlib.axes.Axes][source]#
Visualize edge correction results.
Displays the distribution of measurements for the last time point, highlighting surrounded vs. edge colonies and the calculated correction threshold.
- Parameters:
figsize (tuple[int, int] | None) – Figure size (width, height).
max_groups (int) – Maximum number of groups to display.
collapsed (bool) – If True, show groups stacked vertically.
**kwargs – Additional matplotlib parameters to customize the plot. Common options include: - dpi: Figure resolution (default 100) - facecolor: Figure background color - edgecolor: Figure edge color - grid_alpha: Alpha value for grid lines - legend_loc: Legend location (default ‘best’) - legend_fontsize: Font size for legend (default 8 or 9) - marker_alpha: Alpha value for scatter plot markers - line_width: Line width for box plots and fence lines
- Returns:
Tuple of (Figure, Axes).
- Return type:
- results() pandas.DataFrame[source]#
Return the corrected measurement DataFrame.
Returns the DataFrame with edge-corrected measurements from the most recent call to analyze(). This allows retrieval of results after processing.
- Returns:
DataFrame with corrected measurements. If analyze() has not been called, returns an empty DataFrame.
- Return type:
Examples
Retrieving corrected measurements after analysis
>>> corrector = EdgeCorrector( ... on='Area', ... groupby=['ImageName'] ... ) >>> corrected = corrector.analyze(data) >>> results = corrector.results() # Same as corrected >>> assert results.equals(corrected)
Notes
Returns the DataFrame stored in self._latest_measurements
Contains the same structure as input but with corrected values
Use this method to retrieve results after calling analyze()