phenotypic.analysis.EdgeCorrector#

class phenotypic.analysis.EdgeCorrector(on: str, groupby: list[str], time_label: str = 'Metadata_Time', nrows: int = 8, ncols: int = 12, top_n: int = 3, pvalue: float = 0.05, connectivity: int = 4, agg_func: str = 'mean', num_workers: int = 1)[source]

Bases: SetAnalyzer

Analyzer for detecting and correcting edge effects in arrayed colony growth.

This class identifies colonies at grid edges (missing orthogonal neighbors) and caps their measurement values to prevent edge effects in high-throughput phenotyping assays. Edge colonies often show artificially inflated measurements (larger areas, higher color intensity) due to lack of competition for resources from missing neighbors. The corrector uses permutation testing to determine if edge and interior colonies are statistically different before applying correction.

Intuition: In plate-based assays (96-well, 384-well), colonies at grid edges experience fundamentally different growth conditions: they lack orthogonal neighbors that would otherwise compete for nutrients and space. This causes edge colonies to appear larger/brighter than interior colonies under identical conditions, biasing downstream analyses. EdgeCorrector detects this asymmetry and caps measurements to a threshold derived from top interior colonies, preventing this systematic bias.

Use cases:

High-throughput phenotyping on standard plate layouts (8x12, 16x24, etc.)

Growth assays where colony size/intensity is a fitness proxy

Comparing genotypes across plates with multiple replicates per condition

Any analysis where spatial position should not correlate with phenotype

Caveats:

Requires multiple interior colonies to establish a reliable threshold

Edge correction assumes interior and edge colonies should have similar distributions; this may not hold in some experimental designs

If too many wells are empty or dead, surrounded position detection may fail

Permutation testing requires adequate sample sizes for statistical power

All measurements (not just edge colonies) are capped when correction is applied

Attributes:
nrows (int): Number of rows in the grid layout. ncols (int): Number of columns in the grid layout. top_n (int): Number of top-valued interior colonies to use for threshold calculation. connectivity (int): Neighbor pattern: 4 (orthogonal) or 8 (with diagonals). time_label (str): Column name containing time point information. pvalue (float): P-value threshold for permutation test (0.0 disables test). on (str): Name of measurement column to analyze and correct. groupby (list[str]): Column names for grouping data by experiment/plate/condition.

Category: **EDGE_CORRECTION**#
Name	Description
`Cap`	The carrying capacity for the target measurement
`NewVal`	The new value of the target measurement

Methods

`__init__`	Initialize EdgeCorrector with grid layout and correction parameters.
`analyze`	Analyze and apply edge correction to grid-based colony measurements.
`dash`	Interactive Plotly visualization of analysis results.
`results`	Return the corrected measurement DataFrame from the last analyze() call.
`show`	Visualize edge correction results with interior/edge colony comparisons.

Parameters:

on (str)
groupby (list[str])
time_label (str)
nrows (int)
ncols (int)
top_n (int)
pvalue (float)
connectivity (int)
agg_func (str)
num_workers (int)

__init__(on: str, groupby: list[str], time_label: str = 'Metadata_Time', nrows: int = 8, ncols: int = 12, top_n: int = 3, pvalue: float = 0.05, connectivity: int = 4, agg_func: str = 'mean', num_workers: int = 1)[source]

Initialize EdgeCorrector with grid layout and correction parameters.

Parameters:

on (str) – Name of the measurement column to analyze (e.g., ‘Area’, ‘Intensity’). Corrected values will be placed in a new column EDGE_CORRECTION.NEW_VAL-{on}. Original column is preserved unchanged.
groupby (list[str]) – Column names for grouping data independently (e.g., [‘ImageName’, ‘Condition’]). Each group gets its own threshold calculation.
time_label (str, optional) – Column name containing time point information. Defaults to “Metadata_Time”. The maximum time point per group is used to identify interior vs. edge colonies.
nrows (int, optional) – Number of rows in the grid layout. Defaults to 8 (standard 96-well plate). Must be positive. Affects edge detection logic.
ncols (int, optional) – Number of columns in the grid layout. Defaults to 12 (standard 96-well plate). Must be positive. Affects edge detection logic.
top_n (int, optional) – Number of top-valued interior colonies to use for threshold calculation. Defaults to 3. The threshold is the mean of the top_n interior colonies; larger values give more stable thresholds but may miss subtle edge effects.
pvalue (float, optional) – P-value threshold for permutation test comparing interior vs. edge distributions. Defaults to 0.05. Set to 0.0 to disable statistical testing and apply correction to all groups. Values are passed to scipy.stats.permutation_test with 1000 resamples.
connectivity (int, optional) – Neighbor pattern for interior cell detection. Defaults to 4 (orthogonal: North, South, East, West). Set to 8 to include diagonal neighbors. Affects how strictly “surrounded” is defined.
agg_func (str, optional) – Aggregation function for multiple measurements per section (well). Defaults to “mean”. See pandas.DataFrame.agg for options.
num_workers (int, optional) – Number of parallel workers for group processing. Defaults to 1 (serial). Use -1 for all CPU cores via joblib.Parallel.

Raises:

ValueError – If connectivity is not 4 or 8.
ValueError – If nrows or ncols are not positive integers.
ValueError – If top_n is not a positive integer.

Examples

Basic initialization with 96-well plate defaults:

>>> from phenotypic.analysis import EdgeCorrector
>>> corrector = EdgeCorrector(
...     on='Area',
...     groupby=['ImageName'],
...     top_n=3,
...     pvalue=0.05
... )
>>> # nrows=8, ncols=12 are defaults for 96-well format

Custom grid layout (384-well format, 16x24):

>>> corrector = EdgeCorrector(
...     on='ColonyIntensity',
...     groupby=['Plate', 'Condition'],
...     nrows=16,
...     ncols=24,
...     top_n=3,
...     connectivity=8,  # Include diagonal neighbors
...     num_workers=4
... )

Aggressive correction (no statistical test):

>>> corrector = EdgeCorrector(
...     on='Area',
...     groupby=['ImageName'],
...     pvalue=0.0,  # Apply to all groups regardless of stats
...     top_n=1  # Use single top value as threshold
... )

analyze(data: pandas.DataFrame) → pandas.DataFrame[source]

Analyze and apply edge correction to grid-based colony measurements.

This method processes the input DataFrame by grouping according to specified columns and applying edge correction to each group independently. For each group, it identifies edge colonies (those missing orthogonal neighbors at the final time point), compares their distributions to interior colonies via permutation test, and caps all measurements to a threshold derived from top interior colonies.

Edge correction assumes that interior and edge colonies under identical conditions should have similar phenotypic distributions. When they differ significantly (p < pvalue threshold), measurements are capped to prevent edge-driven bias in downstream analyses.

Parameters:

data (pd.DataFrame) –

Input DataFrame containing grid measurements. Must include:

GRID.SECTION_NUM (str): Column with well/section indices (0-indexed flattened position: row * ncols + col)
self.on (str): Measurement column to analyze and correct
All columns in self.groupby: For independent group processing
self.time_label (str, optional): Time point column if not all observations are at the same time

Returns:

Measurements with two new correction columns added:

EdgeCorrection_Size-{on}: Capped measurement values (clipped to threshold where edge effect detected)
EdgeCorrection_-{self.on}: Threshold value used for correction

Original measurement column (self.on) remains unchanged. All other columns preserved from input. One row per well per group.

Return type:

pd.DataFrame

Raises:

KeyError – If required columns (GRID.SECTION_NUM, self.on, or any in self.groupby) are missing.
ValueError – If data is empty or has zero rows.

Notes

Stores original data in self._original_data for later visualization
Stores corrected data in self._latest_measurements for retrieval via results()
Groups are processed independently via joblib.Parallel if num_workers > 1
Aggregation (default: mean) is applied to multiple measurements per well
Edge correction is only applied if permutation test p-value < self.pvalue
If pvalue=0.0, correction is applied to all groups regardless of statistics

Examples

Basic edge correction on 96-well data:

>>> import pandas as pd
>>> import numpy as np
>>> from phenotypic.analysis import EdgeCorrector
>>> from phenotypic.tools\_.measurement_info_ import GRID
>>> # Create sample 96-well data (8 rows x 12 cols)
>>> np.random.seed(42)
>>> data = pd.DataFrame({
...     'ImageName': ['img1'] * 96,
...     GRID.ROW_MAJOR_IDX: range(96),
...     'Metadata_Time': [1] * 96,
...     'Shape_Area': np.random.uniform(100, 500, 96)
... })
>>> # Edge colonies (row/col 0 or 7/11) have larger areas
>>> edge_idx = [i for i in range(96) if i//12 in (0,7) or i%12 in (0,11)]
>>> data.loc[edge_idx, 'Shape_Area'] *= 1.5
>>> # Apply correction
>>> corrector = EdgeCorrector(
...     on='Shape_Area',
...     groupby=['ImageName'],
...     top_n=5,
...     pvalue=0.05
... )
>>> corrected = corrector.analyze(data)  
>>> # New columns created:
>>> # - 'EdgeCorrection_NewVal-Area': Capped area values at threshold
>>> # - 'EdgeCorrection_Cap-Area': Threshold value used
>>> # Original 'Area' column unchanged

Multi-group edge correction (multiple plates and conditions):

>>> # Data from multiple plates and conditions
>>> data = pd.DataFrame({
...     'Plate': ['P1']*96 + ['P2']*96,
...     'Condition': ['WT']*48 + ['KO']*48 + ['WT']*48 + ['KO']*48,
...     GRID.ROW_MAJOR_IDX: list(range(96))*2,
...     'Metadata_Time': [1]*192,
...     'Area': np.random.uniform(100, 500, 192)
... })  
>>> corrector = EdgeCorrector(
...     on='Area',
...     groupby=['Plate', 'Condition'],  # 4 independent corrections
...     nrows=8, ncols=12,
...     num_workers=4
... )
>>> corrected = corrector.analyze(data)  
>>> # Each plate-condition combo gets its own threshold

show(figsize: tuple[int, int] | None = None, max_groups: int = 20, collapsed: bool = True, criteria: dict[str, any] | None = None, **kwargs) → tuple[Figure, matplotlib.axes.Axes][source]

Visualize edge correction results with interior/edge colony comparisons.

Displays the distribution of measurements for the last time point per group, highlighting interior (surrounded) vs. edge colonies. Shows the calculated correction threshold and permutation test p-values. Interior colonies are shown in blue, edge colonies in red. Circles indicate measurements passing the threshold, X’s indicate capped measurements.

Parameters:

figsize (tuple[int, int], optional) – Figure size as (width, height) in inches. If None, auto-sized based on number of groups (single-group: 10x6, many groups: 10x max(6, 0.5*ngroups+2)).
max_groups (int, optional) – Maximum number of groups to display. Defaults to 20. If data has more groups, a warning is printed and only the first 20 are shown.
collapsed (bool, optional) – If True (default), show all groups stacked vertically on a single axis with y-offsets. If False, create a grid of subplots with one group per subplot.
criteria (dict[str, any], optional) – Filter groups before visualization using column-value criteria (e.g., {‘Plate’: ‘P1’, ‘Condition’: [‘WT’, ‘KO’]}). Filtering uses SetAnalyzer._filter_by with AND logic across criteria.
**kwargs –
Additional matplotlib parameters:
- dpi (int): Figure resolution, passed to plt.subplots()
- facecolor (str): Figure background color
- edgecolor (str): Figure edge color
- legend_fontsize (int): Font size for legend (default 9 for collapsed, 8 for individual)

Returns:

Tuple of (matplotlib Figure, Axes object(s)):

If collapsed=True: (Figure, single Axes)

If collapsed=False: (Figure, array of Axes)

Return type:

tuple[Figure, plt.Axes]

Raises:

RuntimeError – If analyze() has not been called (no results to display).
ValueError – If criteria filter leaves no matching data.

Notes

Interior colonies are those with all orthogonal neighbors present (4-connectivity)
Edge colonies are detected but lack all orthogonal neighbors
Threshold line (orange) is derived from top interior colonies
P-values displayed between interior and edge means (if pvalue != 0)
Permutation test uses 1000 resamples with two-sided alternative
Call analyze() before show()

Examples

Basic visualization of edge correction results:

>>> corrector = EdgeCorrector(on='Area', groupby=['ImageName'])
>>> corrected = corrector.analyze(data)  
>>> fig, ax = corrector.show()  
>>> # Single collapsed plot with all groups stacked vertically

Individual subplots per group:

>>> fig, axes = corrector.show(
...     collapsed=False,
...     figsize=(15, 10)
... )  
>>> # Grid of subplots, max 3 columns

Filtered visualization for specific plate:

>>> fig, ax = corrector.show(
...     criteria={'Plate': 'P1'},
...     max_groups=10,
...     figsize=(12, 8)
... )  

results() → pandas.DataFrame[source]

Return the corrected measurement DataFrame from the last analyze() call.

Retrieves the DataFrame with edge-corrected measurements produced by the most recent call to analyze(). Provides convenient access to results without retaining a local reference.

Returns:

Edge-corrected measurements with original data plus two new: correction columns: - EDGE_CORRECTION.NEW_VAL-{self.on}: Capped measurement values - EDGE_CORRECTION.CORRECTED_CAP-{self.on}: Threshold value used Original measurement column (self.on) is preserved unchanged. If analyze() has not been called, returns an empty DataFrame.

Return type:

pd.DataFrame

Examples

Retrieving corrected measurements after analysis:

>>> corrector = EdgeCorrector(
...     on='Area',
...     groupby=['ImageName']
... )
>>> corrected = corrector.analyze(data)  
>>> results = corrector.results()  
>>> assert results.equals(corrected)  
>>> # Access corrected values
>>> corrected_areas = results['Size-Area']  
>>> thresholds = results['Cap-Area']  
>>> # Original 'Area' column also available for comparison
>>> original_areas = results['Area']  

Notes

Returns the DataFrame stored in self._latest_measurements
Same as the return value of analyze()
Always use this method rather than direct attribute access

dash(**kwargs)

Interactive Plotly visualization of analysis results.

Subclasses may override this method to provide an interactive Plotly figure equivalent to show().

Raises:: NotImplementedError – Unless overridden by a subclass.

phenotypic.analysis.EdgeCorrector#

This Page