6. Fitting Logistic Growth Curves#


Background#

In this tutorial we will cover how to fit growth curves on data from PhenoTypic. For growth curves you’ll typically want to use metrics of size from pht.measure.MeasureSize: “Size_Area” or “Size_IntegratedIntensity”. These can also be obtained from pht.measure.MeasureArea or pht.measure.MeasureIntensity respectively.

Size Metric Comparison:

Metric

Pros

Cons

“Shape_Area”

Simple to set up Captures lateral growth

Misses Vertical Growth

“Intensity_IntegratedIntensity”

Captures lateral and vertical growth

Requires color correction if there’s significant color drift or lots of color variation between colonies

Phenotypic uses the following form of the logistic formula:

Logistic Kinetics Model#

\(N(t) = \frac{K } {1 + \frac{K - N0}{N0} e^{-r t}}\)

\(N_t\): population size at time t

\(N_0\): initial population size at time t

\(r\): growth rate

\(K\): Carrying capacity (maximum population size).

from which we can derive:

\(µ_{max} = \frac{Kr}{4}\)

\(µ_{max}\): maximum specific growth rate

To solver for the parameters, we use the following loss function with the scipy linear least squares solver

Loss Function#

\(J(K,N_0,r) = \frac{1}{n}\sum_{i=1}^{n}\frac{1}{2}(f_{K,N0,r}(t^{(i)}) - N_t^{(i)})^2) + \lambda(\frac{dN}{dt}^2 + N_0^2) + \alpha \frac{\lvert K-\max({N_t})\rvert}{N_t}\)

\(\lambda\): regularization term for growth rate and initial population size

\(\alpha\): penalty term for deviations in the carrying capacity relative to the largest measurement

Overview#

The data in this tutorial comes from the {doc} QuickStart tutorial

In this tutorial we use the following:

  • pht.analysis.LogGrowthModel

  • pht.analysis.TukeysOutlierRemover

  • pht.analysis.EdgeCorrector

[ ]:
import phenotypic as pht

meas = pht.data.load_quickstart_meas()
meas.head()

Adding in metadata post-analysis for downstream processing#

The original naming rule for this data was {Time}_{PlateSet}{Media}_{Replicate}

Based on this naming rule, let’s add the remaining metadata. These methods are all from pandas.DataFrame.

Note: This was the original naming rule, but it leads to some awkward splitting. It’s usually better to separate independent variables with underscores and related variables with dashes.

[2]:
# We split the ImageName using string and then unpack that into three new columns in the dataframe
meas[["Metadata_Time", "Metadata_SetMedia", "Metadata_Replicate"]] = meas[
    "Metadata_ImageName"
].str.split("_", expand=True)

# Lets now split the {PlateSet}{Media} into different columns too
meas[["Metadata_PlateSet", "Metadata_Media"]] = meas["Metadata_SetMedia"].str.split(
    "", expand=True
)[[1, 2]]

# Time is a number, but because we split it from its name we need to convert it to the proper datatype
meas["Metadata_Time"] = meas["Metadata_Time"].astype(int)
meas["Metadata_PlateSet"] = meas["Metadata_PlateSet"].astype(int)

# Lets see our newest changes
meas[
    [
        "Metadata_ImageName",
        "Metadata_Time",
        "Metadata_PlateSet",
        "Metadata_Media",
        "Metadata_Replicate",
    ]
].head()
[2]:
Metadata_ImageName Metadata_Time Metadata_PlateSet Metadata_Media Metadata_Replicate
0 2_1S_5 2 1 S 5
1 2_1S_5 2 1 S 5
2 2_1S_5 2 1 S 5
3 2_1S_5 2 1 S 5
4 2_1S_5 2 1 S 5

Adding in metadata like strain name#

While you can add the metadata per-image, its necessary to add it in post-extraction for things like colony strain names. To add in metadata like the strain name for each column to your data, we’ll use table joining.

[3]:
# Lets generate a synthetic table of strain names
import pandas as pd

# It's important to use the same names as the measurement dataframe
synth_names = pd.DataFrame(
    data={
        "Metadata_PlateSet": 1,
        "Grid_SectionNum": range(96),
        "Metadata_StrainName": [f"SYNTH-{idx:02d}" for idx in range(96)],
    }
)

synth_names
[3]:
Metadata_PlateSet Grid_SectionNum Metadata_StrainName
0 1 0 SYNTH-00
1 1 1 SYNTH-01
2 1 2 SYNTH-02
3 1 3 SYNTH-03
4 1 4 SYNTH-04
... ... ... ...
91 1 91 SYNTH-91
92 1 92 SYNTH-92
93 1 93 SYNTH-93
94 1 94 SYNTH-94
95 1 95 SYNTH-95

96 rows × 3 columns

[4]:
meas = meas.merge(synth_names, on=["Metadata_PlateSet", "Grid_SectionNum"], how="inner")
meas[
    ["Metadata_PlateSet", "Grid_SectionNum", "Metadata_StrainName", "Shape_Area"]
].head()
[4]:
Metadata_FileSuffix Metadata_BitDepth Metadata_ImageType Metadata_ImageName ObjectLabel Bbox_CenterRR Bbox_CenterCC Bbox_MinRR Bbox_MinCC Bbox_MaxRR ... ColorHSV_BrightnessStdDev ColorHSV_BrightnessCoeffVar ColorLab_ChromaEstimatedMean ColorLab_ChromaEstimatedMedian Metadata_Time Metadata_SetMedia Metadata_Replicate Metadata_PlateSet Metadata_Media Metadata_StrainName
0 .jpg 8 GridImage 2_1S_5 1 143.066954 638.928742 116 616 171 ... 0.020981 0.068122 2.324505 2.412023 2 1S 5 1 S SYNTH-03
1 .jpg 8 GridImage 2_1S_5 2 150.888348 489.121803 122 464 182 ... 0.049686 0.131528 2.861581 3.057672 2 1S 5 1 S SYNTH-02
2 .jpg 8 GridImage 2_1S_5 3 152.518557 962.014356 122 934 185 ... 0.046833 0.115846 3.028543 3.214427 2 1S 5 1 S SYNTH-05
3 .jpg 8 GridImage 2_1S_5 4 152.439723 1610.904644 125 1589 182 ... 0.045290 0.116357 4.114455 4.538226 2 1S 5 1 S SYNTH-09
4 .jpg 8 GridImage 2_1S_5 5 156.673430 802.163285 129 779 185 ... 0.043493 0.113999 2.941698 3.139710 2 1S 5 1 S SYNTH-04

5 rows × 178 columns

[6]:
# Lets isolate the measurements we care about
meas = meas[
    [
        "Metadata_ImageName",
        "Grid_SectionNum",
        "Metadata_Media",
        "Metadata_PlateSet",
        "Metadata_Replicate",
        "Metadata_StrainName",
        "Metadata_Time",
        "Shape_Area",
    ]
].sort_values(by=["Metadata_StrainName", "Metadata_Time"])

meas["Metadata_Time"] = meas["Metadata_Time"].astype(
    int
)  # Ensures that the time column is numeric
meas.head(10)
[6]:
Metadata_ImageName Grid_SectionNum Metadata_Media Metadata_PlateSet Metadata_Replicate Metadata_StrainName Metadata_Time Shape_Area
8 2_1S_5 0 S 1 5 SYNTH-00 2 2295.0
89 2_1S_6 0 S 1 6 SYNTH-00 2 2420.0
171 2_1S_7 0 S 1 7 SYNTH-00 2 2751.0
315 3_1S_5 0 S 1 5 SYNTH-00 3 2676.0
401 3_1S_6 0 S 1 6 SYNTH-00 3 2877.0
485 3_1S_7 0 S 1 7 SYNTH-00 3 3206.0
568 3_1S_8 0 S 1 8 SYNTH-00 3 2840.0
651 4_1S_5 0 S 1 5 SYNTH-00 4 2948.0
740 4_1S_6 0 S 1 6 SYNTH-00 4 3413.0
826 4_1S_7 0 S 1 7 SYNTH-00 4 3535.0

Filtering and correction#

In automated image processing, depending on the scene, objects can be:

  • oversegmented (some of the background is detected along with the object)

  • undersegmented (some of the object is not detected with the object)

  • noise that was detected as an object

Other factors that may mess with our growth curves include stochastic biological noise or experimental error as well. To refine the data to reflect the true growth kinetics better, we can apply tukey’s outlier and our edge correction method on the dataset.

[7]:
from phenotypic.analysis import TukeyOutlierRemover

tukey_outlier_analysis = TukeyOutlierRemover(
    on="Shape_Area", groupby=["Metadata_StrainName", "Metadata_Time"]
)

tukey_results = tukey_outlier_analysis.analyze(meas)

tukey_outlier_analysis.show(collapsed=True)
Warning: Displaying only first 20 of 451 groups
[7]:
(<Figure size 1000x1200 with 1 Axes>,
 <Axes: xlabel='Shape_Area', ylabel='Group'>)
../../../_images/user_guide_tutorial_notebooks_GrowthCurves_9_2.png

Since there’s a large amount of groups and they can’t all be displayed, only the first 20 are displayed, but this can be changed.

[8]:
from phenotypic.analysis import EdgeCorrector

edge_correcter = EdgeCorrector(
    on="Shape_Area",
    groupby=["Metadata_PlateSet", "Metadata_Media", "Metadata_Replicate"],
    time_label="Metadata_Time",
)

edge_corrected = edge_correcter.analyze(tukey_results)
edge_correcter.show(collapsed=True)
[8]:
(<Figure size 1000x600 with 1 Axes>,
 <Axes: title={'center': 'Edge Correction (Top N=3, p=0.05)'}, xlabel='Shape_Area'>)
../../../_images/user_guide_tutorial_notebooks_GrowthCurves_11_1.png

Fit model#

New parameters: > lam: The regularization factor for specific growth rate > alpha: The penalty term for the relative deviation from the max of the carrying capacity

[9]:
log_model = pht.analysis.LogGrowthModel(
    on="Shape_Area",
    groupby=["Metadata_StrainName"],
    agg_func="mean",
    time_label="Metadata_Time",
    lam=0.5,
    alpha=2.0,
    n_jobs=-1,
)
model_fit = log_model.analyze(meas)
model_fit.head()
[9]:
Metadata_StrainName LogGrowthModel_r LogGrowthModel_K LogGrowthModel_N0 LogGrowthModel_µmax LogGrowthModel_Kmax LogGrowthModel_NumSamples LogGrowthModel_OptimizerLoss LogGrowthModel_OptimizerStatus LogGrowthModel_MAE LogGrowthModel_MSE LogGrowthModel_RMSE
0 SYNTH-00 0.882962 3803.000000 731.050357 839.475765 3803.00 5 359014.371364 2 111.664757 19690.330182 140.322237
1 SYNTH-01 1.062408 3262.010706 363.235125 866.396198 3414.25 5 348964.083825 2 212.846392 51327.417300 226.555550
2 SYNTH-02 0.861747 3321.531212 663.466829 715.579876 3341.25 5 285067.340599 2 100.423547 18802.656973 137.122781
3 SYNTH-03 0.903318 2963.500000 347.147868 669.245998 2963.50 5 155206.330925 2 63.041344 5242.347575 72.404058
4 SYNTH-04 0.885146 3032.000000 593.850922 670.940923 3032.00 5 236673.126701 2 93.985535 14387.186777 119.946600

Parameters:

  • on: What column to measure on

  • groupby: What columns to form groups on. See also: pandas.DataFrame.groupby()

  • agg_func: How to aggregate replicate time points. Default is “mean”. For other options see: pandas.DataFrame.agg()

  • time_label: What column is to use for the time measurement. This should be added by the user either as metadata or externally.

Display results#

[10]:
log_model.show()
[10]:
(<Figure size 600x400 with 1 Axes>, <Axes: title={'center': 'mean±SE'}>)
../../../_images/user_guide_tutorial_notebooks_GrowthCurves_16_1.png
[11]:
log_model.show(criteria={"Metadata_StrainName": ["SYNTH-00"]})
[11]:
(<Figure size 600x400 with 1 Axes>, <Axes: title={'center': 'mean±SE'}>)
../../../_images/user_guide_tutorial_notebooks_GrowthCurves_17_1.png

If we want to display only one of the strains, we can use the criteria like above. You can specify which column you want to filter by as a key and provide the