{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial 7: Measuring and Exporting\n", "\n", "Detection tells you *where* colonies are. Measurement tells you *what they\n", "are* — how big, how round, how bright. In this tutorial you will add\n", "measurements to a pipeline, extract a DataFrame of colony features, and\n", "export the results for downstream analysis.\n", "\n", "**What you will learn:**\n", "\n", "1. Add measurement operations to a pipeline\n", "2. Use `pipeline.apply_and_measure()` to get a DataFrame\n", "3. Understand the output columns\n", "4. Export to CSV and Parquet" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import phenotypic as pht\n", "from phenotypic.data import load_yeast_plate\n", "from phenotypic.enhance import GaussianBlur, CLAHE\n", "from phenotypic.detect import OtsuDetector\n", "from phenotypic.measure import MeasureSize, MeasureShape, MeasureIntensity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build a Pipeline with Measurements\n", "\n", "The `meas` parameter accepts a list of measurement operations. Each one\n", "extracts a different set of features from the detected colonies." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plate = load_yeast_plate()\n", "\n", "pipeline = pht.ImagePipeline(\n", " ops=[GaussianBlur(sigma=2.0), CLAHE(clip_limit=0.01), OtsuDetector()],\n", " meas=[MeasureSize(), MeasureShape(), MeasureIntensity()],\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Apply and Measure\n", "\n", "`.apply_and_measure()` runs the full pipeline (enhance → detect → measure)\n", "and returns a [pandas DataFrame](https://pandas.pydata.org/docs/user_guide/dsintro.html)\n", "with one row per detected colony." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pipeline.apply_and_measure(plate)\n", "print(f\"Measured {len(df)} colonies across {df.shape[1]} features\")\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explore the Columns\n", "\n", "Each measurement operation contributes its own set of columns. Let's see\n", "what we got." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"All columns:\")\n", "for col in df.columns:\n", " print(f\" {col}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": "Here are the highlights from each measurement:\n\n**MeasureSize:**\n- `Size_Area` — colony size in pixels\n- `Size_IntegratedIntensity` — sum of grayscale pixel values\n\n**MeasureShape:**\n- `Shape_Circularity` — how round the colony is (1.0 = perfect circle)\n- `Shape_Solidity` — ratio of colony area to convex hull area\n- `Shape_Eccentricity` — elongation (0 = circular, approaching 1 = elongated)\n- `Shape_MajorAxisLength` / `Shape_MinorAxisLength` — fitted ellipse axes\n\n**MeasureIntensity:**\n- `Intensity_MeanIntensity` / `Intensity_MedianIntensity` — average colony brightness\n- `Intensity_StandardDeviationIntensity` — variation within the colony\n- `Intensity_MinimumIntensity` / `Intensity_MaximumIntensity` — intensity extremes" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quick Statistics\n", "\n", "Since the result is a standard pandas DataFrame, you can use all the usual\n", "pandas methods to explore it." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "df[[\"Size_Area\", \"Shape_Circularity\", \"Intensity_MeanIntensity\"]].describe()" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Export to CSV\n", "\n", "For sharing with collaborators or importing into spreadsheet software,\n", "export to CSV." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.to_csv(\"colony_measurements.csv\")\n", "print(\"Saved to colony_measurements.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Export to Parquet\n", "\n", "For large datasets, [Parquet](https://parquet.apache.org/) is more\n", "efficient — it is compressed, preserves column types, and loads much\n", "faster than CSV." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.to_parquet(\"colony_measurements.parquet\")\n", "print(\"Saved to colony_measurements.parquet\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clean Up" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "os.remove(\"colony_measurements.csv\")\n", "os.remove(\"colony_measurements.parquet\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "You have extracted colony features and exported them for analysis:\n", "\n", "- **`meas=[MeasureSize(), MeasureShape(), MeasureIntensity()]`** — add measurements to a pipeline\n", "- **`pipeline.apply_and_measure(plate)`** — run the full pipeline and get a DataFrame\n", "- **`.to_csv()`** / **`.to_parquet()`** — export for downstream tools\n", "\n", "The result is a standard pandas DataFrame, so you can filter, group, plot,\n", "and analyze it with any tool in the Python ecosystem.\n", "\n", "**Next up:** [Tutorial 8: Using Prefab Pipelines](08_using_prefab_pipelines.ipynb) —\n", "discover PhenoTypic's pre-built pipelines for common organisms and plate types." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.11.0" } }, "nbformat": 4, "nbformat_minor": 4 }