{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial 7: Measuring and Exporting\n",
    "\n",
    "Detection tells you *where* colonies are. Measurement tells you *what they\n",
    "are* — how big, how round, how bright. In this tutorial you will add\n",
    "measurements to a pipeline, extract a DataFrame of colony features, and\n",
    "export the results for downstream analysis.\n",
    "\n",
    "**What you will learn:**\n",
    "\n",
    "1. Add measurement operations to a pipeline\n",
    "2. Use `pipeline.apply_and_measure()` to get a DataFrame\n",
    "3. Understand the output columns\n",
    "4. Export to CSV and Parquet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import phenotypic as pht\n",
    "from phenotypic.data import load_yeast_plate\n",
    "from phenotypic.enhance import GaussianBlur, CLAHE\n",
    "from phenotypic.detect import OtsuDetector\n",
    "from phenotypic.measure import MeasureSize, MeasureShape, MeasureIntensity"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Build a Pipeline with Measurements\n",
    "\n",
    "The `meas` parameter accepts a list of measurement operations. Each one\n",
    "extracts a different set of features from the detected colonies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plate = load_yeast_plate()\n",
    "\n",
    "pipeline = pht.ImagePipeline(\n",
    "    ops=[GaussianBlur(sigma=2.0), CLAHE(clip_limit=0.01), OtsuDetector()],\n",
    "    meas=[MeasureSize(), MeasureShape(), MeasureIntensity()],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Apply and Measure\n",
    "\n",
    "`.apply_and_measure()` runs the full pipeline (enhance → detect → measure)\n",
    "and returns a [pandas DataFrame](https://pandas.pydata.org/docs/user_guide/dsintro.html)\n",
    "with one row per detected colony."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pipeline.apply_and_measure(plate)\n",
    "print(f\"Measured {len(df)} colonies across {df.shape[1]} features\")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Explore the Columns\n",
    "\n",
    "Each measurement operation contributes its own set of columns. Let's see\n",
    "what we got."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"All columns:\")\n",
    "for col in df.columns:\n",
    "    print(f\"  {col}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "Here are the highlights from each measurement:\n\n**MeasureSize:**\n- `Size_Area` — colony size in pixels\n- `Size_IntegratedIntensity` — sum of grayscale pixel values\n\n**MeasureShape:**\n- `Shape_Circularity` — how round the colony is (1.0 = perfect circle)\n- `Shape_Solidity` — ratio of colony area to convex hull area\n- `Shape_Eccentricity` — elongation (0 = circular, approaching 1 = elongated)\n- `Shape_MajorAxisLength` / `Shape_MinorAxisLength` — fitted ellipse axes\n\n**MeasureIntensity:**\n- `Intensity_MeanIntensity` / `Intensity_MedianIntensity` — average colony brightness\n- `Intensity_StandardDeviationIntensity` — variation within the colony\n- `Intensity_MinimumIntensity` / `Intensity_MaximumIntensity` — intensity extremes"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quick Statistics\n",
    "\n",
    "Since the result is a standard pandas DataFrame, you can use all the usual\n",
    "pandas methods to explore it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "df[[\"Size_Area\", \"Shape_Circularity\", \"Intensity_MeanIntensity\"]].describe()"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Export to CSV\n",
    "\n",
    "For sharing with collaborators or importing into spreadsheet software,\n",
    "export to CSV."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_csv(\"colony_measurements.csv\")\n",
    "print(\"Saved to colony_measurements.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Export to Parquet\n",
    "\n",
    "For large datasets, [Parquet](https://parquet.apache.org/) is more\n",
    "efficient — it is compressed, preserves column types, and loads much\n",
    "faster than CSV."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_parquet(\"colony_measurements.parquet\")\n",
    "print(\"Saved to colony_measurements.parquet\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clean Up"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "os.remove(\"colony_measurements.csv\")\n",
    "os.remove(\"colony_measurements.parquet\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "You have extracted colony features and exported them for analysis:\n",
    "\n",
    "- **`meas=[MeasureSize(), MeasureShape(), MeasureIntensity()]`** — add measurements to a pipeline\n",
    "- **`pipeline.apply_and_measure(plate)`** — run the full pipeline and get a DataFrame\n",
    "- **`.to_csv()`** / **`.to_parquet()`** — export for downstream tools\n",
    "\n",
    "The result is a standard pandas DataFrame, so you can filter, group, plot,\n",
    "and analyze it with any tool in the Python ecosystem.\n",
    "\n",
    "**Next up:** [Tutorial 8: Using Prefab Pipelines](08_using_prefab_pipelines.ipynb) —\n",
    "discover PhenoTypic's pre-built pipelines for common organisms and plate types."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.11.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}