{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial 6: Batch Processing\n",
    "\n",
    "When you have dozens or hundreds of plates to process, running Python code\n",
    "one image at a time is not practical. PhenoTypic's **command-line interface**\n",
    "(CLI) lets you apply a saved pipeline to an entire directory of plate images\n",
    "with built-in parallelism, checkpointing, and resume.\n",
    "\n",
    "**What you will learn:**\n",
    "\n",
    "1. Save a pipeline to JSON for CLI use\n",
    "2. Run batch processing from the command line\n",
    "3. Resume interrupted jobs\n",
    "4. Control parallelism and other options"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Build and Save a Pipeline\n",
    "\n",
    "The CLI needs a saved pipeline JSON file. Let's create one with the\n",
    "enhance-detect-measure workflow we have been building throughout\n",
    "these tutorials."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import phenotypic as pht\n",
    "from phenotypic.enhance import GaussianBlur, CLAHE\n",
    "from phenotypic.detect import OtsuDetector\n",
    "from phenotypic.measure import MeasureSize, MeasureShape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipeline = pht.ImagePipeline(\n",
    "    ops=[GaussianBlur(sigma=2.0), CLAHE(clip_limit=0.01), OtsuDetector()],\n",
    "    meas=[MeasureSize(), MeasureShape()],\n",
    "    name=\"batch_pipeline\",\n",
    ")\n",
    "pipeline.to_json(\"batch_pipeline.json\")\n",
    "print(\"Pipeline saved!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Run Batch Processing\n",
    "\n",
    "The CLI entry point is `python -m phenotypic`. At minimum it needs three\n",
    "arguments: the pipeline JSON file, the input directory containing your\n",
    "plate images, and the output directory where results will be saved.\n",
    "\n",
    "```bash\n",
    "python -m phenotypic batch_pipeline.json /path/to/plates/ /path/to/output/\n",
    "```\n",
    "\n",
    "The CLI will:\n",
    "1. Discover all images in the input directory\n",
    "2. Apply the pipeline to each one\n",
    "3. Save processed images, overlays, and measurement CSVs to the output directory\n",
    "4. Checkpoint progress periodically"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3: Specify Image Type and Grid Dimensions\n",
    "\n",
    "For grid plates, tell the CLI to use `GridImage` and specify the grid layout:\n",
    "\n",
    "```bash\n",
    "python -m phenotypic batch_pipeline.json /path/to/plates/ /path/to/output/ \\\n",
    "    --image-type GridImage \\\n",
    "    --nrows 8 --ncols 12 \\\n",
    "    --ext .png\n",
    "```\n",
    "\n",
    "- **`--image-type GridImage`** — load images as GridImage (default is Image)\n",
    "- **`--nrows 8 --ncols 12`** — 96-well grid layout\n",
    "- **`--ext .png`** — only process files with this extension"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4: Parallelism\n",
    "\n",
    "Process multiple plates at once with `--n-jobs`:\n",
    "\n",
    "```bash\n",
    "python -m phenotypic batch_pipeline.json /path/to/plates/ /path/to/output/ \\\n",
    "    --n-jobs 4\n",
    "```\n",
    "\n",
    "This runs 4 plates in parallel. By default, the CLI uses all available CPU\n",
    "cores."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 5: Resume Interrupted Jobs\n",
    "\n",
    "If a batch job is interrupted (crash, timeout, Ctrl+C), you can resume\n",
    "from where it left off:\n",
    "\n",
    "```bash\n",
    "python -m phenotypic batch_pipeline.json /path/to/plates/ /path/to/output/ \\\n",
    "    --resume\n",
    "```\n",
    "\n",
    "The CLI reads the checkpoint file in the output directory and skips plates\n",
    "that were already processed. Only unfinished plates are reprocessed.\n",
    "\n",
    "To retry only plates that *failed* (not skipped):\n",
    "\n",
    "```bash\n",
    "python -m phenotypic batch_pipeline.json /path/to/plates/ /path/to/output/ \\\n",
    "    --resume --retry-failures\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Useful Flags\n",
    "\n",
    "| Flag | Purpose |\n",
    "|------|---------|\n",
    "| `--dry-run` | Validate pipeline and list images without processing |\n",
    "| `--sample 5` | Process only 5 random images (great for testing) |\n",
    "| `--checkpoint-interval 50` | Save state every 50 images |\n",
    "| `--force-local` | Run locally even if SLURM is available |\n",
    "| `--restart` | Clear all state and start fresh |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clean Up"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "os.remove(\"batch_pipeline.json\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "You now know how to scale PhenoTypic from a single plate to hundreds:\n",
    "\n",
    "- **Save your pipeline** to JSON with `.to_json()`\n",
    "- **Run `python -m phenotypic`** with your pipeline, input directory, and output directory\n",
    "- **`--n-jobs`** controls parallelism\n",
    "- **`--resume`** picks up where you left off after interruptions\n",
    "- **`--dry-run`** and **`--sample`** let you test before committing to a full run\n",
    "\n",
    "**Next up:** [Tutorial 7: Measuring and Exporting](07_measuring_and_exporting.ipynb) —\n",
    "extract size, shape, and intensity measurements from detected colonies."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.11.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}